pickuma.
Infrastructure

Concurrency, Retries, and Timeouts: Building Reliable AI Agents in TypeScript

Why Promise.race leaks model calls and billing in AI agents, and how a single-owner pattern with AbortSignal, deadline budgets, and jittered retries fixes it.

6 min read

An AI agent rarely does one thing at a time. A single turn might call a model, run three tool invocations in parallel, fetch a document, and query a vector store — each with its own latency curve, cost, and failure mode. When one task hangs, the reflexive fix is a timeout. When one fails, the reflexive fix is a retry. Stack both across a dozen concurrent tasks and you get a system that quietly burns tokens on work nobody is waiting for anymore.

The part most agent code gets wrong is ownership: who controls a task’s lifecycle once it has started.

Why Promise.race leaks work and money

The most common timeout in TypeScript looks like this:

const result = await Promise.race([
callModel(prompt),
new Promise((_, reject) =>
setTimeout(() => reject(new Error('timeout')), 30_000)),
]);

It looks correct. It is not. Promise.race settles with whichever promise finishes first, but it has no power to stop the others. When the timeout wins, callModel(prompt) is still running. The HTTP request is still open. The provider is still streaming tokens you are still paying for. The promise just has nobody listening.

For one call that is a rounding error. For an agent that fans out several tool calls per turn across hundreds of turns, the leaked work compounds: orphaned model calls, connection-pool exhaustion, and a bill that does not reconcile with your logs.

One owner per task

The fix is to give every task a single owner holding three controls: the signal that cancels it, the timer that enforces its deadline, and the catch block that decides retries. AbortController is the primitive that ties them together.

function withDeadline<T>(
work: (signal: AbortSignal) => Promise<T>,
ms: number,
parent?: AbortSignal,
): Promise<T> {
const signal = parent
? AbortSignal.any([parent, AbortSignal.timeout(ms)])
: AbortSignal.timeout(ms);
return work(signal);
}

Two things matter. First, the signal is passed into the work, not wrapped around it. callModel must accept an AbortSignal and forward it to fetchfetch and every current provider SDK support this. When the deadline fires, the socket closes and the provider stops generating. Second, AbortSignal.any (Node 20+, current browsers) lets a task be cancelled by either its own timeout or its parent. When a user cancels a turn, every in-flight tool call beneath it dies in one propagation instead of running to its own deadline.

That is the single-owner idea: a task is never cancelled by a race against an unrelated promise. It is cancelled by a signal its owner controls and that the task itself listens to.

Retries and timeouts share one budget

Retries and timeouts are usually written on different days by different people, and they fight. A 30-second per-attempt timeout with three retries is a two-minute worst case — long after the user gave up. The fix is one deadline budget that every retry draws down from, instead of a fresh timeout per attempt.

async function retry<T>(
work: (signal: AbortSignal) => Promise<T>,
opts: { attempts: number; budgetMs: number; parent?: AbortSignal },
): Promise<T> {
const start = Date.now();
for (let i = 1; ; i++) {
const left = opts.budgetMs - (Date.now() - start);
if (left <= 0) throw new Error('deadline exceeded');
try {
return await withDeadline(work, left, opts.parent);
} catch (err) {
if (i >= opts.attempts || !isRetryable(err)) throw err;
const backoff = Math.min(500 * 2 ** i, 8_000);
await sleep(Math.random() * backoff); // full jitter
}
}
}

Three rules this enforces:

  • Each attempt’s timeout is the remaining budget, so total wall-clock time never exceeds budgetMs.
  • isRetryable must distinguish causes. Retry on 429, 503, and connection resets. Do not retry on 400 or 401 — a malformed or unauthorized request fails identically every time, and you have tripled latency for nothing.
  • Backoff uses full jitter (Math.random() * backoff), not a fixed delay. When a provider rate-limits your whole agent at once, synchronized retries arrive as a thundering herd and get rate-limited again.

One trap is specific to agents: idempotency. Retrying a model call is safe — it has no side effect beyond cost. Retrying a tool call that sends an email, charges a card, or writes a row is not. Tag each tool as idempotent or not, and let only the idempotent ones into the retry path. The rest should fail loudly on the first error rather than repeat a side effect.

Cursor

AI-native code editor with codebase-aware multi-file edits — useful for threading AbortSignal through an existing agent codebase and keeping retry and timeout wrappers consistent across files.

Free tier; Pro at $20/month

Try Cursor

Affiliate link · We earn a commission at no cost to you.

Wire these three patterns together and the payoff is structural: a cancelled turn stops all of its work, a slow provider cannot blow your latency budget, and a retry storm never amplifies an outage. None of it requires a framework — AbortController, AbortSignal.any, and a budget counter are enough.

FAQ

Does AbortController actually stop an LLM API call, or just the local promise? +
It stops the call if the signal reaches the underlying fetch. AbortController aborts the HTTP request, the provider sees the connection close, and generation stops. The catch is that you must thread the signal all the way down — through your SDK wrapper into fetch. If any layer drops it, abort only settles your local promise while the request keeps running and billing server-side.
When should I retry a failed agent step instead of failing the whole turn? +
Retry transient, idempotent failures — rate limits (429), provider 503s, connection resets — within the deadline budget. Fail the turn for client errors like 400 and 401, for non-idempotent tool calls that already had partial effects, and once the budget is exhausted. A retry that cannot change the outcome only adds latency.
Is Promise.race ever the right tool for a timeout? +
It is fine when the losing task has no cost and no side effects — racing two in-memory cache lookups, for example. It is wrong whenever the loser holds a network connection, bills per token, or mutates state, because race cannot cancel it. For anything that touches a provider API, use an AbortSignal the task actually listens to.

Related tools

Some links above are affiliate links. We may earn a commission if you sign up. See our disclosure for details.

Related reading

See all Infrastructure articles →

Get the best tools, weekly

One email every Friday. No spam, unsubscribe anytime.