Concurrency, Retries, and Timeouts: Building Reliable AI Agents in TypeScript
Why Promise.race leaks model calls and billing in AI agents, and how a single-owner pattern with AbortSignal, deadline budgets, and jittered retries fixes it.
An AI agent rarely does one thing at a time. A single turn might call a model, run three tool invocations in parallel, fetch a document, and query a vector store — each with its own latency curve, cost, and failure mode. When one task hangs, the reflexive fix is a timeout. When one fails, the reflexive fix is a retry. Stack both across a dozen concurrent tasks and you get a system that quietly burns tokens on work nobody is waiting for anymore.
The part most agent code gets wrong is ownership: who controls a task’s lifecycle once it has started.
Why Promise.race leaks work and money
The most common timeout in TypeScript looks like this:
const result = await Promise.race([ callModel(prompt), new Promise((_, reject) => setTimeout(() => reject(new Error('timeout')), 30_000)),]);It looks correct. It is not. Promise.race settles with whichever promise finishes first, but it has no power to stop the others. When the timeout wins, callModel(prompt) is still running. The HTTP request is still open. The provider is still streaming tokens you are still paying for. The promise just has nobody listening.
For one call that is a rounding error. For an agent that fans out several tool calls per turn across hundreds of turns, the leaked work compounds: orphaned model calls, connection-pool exhaustion, and a bill that does not reconcile with your logs.
One owner per task
The fix is to give every task a single owner holding three controls: the signal that cancels it, the timer that enforces its deadline, and the catch block that decides retries. AbortController is the primitive that ties them together.
function withDeadline<T>( work: (signal: AbortSignal) => Promise<T>, ms: number, parent?: AbortSignal,): Promise<T> { const signal = parent ? AbortSignal.any([parent, AbortSignal.timeout(ms)]) : AbortSignal.timeout(ms); return work(signal);}Two things matter. First, the signal is passed into the work, not wrapped around it. callModel must accept an AbortSignal and forward it to fetch — fetch and every current provider SDK support this. When the deadline fires, the socket closes and the provider stops generating. Second, AbortSignal.any (Node 20+, current browsers) lets a task be cancelled by either its own timeout or its parent. When a user cancels a turn, every in-flight tool call beneath it dies in one propagation instead of running to its own deadline.
That is the single-owner idea: a task is never cancelled by a race against an unrelated promise. It is cancelled by a signal its owner controls and that the task itself listens to.
Retries and timeouts share one budget
Retries and timeouts are usually written on different days by different people, and they fight. A 30-second per-attempt timeout with three retries is a two-minute worst case — long after the user gave up. The fix is one deadline budget that every retry draws down from, instead of a fresh timeout per attempt.
async function retry<T>( work: (signal: AbortSignal) => Promise<T>, opts: { attempts: number; budgetMs: number; parent?: AbortSignal },): Promise<T> { const start = Date.now(); for (let i = 1; ; i++) { const left = opts.budgetMs - (Date.now() - start); if (left <= 0) throw new Error('deadline exceeded'); try { return await withDeadline(work, left, opts.parent); } catch (err) { if (i >= opts.attempts || !isRetryable(err)) throw err; const backoff = Math.min(500 * 2 ** i, 8_000); await sleep(Math.random() * backoff); // full jitter } }}Three rules this enforces:
- Each attempt’s timeout is the remaining budget, so total wall-clock time never exceeds
budgetMs. isRetryablemust distinguish causes. Retry on 429, 503, and connection resets. Do not retry on 400 or 401 — a malformed or unauthorized request fails identically every time, and you have tripled latency for nothing.- Backoff uses full jitter (
Math.random() * backoff), not a fixed delay. When a provider rate-limits your whole agent at once, synchronized retries arrive as a thundering herd and get rate-limited again.
One trap is specific to agents: idempotency. Retrying a model call is safe — it has no side effect beyond cost. Retrying a tool call that sends an email, charges a card, or writes a row is not. Tag each tool as idempotent or not, and let only the idempotent ones into the retry path. The rest should fail loudly on the first error rather than repeat a side effect.
Cursor
AI-native code editor with codebase-aware multi-file edits — useful for threading AbortSignal through an existing agent codebase and keeping retry and timeout wrappers consistent across files.
Free tier; Pro at $20/month
Affiliate link · We earn a commission at no cost to you.
Wire these three patterns together and the payoff is structural: a cancelled turn stops all of its work, a slow provider cannot blow your latency budget, and a retry storm never amplifies an outage. None of it requires a framework — AbortController, AbortSignal.any, and a budget counter are enough.
FAQ
Does AbortController actually stop an LLM API call, or just the local promise? +
When should I retry a failed agent step instead of failing the whole turn? +
Is Promise.race ever the right tool for a timeout? +
Related tools
Beehiiv
Newsletter platform with built-in ad network and Boost referrals.
Try Beehiiv →
Webflow
Visual site builder with real CSS export and a CMS that scales.
Try Webflow →
Some links above are affiliate links. We may earn a commission if you sign up. See our disclosure for details.
Related reading
2026-05-21
Temporal Hits 3,000 Customers: Durable Execution for AI Agent Workflows
Temporal's durable execution engine crossed 3,000 paying customers as teams building long-running LLM agents swap DIY retry code for crash-proof workflows. We break down what durable execution buys you and where it costs you.
2026-05-21
MinIO MemKV and the AI Recompute Tax: What KV Cache Offloading Actually Buys You
MinIO's MemKV offloads transformer KV cache to persistent memory tiers so agentic AI pipelines reload attention state instead of recomputing it. We break down the recompute tax, MinIO's 95% utilization claim, and when reload actually beats recompute.
2026-05-21
Why AI Agents Fail Silently and How to Build an Observability Monitor
AI agents return 200s and exit cleanly while hallucinating, degrading under rate limits, and overrunning budgets. Here are the four silent failure modes and a minimal monitor that catches them in production.
2026-05-21
Why Long-Running AI Agents Break on HTTP, and How Ably's Durable Sessions Fix It
HTTP's request-response model was never built for AI agents that run for minutes or hours. Here is why connections drop mid-task and how Ably's durable sessions keep messages, state, and reconnects intact.
2026-05-20
Training an LLM in Swift: Optimizing Matrix Multiplication from Gflop/s to Tflop/s
A technical walkthrough of optimizing matrix multiplication in Swift on Apple Silicon — loop reordering, cache blocking, SIMD, multithreading, and GPU offload — and why matmul throughput sets your LLM training speed.
Get the best tools, weekly
One email every Friday. No spam, unsubscribe anytime.