pickuma.
Infrastructure

Temporal Hits 3,000 Customers: Durable Execution for AI Agent Workflows

Temporal's durable execution engine crossed 3,000 paying customers as teams building long-running LLM agents swap DIY retry code for crash-proof workflows. We break down what durable execution buys you and where it costs you.

6 min read

Temporal says it crossed 3,000 paying customers. The number on its own is a vanity metric — what’s interesting is who’s signing up. A growing share are teams building AI agents: long-running LLM pipelines that call models, hit tools, wait on humans, and have to survive a process restart in the middle of all of it.

If you’ve shipped an agent that runs longer than a single request, you know the failure mode. The model call times out on step 9 of 14. Your worker gets redeployed mid-run. A tool API returns a 429. The agent loop was holding all of its state in memory, and now that state is gone. Temporal’s pitch is that this class of bug should not be your problem. We read through its docs and SDKs to see how well that holds up for agent workloads specifically.

What durable execution actually changes

Temporal is a workflow engine built around one idea: your workflow code runs as if the machine never fails. You write an ordinary function — call a model, branch on the result, sleep for an hour, call a tool — and Temporal makes that function’s execution durable. If the process running it dies, another worker picks the workflow up and continues from the line it left off.

It does this with event sourcing. Every step a workflow takes — every activity it schedules, every timer it sets, every signal it receives — is appended to an event history stored by the Temporal service. When a worker resumes a workflow, it replays that history to rebuild in-memory state, then continues. The workflow function never persists anything explicitly. You do not write checkpoint code.

That split is the core of the model: workflow code is the deterministic orchestration layer, and activities are the side effects. An activity is a plain function — an HTTP call to a model API, a database write, a tool invocation. Activities fail and get retried independently, with backoff policies you set per activity instead of hand-rolling. The workflow that called them never sees the retries; it sees the eventual result.

For an agent, the mapping is direct. The loop — decide, act, observe, repeat — becomes a workflow. Each model call and each tool call becomes an activity. A six-hour sleep costs nothing while it waits and survives any number of deploys. Waiting on a human approval becomes a signal: the workflow blocks until your app sends one, even if that takes three days.

The DIY retry code you are replacing

Most agent projects start without any of this. The loop lives in one process, state lives in a variable, and reliability is whatever try/except and a retry decorator give you. That works in a notebook. It stops working the first time a run outlives the process that started it.

The two common upgrades both have sharp edges. The first is scattering retry logic — tenacity in Python, a backoff wrapper in TypeScript — around every external call. It handles transient failures and does nothing for a crash. If the process dies, the half-finished run dies with it, and you have no record of where it was. You also end up with retry policy duplicated across a dozen call sites, each one slightly different.

The second is a job queue: Celery, BullMQ, SQS with workers. Queues are good at fan-out and at surviving restarts, but they push a different cost onto you. A multi-step run becomes several queued jobs, and now you own the glue: persisting state between steps, making every step idempotent so a redelivered message does not double-charge a model call, and reconstructing which step the run was on after a failure. You are building a workflow engine, badly, one queue at a time.

Temporal collapses that work. State between steps is the workflow’s own local variables, persisted for you. Idempotency is handled because a replayed workflow does not re-run activities that already completed — it reads their results from history. Which step the run is on is the event history, visible in a UI you did not build. Retry policy lives in one place per activity.

The honest version: you do not adopt Temporal to write less code on day one. You adopt it so the reliability code you would otherwise write, and keep rewriting, is no longer yours to maintain. Building it out does mean writing typed SDK code — workflow definitions, activity stubs, worker registration — and that is where an AI-native editor earns its place.

Cursor

Temporal's SDKs are typed and boilerplate-heavy: workflow definitions, activity stubs, worker registration, per-activity retry policies. An AI-native editor speeds up the scaffolding so your time goes to agent logic instead of wiring.

Free Hobby tier; Pro at $20/month

Try Cursor

Affiliate link · We earn a commission at no cost to you.

Where Temporal makes you pay

None of this is free in effort. Three costs are worth knowing before you commit.

Determinism is the big one. Workflow code is replayed, so it cannot do anything non-deterministic directly — no Date.now(), no random(), no direct network calls, no reading a file. Those go through activities or the SDK’s deterministic equivalents. Break the rule and a replay diverges from history, which surfaces as an error at the worst possible time. The constraint is learnable, but it is a real shift in how you write the orchestration layer.

Versioning is the second. Because old workflows replay old history, changing a running workflow’s code can break in-flight executions. Temporal gives you patching APIs for this, but long-lived agent workflows — ones that sleep for days — mean you will hit it. You have to treat code changes the way you treat database migrations.

Operations is the third. Self-hosting means running the service plus a database and keeping event history from growing without bound. Temporal Cloud removes that, but its usage-based billing scales with how many actions your workflows take, and a chatty agent loop generates a lot of actions. Model the cost before you move a high-volume workload onto it.

For a single short-lived agent call, Temporal is overkill — a plain retry wrapper is the right tool. The line to cross is when runs are long, span multiple services, wait on humans or timers, or cannot afford to lose state. That is the workload driving the 3,000-customer figure, and it is one that genuinely lacked a clean answer before.

FAQ

Does Temporal replace an agent framework like LangGraph or CrewAI? +
No — it sits underneath one. Temporal handles durability and orchestration; the framework handles prompt construction, tool definitions, and model routing. A common pattern runs the framework's reasoning step as a Temporal activity, with the agent loop itself as the workflow. You can also skip the framework and write the loop directly.
Can I add Temporal to an existing agent without a rewrite? +
Partly. Reasoning and tool calls usually become activities with little change, since they are already plain functions. The cost is the loop itself: it has to become deterministic workflow code, which means moving anything that touches time, randomness, or the network into activities. For a small agent that is an afternoon; for a large one, plan for a few days.
Is the open-source version enough, or do I need Temporal Cloud? +
The open-source server is fully functional — Temporal Cloud runs the same code. Self-hosting is reasonable for a service or two if you are comfortable operating a stateful service and its database. Most teams move to Cloud once event-history retention and scaling outweigh the per-action bill.

Related tools

Some links above are affiliate links. We may earn a commission if you sign up. See our disclosure for details.

Related reading

See all Infrastructure articles →

Get the best tools, weekly

One email every Friday. No spam, unsubscribe anytime.