pickuma.
Infrastructure

Temporal Cloud Serverless: Durable Execution Without the Ops Overhead

Temporal Cloud now lets you run durable workflows on AWS Lambda with zero infrastructure management. Here's what changed, what the tradeoffs are, and whether it fits your workload.

7 min read

If you’ve evaluated Temporal before and decided the ops surface was too heavy, the picture has shifted. At Replay 2026, Temporal announced Serverless Workers — currently in pre-release — which run your Temporal Workers on AWS Lambda rather than a persistent fleet you manage. The core programming model stays the same, but Temporal now handles invoking, scaling, and shutting down the Lambda functions based on queue depth. You write the same Workflows and Activities you’d write for a self-hosted cluster; what disappears is the always-on compute bill and the autoscaling strategy.

Before getting into the specifics of what changed, it’s worth being clear about what Temporal actually is and why the serverless announcement matters in context.

What Durable Execution Actually Means

Temporal’s core abstraction is that your code runs to completion regardless of failures — process crashes, network partitions, infrastructure restarts. It achieves this by recording every step of a Workflow’s execution as an event history on the Temporal Service. If a Worker crashes mid-execution, another Worker picks up the history, replays it to reconstruct in-memory state, and continues from where things stopped.

The practical result: you write business logic as ordinary functions without embedding retry loops, checkpoint files, or manual state management. A Workflow that transfers funds, processes a batch of documents, or runs a multi-step ML pipeline looks like sequential code. The durability comes from Temporal’s event log, not from your code’s defensive patterns.

The unit of work is split into two layers. Workflows define the control flow — what happens, in what order, with what branching logic. Activities are the side-effectful units that talk to databases, APIs, or external services. Activities get automatic retry policies; Workflows don’t execute side effects directly, which is what makes replay safe.

Workers are the processes that actually execute this code. They poll a Task Queue on the Temporal Service, pull tasks, run them, and report results back. Traditional Temporal deployments require you to run long-lived Worker processes — on Kubernetes, EC2, ECS, wherever — and manage their scaling yourself.

Serverless Workers: What Changed at Replay 2026

Serverless Workers are a different lifecycle model for the same programming model. Instead of a long-running process polling the queue continuously, you upload your Worker code to AWS Lambda, create a cross-account IAM role using a Temporal-provided CloudFormation template, and register the Lambda ARN with Temporal via CLI or UI.

From there, Temporal watches the Task Queue metrics — specifically the backlog count and sync match rate — and decides when to invoke your Lambda. When tasks arrive, Temporal assumes the IAM role in your account and triggers the function. The Worker processes available tasks and shuts down before Lambda’s maximum invocation duration.

The setup is intentionally minimal: three steps, standard SDK code, no new APIs to learn. The pre-release currently supports Go, Python, and TypeScript SDKs. Google Cloud Run support is listed as coming.

The scaling model changes meaningfully. With a traditional Worker fleet, you define autoscaling policies and pay for minimum capacity even during quiet periods. With Serverless Workers, compute runs only when tasks exist. For workloads that are bursty, infrequent, or unpredictable in volume — background jobs, triggered pipelines, intermittent integrations — this eliminates a real cost and operational surface.

The Constraint You Can’t Ignore

Lambda imposes a maximum invocation duration of 15 minutes. Temporal handles this cleanly at the Workflow level — a Workflow can span arbitrarily many Lambda invocations across its lifetime, because the state lives in the event log, not in the process. But individual Activities are bounded by that 15-minute ceiling.

If you have an Activity that calls a slow external API, runs a database migration, or performs a computation that regularly takes longer than 15 minutes, Serverless Workers are the wrong fit for those activities. Long-running Workflows are supported; long-running Activities within a single invocation are not. This is a real limitation for ML training steps, video encoding, or any processing that cannot be broken into chunks under the time limit.

The Temporal team is candid about this tradeoff in the documentation. It’s not a workaround-able edge case — it’s an architectural constraint of the underlying compute platform.

Why This Matters for AI Agent Workflows

The timing of the serverless announcement is not accidental. AI agent architectures have become one of Temporal’s fastest-growing use cases, and the two are naturally complementary for reasons that go beyond marketing alignment.

Agentic workflows are structurally difficult: they run for unpredictable durations, call unreliable APIs (LLM providers, external tools, retrieval systems), branch based on model outputs, and need to be observable and recoverable when something goes wrong. Temporal’s primitives address each of these directly.

Also announced at Replay 2026 alongside Serverless Workers:

  • Workflow Streams (public preview): A durable streaming primitive using Signals and Updates that delivers incremental outputs — useful for streaming token-by-token LLM responses through a durable layer rather than buffering everything in memory.
  • External Payload Storage (public preview for Python and Go): Routes large inputs and outputs through Amazon S3 or custom storage drivers, sidestepping Temporal’s payload size limits when you’re passing large context windows or embedding vectors between steps.
  • Google ADK and OpenAI Agents SDK integrations: Official integrations that give agent frameworks access to Temporal’s durability primitives without manual wiring.

For multi-agent systems specifically, Temporal’s Signals and Queries give you a structured inter-agent messaging layer backed by the event log. Each agent is a separate Workflow; Signals pass messages between them; Queries expose current state without mutating it. The Temporal UI records every inter-agent communication with timestamps and inputs, which converts the usual opacity of agent orchestration into something you can actually inspect and debug.

The Serverless Workers model fits agent workloads that are event-triggered — a new document arrives, a user submits a form, a schedule fires. Those agents don’t need always-on Workers. They need Workers that start in response to demand and stop when the queue is empty.

Pricing and When the Model Makes Sense

Temporal Cloud bills on actions — billable operations between your application and the Temporal Service, such as starting a Workflow, recording a heartbeat, or sending a Signal. Published pricing starts at $50 per million Actions with volume discounts applied automatically as usage grows. Storage is billed separately: active storage (running Workflows) and retained storage (event histories for closed Workflows, up to a 90-day retention window).

The base plan tiers start at $100/month for Essentials and $500/month for Business. These include baseline action and storage allocations before consumption billing kicks in.

Serverless Workers don’t introduce a new Temporal billing line — you still pay for Actions and Storage as usual. What changes is your compute bill: Lambda invocations instead of persistent EC2 or Kubernetes nodes. For workloads running continuously at high volume, the Lambda cost per invocation can exceed what you’d pay for a small always-on fleet. The break-even depends on your specific invocation pattern and Lambda configuration, and Temporal’s own documentation on estimating costs is worth reading before committing.

The model makes the clearest sense for:

  • Background job pipelines where tasks arrive in unpredictable bursts
  • Development and staging environments where you want Temporal’s durability semantics without paying for idle Workers
  • Early-stage products where you’re not yet sure whether the workload justifies dedicated infrastructure
  • Agent systems where each workflow execution is triggered by an external event rather than running continuously

It makes less sense for latency-sensitive workflows (Lambda cold starts add tail latency you can’t fully control), high-throughput steady-state processing (at sufficient volume, long-lived Workers are cheaper), or any use case involving Activities that approach or exceed the 15-minute Lambda limit.

The Broader Picture

Temporal has grown from a Cadence fork to a funded company with over 3,000 paying customers, a managed cloud product, and now a serverless deployment mode. The programming model has stayed stable enough that early-adopter code from three years ago largely still works. That’s genuinely unusual for infrastructure tooling.

What’s changed is the deployment surface. Self-hosted Temporal clusters require Kubernetes and a production-grade persistence store (PostgreSQL or Cassandra). Temporal Cloud removes the cluster ops but still assumed you ran your own Workers. Serverless Workers remove the Worker ops. The progression is coherent.

The remaining question for most teams is whether the Temporal programming model — deterministic Workflows, separate Activities, replay-based recovery — is the right abstraction for their workload. If it is, the serverless option removes the last significant deployment objection. If it isn’t, serverless Workers don’t change the fundamental model fit. That evaluation still requires reading the documentation, running the hello-world, and stress-testing the determinism constraints against your actual code.

The pre-release is open. The setup is documented. Whether the 15-minute Activity limit and Lambda cold-start tail latency are acceptable depends on your workload, and that’s something only you can benchmark.

FAQ

Can a Temporal Workflow running on Serverless Workers run for more than 15 minutes? +
Yes. The 15-minute ceiling applies to individual Lambda invocations, not to the Workflow as a whole. Temporal persists Workflow state in its event log, so a single Workflow can span many Lambda invocations over hours, days, or longer. The constraint is that any single Activity execution must complete within one invocation window.
Does Serverless Workers support the same Temporal SDK APIs as self-hosted Workers? +
The programming model is identical — you register the same Workflows and Activities using the same SDK. Serverless Workers are currently in pre-release and support Go, Python, and TypeScript. Java and other SDKs are not yet available for this deployment mode. The pre-release label means the setup APIs and CLI commands may change before GA.
Is Temporal Cloud serverless suitable for AI agent orchestration? +
It depends on your agent architecture. Event-triggered agents that start in response to external inputs are a good fit: they benefit from scale-to-zero compute and Temporal's built-in durability without paying for idle Workers. Agents that need to run continuously or that have Activities exceeding 15 minutes are better served by traditional long-lived Workers, which Temporal Cloud also supports.

Related tools

Some links above are affiliate links. We may earn a commission if you sign up. See our disclosure for details.

Related reading

See all Infrastructure articles →

Get the best tools, weekly

One email every Friday. No spam, unsubscribe anytime.