Why Long-Running AI Agents Break on HTTP, and How Ably's Durable Sessions Fix It
HTTP's request-response model was never built for AI agents that run for minutes or hours. Here is why connections drop mid-task and how Ably's durable sessions keep messages, state, and reconnects intact.
An AI agent that summarizes a paragraph finishes in two seconds. An AI agent that researches a question, calls six tools, and drafts a report can run for four minutes — or forty. The first fits HTTP comfortably. The second fights it the whole way.
Most agent backends are still wired the way web apps have been wired since the 1990s: a client sends a request, the server sends a response, the connection closes. That contract holds because the response usually arrives fast enough that nobody notices the connection was open at all. Long-running agents break the contract. They produce output gradually, they outlive the patience of every proxy between client and server, and they keep working even after the user closes the tab. We dug into why this fails so often, and how Ably’s durable session model is built to absorb it.
Where HTTP runs out of road
HTTP’s request-response cycle assumes a short, bounded exchange. Three things go wrong once an agent runs for minutes instead of milliseconds.
Idle timeouts close the socket. Your connection passes through load balancers, reverse proxies, and CDNs, and each one drops connections that go quiet. An AWS Application Load Balancer closes idle connections after 60 seconds by default. An agent that reasons for 90 seconds before emitting its first token has already lost the socket underneath it.
Streaming is still one fragile pipe. Server-Sent Events and WebSockets hold the connection open and solve the timeout, which is why most agent UIs use them today. But the stream is bound to a single TCP connection. When a phone switches from Wi-Fi to cellular, a laptop sleeps, or the server is redeployed mid-task, that connection dies — and every token emitted during the gap is gone. The agent kept running on the server; the client simply stopped hearing it.
Nothing remembers what was missed. Reopen the connection and you get a fresh stream from that instant forward. HTTP gives you no way to ask which messages arrived between second 30 and second 95. The protocol has no concept of a session that outlives the socket.
What durable sessions actually mean
Ably’s approach is to stop treating the session and the connection as the same object. A durable session is a logical channel that lives on the server; the WebSocket connection is just a temporary attachment to it. Three mechanisms make that work.
Decoupled lifecycle. The agent publishes to a channel, not to a socket. The session exists whether or not a client is currently listening. The user can shut the laptop, the agent keeps running, and the messages wait on the channel.
Message persistence and replay. Every message gets an ID and is retained for a configurable window. Ably’s history and rewind features let a reconnecting client ask for everything since a given message ID and receive the gap in order — no tokens lost, no duplicates inserted.
Connection state recovery. When a client reconnects inside the recovery window — roughly two minutes by default — Ably restores the prior connection state and resumes delivery from the last message the client acknowledged. To the application, the interruption never happened.
Presence sits alongside these three: the server can see whether a human is currently attached, so an agent can decide whether to stream every token or just checkpoint its progress and notify the user later.
Patterns for infrastructure that survives a dropped connection
You don’t need Ably specifically to apply the ideas, but you do need to design for them on purpose.
Give every message a monotonic ID. Ordering and gap detection are impossible without one. The client tracks the last ID it processed, and reconnect logic replays from there.
Make the session the unit of work, not the request. Store run state — current step, tool calls, partial output — keyed by a session ID the client holds. Reconnection re-attaches to that ID; it never re-submits the prompt and never starts the agent over.
Guard every side effect. Even with clean resume logic, a tool call that fires twice should not double-charge a card or send two emails. Put an idempotency key on each external action.
Separate “the agent finished” from “the client got the result.” Persist the final output, and treat delivery as its own retryable step. An agent that completes while the user is offline should still deliver when they return.
Done together, these patterns turn a dropped connection from a lost task into a resumable one — the difference between an agent demo and an agent users trust with a forty-minute job.
Cursor
Building the agent and its reconnection layer yourself? An AI-native editor keeps the session, replay, and idempotency code in context as you write it, which is where most of the subtle bugs hide.
Free tier; Pro from $20/month
Affiliate link · We earn a commission at no cost to you.
Common questions
FAQ
Can't I just use WebSockets and handle reconnection myself? +
Does any of this matter if my agent finishes in under 30 seconds? +
Should agent run state live in Ably or in my own database? +
Related tools
Beehiiv
Newsletter platform with built-in ad network and Boost referrals.
Try Beehiiv →
Webflow
Visual site builder with real CSS export and a CMS that scales.
Try Webflow →
Some links above are affiliate links. We may earn a commission if you sign up. See our disclosure for details.
Related reading
2026-05-21
Concurrency, Retries, and Timeouts: Building Reliable AI Agents in TypeScript
Why Promise.race leaks model calls and billing in AI agents, and how a single-owner pattern with AbortSignal, deadline budgets, and jittered retries fixes it.
2026-05-21
Temporal Hits 3,000 Customers: Durable Execution for AI Agent Workflows
Temporal's durable execution engine crossed 3,000 paying customers as teams building long-running LLM agents swap DIY retry code for crash-proof workflows. We break down what durable execution buys you and where it costs you.
2026-05-21
MinIO MemKV and the AI Recompute Tax: What KV Cache Offloading Actually Buys You
MinIO's MemKV offloads transformer KV cache to persistent memory tiers so agentic AI pipelines reload attention state instead of recomputing it. We break down the recompute tax, MinIO's 95% utilization claim, and when reload actually beats recompute.
2026-05-21
Why AI Agents Fail Silently and How to Build an Observability Monitor
AI agents return 200s and exit cleanly while hallucinating, degrading under rate limits, and overrunning budgets. Here are the four silent failure modes and a minimal monitor that catches them in production.
2026-05-20
Training an LLM in Swift: Optimizing Matrix Multiplication from Gflop/s to Tflop/s
A technical walkthrough of optimizing matrix multiplication in Swift on Apple Silicon — loop reordering, cache blocking, SIMD, multithreading, and GPU offload — and why matmul throughput sets your LLM training speed.
Get the best tools, weekly
One email every Friday. No spam, unsubscribe anytime.