Why Long-Running AI Agents Break on HTTP, and How Ably's Durable Sessions Fix It
HTTP's request-response model was never built for AI agents that run for minutes or hours. Here is why connections drop mid-task and how Ably's durable sessions keep messages, state, and reconnects intact.
An AI agent that summarizes a paragraph finishes in two seconds. An AI agent that researches a question, calls six tools, and drafts a report can run for four minutes — or forty. The first fits HTTP comfortably. The second fights it the whole way.
Most agent backends are still wired the way web apps have been wired since the 1990s: a client sends a request, the server sends a response, the connection closes. That contract holds because the response usually arrives fast enough that nobody notices the connection was open at all. Long-running agents break the contract. They produce output gradually, they outlive the patience of every proxy between client and server, and they keep working even after the user closes the tab. We dug into why this fails so often, and how Ably’s durable session model is built to absorb it.
Where HTTP runs out of road
HTTP’s request-response cycle assumes a short, bounded exchange. Three things go wrong once an agent runs for minutes instead of milliseconds.
Idle timeouts close the socket. Your connection passes through load balancers, reverse proxies, and CDNs, and each one drops connections that go quiet. An AWS Application Load Balancer closes idle connections after 60 seconds by default. An agent that reasons for 90 seconds before emitting its first token has already lost the socket underneath it.
Streaming is still one fragile pipe. Server-Sent Events and WebSockets hold the connection open and solve the timeout, which is why most agent UIs use them today. But the stream is bound to a single TCP connection. When a phone switches from Wi-Fi to cellular, a laptop sleeps, or the server is redeployed mid-task, that connection dies — and every token emitted during the gap is gone. The agent kept running on the server; the client simply stopped hearing it.
Nothing remembers what was missed. Reopen the connection and you get a fresh stream from that instant forward. HTTP gives you no way to ask which messages arrived between second 30 and second 95. The protocol has no concept of a session that outlives the socket.
What durable sessions actually mean
Ably’s approach is to stop treating the session and the connection as the same object. A durable session is a logical channel that lives on the server; the WebSocket connection is just a temporary attachment to it. Three mechanisms make that work.
Decoupled lifecycle. The agent publishes to a channel, not to a socket. The session exists whether or not a client is currently listening. The user can shut the laptop, the agent keeps running, and the messages wait on the channel.
Message persistence and replay. Every message gets an ID and is retained for a configurable window. Ably’s history and rewind features let a reconnecting client ask for everything since a given message ID and receive the gap in order — no tokens lost, no duplicates inserted.
Connection state recovery. When a client reconnects inside the recovery window — roughly two minutes by default — Ably restores the prior connection state and resumes delivery from the last message the client acknowledged. To the application, the interruption never happened.
Presence sits alongside these three: the server can see whether a human is currently attached, so an agent can decide whether to stream every token or just checkpoint its progress and notify the user later.
Patterns for infrastructure that survives a dropped connection
You don’t need Ably specifically to apply the ideas, but you do need to design for them on purpose.
Give every message a monotonic ID. Ordering and gap detection are impossible without one. The client tracks the last ID it processed, and reconnect logic replays from there.
Make the session the unit of work, not the request. Store run state — current step, tool calls, partial output — keyed by a session ID the client holds. Reconnection re-attaches to that ID; it never re-submits the prompt and never starts the agent over.
Guard every side effect. Even with clean resume logic, a tool call that fires twice should not double-charge a card or send two emails. Put an idempotency key on each external action.
Separate “the agent finished” from “the client got the result.” Persist the final output, and treat delivery as its own retryable step. An agent that completes while the user is offline should still deliver when they return.
Done together, these patterns turn a dropped connection from a lost task into a resumable one — the difference between an agent demo and an agent users trust with a forty-minute job.
Cursor
Building the agent and its reconnection layer yourself? An AI-native editor keeps the session, replay, and idempotency code in context as you write it, which is where most of the subtle bugs hide.
Free tier; Pro from $20/month
Affiliate link · We earn a commission at no cost to you.
Common questions
FAQ
Can't I just use WebSockets and handle reconnection myself? +
Does any of this matter if my agent finishes in under 30 seconds? +
Should agent run state live in Ably or in my own database? +
Related tools
Beehiiv
Newsletter platform with built-in ad network and Boost referrals.
Try Beehiiv →
Webflow
Visual site builder with real CSS export and a CMS that scales.
Try Webflow →
Some links above are affiliate links. We may earn a commission if you sign up. See our disclosure for details.
Related reading
2026-05-26
ROCm in 2026: Why PyTorch on the RX 7900 XTX Still Falls Short for Research
A measured look at where AMD ROCm with PyTorch and PyTorch Lightning still has rough edges on the RX 7900 XTX in 2026, and what that means if you are porting CUDA training workloads.
2026-05-26
GPT-5.5 Instant vs GPT-5.3: Which of OpenAI's Three Claims Hold Up
OpenAI swapped ChatGPT's default to GPT-5.5 Instant overnight, claiming faster responses, sharper reasoning, and fewer hallucinations. We grade each claim against independent testing and show developers what to change in their API stack.
2026-05-26
OpenAI Daybreak vs Anthropic Glasswing: Identical Benchmarks, Shared Partners
OpenAI's Daybreak and Anthropic's Glasswing shipped the same week with matching cybersecurity benchmarks and overlapping enterprise partners. Here's what the convergence signals and how to evaluate either for your AppSec pipeline.
2026-05-26
Macchiato Day 2 Review: Live Token Metrics and Parallel AI Terminals
Macchiato's Day 2 release ships a live token sidebar, per-agent cost dashboard, and shortcuts for Claude Code and OpenCode. Here is what changes for developers running multiple AI agents.
2026-05-21
Concurrency, Retries, and Timeouts: Building Reliable AI Agents in TypeScript
Why Promise.race leaks model calls and billing in AI agents, and how a single-owner pattern with AbortSignal, deadline budgets, and jittered retries fixes it.
Get the best tools, weekly
One email every Friday. No spam, unsubscribe anytime.