Backpressure, Explained Through a Queue That Won't Fall Over
What backpressure actually is, why an unbounded queue is a memory leak in disguise, and the four strategies a producer can take when a consumer falls behind.
A queue feels like the safe answer. Producer writes fast, consumer reads slow, so you drop a buffer between them and assume the buffer absorbs the difference. It does — right up until the producer is faster than the consumer for long enough that the buffer is no longer a buffer. It’s a backlog. And a backlog with no ceiling is a memory leak that takes a while to show up in your graphs.
Backpressure is the mechanism that stops that. It’s the signal that travels backward — from the slow consumer to the fast producer — saying “slow down, I can’t keep up.” Without it, the producer keeps shoving work into a queue that grows until the process is killed by the OOM killer or the latency on every queued item climbs past the point where the result still matters.
The unbounded queue is the bug, not the fix
Here’s the version most of us write first:
queue = [] # no max size
def produce(item):
queue.append(item) # never blocks, never fails
def consume():
while queue:
handle(queue.pop(0))
This works in every test you run, because in a test the producer stops. In production the producer doesn’t stop. If handle() takes 50ms and items arrive every 10ms, the queue grows by 4 items per cycle, forever. Memory climbs linearly. The 10,000th item waits roughly 500 seconds before anyone looks at it. By the time you see the memory alert, the queued work is already stale.
The fix is not a bigger queue. A bigger queue just moves the cliff further out and makes the fall taller. The fix is a bounded queue plus a decision about what happens when it’s full. That decision is backpressure.
Four things a full queue can do
Once the queue has a maximum size, produce() has to answer one question: what do I do when there’s no room? There are exactly four honest answers, and picking the wrong one for your workload is how systems fail in surprising ways.
Block the producer. The producer waits until a slot frees up. This is the cleanest form of backpressure — the slowness propagates all the way up the chain, and an upstream HTTP server starts returning slower, which makes its clients slow down. Go channels with a fixed capacity do this by default: a send on a full channel blocks. The risk is that blocking can cascade into a deadlock if the producer holds a lock the consumer needs.
Drop the new item. Reject what just arrived. Sensible when fresh data supersedes old — a metrics pipeline sampling 1-in-N under load loses precision, not correctness. You must surface the drop as a counter, or you’ve built silent data loss.
Drop the oldest item. Evict the head to make room for the tail. Right when the newest data is the most valuable: live sensor readings, the current price, the latest frame. A ring buffer does this for free.
Fail fast. Return an error to the caller immediately — HTTP 503, a rejected future. This is what a bounded thread pool’s rejection policy does, and it’s the foundation of load shedding: better to cleanly reject 10% of requests than to slowly degrade 100% of them into timeouts.
Why “just add a queue” hides the real number
The queue length you can tolerate is determined by Little’s Law: the average number of items in the system equals arrival rate times the time each item spends inside. Flip it around and a full queue tells you your worst-case latency. A 1,000-slot queue draining at 200 items/second means a freshly queued item waits up to 5 seconds. If your SLA is 1 second, your queue is already four times too deep — and no amount of buffering fixes that, because the buffer is the latency.
This is the part people skip. A queue doesn’t make a slow consumer fast. It converts a throughput problem into a latency problem and hides it inside a data structure. Backpressure forces the throughput problem back into the open where you can either scale the consumer, shed load, or tell the producer the truth.
Reactive libraries make this explicit. In Reactive Streams (the contract behind Project Reactor, RxJava, and Akka Streams), the consumer calls request(n) to pull a specific number of items, and the producer is contractually forbidden from sending more than were requested. The demand signal is the backpressure. Node.js streams do the same with a lower-level handshake: writable.write() returns false when the internal buffer is over its highWaterMark, and a well-behaved producer pauses until the 'drain' event fires.
function pump(source, dest) {
for (const chunk of source) {
const ok = dest.write(chunk);
if (!ok) {
// buffer is full — stop until it drains
source.pause();
dest.once('drain', () => source.resume());
return;
}
}
}
That if (!ok) is the whole idea. The plumbing exists in your runtime already. The bug is ignoring the return value — calling write() in a tight loop without checking it is the Node equivalent of the unbounded queue.append() above.
When you’re tracing a backpressure path through an unfamiliar codebase — finding every place a producer ignores the consumer’s signal — an editor with whole-repo context speeds up the read considerably. You’re looking for the inverse of a pattern (writes with no corresponding check), and that’s exactly the kind of structural search an AI-assisted editor handles better than grep.
Cursor
AI-native code editor with full-repo context — useful for tracing where a producer ignores a consumer's backpressure signal across files.
Free tier; Pro from $20/mo
Affiliate link · We earn a commission at no cost to you.
The mental model to keep: a queue is a shock absorber for bursts, not a fix for a sustained rate mismatch. Size it for the burst you expect, bound it hard, and decide — explicitly, in code — what happens at the boundary. A queue that won’t fall over is just a queue whose full case you actually wrote.
FAQ
What's the difference between backpressure and rate limiting?
How big should I make a bounded queue?
Does an async runtime give me backpressure for free?
Related reading
2026-06-22
TCP vs UDP, Explained Through What Breaks When You Pick Wrong
TCP and UDP aren't interchangeable. We walk through the exact failure modes — head-of-line blocking, silent packet loss, Nagle delays — that show up when you pick the wrong transport.
2026-06-22
Write-Ahead Logging: How Databases Survive a Power Cut
How write-ahead logging keeps your data intact when the machine dies mid-write — the log-first rule, fsync, checkpoints, and why PostgreSQL and SQLite both rely on it.
2026-06-22
What a Bloom Filter Actually Saves You (and When It Lies)
A bloom filter trades a small false-positive rate for big memory savings. Here is the math behind the trade, where it pays off, and the failure mode that bites people.
2026-06-22
Idempotency, Explained Through the Retry That Doesn't Double-Charge
A practical look at idempotency keys: why a retried payment request shouldn't charge a card twice, how the pattern works, and where it quietly breaks in production.
2026-06-12
Git Plumbing in Practice: How CI, Review Tools, and AI Agents Build on Git's Primitives
How CI runners, stacked-diff CLIs, code review systems, and AI coding agents build on Git's object model — blobs, trees, commits, and refs — instead of reinventing version control, and how to start building on the plumbing yourself.
Get the best tools, weekly
One email every Friday. No spam, unsubscribe anytime.