What Back-Pressure Is, and Why Your Queues Need It
A queue without back-pressure does not absorb load — it hides it until you run out of memory. Learn what back-pressure means, the four ways systems apply it, and how to add it to your own services.
A queue sits between something that produces work and something that consumes it. The pitch is that it smooths out bursts: when requests arrive faster than you can handle them, they wait in line instead of failing. That works right up until the line grows faster than it drains. Then the queue stops being a buffer and becomes the place your system goes to die — usually by running out of memory, sometimes by serving responses so stale they are worthless. Back-pressure is the mechanism that stops that from happening. It is the signal a slow consumer sends back to a fast producer that says: slow down, I cannot keep up.
A queue is a shock absorber until it isn’t
The failure mode is worth spelling out because it is so common. Say your producer pushes 1,000 messages per second and your consumer processes 800 per second. Every second, 200 messages pile up. The queue depth climbs linearly. For the first few minutes nothing looks wrong — latency creeps up, but no errors. Then one of three things happens, depending on where the queue lives.
If the queue is an in-memory list in your process, you exhaust the heap and the process is killed by the OOM killer, dropping every queued message at once. If it is a managed broker with a bounded size, it starts rejecting new messages — and now your producer is the one getting errors, often without code to handle them. If it is an unbounded broker with disk spillover, the queue keeps growing and your consumers fall further and further behind, so by the time a message is finally processed the event it describes is minutes old and the user has long since given up.
None of these is the queue “absorbing” load. The load was never absorbed. It was deferred, and deferral with no exit is just a slower crash. An unbounded queue does not have a capacity problem you can buy your way out of with more RAM — it has a rate problem, and the only real fixes are to speed up the consumer, slow down the producer, or shed work. Back-pressure is how the system chooses one of those automatically instead of waiting for the OOM killer to choose for it.
How back-pressure actually works
Back-pressure is not one technique — it is a category. The shared idea is that the consumer’s capacity is allowed to influence the producer’s rate. There are four common ways to wire that influence, in rough order of how gentle they are.
Blocking. The simplest form. When the queue is full, the producer’s put call blocks until a slot frees up. Go channels do this by default: send to a full channel and the goroutine parks until a receiver pulls a value. Java’s ArrayBlockingQueue does the same. Blocking propagates back-pressure for free — a slow consumer literally stalls the producer — but it only works when producer and consumer share a thread-of-control or process. It does not cross a network.
Dropping and load shedding. When you cannot afford to block — say the producer is handling live HTTP requests and stalling it would stall users — you drop instead. Either drop the newest message (reject the incoming request with a 429 or 503), or drop the oldest (overwrite stale data nobody will miss). Load shedding is the deliberate version: under overload, reject a fraction of requests immediately so the ones you do accept finish on time. A system that sheds 20% of load and serves the rest in 50 ms is healthier than one that accepts everything and serves all of it in 8 seconds.
Credit-based flow control. The consumer hands the producer a budget of “credits” — permission to send N messages. The producer spends a credit per message and stops when it runs out, replenishing only when the consumer grants more. TCP’s receive window works exactly this way: the receiver advertises how many bytes it has buffer for, and the sender is not allowed to exceed it. gRPC and HTTP/2 carry the same idea up to the application layer. Credits are how you do back-pressure across a network, where blocking is not an option.
Pull instead of push. Invert the relationship: instead of the producer pushing whenever it has data, the consumer pulls when it is ready. Kafka consumers poll for records at their own pace; the broker never forces messages on them. Reactive Streams (the spec behind RxJava, Project Reactor, and Java’s Flow API) builds the entire protocol around a request(n) call where the subscriber asks for exactly as many items as it can handle. Pull-based systems have back-pressure built into their shape — a slow consumer simply pulls less often.
Putting back-pressure into your own services
You rarely need to invent any of this. You need to not opt out of it, which is surprisingly easy to do by accident.
Start by bounding every queue, including the implicit ones. A thread pool’s task queue is a queue. A buffered channel is a queue. An async runtime’s pending-task list is a queue. Each has a configuration for maximum size and a policy for what to do when full — set both deliberately rather than taking the default, which is frequently “unbounded.”
Next, make sure the bound actually propagates. A bounded queue that silently drops messages when full has back-pressure that goes nowhere. The whole point is that fullness becomes a signal someone upstream reacts to — a blocked call, a rejected request, a paused poll. If your producer ignores the rejection and retries in a tight loop, you have replaced a memory leak with a busy-wait. Honor the signal: back off, or propagate the rejection further upstream until it reaches something that can legitimately slow down or shed.
Finally, decide your overload policy before you are overloaded. “What do we drop, and how do we tell the caller?” is a product question as much as a technical one. Dropping the oldest analytics events is fine; dropping the oldest payment instructions is not. Returning a 503 with a Retry-After header lets a well-behaved client cooperate; silently timing out does not. The systems that survive traffic spikes are not the ones with the biggest queues. They are the ones that decided, in advance, exactly how they would say no.
FAQ
Is back-pressure the same as rate limiting?+
Does adding more consumers remove the need for back-pressure?+
Where does TCP fit into this?+
Related reading
2026-06-12
Git Plumbing in Practice: How CI, Review Tools, and AI Agents Build on Git's Primitives
How CI runners, stacked-diff CLIs, code review systems, and AI coding agents build on Git's object model — blobs, trees, commits, and refs — instead of reinventing version control, and how to start building on the plumbing yourself.
2026-06-10
LSM-Trees vs B-Trees: The Write-Optimized Database Tradeoff
Why some databases append writes and reconcile later while others edit in place — and how that one choice shapes write throughput, read latency, and disk usage.
2026-06-10
Copy-on-Write, Explained Through fork() and Snapshots
How copy-on-write defers copying until a write actually happens — the mechanism behind fast fork(), filesystem snapshots, and database MVCC, explained with page tables and page faults.
2026-06-10
A Coroutine Is Not a Thread: What Suspends, What Gets Scheduled, and Why It Matters
A coroutine suspends and resumes cooperatively; a thread is preempted by the OS. Here is the real difference in scheduling, memory, and parallelism — and when each one wins.
2026-06-10
Two's Complement: How Computers Represent Negative Numbers
How two's complement encodes negative integers, why CPUs run signed and unsigned math on one adder, and the edge cases — INT_MIN, overflow, sign extension — that cause real bugs.
Get the best tools, weekly
One email every Friday. No spam, unsubscribe anytime.