pickuma.
Dev Knowledge

What Back-Pressure Is, and Why Your Queues Need It

A queue without back-pressure does not absorb load — it hides it until you run out of memory. Learn what back-pressure means, the four ways systems apply it, and how to add it to your own services.

6 min read

A queue sits between something that produces work and something that consumes it. The pitch is that it smooths out bursts: when requests arrive faster than you can handle them, they wait in line instead of failing. That works right up until the line grows faster than it drains. Then the queue stops being a buffer and becomes the place your system goes to die — usually by running out of memory, sometimes by serving responses so stale they are worthless. Back-pressure is the mechanism that stops that from happening. It is the signal a slow consumer sends back to a fast producer that says: slow down, I cannot keep up.

A queue is a shock absorber until it isn’t

The failure mode is worth spelling out because it is so common. Say your producer pushes 1,000 messages per second and your consumer processes 800 per second. Every second, 200 messages pile up. The queue depth climbs linearly. For the first few minutes nothing looks wrong — latency creeps up, but no errors. Then one of three things happens, depending on where the queue lives.

If the queue is an in-memory list in your process, you exhaust the heap and the process is killed by the OOM killer, dropping every queued message at once. If it is a managed broker with a bounded size, it starts rejecting new messages — and now your producer is the one getting errors, often without code to handle them. If it is an unbounded broker with disk spillover, the queue keeps growing and your consumers fall further and further behind, so by the time a message is finally processed the event it describes is minutes old and the user has long since given up.

None of these is the queue “absorbing” load. The load was never absorbed. It was deferred, and deferral with no exit is just a slower crash. An unbounded queue does not have a capacity problem you can buy your way out of with more RAM — it has a rate problem, and the only real fixes are to speed up the consumer, slow down the producer, or shed work. Back-pressure is how the system chooses one of those automatically instead of waiting for the OOM killer to choose for it.

How back-pressure actually works

Back-pressure is not one technique — it is a category. The shared idea is that the consumer’s capacity is allowed to influence the producer’s rate. There are four common ways to wire that influence, in rough order of how gentle they are.

Blocking. The simplest form. When the queue is full, the producer’s put call blocks until a slot frees up. Go channels do this by default: send to a full channel and the goroutine parks until a receiver pulls a value. Java’s ArrayBlockingQueue does the same. Blocking propagates back-pressure for free — a slow consumer literally stalls the producer — but it only works when producer and consumer share a thread-of-control or process. It does not cross a network.

Dropping and load shedding. When you cannot afford to block — say the producer is handling live HTTP requests and stalling it would stall users — you drop instead. Either drop the newest message (reject the incoming request with a 429 or 503), or drop the oldest (overwrite stale data nobody will miss). Load shedding is the deliberate version: under overload, reject a fraction of requests immediately so the ones you do accept finish on time. A system that sheds 20% of load and serves the rest in 50 ms is healthier than one that accepts everything and serves all of it in 8 seconds.

Credit-based flow control. The consumer hands the producer a budget of “credits” — permission to send N messages. The producer spends a credit per message and stops when it runs out, replenishing only when the consumer grants more. TCP’s receive window works exactly this way: the receiver advertises how many bytes it has buffer for, and the sender is not allowed to exceed it. gRPC and HTTP/2 carry the same idea up to the application layer. Credits are how you do back-pressure across a network, where blocking is not an option.

Pull instead of push. Invert the relationship: instead of the producer pushing whenever it has data, the consumer pulls when it is ready. Kafka consumers poll for records at their own pace; the broker never forces messages on them. Reactive Streams (the spec behind RxJava, Project Reactor, and Java’s Flow API) builds the entire protocol around a request(n) call where the subscriber asks for exactly as many items as it can handle. Pull-based systems have back-pressure built into their shape — a slow consumer simply pulls less often.

Putting back-pressure into your own services

You rarely need to invent any of this. You need to not opt out of it, which is surprisingly easy to do by accident.

Start by bounding every queue, including the implicit ones. A thread pool’s task queue is a queue. A buffered channel is a queue. An async runtime’s pending-task list is a queue. Each has a configuration for maximum size and a policy for what to do when full — set both deliberately rather than taking the default, which is frequently “unbounded.”

Next, make sure the bound actually propagates. A bounded queue that silently drops messages when full has back-pressure that goes nowhere. The whole point is that fullness becomes a signal someone upstream reacts to — a blocked call, a rejected request, a paused poll. If your producer ignores the rejection and retries in a tight loop, you have replaced a memory leak with a busy-wait. Honor the signal: back off, or propagate the rejection further upstream until it reaches something that can legitimately slow down or shed.

Finally, decide your overload policy before you are overloaded. “What do we drop, and how do we tell the caller?” is a product question as much as a technical one. Dropping the oldest analytics events is fine; dropping the oldest payment instructions is not. Returning a 503 with a Retry-After header lets a well-behaved client cooperate; silently timing out does not. The systems that survive traffic spikes are not the ones with the biggest queues. They are the ones that decided, in advance, exactly how they would say no.

FAQ

Is back-pressure the same as rate limiting?+
They overlap but differ in where the limit comes from. Rate limiting caps the producer at a fixed rate you choose up front (say, 100 requests per second), regardless of how the consumer is doing. Back-pressure makes the limit dynamic — it reflects the consumer's actual, current capacity. A back-pressured system speeds up when the consumer catches up and slows down when it falls behind, while a rate limiter holds the same ceiling either way.
Does adding more consumers remove the need for back-pressure?+
No. Scaling consumers raises the rate at which the queue drains, which buys headroom, but it does not bound the queue. If producers can still outpace the larger consumer pool — during a spike, or because a downstream dependency slowed down — the queue grows without limit just as before. You still need a maximum size and an overload policy; more consumers just makes them trigger less often.
Where does TCP fit into this?+
TCP is the most widely deployed back-pressure system in existence. Its sliding receive window is credit-based flow control: the receiver advertises how much buffer space it has, and the sender is not permitted to send more than that until the receiver frees space and advertises a larger window. When you write to a socket faster than the other end reads, your write call eventually blocks — that block is TCP propagating back-pressure all the way to your application code.

Related reading

See all Dev Knowledge articles →

Get the best tools, weekly

One email every Friday. No spam, unsubscribe anytime.