Backpressure, Explained Through a Queue That Won't Fall Over

A queue feels like the safe answer. Producer writes fast, consumer reads slow, so you drop a buffer between them and assume the buffer absorbs the difference. It does — right up until the producer is faster than the consumer for long enough that the buffer is no longer a buffer. It’s a backlog. And a backlog with no ceiling is a memory leak that takes a while to show up in your graphs.

Backpressure is the mechanism that stops that. It’s the signal that travels backward — from the slow consumer to the fast producer — saying “slow down, I can’t keep up.” Without it, the producer keeps shoving work into a queue that grows until the process is killed by the OOM killer or the latency on every queued item climbs past the point where the result still matters.

The unbounded queue is the bug, not the fix

Here’s the version most of us write first:

queue = []  # no max size

def produce(item):
    queue.append(item)   # never blocks, never fails

def consume():
    while queue:
        handle(queue.pop(0))

This works in every test you run, because in a test the producer stops. In production the producer doesn’t stop. If handle() takes 50ms and items arrive every 10ms, the queue grows by 4 items per cycle, forever. Memory climbs linearly. The 10,000th item waits roughly 500 seconds before anyone looks at it. By the time you see the memory alert, the queued work is already stale.

The fix is not a bigger queue. A bigger queue just moves the cliff further out and makes the fall taller. The fix is a bounded queue plus a decision about what happens when it’s full. That decision is backpressure.

Four things a full queue can do

Once the queue has a maximum size, produce() has to answer one question: what do I do when there’s no room? There are exactly four honest answers, and picking the wrong one for your workload is how systems fail in surprising ways.

Block the producer. The producer waits until a slot frees up. This is the cleanest form of backpressure — the slowness propagates all the way up the chain, and an upstream HTTP server starts returning slower, which makes its clients slow down. Go channels with a fixed capacity do this by default: a send on a full channel blocks. The risk is that blocking can cascade into a deadlock if the producer holds a lock the consumer needs.

Drop the new item. Reject what just arrived. Sensible when fresh data supersedes old — a metrics pipeline sampling 1-in-N under load loses precision, not correctness. You must surface the drop as a counter, or you’ve built silent data loss.

Drop the oldest item. Evict the head to make room for the tail. Right when the newest data is the most valuable: live sensor readings, the current price, the latest frame. A ring buffer does this for free.

Fail fast. Return an error to the caller immediately — HTTP 503, a rejected future. This is what a bounded thread pool’s rejection policy does, and it’s the foundation of load shedding: better to cleanly reject 10% of requests than to slowly degrade 100% of them into timeouts.

Why “just add a queue” hides the real number

The queue length you can tolerate is determined by Little’s Law: the average number of items in the system equals arrival rate times the time each item spends inside. Flip it around and a full queue tells you your worst-case latency. A 1,000-slot queue draining at 200 items/second means a freshly queued item waits up to 5 seconds. If your SLA is 1 second, your queue is already four times too deep — and no amount of buffering fixes that, because the buffer is the latency.

This is the part people skip. A queue doesn’t make a slow consumer fast. It converts a throughput problem into a latency problem and hides it inside a data structure. Backpressure forces the throughput problem back into the open where you can either scale the consumer, shed load, or tell the producer the truth.

Reactive libraries make this explicit. In Reactive Streams (the contract behind Project Reactor, RxJava, and Akka Streams), the consumer calls request(n) to pull a specific number of items, and the producer is contractually forbidden from sending more than were requested. The demand signal is the backpressure. Node.js streams do the same with a lower-level handshake: writable.write() returns false when the internal buffer is over its highWaterMark, and a well-behaved producer pauses until the 'drain' event fires.

function pump(source, dest) {
  for (const chunk of source) {
    const ok = dest.write(chunk);
    if (!ok) {
      // buffer is full — stop until it drains
      source.pause();
      dest.once('drain', () => source.resume());
      return;
    }
  }
}

That if (!ok) is the whole idea. The plumbing exists in your runtime already. The bug is ignoring the return value — calling write() in a tight loop without checking it is the Node equivalent of the unbounded queue.append() above.

When you’re tracing a backpressure path through an unfamiliar codebase — finding every place a producer ignores the consumer’s signal — an editor with whole-repo context speeds up the read considerably. You’re looking for the inverse of a pattern (writes with no corresponding check), and that’s exactly the kind of structural search an AI-assisted editor handles better than grep.

Cursor

AI-native code editor with full-repo context — useful for tracing where a producer ignores a consumer's backpressure signal across files.

Free tier; Pro from $20/mo

Try Cursor

Affiliate link · We earn a commission at no cost to you.

The mental model to keep: a queue is a shock absorber for bursts, not a fix for a sustained rate mismatch. Size it for the burst you expect, bound it hard, and decide — explicitly, in code — what happens at the boundary. A queue that won’t fall over is just a queue whose full case you actually wrote.

FAQ

What's the difference between backpressure and rate limiting?

Rate limiting is a fixed cap the producer applies to itself regardless of the consumer's state — 100 requests per second, always. Backpressure is dynamic and reactive: the consumer's actual capacity, right now, determines how fast the producer is allowed to go. Rate limiting protects against a known ceiling; backpressure adapts to a moving one.

How big should I make a bounded queue?

Size it for the largest short burst you expect to absorb, then check the worst-case latency it implies using Little's Law (queue depth divided by drain rate). If that latency exceeds your SLA, the queue is too deep — you need a faster consumer or load shedding, not more slots. Start small; a deep queue masks the real throughput problem.

Does an async runtime give me backpressure for free?

No. Async lets you handle many in-flight operations without blocking threads, but nothing stops you from spawning unbounded work. You still need a bounded channel, a semaphore, or a demand-based protocol like Reactive Streams to actually limit concurrency. async/await is a concurrency model, not a flow-control mechanism.

Backpressure, Explained Through a Queue That Won't Fall Over

The unbounded queue is the bug, not the fix

Four things a full queue can do

Why “just add a queue” hides the real number

Cursor

FAQ

TCP vs UDP, Explained Through What Breaks When You Pick Wrong

Write-Ahead Logging: How Databases Survive a Power Cut

What a Bloom Filter Actually Saves You (and When It Lies)

Idempotency, Explained Through the Retry That Doesn't Double-Charge

Git Plumbing in Practice: How CI, Review Tools, and AI Agents Build on Git's Primitives

Get the best tools, weekly