Consistent Hashing, Explained Through the Problem It Actually Solves
Why hash(key) % N falls apart when you add a server, how the hash ring fixes it, and what virtual nodes do — a practical walkthrough for developers.
Most explanations of consistent hashing start with the ring. That’s backwards. The ring only makes sense once you’ve felt the pain it removes, so we’ll start with the pain.
The remapping problem
You have 40 million cached objects spread across 4 cache servers. The rule that decides which server holds which object is one line: server = hash(key) % 4. It works. Reads land on the right box, load is roughly even, and you move on.
Then traffic grows and you add a fifth server. The rule becomes hash(key) % 5. That one-character change is far more violent than it looks. A key stays on its old server only if hash(key) % 4 and hash(key) % 5 happen to agree, and across the full keyspace they agree about 20% of the time. The other 80% — roughly 32 of your 40 million objects — now resolve to a different server.
For a cache, that’s close to a cold start. Thirty-two million lookups miss, fall through to the database, and refetch at the same moment you were trying to reduce load by adding capacity. The act of scaling up triggers a thundering herd.
The root cause is that modulo ties every key’s location to the exact value of N. Change N at all and you’ve rewritten the placement of nearly the entire dataset. What you actually want is for adding one server to disturb roughly one server’s worth of keys — no more.
How the ring fixes it
Consistent hashing changes the question. Instead of mapping keys into “one of N slots,” you map both keys and servers onto the same large circular number space — say 0 to 2³² − 1, with the top wrapped around to meet the bottom to form a ring.
Hash each server’s name to a point on the ring. Hash each key to a point too. A key belongs to the first server you hit walking clockwise from the key’s position. That’s the entire ownership rule.
Now watch what happens when you add a server. It lands at a single point on the ring and claims only the keys sitting in the arc between it and the next server counter-clockwise. Every other key on the ring keeps the exact owner it had before. On average the new server pulls in K/N keys — about 8 million of our 40 million, not 32 million — and those are the only keys that move.
Removal is the mirror image. When a server leaves the ring, the keys it owned fall to the next server clockwise, and nothing else is touched. No global reshuffle, no cache stampede across the fleet — just one arc changing hands.
Virtual nodes, and where this runs in production
The basic ring has two weaknesses, and both come from the same source: with only N points scattered on a huge ring, the arcs between them are uneven. One server might randomly own a 30% slice of the ring while another owns 5%. Worse, when a server dies, its entire slice lands on a single clockwise neighbor, which can see its load jump sharply in an instant.
Virtual nodes solve both at once. Instead of placing each physical server at one point, you place it at many — 100 to 200 is a common range — by hashing server-name#1, server-name#2, and so on. Each physical server now owns a hundred small arcs scattered around the ring rather than one big one.
Two things improve. First, distribution smooths out: more points means the law of large numbers works in your favor, and the gap between the busiest and least-busy server typically shrinks into the single-digit percentage range. Second, failure is graceful — a dead server’s hundred small arcs are inherited by many different neighbors instead of dumping everything on one, so the recovered load spreads across the fleet.
This isn’t academic. Amazon’s 2007 Dynamo paper put consistent hashing into wide industrial use, and Cassandra and Riak inherited the design directly. memcached client libraries distribute keys across servers with it through the “ketama” algorithm. CDNs and L7 load balancers use it to pin a given client to the same backend so sessions and warm caches survive, even as the backend pool changes underneath.
If you’re implementing it yourself, the core is small: a sorted list of (hash, node) pairs and a binary search to find the first entry clockwise of a key’s hash. The subtle parts are choosing a hash with good distribution and getting the wrap-around case right at the top of the ring.
Cursor
Building a ring from scratch is a great way to internalize it. An AI-assisted editor is handy for scaffolding the sorted-ring data structure and writing the edge-case tests — empty ring, single node, wrap-around at 2³² − 1 — that are easy to forget.
Free tier; Pro $20/mo
Affiliate link · We earn a commission at no cost to you.
The mental model to keep: modulo hashing optimizes for even distribution at a fixed size and pays for it catastrophically when the size changes. Consistent hashing trades a little structural complexity for stability under change. If your node count never moves, plain modulo is simpler and fine. The moment nodes come and go — autoscaling, failures, rolling deploys — the ring earns its keep.
FAQ
Isn't consistent hashing just hashing the key twice?+
How many virtual nodes should I use per server?+
When is plain hash % N actually fine?+
Related reading
2026-06-09
What a Merkle Tree Is, and Where You've Already Seen One
A Merkle tree hashes data into a single fingerprint so you can verify any piece without downloading the whole set. Here's how it works and where it already runs in your stack.
2026-06-09
What a Write-Ahead Log Is, and Why Databases Trust It
A practical look at the write-ahead log: the durability trick behind Postgres, SQLite, and most databases, and what it means when a server loses power mid-write.
2026-06-08
Database Isolation Levels Explained: The Anomalies Each One Prevents
A practical guide to the four SQL isolation levels, the concurrency anomalies they forbid, and how PostgreSQL and MySQL actually behave at the same level name.
2026-06-08
What the CAP Theorem Actually Means for Your Application
The CAP theorem isn't 'pick two of three.' It's a rule about what happens during a network partition — and most of the time, no partition is happening at all.
2026-06-09
The Circuit Breaker Pattern, Explained for Resilient Systems
How the circuit breaker pattern stops one slow dependency from taking down your whole service — states, thresholds, and the defaults real libraries ship with.
Get the best tools, weekly
One email every Friday. No spam, unsubscribe anytime.