How Hash Tables Achieve O(1) Lookups
Why hash tables give average-case constant-time lookups — hashing keys to indices, handling collisions, load factor, rehashing, and the O(n) worst case.
A hash table looks like magic the first time you reach for one: you store a million entries, then pull any single value back out in roughly the same time it takes to read one with ten entries. No scanning, no comparisons against every key. The trick is turning the key itself into the address where its value lives.
From key to index
The core idea is a function that converts any key into a number, then folds that number into the bounds of a backing array. Storing a value at key k means computing index = hash(k) % capacity and writing to that slot. Reading it back computes the same index and reads the same slot. Array indexing is a single memory operation — it does not get slower as the array fills — so the lookup cost is dominated by computing the hash, not by the number of stored entries.
Picture a table with 8 slots holding strings keyed by name:
hash("alice") % 8 -> 3 slots[3] = "alice's data"hash("bob") % 8 -> 6 slots[6] = "bob's data"hash("carol") % 8 -> 1 slots[1] = "carol's data"To look up "bob", you hash it again, get 6, and read slots[6] directly. You never touch slots 3 or 1. That direct jump is the whole reason lookups are constant time on average: the work does not depend on n, the number of entries.
For this to hold, the hash function must spread keys evenly across the slots. A good hash makes outputs look random, so keys scatter uniformly. A bad hash that clumps keys into a few slots destroys the property — and so does an attacker who deliberately picks keys that collide.
Collisions and how they are resolved
Because you are squeezing an enormous space of possible keys into a small array, two different keys will eventually compute the same index. This is a collision, and every hash table needs a strategy for it.
Separate chaining stores a small secondary structure — usually a linked list — at each slot. Colliding keys are appended to the list at that slot. A lookup hashes to the slot, then walks the (normally short) list comparing keys. If "dave" also hashes to 6, slot 6 holds a list of both "bob" and "dave", and a lookup checks each.
Open addressing keeps everything in the array itself. On a collision, it probes for the next free slot by a fixed rule — linear probing checks the next slot, the next, and so on. Lookups follow the same probe sequence until they find the key or hit an empty slot. This is more cache-friendly but degrades faster as the table fills.
Either way, the cost of a lookup is proportional to how many keys share its slot (or probe sequence). Keep that number small and lookups stay fast.
Load factor, rehashing, and the worst case
The knob that keeps collisions rare is the load factor: entries divided by capacity. As it climbs toward 1, slots fill up, chains lengthen, and probe sequences grow. So hash tables resize before that happens — typically when the load factor crosses a threshold like 0.75. Resizing allocates a larger array and rehashes every existing entry into it, because % capacity now yields different indices.
Rehashing is an O(n) operation, but it happens rarely and each resize roughly doubles capacity, so the cost spreads thinly across all the cheap inserts in between. Averaged over a sequence of operations, insertion is amortized O(1) — occasional expensive resizes, vastly more often free.
The worst case is therefore O(n): all n keys collide into a single slot, and a lookup must scan all of them. You get O(1) in practice because a decent hash function plus a controlled load factor make that case astronomically unlikely for ordinary data. The guarantee is statistical, not absolute — which is exactly why the choice of hash function matters so much.
FAQ
FAQ
Why is the worst case O(n) if lookups are supposed to be constant time?+
What is a load factor and why does it trigger resizing?+
What is the difference between separate chaining and open addressing?+
Related reading
2026-06-04
ACID vs BASE: What Database Guarantees Actually Promise
ACID and BASE describe two ends of a tradeoff between strict correctness and scalable availability. Learn what each guarantee means, when each fits, and why most modern databases sit somewhere in between.
2026-06-04
Big-Endian vs Little-Endian
Byte order explained: how big-endian and little-endian lay out multi-byte numbers in memory, why network protocols pick one, and when the difference actually bites you.
2026-06-04
Big-O Notation in Plain English
Big-O describes how an algorithm's runtime or memory grows as input grows. Learn the common classes — O(1), O(log n), O(n), O(n log n), O(n^2), O(2^n) — with plain examples.
2026-06-04
CORS in Plain English: Why the Browser Blocks Your Fetch
A clear walkthrough of CORS and the same-origin policy — what an origin is, why your fetch fails, how servers opt in, and the big misconception about who CORS actually protects.
2026-06-04
Environment Variables and PATH, Explained
What environment variables actually are, why they hold config and secrets, and how PATH decides which binary runs when you type a command.
Get the best tools, weekly
One email every Friday. No spam, unsubscribe anytime.