How Hash Tables Achieve O(1) Lookups

A hash table looks like magic the first time you reach for one: you store a million entries, then pull any single value back out in roughly the same time it takes to read one with ten entries. No scanning, no comparisons against every key. The trick is turning the key itself into the address where its value lives.

From key to index

The core idea is a function that converts any key into a number, then folds that number into the bounds of a backing array. Storing a value at key k means computing index = hash(k) % capacity and writing to that slot. Reading it back computes the same index and reads the same slot. Array indexing is a single memory operation — it does not get slower as the array fills — so the lookup cost is dominated by computing the hash, not by the number of stored entries.

Picture a table with 8 slots holding strings keyed by name:

hash("alice") % 8 -> 3   slots[3] = "alice's data"
hash("bob")   % 8 -> 6   slots[6] = "bob's data"
hash("carol") % 8 -> 1   slots[1] = "carol's data"

To look up "bob", you hash it again, get 6, and read slots[6] directly. You never touch slots 3 or 1. That direct jump is the whole reason lookups are constant time on average: the work does not depend on n, the number of entries.

For this to hold, the hash function must spread keys evenly across the slots. A good hash makes outputs look random, so keys scatter uniformly. A bad hash that clumps keys into a few slots destroys the property — and so does an attacker who deliberately picks keys that collide.

Collisions and how they are resolved

Because you are squeezing an enormous space of possible keys into a small array, two different keys will eventually compute the same index. This is a collision, and every hash table needs a strategy for it.

Separate chaining stores a small secondary structure — usually a linked list — at each slot. Colliding keys are appended to the list at that slot. A lookup hashes to the slot, then walks the (normally short) list comparing keys. If "dave" also hashes to 6, slot 6 holds a list of both "bob" and "dave", and a lookup checks each.

Open addressing keeps everything in the array itself. On a collision, it probes for the next free slot by a fixed rule — linear probing checks the next slot, the next, and so on. Lookups follow the same probe sequence until they find the key or hit an empty slot. This is more cache-friendly but degrades faster as the table fills.

Either way, the cost of a lookup is proportional to how many keys share its slot (or probe sequence). Keep that number small and lookups stay fast.

Load factor, rehashing, and the worst case

The knob that keeps collisions rare is the load factor: entries divided by capacity. As it climbs toward 1, slots fill up, chains lengthen, and probe sequences grow. So hash tables resize before that happens — typically when the load factor crosses a threshold like 0.75. Resizing allocates a larger array and rehashes every existing entry into it, because % capacity now yields different indices.

Rehashing is an O(n) operation, but it happens rarely and each resize roughly doubles capacity, so the cost spreads thinly across all the cheap inserts in between. Averaged over a sequence of operations, insertion is amortized O(1) — occasional expensive resizes, vastly more often free.

The worst case is therefore O(n): all n keys collide into a single slot, and a lookup must scan all of them. You get O(1) in practice because a decent hash function plus a controlled load factor make that case astronomically unlikely for ordinary data. The guarantee is statistical, not absolute — which is exactly why the choice of hash function matters so much.

FAQ

Why is the worst case O(n) if lookups are supposed to be constant time?

Constant time depends on keys spreading evenly across slots. If every key hashes to the same slot — through a bad hash function or maliciously chosen keys — they all pile into one chain or probe sequence, and a lookup must scan all n of them. That collapses the table to linear-scan behavior.

What is a load factor and why does it trigger resizing?

Load factor is the ratio of stored entries to total capacity. As it rises, collisions become more frequent and lookups slow down. Resizing once it crosses a threshold (often around 0.75) keeps the table sparse, trading occasional O(n) rehashes for consistently fast operations.

What is the difference between separate chaining and open addressing?

Separate chaining stores a secondary list at each slot and appends collisions there. Open addressing keeps all entries in the array and probes for the next free slot on collision. Chaining handles high load factors more gracefully; open addressing is more cache-friendly but degrades faster as the table fills.

How Hash Tables Achieve O(1) Lookups

From key to index

Collisions and how they are resolved

Load factor, rehashing, and the worst case

FAQ

FAQ

TCP vs UDP, Explained Through What Breaks When You Pick Wrong

Write-Ahead Logging: How Databases Survive a Power Cut

Backpressure, Explained Through a Queue That Won't Fall Over

What a Bloom Filter Actually Saves You (and When It Lies)

Idempotency, Explained Through the Retry That Doesn't Double-Charge

Get the best tools, weekly