pickuma.
Dev Knowledge

Race Condition, Explained

A race condition is when a program's correctness depends on the unpredictable timing of concurrent operations. Here's why they happen, the classic lost-update bug, and how to fix them by design.

5 min read

A race condition is a bug where the correctness of your program depends on the timing or interleaving of concurrent operations — and that timing is something you do not control. The code looks right when you read it top to bottom. It is wrong only because two things ran at once, and which one “won the race” changed the answer.

The classic example: a lost update

Imagine two threads sharing a single counter that starts at 0, and each thread runs counter = counter + 1 a thousand times. You would expect a final value of 2000. In practice you often get something less — 1873, 1991, a different number each run.

The reason is that counter = counter + 1 is not one indivisible step. At the machine level it is three steps, a read-modify-write:

1. read counter into a register (sees 41)
2. add 1 in the register (now 42)
3. write the register back (stores 42)

Now interleave two threads, A and B, around the same value:

A: read counter -> 41
B: read counter -> 41 # B read before A wrote
A: add 1 -> 42, write 42
B: add 1 -> 42, write 42 # overwrites A's increment

Both threads incremented, but the counter only went up by one. One update was silently lost. The window where this can happen is tiny — a few CPU cycles — which is exactly why the bug is rare and maddening rather than constant.

The region of code that touches shared state and must not be interleaved is called a critical section. The whole problem is that the read, the modify, and the write of the critical section are not happening as a single, uninterruptible unit.

How you actually fix it

There are three durable approaches, and “add a retry” is not one of them.

Mutual exclusion (locks/mutexes). Wrap the critical section in a lock so only one thread can be inside at a time. The other thread blocks until the first releases the lock. This makes the read-modify-write effectively atomic by serializing access.

lock.acquire()
counter = counter + 1 # only one thread here at a time
lock.release()

The cost is contention: threads wait, and careless locking introduces deadlocks (two threads each holding a lock the other needs).

Atomic operations. Many languages and CPUs offer operations that do read-modify-write as a single hardware-guaranteed step — AtomicInteger.incrementAndGet() in Java, std::atomic in C++, atomic types in Go and Rust. No lock, no interleaving window. Atomics are ideal for simple updates like counters and flags; they do not scale up to “update three related fields together.”

Avoid shared mutable state. The race needs two ingredients: shared state and mutation. Remove either one and the race cannot exist. Give each thread its own copy, use immutable data, or funnel all changes through a single owner (the actor model, a message queue, a single-threaded event loop). This is why functional and message-passing styles sidestep whole categories of concurrency bugs.

Beyond threads: check-then-act and TOCTOU

Races are not only a threading problem. Any time you check a condition and then act on it as if the check is still true, something can change in the gap. This is the check-then-act bug, and its security-flavored cousin is TOCTOU — time-of-check to time-of-use.

A filesystem example: a program checks that a file exists and the user may read it, then opens it. Between the check and the open, an attacker swaps the file for a symlink to /etc/passwd. The check passed against a safe file; the use happened against a dangerous one.

The same pattern appears in databases: SELECT to see if a row exists, then INSERT if it does not — two requests racing to insert the same key. The fix is to make check-and-act atomic: a unique constraint plus INSERT ... ON CONFLICT, a SELECT ... FOR UPDATE row lock, or an atomic open flag like O_CREAT | O_EXCL instead of checking first.

The mental shift that makes races tractable is this: stop reasoning about your code as a single sequence of steps. Assume any other thread can run between any two of your instructions, see your half-finished state, and act on it. Code that is still correct under that assumption is concurrency-safe; code that is not has a race waiting to be triggered.

FAQ

FAQ

Is a race condition the same as a deadlock?+
No. A race condition is a correctness bug from uncontrolled timing — the wrong interleaving produces a wrong result. A deadlock is a liveness bug where threads block forever waiting on each other's locks. Ironically, locks added to fix races are a common source of deadlocks, so they are related but opposite failure modes.
Why can't I just retry the operation until it works?+
Retrying does not remove the race; it just rolls the dice again, and the corrupted state may already be written. Because the bug is nondeterministic, a retry can appear to 'fix' it in testing while leaving it fully present in production. The only real fixes make the bad interleaving impossible: locks, atomics, or eliminating shared mutable state.
Are single-threaded programs immune to race conditions?+
Not entirely. A single-threaded program avoids thread-level races, but it can still hit check-then-act races against the outside world — files changed by other processes, rows changed by other database clients, or async callbacks interleaving on one event loop. Shared external state is enough to create a race even without multiple threads.

Related reading

See all Dev Knowledge articles →

Get the best tools, weekly

One email every Friday. No spam, unsubscribe anytime.