How Git Actually Stores Your Code: Blobs, Trees, and Commits
Git is a content-addressed object store. Learn the four object types — blobs, trees, commits, tags — and why Git keeps whole snapshots instead of diffs.
Most people picture Git as a tool that records changes — a stack of diffs layered on top of each other. That mental model is wrong, and it makes Git feel mysterious. Git is really a small key-value database that stores snapshots, and once you see the four object types it uses, commands like reset, checkout, and rebase stop being magic.
Git is a content-addressed object store
Everything Git tracks lives in .git/objects as an object, and every object has an ID that is the hash of its own content. By default that hash is a 40-character SHA-1 digest (newer Git supports SHA-256). The same bytes always produce the same ID, so the ID is the content’s address — change one byte and you get a completely different object. This is why Git data is effectively immutable: you never edit an object in place, you create a new one with a new name.
You can look inside any object with git cat-file. The -t flag prints the type, -p pretty-prints the content:
$ git cat-file -t 3b18e512blob$ git cat-file -p 3b18e512hello worldThere are exactly four object types: blob, tree, commit, and tag.
A blob is just file contents — raw bytes, with no filename and no metadata. The blob for README.md knows nothing about being named README.md; it only knows what’s inside.
A tree is a directory listing. It maps names to other objects: each entry has a mode (like a file vs. an executable vs. a subdirectory), a name, and the hash of either a blob (a file) or another tree (a subdirectory). Trees are how Git represents folder structure. Inspecting one shows exactly that:
$ git cat-file -p HEAD^{tree}100644 blob a906cb... README.md040000 tree fe8e3b... srcA commit ties it together. A commit object points to exactly one top-level tree (the full state of your project at that moment), plus the hash of its parent commit (or parents, for a merge), the author and committer with timestamps, and the commit message. Running git cat-file -p HEAD shows these fields in plain text. Because each commit names its parent, the commits form a chain — really a directed graph — and that graph is your history.
Snapshots, not diffs
Here is the part that surprises people: a commit stores a snapshot of your entire tree, not a diff against the previous commit. Each commit points to a complete tree describing every file in the project at that point.
That sounds wasteful, but it isn’t, because of content addressing. If a file didn’t change between two commits, its blob hash is identical, so both commits’ trees point at the very same blob object. Git stores that blob once. The same goes for unchanged directories: an unchanged subdirectory yields an identical tree object, reused across commits. A commit that touches one file in a deep folder only creates new objects along that one path; everything else is shared by reference.
The diffs you see in git diff or git log -p are computed on the fly by comparing two snapshots. Git doesn’t store them; it derives them when you ask.
Branches and HEAD are just pointers
If commits are immutable objects in a graph, what is a branch? Almost nothing. A branch is a ref — a small file under .git/refs/heads/ that contains a single commit hash. The branch main is literally a 40-character string naming the latest commit on that line of work.
git commit writes a new commit whose parent is the current one, then updates the branch ref to point at it. That’s the whole operation. Creating a branch (git branch feature) just writes a new file with the same hash — which is why it’s instant and cheap no matter how large the repo.
HEAD is one more layer of indirection: usually it’s a file containing ref: refs/heads/main, meaning “I am on the branch main.” When you checkout a different branch, Git rewrites HEAD to point at that ref and updates your working files to match its tree. A “detached HEAD” simply means HEAD holds a commit hash directly instead of pointing at a branch.
Once this clicks, the scary commands demystify. git reset moves a branch pointer to a different commit. git rebase replays commits to create new ones with new hashes (which is why it rewrites history). Nothing reaches into an object and mutates it — Git only ever creates new objects and moves pointers.
FAQ
FAQ
Does Git store a separate full copy of every file in every commit?+
What is the difference between a blob and a file?+
If commits are immutable, how can rebase or amend change history?+
Related reading
2026-06-04
ACID vs BASE: What Database Guarantees Actually Promise
ACID and BASE describe two ends of a tradeoff between strict correctness and scalable availability. Learn what each guarantee means, when each fits, and why most modern databases sit somewhere in between.
2026-06-04
Big-Endian vs Little-Endian
Byte order explained: how big-endian and little-endian lay out multi-byte numbers in memory, why network protocols pick one, and when the difference actually bites you.
2026-06-04
Big-O Notation in Plain English
Big-O describes how an algorithm's runtime or memory grows as input grows. Learn the common classes — O(1), O(log n), O(n), O(n log n), O(n^2), O(2^n) — with plain examples.
2026-06-04
CORS in Plain English: Why the Browser Blocks Your Fetch
A clear walkthrough of CORS and the same-origin policy — what an origin is, why your fetch fails, how servers opt in, and the big misconception about who CORS actually protects.
2026-06-04
Environment Variables and PATH, Explained
What environment variables actually are, why they hold config and secrets, and how PATH decides which binary runs when you type a command.
Get the best tools, weekly
One email every Friday. No spam, unsubscribe anytime.