How Git Actually Stores Your Code: Blobs, Trees, and Commits

Most people picture Git as a tool that records changes — a stack of diffs layered on top of each other. That mental model is wrong, and it makes Git feel mysterious. Git is really a small key-value database that stores snapshots, and once you see the four object types it uses, commands like reset, checkout, and rebase stop being magic.

Git is a content-addressed object store

Everything Git tracks lives in .git/objects as an object, and every object has an ID that is the hash of its own content. By default that hash is a 40-character SHA-1 digest (newer Git supports SHA-256). The same bytes always produce the same ID, so the ID is the content’s address — change one byte and you get a completely different object. This is why Git data is effectively immutable: you never edit an object in place, you create a new one with a new name.

You can look inside any object with git cat-file. The -t flag prints the type, -p pretty-prints the content:

$ git cat-file -t 3b18e512
blob
$ git cat-file -p 3b18e512
hello world

There are exactly four object types: blob, tree, commit, and tag.

A blob is just file contents — raw bytes, with no filename and no metadata. The blob for README.md knows nothing about being named README.md; it only knows what’s inside.

A tree is a directory listing. It maps names to other objects: each entry has a mode (like a file vs. an executable vs. a subdirectory), a name, and the hash of either a blob (a file) or another tree (a subdirectory). Trees are how Git represents folder structure. Inspecting one shows exactly that:

$ git cat-file -p HEAD^{tree}
100644 blob a906cb...    README.md
040000 tree fe8e3b...    src

A commit ties it together. A commit object points to exactly one top-level tree (the full state of your project at that moment), plus the hash of its parent commit (or parents, for a merge), the author and committer with timestamps, and the commit message. Running git cat-file -p HEAD shows these fields in plain text. Because each commit names its parent, the commits form a chain — really a directed graph — and that graph is your history.

Snapshots, not diffs

Here is the part that surprises people: a commit stores a snapshot of your entire tree, not a diff against the previous commit. Each commit points to a complete tree describing every file in the project at that point.

That sounds wasteful, but it isn’t, because of content addressing. If a file didn’t change between two commits, its blob hash is identical, so both commits’ trees point at the very same blob object. Git stores that blob once. The same goes for unchanged directories: an unchanged subdirectory yields an identical tree object, reused across commits. A commit that touches one file in a deep folder only creates new objects along that one path; everything else is shared by reference.

The diffs you see in git diff or git log -p are computed on the fly by comparing two snapshots. Git doesn’t store them; it derives them when you ask.

Branches and HEAD are just pointers

If commits are immutable objects in a graph, what is a branch? Almost nothing. A branch is a ref — a small file under .git/refs/heads/ that contains a single commit hash. The branch main is literally a 40-character string naming the latest commit on that line of work.

git commit writes a new commit whose parent is the current one, then updates the branch ref to point at it. That’s the whole operation. Creating a branch (git branch feature) just writes a new file with the same hash — which is why it’s instant and cheap no matter how large the repo.

HEAD is one more layer of indirection: usually it’s a file containing ref: refs/heads/main, meaning “I am on the branch main.” When you checkout a different branch, Git rewrites HEAD to point at that ref and updates your working files to match its tree. A “detached HEAD” simply means HEAD holds a commit hash directly instead of pointing at a branch.

Once this clicks, the scary commands demystify. git reset moves a branch pointer to a different commit. git rebase replays commits to create new ones with new hashes (which is why it rewrites history). Nothing reaches into an object and mutates it — Git only ever creates new objects and moves pointers.

FAQ

Does Git store a separate full copy of every file in every commit?

No. Each commit references a complete tree, but unchanged files and folders point to the exact same blob and tree objects as earlier commits because their content hashes are identical. Git stores each unique piece of content only once and shares it by reference.

What is the difference between a blob and a file?

A blob holds only the raw contents of a file — it has no name, no path, and no permissions. The filename and mode live in the tree object that references the blob. So renaming a file with identical contents creates a new tree entry but reuses the same blob.

If commits are immutable, how can rebase or amend change history?

They don't mutate existing commits — they create brand-new commit objects with new hashes and then move the branch pointer to them. The old commits still exist (you can often recover them via the reflog) until Git's garbage collection eventually prunes unreferenced objects.

How Git Actually Stores Your Code: Blobs, Trees, and Commits

Git is a content-addressed object store

Snapshots, not diffs

Branches and HEAD are just pointers

FAQ

FAQ

TCP vs UDP, Explained Through What Breaks When You Pick Wrong

Write-Ahead Logging: How Databases Survive a Power Cut

Backpressure, Explained Through a Queue That Won't Fall Over

What a Bloom Filter Actually Saves You (and When It Lies)

Idempotency, Explained Through the Retry That Doesn't Double-Charge

Get the best tools, weekly