How Git Stores Data Under the Hood: Objects, Trees, and Commits Explained
This article walks through Git's internal storage model using a concrete example, showing how blobs, trees, commits and tags are created, how the three Git areas (working directory, index, repository) interact, and why Git can efficiently track history while ensuring data integrity.
Introduction
The article uses a step‑by‑step example to illustrate Git’s internal mechanisms, explaining what Git stores, how it stores it, and the benefits of this design.
Git Objects and Their Types
Git stores data as objects inside the .git/objects directory. The main object types are:
blob : contains the raw content of a file. Its SHA‑1 hash serves as a unique identifier.
tree : records a snapshot of a directory, listing each entry’s mode, type, SHA‑1, and filename.
commit : points to a tree object and records metadata such as author, committer, timestamp, parent commit(s), and commit message.
tag : optional annotated tag object created with git tag -a.
These objects form a key‑value store backed by a Merkle tree, similar to blockchain data structures.
Creating a Repository and Initial Objects
$ git init
$ echo '111' > a.txt
$ echo '222' > b.txt
$ git add *.txtAfter git add, two blob objects appear under .git/objects. You can inspect them with:
$ git cat-file -t 58c9
blob
$ git cat-file -p 58c9
111This shows that the blob stores the file content ("111") and is identified by its SHA‑1 hash 58c9bdf9d017fcd178dc8c073cbfcbb7ff240d6c.
Creating a Commit
$ git commit -am '[+] init'The commit creates a new tree object (snapshot of the directory) and a commit object that references the tree. Inspecting them:
$ git cat-file -t 4caaa1
tree
$ git cat-file -p 4caaa1
100644 blob 58c9bdf9d017fcd178dc8c073cbfcbb7ff240d6c a.txt
100644 blob c200906efd24ec5e783bee7... b.txt $ git cat-file -t 0c96bf
commit
$ git cat-file -p 0c96bf
tree 4caaa1a9ae0b274fba9e3675f9ef071616e5b209
author lzane 1573302343 +0800
committer lzane 1573302343 +0800
[+] initThe commit object stores the tree hash, author/committer info, timestamp, and message.
Git’s Three Areas
Working directory : the actual files on your filesystem.
Index (staging area) : a snapshot of files that will become the next commit.
Git repository : the collection of objects (blobs, trees, commits, tags) that record the project history.
Updating a File – What Happens Internally?
$ echo "333" > a.txt # modify fileThe working directory changes, but the index and repository stay the same. $ git add a.txt # stage the change A new blob object is created for the updated content, and the index now points to this new blob. $ git commit -m 'update' # create a new commit Git builds a new tree object from the index (a snapshot of the current state).
A new commit object is created, linking to the new tree and the previous commit as its parent.
The branch pointer (e.g., master) is moved to the new commit.
Branch and Tag Pointers
$ cat .git/HEAD
ref: refs/heads/master
$ cat .git/refs/heads/master
0c96bf...HEAD, branch refs, and lightweight tags are simple pointers to commit SHA‑1 hashes.
Common Questions
Why store file mode and name in a tree, not in a blob?
Storing them in a tree allows Git to reuse the same blob when only the filename changes, saving space.
Does Git store full snapshots or diffs?
Git stores full snapshots (new blob objects) for each version, which enables fast checkout and diff operations at the cost of storage; Git mitigates this with compression and garbage collection.
How does Git ensure history cannot be tampered with?
Each object’s SHA‑1 hash includes the content of its children; altering any file changes its blob hash, which propagates up through the tree and commit hashes, breaking the chain. Because every clone has the full history, tampering is easily detected.
Conclusion
Understanding Git’s three areas and its object model (blob, tree, commit, tag) provides a visual way to grasp most Git commands, as they mainly manipulate these structures and the commit chain.
Liangxu Linux
Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
