Fundamentals 11 min read

How Git Stores Data Under the Hood: Objects, Trees, and Commits Explained

This article walks through Git's internal storage model using a concrete example, showing how blobs, trees, commits and tags are created, how the three Git areas (working directory, index, repository) interact, and why Git can efficiently track history while ensuring data integrity.

Liangxu Linux

Dec 16, 2019

How Git Stores Data Under the Hood: Objects, Trees, and Commits Explained

Introduction

The article uses a step‑by‑step example to illustrate Git’s internal mechanisms, explaining what Git stores, how it stores it, and the benefits of this design.

Git Objects and Their Types

Git stores data as objects inside the .git/objects directory. The main object types are:

blob : contains the raw content of a file. Its SHA‑1 hash serves as a unique identifier.

tree : records a snapshot of a directory, listing each entry’s mode, type, SHA‑1, and filename.

commit : points to a tree object and records metadata such as author, committer, timestamp, parent commit(s), and commit message.

tag : optional annotated tag object created with git tag -a.

These objects form a key‑value store backed by a Merkle tree, similar to blockchain data structures.

Creating a Repository and Initial Objects

$ git init
$ echo '111' > a.txt
$ echo '222' > b.txt
$ git add *.txt

After git add, two blob objects appear under .git/objects. You can inspect them with:

$ git cat-file -t 58c9
blob
$ git cat-file -p 58c9
111

This shows that the blob stores the file content ("111") and is identified by its SHA‑1 hash 58c9bdf9d017fcd178dc8c073cbfcbb7ff240d6c.

Creating a Commit

$ git commit -am '[+] init'

The commit creates a new tree object (snapshot of the directory) and a commit object that references the tree. Inspecting them:

$ git cat-file -t 4caaa1
 tree
$ git cat-file -p 4caaa1
100644 blob 58c9bdf9d017fcd178dc8c073cbfcbb7ff240d6c    a.txt
100644 blob c200906efd24ec5e783bee7...    b.txt

$ git cat-file -t 0c96bf
 commit
$ git cat-file -p 0c96bf
tree 4caaa1a9ae0b274fba9e3675f9ef071616e5b209
author lzane 1573302343 +0800
committer lzane 1573302343 +0800
[+] init

The commit object stores the tree hash, author/committer info, timestamp, and message.

Git’s Three Areas

Working directory : the actual files on your filesystem.

Index (staging area) : a snapshot of files that will become the next commit.

Git repository : the collection of objects (blobs, trees, commits, tags) that record the project history.

Updating a File – What Happens Internally?

$ echo "333" > a.txt   # modify file

The working directory changes, but the index and repository stay the same. $ git add a.txt # stage the change A new blob object is created for the updated content, and the index now points to this new blob. $ git commit -m 'update' # create a new commit Git builds a new tree object from the index (a snapshot of the current state).

A new commit object is created, linking to the new tree and the previous commit as its parent.

The branch pointer (e.g., master) is moved to the new commit.

Branch and Tag Pointers

$ cat .git/HEAD
ref: refs/heads/master
$ cat .git/refs/heads/master
0c96bf...

HEAD, branch refs, and lightweight tags are simple pointers to commit SHA‑1 hashes.

Common Questions

Why store file mode and name in a tree, not in a blob?

Storing them in a tree allows Git to reuse the same blob when only the filename changes, saving space.

Does Git store full snapshots or diffs?

Git stores full snapshots (new blob objects) for each version, which enables fast checkout and diff operations at the cost of storage; Git mitigates this with compression and garbage collection.

How does Git ensure history cannot be tampered with?

Each object’s SHA‑1 hash includes the content of its children; altering any file changes its blob hash, which propagates up through the tree and commit hashes, breaking the chain. Because every clone has the full history, tampering is easily detected.

Conclusion

Understanding Git’s three areas and its object model (blob, tree, commit, tag) provides a visual way to grasp most Git commands, as they mainly manipulate these structures and the commit chain.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Git Version Control tree Blob commit git internals

Written by

Liangxu Linux

Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.