Understanding Git Internals: How the Version Control System Works Under the Hood
This article delves into Git’s core architecture, explaining its fast, distributed nature, immutable objects, porcelain and plumbing commands, the content‑addressable version database, and how init, add, and commit manipulate the working directory, index, and repository to create snapshots of project history.
Understanding the essence of a system brings you closer to its truth. Learning how Git works internally and its data structures is crucial for grasping its power. This first part of the Git series introduces Git’s characteristics, internal data‑structure design, and how data changes during a complete commit flow.
What are Git’s characteristics?
Fast, scalable, distributed revision control system
The stupid content tracker
Immutable objects
Porcelain (high‑level commands)
Plumbing (low‑level commands)
What is the Git Version Database?
Git is a content‑addressable file system, essentially a simple key‑value store. Objects are stored under the .git/objects directory, keyed by SHA‑1 hashes. Four object types exist: commit, tree, blob, and tag.
Basic Git concepts
Content addressable filesystem
Simple key‑value data store
Key: SHA‑1 hash (40‑character hexadecimal string)
Value: binary files
Note: a commit can be understood as a snapshot of a tree plus its blobs.
SHA‑1 is a cryptographic hash function that produces a 160‑bit (20‑byte) value, typically shown as 40 hexadecimal characters. It behaves like a pure function: identical input yields identical output.
How does Git work?
The simplest Git flow consists of three steps:
Modify files in the working directory.
Stage files, placing a snapshot into the index.
Commit the staged snapshot, permanently storing it in the repository.
Corresponding high‑level commands:
$ git init
$ git add .
$ git commitgit init
Initialize a project and inspect its directory structure:
$ git init demo1 && cd demo1
$ tree .git
.git
├── HEAD
├── config
├── description
├── hooks
│ ├── applypatch-msg.sample
│ ├── commit-msg.sample
│ └── ...
├── info
│ └── exclude
├── objects
│ ├── info
│ └── pack
└── refs
├── heads
└── tagsThe important entries are HEAD, the (yet‑to‑be‑created) index, objects, and refs. objects stores all hashed data; refs holds pointers to commit objects; HEAD points to the current branch; index tracks the staging area.
git add
$ echo 'hello git' > index.txt
$ git add index.txtAfter adding, the .git directory changes:
.git
├── HEAD
├── config
├── description
├── index
├── info
│ └── exclude
├── objects
│ ├── 8d
│ │ └── 0e41234f24b6da002d962a26c2495ea16a425f
│ └── ...
└── refs
├── heads
└── tagsThe new file index.txt is stored as a blob object whose hash is 8d0e41234f24b6da002d962a26c2495ea16a425f. Using low‑level commands we can replicate git add:
$ echo 'hello git' | git hash-object -w --stdin
$ git update-index --add --cacheinfo 100644 8d0e41234f24b6da002d962a26c2495ea16a425f index.txtThe -w flag tells hash-object to write the object to the database. The mode 100644 denotes a regular file.
git commit
After git commit -m 'init-1', new objects appear:
.git
├── COMMIT_EDITMSG
├── HEAD
├── config
├── description
├── index
├── logs
│ ├── HEAD
│ └── refs
│ └── heads
│ └── master
├── objects
│ ├── 75
│ │ └── 0d7c0f7f998d3e2ce2d71ec801902f69bf6a39
│ ├── 88
│ │ └── bc066ebf3d864e34297f7051a0ded16e49813a
│ ├── 8d
│ │ └── 0e41234f24b6da002d962a26c2495ea16a425f
│ └── ...
└── refs
├── heads
│ └── master
└── tagsThe commit object points to a tree object ( 88bc066ebf3d864e34297f7051a0ded16e49813a) which in turn references the blob for index.txt. The chain of references is:
HEAD → current ref (e.g., refs/heads/master)
ref → commit hash
commit → tree hash
tree → file entries (mode, type, blob hash, path)
blob → file content
This chain forms a complete snapshot of the repository at the moment of the commit.
Reviewing the full commit process reveals how Git’s immutable objects, content‑addressable storage, and plumbing commands combine to provide a fast, distributed version control system.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Programmer DD
A tinkering programmer and author of "Spring Cloud Microservices in Action"
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
