Fundamentals 12 min read

Understanding Git Internals: How the Version Control System Works Under the Hood

This article delves into Git’s core architecture, explaining its fast, distributed nature, immutable objects, porcelain and plumbing commands, the content‑addressable version database, and how init, add, and commit manipulate the working directory, index, and repository to create snapshots of project history.

Programmer DD

Jan 31, 2021

Understanding Git Internals: How the Version Control System Works Under the Hood

Understanding the essence of a system brings you closer to its truth. Learning how Git works internally and its data structures is crucial for grasping its power. This first part of the Git series introduces Git’s characteristics, internal data‑structure design, and how data changes during a complete commit flow.

What are Git’s characteristics?

Fast, scalable, distributed revision control system

The stupid content tracker

Immutable objects

Porcelain (high‑level commands)

Plumbing (low‑level commands)

What is the Git Version Database?

Git is a content‑addressable file system, essentially a simple key‑value store. Objects are stored under the .git/objects directory, keyed by SHA‑1 hashes. Four object types exist: commit, tree, blob, and tag.

Basic Git concepts

Content addressable filesystem

Simple key‑value data store

Key: SHA‑1 hash (40‑character hexadecimal string)

Value: binary files

Note: a commit can be understood as a snapshot of a tree plus its blobs.

SHA‑1 is a cryptographic hash function that produces a 160‑bit (20‑byte) value, typically shown as 40 hexadecimal characters. It behaves like a pure function: identical input yields identical output.

How does Git work?

The simplest Git flow consists of three steps:

Modify files in the working directory.

Stage files, placing a snapshot into the index.

Commit the staged snapshot, permanently storing it in the repository.

Corresponding high‑level commands:

$ git init
$ git add .
$ git commit

git init

Initialize a project and inspect its directory structure:

$ git init demo1 && cd demo1
$ tree .git
.git
├── HEAD
├── config
├── description
├── hooks
│   ├── applypatch-msg.sample
│   ├── commit-msg.sample
│   └── ...
├── info
│   └── exclude
├── objects
│   ├── info
│   └── pack
└── refs
    ├── heads
    └── tags

The important entries are HEAD, the (yet‑to‑be‑created) index, objects, and refs. objects stores all hashed data; refs holds pointers to commit objects; HEAD points to the current branch; index tracks the staging area.

git add

$ echo 'hello git' > index.txt
$ git add index.txt

After adding, the .git directory changes:

.git
├── HEAD
├── config
├── description
├── index
├── info
│   └── exclude
├── objects
│   ├── 8d
│   │   └── 0e41234f24b6da002d962a26c2495ea16a425f
│   └── ...
└── refs
    ├── heads
    └── tags

The new file index.txt is stored as a blob object whose hash is 8d0e41234f24b6da002d962a26c2495ea16a425f. Using low‑level commands we can replicate git add:

$ echo 'hello git' | git hash-object -w --stdin
$ git update-index --add --cacheinfo 100644 8d0e41234f24b6da002d962a26c2495ea16a425f index.txt

The -w flag tells hash-object to write the object to the database. The mode 100644 denotes a regular file.

git commit

After git commit -m 'init-1', new objects appear:

.git
├── COMMIT_EDITMSG
├── HEAD
├── config
├── description
├── index
├── logs
│   ├── HEAD
│   └── refs
│       └── heads
│           └── master
├── objects
│   ├── 75
│   │   └── 0d7c0f7f998d3e2ce2d71ec801902f69bf6a39
│   ├── 88
│   │   └── bc066ebf3d864e34297f7051a0ded16e49813a
│   ├── 8d
│   │   └── 0e41234f24b6da002d962a26c2495ea16a425f
│   └── ...
└── refs
    ├── heads
    │   └── master
    └── tags

The commit object points to a tree object ( 88bc066ebf3d864e34297f7051a0ded16e49813a) which in turn references the blob for index.txt. The chain of references is:

HEAD → current ref (e.g., refs/heads/master)

ref → commit hash

commit → tree hash

tree → file entries (mode, type, blob hash, path)

blob → file content

This chain forms a complete snapshot of the repository at the moment of the commit.

Reviewing the full commit process reveals how Git’s immutable objects, content‑addressable storage, and plumbing commands combine to provide a fast, distributed version control system.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Git Data Structures Version Control internals

Written by

Programmer DD

A tinkering programmer and author of "Spring Cloud Microservices in Action"

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.