How Git’s First 1,000-Line Commit Shaped Its Core Design
An in‑depth exploration of Git’s initial 1,000‑line source code reveals the foundational design principles—workspace, index, objects, and three core object types—showing how early implementations of init‑db, update‑cache, write‑tree, and commit‑tree established the powerful, distributed version‑control system we use today.
Preface
Git is the most widely used modern version control system. Created by Linus Torvalds in 2005, its first usable version was written in C in just two weeks. The initial commit contained about 1,000 lines of code, yet already implemented core principles such as repository initialization, committing, diff viewing, and the three Git objects: blob, tree, and commit.
Compilation
Getting the Source
# 获取 git 源码
$ git clone https://github.com/git/git.git
# 查看第一个提交
$ git log --date-order --reverse
commit e83c5163316f89bfbde7d9ab23ca2e25604af290
Author: Linus Torvalds <[email protected]>
Date: Thu Apr 7 15:13:13 2005 -0700
Initial revision of "git", the information manager from hell
# 切换到第一个提交
$ git checkout e83c5163316f89bfbde7d9ab23ca2e25604af290File Structure
$ tree -h
.
├── [2.4K] cache.h
├── [503] cat-file.c # 查看objects文件
├── [4.0K] commit-tree.c # 提交tree
├── [1.2K] init-db.c # 初始化仓库
├── [970] Makefile
├── [5.5K] read-cache.c # 读取当前索引文件内容
├── [8.2K] README
├── [986] read-tree.c # 读取tree
├── [2.0K] show-diff.c # 查看diff内容
├── [5.3K] update-cache.c # 添加文件或目录
└── [1.4K] write-tree.c # 写入到tree
# 统计代码行数,总共1089行
$ find . \( -name "*.c" -or -name "*.h" -or -name "Makefile" \) -print | xargs wc -l
1089 totalCompilation Issues
$ git diff ./Makefile
- LIBS= -lssl
+ LIBS= -lssl -lz -lcrypto
$ makeCompilation is only supported on Linux platforms.
Source Code Analysis
Write programs that do one thing and do it well. —Unix philosophy
The first commit implements three Git objects (blob, tree, commit) and three areas (workspace, index, commit history). Below is a simple Git workflow derived from these concepts.
init-db: Initialize Repository
Command $ init-db Execution Flow
Create directory .dircache.
Create directory .dircache/objects.
Create 256 sub‑directories .dircache/objects/00 … .dircache/objects/ff.
The .dircache directory is the early version of Git’s working directory; modern Git uses .git.
update-cache: Add Files to Index
Updates the index with changes from the workspace. $ update-cache <file> ... Execution Flow
Read and parse the index file .dircache/index.
Traverse files, compute SHA‑1, and write entries to the index.
Compress file contents into blob objects stored under .dircache/objects.
cat-file: Inspect Objects
$ cat-file <sha1>Locates the object file, decompresses it, and writes the content to a temporary file for viewing.
show-diff: Show Differences
$ show-diffCompares workspace files with the index and displays differences or confirms that files are unchanged.
write-tree: Create Tree Object
$ write-treeAggregates indexed objects into a single tree object representing the directory structure.
read-tree: Read Tree Object
$ read-tree <sha1>commit-tree: Create Commit Object
$ commit-tree <tree-sha1> < changelogWrites a commit object containing tree reference, parent(s), author, committer, and message.
Design Principles
Write programs that work together. —Unix philosophy
Git is a decentralized system: each developer’s workspace is a full repository. The three object types are:
blob – stores file snapshots.
tree – records directory structure and blob references.
commit – points to a tree and stores commit metadata.
The three areas are:
Workspace – where files are edited.
Index – staging area for changes.
Commit history – permanent storage of commits.
Typical workflow: init-db → update-cache → write-tree → commit-tree, with show-diff and read-tree for inspection.
Objects Files
Objects are stored under .dircache/objects using the first byte of the SHA‑1 as a sub‑directory to avoid filesystem limits.
Blob Object
Stores file content as <type>+<size>+<content> and is compressed with zlib.
Tree Object
Contains entries of <mode> <filename> <sha1> for each file, also compressed.
Commit Object
Combines tree SHA‑1, parent SHA‑1(s), author, committer, and message.
Index File
The binary .dircache/index stores a header and a list of entries describing staged files. It is protected by a lock file .dircache/index.lock during updates.
Hash Algorithm
Git uses SHA‑1 (via OpenSSL) for object IDs, with ongoing migration to SHA‑256.
#include <openssl/sha.h>
static int verify_hdr(struct cache_header *hdr, unsigned long size) {
SHA_CTX c;
unsigned char sha1[20];
// ... compute SHA1 of header ...
SHA1_Init(&c);
SHA1_Update(&c, hdr, offsetof(struct cache_header, sha1));
SHA1_Update(&c, hdr+1, size - sizeof(*hdr));
SHA1_Final(sha1, &c);
return 0;
}Summary and Thoughts
The first commit laid the groundwork for Git’s distributed architecture, object model, and three‑area design, while many advanced features (branches, remote handling, hooks) were added later. Understanding these fundamentals helps appreciate Git’s powerful design.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
