Fundamentals 19 min read

How Git’s First 1,000-Line Commit Shaped Its Core Design

An in‑depth exploration of Git’s initial 1,000‑line source code reveals the foundational design principles—workspace, index, objects, and three core object types—showing how early implementations of init‑db, update‑cache, write‑tree, and commit‑tree established the powerful, distributed version‑control system we use today.

Alibaba Cloud Developer
Alibaba Cloud Developer
Alibaba Cloud Developer
How Git’s First 1,000-Line Commit Shaped Its Core Design

Preface

Git is the most widely used modern version control system. Created by Linus Torvalds in 2005, its first usable version was written in C in just two weeks. The initial commit contained about 1,000 lines of code, yet already implemented core principles such as repository initialization, committing, diff viewing, and the three Git objects: blob, tree, and commit.

Compilation

Getting the Source

# 获取 git 源码
$ git clone https://github.com/git/git.git

# 查看第一个提交
$ git log --date-order --reverse
commit e83c5163316f89bfbde7d9ab23ca2e25604af290
Author: Linus Torvalds <[email protected]>
Date:   Thu Apr 7 15:13:13 2005 -0700

    Initial revision of "git", the information manager from hell

# 切换到第一个提交
$ git checkout e83c5163316f89bfbde7d9ab23ca2e25604af290

File Structure

$ tree -h
.
├── [2.4K]  cache.h
├── [503]   cat-file.c                  # 查看objects文件
├── [4.0K]  commit-tree.c               # 提交tree
├── [1.2K]  init-db.c                   # 初始化仓库
├── [970]   Makefile
├── [5.5K]  read-cache.c                # 读取当前索引文件内容
├── [8.2K]  README
├── [986]   read-tree.c                 # 读取tree
├── [2.0K]  show-diff.c                 # 查看diff内容
├── [5.3K]  update-cache.c              # 添加文件或目录
└── [1.4K]  write-tree.c                # 写入到tree

# 统计代码行数,总共1089行
$ find . \( -name "*.c" -or -name "*.h" -or -name "Makefile" \) -print | xargs wc -l
1089 total

Compilation Issues

$ git diff ./Makefile
- LIBS= -lssl
+ LIBS= -lssl -lz -lcrypto

$ make

Compilation is only supported on Linux platforms.

Source Code Analysis

Write programs that do one thing and do it well. —Unix philosophy

The first commit implements three Git objects (blob, tree, commit) and three areas (workspace, index, commit history). Below is a simple Git workflow derived from these concepts.

init-db: Initialize Repository

Command $ init-db Execution Flow

Create directory .dircache.

Create directory .dircache/objects.

Create 256 sub‑directories .dircache/objects/00.dircache/objects/ff.

The .dircache directory is the early version of Git’s working directory; modern Git uses .git.

update-cache: Add Files to Index

Updates the index with changes from the workspace. $ update-cache <file> ... Execution Flow

Read and parse the index file .dircache/index.

Traverse files, compute SHA‑1, and write entries to the index.

Compress file contents into blob objects stored under .dircache/objects.

cat-file: Inspect Objects

$ cat-file <sha1>

Locates the object file, decompresses it, and writes the content to a temporary file for viewing.

show-diff: Show Differences

$ show-diff

Compares workspace files with the index and displays differences or confirms that files are unchanged.

write-tree: Create Tree Object

$ write-tree

Aggregates indexed objects into a single tree object representing the directory structure.

read-tree: Read Tree Object

$ read-tree <sha1>

commit-tree: Create Commit Object

$ commit-tree <tree-sha1> < changelog

Writes a commit object containing tree reference, parent(s), author, committer, and message.

Design Principles

Write programs that work together. —Unix philosophy

Git is a decentralized system: each developer’s workspace is a full repository. The three object types are:

blob – stores file snapshots.

tree – records directory structure and blob references.

commit – points to a tree and stores commit metadata.

The three areas are:

Workspace – where files are edited.

Index – staging area for changes.

Commit history – permanent storage of commits.

Typical workflow: init-dbupdate-cachewrite-treecommit-tree, with show-diff and read-tree for inspection.

Objects Files

Objects are stored under .dircache/objects using the first byte of the SHA‑1 as a sub‑directory to avoid filesystem limits.

Blob Object

Stores file content as <type>+<size>+<content> and is compressed with zlib.

Tree Object

Contains entries of <mode> <filename> <sha1> for each file, also compressed.

Commit Object

Combines tree SHA‑1, parent SHA‑1(s), author, committer, and message.

Index File

The binary .dircache/index stores a header and a list of entries describing staged files. It is protected by a lock file .dircache/index.lock during updates.

Hash Algorithm

Git uses SHA‑1 (via OpenSSL) for object IDs, with ongoing migration to SHA‑256.

#include <openssl/sha.h>
static int verify_hdr(struct cache_header *hdr, unsigned long size) {
  SHA_CTX c;
  unsigned char sha1[20];
  // ... compute SHA1 of header ...
  SHA1_Init(&c);
  SHA1_Update(&c, hdr, offsetof(struct cache_header, sha1));
  SHA1_Update(&c, hdr+1, size - sizeof(*hdr));
  SHA1_Final(sha1, &c);
  return 0;
}

Summary and Thoughts

The first commit laid the groundwork for Git’s distributed architecture, object model, and three‑area design, while many advanced features (branches, remote handling, hooks) were added later. Understanding these fundamentals helps appreciate Git’s powerful design.

source code analysissoftware designVersion Controlgit internals
Alibaba Cloud Developer
Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.