Demystify Git: Build Your Own Git Commands from Scratch
This article walks through the fundamentals of Git’s internal architecture, explains how objects, hashes, and the repository directory work, and provides step‑by‑step Node.js code to implement core plumbing commands such as init, hash‑object, and cat‑file.
Preface
This article aims to understand Git’s principles by re‑implementing several Git commands from the ground up. Although the idea sounds daunting, the implementation proves that the core concepts are approachable.
Current Situation Analysis
Git appears complex because of its many high‑level commands and extensive documentation (e.g., the Pro Git book). However, for daily use we only need a handful of commands, so focusing on the underlying mechanisms simplifies the learning curve.
Git Directory Structure
The .git folder contains all metadata needed to manage a repository. Its layout is illustrated below:
├── COMMIT_EDITMSG // last commit message
├── FETCH_HEAD // remote branch heads (hashes)
├── HEAD // current HEAD pointer
├── ORIG_HEAD
├── config // repository configuration
├── description
├── hooks // hook scripts (e.g., pre‑commit)
├── index // staging area
├── info
│ └── exclude
├── logs
│ ├── HEAD
│ └── refs
├── objects // stored objects (compressed)
├── lost-found
│ ├── commit
│ └── other
├── packed-refs
└── refs
├── heads
├── remotes
└── tagsGit Hash
Git stores four object types: commit , tree , blob , and tag . Each object's hash is computed as:
"{type} {content.length}\0{content}"Example:
echo -n "hello,world" | git hash-object --stdinNode.js can reproduce the same hash using the crypto module:
const crypto = require('crypto');
const sha1 = crypto.createHash('sha1');
sha1.update('blob 11\0hello,world');
console.log(sha1.digest('hex'));Git thus functions as a key‑value store where the SHA‑1 hash is the key and the compressed object is the value.
Git Objects
Blob
A blob stores raw file content. Its hash changes only when the content changes, not the filename, which is why Git excels at tracking text files.
Tree
A tree object represents a directory hierarchy. It links mode, path, and child object hashes, forming a multi‑branch tree that ultimately points to blobs.
Commit
A commit records author, timestamp, message, a reference to a tree, and one or two parent commits, forming a singly‑linked (or merge) history.
Tag
A tag is a lightweight reference to a specific commit, often used for releases.
Implementation Overview
Preparation
Node’s built‑in crypto and zlib modules provide SHA‑1 hashing and compression.
Command Parsing
enum CommandEnum { Add='add', Init='init', /* ... */ }
const chooseCommand = (command) => {
switch(command) {
case CommandEnum.Add: return add();
case CommandEnum.Init: return init();
default: console.log('Command not supported');
}
};
chooseCommand(process.argv[2]);Init
const init = () => {
fs.mkdirSync('.git');
fs.mkdirSync('.git/refs');
fs.mkdirSync('.git/objects');
fs.writeFileSync('.git/HEAD', 'ref: refs/heads/master');
fs.writeFileSync('.git/config', `[core]
repositoryformatversion = 0
filemode = true
bare = false
logallrefupdates = true
ignorecase = true
precomposeunicode = true`);
fs.writeFileSync('.git/description', '');
};Read and Write Objects
Reading an object involves locating the file under .git/objects/, inflating it, and parsing the header to obtain type and content.
export const readObject = (sha1) => {
const data = fs.readFileSync(`.git/objects/${sha1.substring(0,2)}/${sha1.substring(2)}`);
const buf = zlib.inflateSync(data);
const headerEnd = buf.indexOf(0);
const header = buf.slice(0, headerEnd).toString('utf8');
const [type, length] = header.split(' ');
const content = buf.slice(headerEnd + 1);
return { type, length: parseInt(length), content };
};Writing an object serializes the data, hashes it, compresses it, and stores it in the appropriate directory.
export const createObject = (obj) => {
const data = obj.serialize();
const sha1 = crypto.createHash('sha1').update(data).digest('hex');
const zip = zlib.deflateSync(data);
const dir = `.git/objects/${sha1.substring(0,2)}`;
if (!fs.existsSync(dir)) fs.mkdirSync(dir);
fs.writeFileSync(`${dir}/${sha1.substring(2)}`, zip);
return sha1;
};Plumbing Commands
cat-filereads an object and prints its type, size, existence, or raw content based on flags.
export const catFile = () => {
const type = process.argv[3];
const sha1 = process.argv[4];
const obj = readObject(sha1);
if (type === '-t') console.log(obj.type);
if (type === '-s') console.log(obj.length);
if (type === '-e') console.log(!!obj.type);
if (type === '-p') console.log(obj.content.toString('utf8'));
}; hash-objectcomputes the SHA‑1 of a file’s raw bytes.
export const hashObject = () => {
const path = process.argv[3];
const data = fs.readFileSync(path);
const sha1 = crypto.createHash('sha1').update(data).digest('hex');
console.log(sha1);
};Object Parsing Enhancements
The readObject function is extended to return proper class instances for blobs, commits, and trees, allowing richer manipulation.
export const readObject = (sha1) => {
const data = fs.readFileSync(`.git/objects/${sha1.substring(0,2)}/${sha1.substring(2)}`);
const buf = zlib.inflateSync(data);
const headerEnd = buf.indexOf(0);
const [type] = buf.slice(0, headerEnd).toString('utf8').split(' ');
const content = buf.slice(headerEnd + 1);
if (type === 'blob') return new GitBlob(content);
if (type === 'commit') return new GitCommit(content.toString('utf8'));
if (type === 'tree') return new GitTree(content);
};Commit parsing extracts author, parent hashes, and message; tree parsing walks the binary format to list mode, path, and child hash.
Branches and Refs
Branch names are stored as files under .git/refs/heads/ containing the commit hash they point to, enabling history traversal.
Logs and Reflogs
These are plain‑text files that record commit history and reference updates.
Staging Area (Index)
The index is a binary file that tracks staged files. Implementing it is beyond the scope of this article, but references are provided for further study.
Conclusion
The article demonstrates a minimal yet functional set of Git plumbing APIs written in Node.js, covering repository initialization, object storage, and basic command implementations. Readers are encouraged to extend the code to support full commit creation, tree building, remote interactions, and garbage collection.
Further Exercises
Explore Git’s garbage collection (gc) mechanism.
Implement high‑level porcelain commands using the plumbing layer.
Design a simple remote Git server and protocol.
Complete the missing features to build a fully functional Git clone.
Talk is cheap, show me the code.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
