Fundamentals 20 min read

Demystify Git: Build Your Own Git Commands from Scratch

This article walks through the fundamentals of Git’s internal architecture, explains how objects, hashes, and the repository directory work, and provides step‑by‑step Node.js code to implement core plumbing commands such as init, hash‑object, and cat‑file.

ELab Team
ELab Team
ELab Team
Demystify Git: Build Your Own Git Commands from Scratch

Preface

This article aims to understand Git’s principles by re‑implementing several Git commands from the ground up. Although the idea sounds daunting, the implementation proves that the core concepts are approachable.

Current Situation Analysis

Git appears complex because of its many high‑level commands and extensive documentation (e.g., the Pro Git book). However, for daily use we only need a handful of commands, so focusing on the underlying mechanisms simplifies the learning curve.

Git Directory Structure

The .git folder contains all metadata needed to manage a repository. Its layout is illustrated below:

├── COMMIT_EDITMSG   // last commit message
├── FETCH_HEAD       // remote branch heads (hashes)
├── HEAD             // current HEAD pointer
├── ORIG_HEAD
├── config           // repository configuration
├── description
├── hooks            // hook scripts (e.g., pre‑commit)
├── index            // staging area
├── info
│   └── exclude
├── logs
│   ├── HEAD
│   └── refs
├── objects          // stored objects (compressed)
├── lost-found
│   ├── commit
│   └── other
├── packed-refs
└── refs
    ├── heads
    ├── remotes
    └── tags

Git Hash

Git stores four object types: commit , tree , blob , and tag . Each object's hash is computed as:

"{type} {content.length}\0{content}"

Example:

echo -n "hello,world" | git hash-object --stdin

Node.js can reproduce the same hash using the crypto module:

const crypto = require('crypto');
const sha1 = crypto.createHash('sha1');
sha1.update('blob 11\0hello,world');
console.log(sha1.digest('hex'));

Git thus functions as a key‑value store where the SHA‑1 hash is the key and the compressed object is the value.

Git Objects

Blob

A blob stores raw file content. Its hash changes only when the content changes, not the filename, which is why Git excels at tracking text files.

Tree

A tree object represents a directory hierarchy. It links mode, path, and child object hashes, forming a multi‑branch tree that ultimately points to blobs.

Commit

A commit records author, timestamp, message, a reference to a tree, and one or two parent commits, forming a singly‑linked (or merge) history.

Tag

A tag is a lightweight reference to a specific commit, often used for releases.

Implementation Overview

Preparation

Node’s built‑in crypto and zlib modules provide SHA‑1 hashing and compression.

Command Parsing

enum CommandEnum { Add='add', Init='init', /* ... */ }
const chooseCommand = (command) => {
  switch(command) {
    case CommandEnum.Add: return add();
    case CommandEnum.Init: return init();
    default: console.log('Command not supported');
  }
};
chooseCommand(process.argv[2]);

Init

const init = () => {
  fs.mkdirSync('.git');
  fs.mkdirSync('.git/refs');
  fs.mkdirSync('.git/objects');
  fs.writeFileSync('.git/HEAD', 'ref: refs/heads/master');
  fs.writeFileSync('.git/config', `[core]
    repositoryformatversion = 0
    filemode = true
    bare = false
    logallrefupdates = true
    ignorecase = true
    precomposeunicode = true`);
  fs.writeFileSync('.git/description', '');
};

Read and Write Objects

Reading an object involves locating the file under .git/objects/, inflating it, and parsing the header to obtain type and content.

export const readObject = (sha1) => {
  const data = fs.readFileSync(`.git/objects/${sha1.substring(0,2)}/${sha1.substring(2)}`);
  const buf = zlib.inflateSync(data);
  const headerEnd = buf.indexOf(0);
  const header = buf.slice(0, headerEnd).toString('utf8');
  const [type, length] = header.split(' ');
  const content = buf.slice(headerEnd + 1);
  return { type, length: parseInt(length), content };
};

Writing an object serializes the data, hashes it, compresses it, and stores it in the appropriate directory.

export const createObject = (obj) => {
  const data = obj.serialize();
  const sha1 = crypto.createHash('sha1').update(data).digest('hex');
  const zip = zlib.deflateSync(data);
  const dir = `.git/objects/${sha1.substring(0,2)}`;
  if (!fs.existsSync(dir)) fs.mkdirSync(dir);
  fs.writeFileSync(`${dir}/${sha1.substring(2)}`, zip);
  return sha1;
};

Plumbing Commands

cat-file

reads an object and prints its type, size, existence, or raw content based on flags.

export const catFile = () => {
  const type = process.argv[3];
  const sha1 = process.argv[4];
  const obj = readObject(sha1);
  if (type === '-t') console.log(obj.type);
  if (type === '-s') console.log(obj.length);
  if (type === '-e') console.log(!!obj.type);
  if (type === '-p') console.log(obj.content.toString('utf8'));
};
hash-object

computes the SHA‑1 of a file’s raw bytes.

export const hashObject = () => {
  const path = process.argv[3];
  const data = fs.readFileSync(path);
  const sha1 = crypto.createHash('sha1').update(data).digest('hex');
  console.log(sha1);
};

Object Parsing Enhancements

The readObject function is extended to return proper class instances for blobs, commits, and trees, allowing richer manipulation.

export const readObject = (sha1) => {
  const data = fs.readFileSync(`.git/objects/${sha1.substring(0,2)}/${sha1.substring(2)}`);
  const buf = zlib.inflateSync(data);
  const headerEnd = buf.indexOf(0);
  const [type] = buf.slice(0, headerEnd).toString('utf8').split(' ');
  const content = buf.slice(headerEnd + 1);
  if (type === 'blob') return new GitBlob(content);
  if (type === 'commit') return new GitCommit(content.toString('utf8'));
  if (type === 'tree') return new GitTree(content);
};

Commit parsing extracts author, parent hashes, and message; tree parsing walks the binary format to list mode, path, and child hash.

Branches and Refs

Branch names are stored as files under .git/refs/heads/ containing the commit hash they point to, enabling history traversal.

Logs and Reflogs

These are plain‑text files that record commit history and reference updates.

Staging Area (Index)

The index is a binary file that tracks staged files. Implementing it is beyond the scope of this article, but references are provided for further study.

Conclusion

The article demonstrates a minimal yet functional set of Git plumbing APIs written in Node.js, covering repository initialization, object storage, and basic command implementations. Readers are encouraged to extend the code to support full commit creation, tree building, remote interactions, and garbage collection.

Further Exercises

Explore Git’s garbage collection (gc) mechanism.

Implement high‑level porcelain commands using the plumbing layer.

Design a simple remote Git server and protocol.

Complete the missing features to build a fully functional Git clone.

Talk is cheap, show me the code.

GitVersion Controlnodejsgit internalsplumbing commandsrepository implementation
ELab Team
Written by

ELab Team

Sharing fresh technical insights

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.