What Linus Missed in Git’s Init: Deep Dive into C Code and Memory Leaks

The article examines Linus Torvalds’s original Git init implementation, walking through the C source files, explaining how directories are created, why a 40‑byte offset is added to allocated memory, and highlighting a missing free() call that leads to a memory leak, while discussing when manual deallocation is necessary.

ITPUB
ITPUB
ITPUB
What Linus Missed in Git’s Init: Deep Dive into C Code and Memory Leaks

Overview

The first Git prototype was created by Linus Torvalds in April 2005 (commit e83c516 – https://github.com/git/git/commit/e83c5163316f89bfbde7d9ab23ca2e25604af290). The initial source tree contains only eleven files and about a thousand lines of C code, making it an ideal case study for examining Linus’s coding style.

Source tree size

The commit referenced above includes the README, Makefile and ten C source files. Among them, init-db.c is the smallest and most self‑contained.

Function of init-db.c

When compiled, init-db.c produces the command init-db, which performs the same basic operation as the modern git init:

Create a hidden directory .dircache in the current working directory.

Inside .dircache, create a sub‑directory objects.

Populate .dircache/objects with 256 sub‑directories named 00 through ff, each representing a possible two‑digit hexadecimal prefix for object files.

Key implementation steps

The program follows these steps (simplified):

#include <stdio.h>
#include <stdlib.h>
#include <sys/stat.h>

int main(void) {
    /* 1. Create .dircache and .dircache/objects */
    mkdir(".dircache", 0755);
    mkdir(".dircache/objects", 0755);

    /* 2. Allocate a buffer for constructing each path */
    int len = strlen(".dircache/objects/");
    char *path = malloc(len + 40);   // extra 40 bytes for "ff" and null terminator

    /* 3. Loop over all 256 possible prefixes */
    for (int i = 0; i < 256; i++) {
        sprintf(path, ".dircache/objects/%02x", i);
        mkdir(path, 0755);
    }

    /* NOTE: the original code does NOT free(path) */
    return 0;
}

Key points in the original source:

The call to mkdir() creates the top‑level directories.

A for loop iterates 256 times, formatting the directory name with %02x to obtain the two‑digit hexadecimal string.

Memory for path is allocated with malloc(len + 40). The extra 40 bytes cover the longest possible path components (the /objects/ prefix, a two‑character suffix, and the terminating null byte).

Memory‑management issue

The buffer allocated for path is never released. The original source contains a comment noting the missing free(path) call. This results in a memory leak if the program were to run for an extended period.

In practice, init-db is a short‑lived utility; the operating system reclaims all process memory on exit, so the leak does not affect the system. However, the omission is still noteworthy because:

Linus’s own code elsewhere calls free() inside loops that allocate large or repeated buffers.

Good practice dictates freeing any heap allocation that is no longer needed, regardless of program lifetime, to avoid unnecessary resource consumption.

Why the extra 40 bytes?

The allocation size len + 40 is chosen to accommodate the longest possible path string that will be stored in path. len is the length of the base directory (e.g., .dircache/objects/). The additional 40 bytes provide space for:

The two‑character hexadecimal suffix ( 00ff).

A possible trailing slash or extra characters used in other parts of the original code.

The terminating null character.

Although the actual suffix occupies only three characters (including the slash), the extra margin ensures the buffer is safely large enough for any future modifications.

Best‑practice recommendation

Even for utilities that terminate quickly, it is advisable to pair every malloc() (or calloc(), realloc()) with a corresponding free() when the allocated memory is no longer needed. This habit prevents hidden leaks in longer‑running programs and makes the code easier to audit.

In the case of init-db.c, adding the following line before returning would eliminate the leak:

free(path);

Conclusion

The original init-db command demonstrates a clear, straightforward implementation of Git’s repository initialization logic: creating a hidden directory structure and populating it with 256 hexadecimal sub‑directories. The source also illustrates a subtle memory‑leak bug—an allocated buffer that is never freed. Understanding this example helps developers appreciate both the elegance of Linus’s low‑level C code and the importance of diligent resource management in foundational tools.

memory managementGitC programming
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.