Fundamentals 12 min read

Building a Simple malloc and Mark‑Sweep GC for the Linux Kernel

This guide walks through creating a 32‑bit Linux‑kernel memory allocator using a free‑list, implementing morecore and add_to_free_list functions, and then adding a basic mark‑and‑sweep garbage collector that scans heap, BSS, data segments and the stack to reclaim unused blocks.

Liangxu Linux
Liangxu Linux
Liangxu Linux
Building a Simple malloc and Mark‑Sweep GC for the Linux Kernel

Overview

The article explains how to implement a basic memory allocator (malloc) and a simple mark‑and‑sweep garbage collector for a 32‑bit Linux kernel environment. It emphasizes that the difficulty lies in memory allocation rather than the garbage‑collection algorithm itself.

Memory Allocator Design

The allocator maintains a linked list of free memory blocks. Each block has a header containing its size and a pointer to the next free block. When a request arrives, a suitable block is removed from the free list and given to the caller; if no block fits, the allocator requests more memory from the kernel.

Free list management: Blocks are added back to the free list when freed.

Used‑list tracking: An additional list tracks currently allocated blocks, allowing easy transfer between free and used lists.

To obtain more memory, the allocator calls the Unix sbrk system call, which moves the program break (the separator address) upward by whole pages. The function morecore() encapsulates this operation, while add_to_free_list() inserts the newly acquired memory into the free list.

Mark‑and‑Sweep Garbage Collection

The GC uses a two‑phase algorithm:

Mark phase: Scan all possible pointers (variables in BSS, initialized data, and the stack) and mark any allocated block that is reachable.

Sweep phase: Walk the used‑list and move any unmarked blocks back to the free list.

Key observations that make this feasible in C:

Any memory address can be accessed via a pointer; if a block is used, it is reachable.

All variables reside somewhere in memory, so scanning known regions can discover references.

Word‑aligned accesses allow the low bits of the next pointer in the header to be used as a tag indicating whether the block is marked.

Scanning Regions

The collector scans:

BSS and initialized data segments (global and static variables).

The heap itself (to find internal pointers).

The stack, using the process’s /proc/<pid>/maps file to locate the stack base, then walking upward from the base.

For the stack base, the article follows the approach used by Boehm GC: read the 28th entry in the proc file, ignoring the first 27 entries.

Initialization

An initialization function opens the appropriate /proc file, extracts the stack base, and sets up the free and used lists. After this setup, the allocator and GC can operate.

Conclusion

The presented code is not a complete production‑ready solution; it may contain bugs and lacks some edge‑case handling. Nevertheless, it demonstrates that a functional malloc and a basic mark‑and‑sweep collector can be built from scratch in C for the Linux kernel, illustrating the “divide‑and‑conquer” strategy for tackling complex systems programming tasks.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Garbage CollectionLinux kernelC programmingmallocmemory allocationMark‑Sweep
Liangxu Linux
Written by

Liangxu Linux

Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.