Fundamentals 60 min read

Unlocking Linux Memory Management: From malloc to Buddy Allocation

This comprehensive guide explores Linux's memory management ecosystem, detailing user‑space allocators like malloc, kernel‑space mechanisms such as kmalloc, vmalloc, and mmap, and core kernel algorithms including the Buddy system, CMA, slab allocator, and memory pools, while providing code examples and practical usage tips.

Deepin Linux
Deepin Linux
Deepin Linux
Unlocking Linux Memory Management: From malloc to Buddy Allocation

1. Memory Allocation Trio

In the complex Linux ecosystem, memory management is the core hub that keeps the system running efficiently and stably. Think of Linux as a bustling metropolis where memory is the land; each process needs a portion of this land to store data and execute instructions. Without a scientific memory management mechanism, the city would suffer from uneven allocation, leaks, slow performance, and even crashes.

1.1 malloc (User‑Space Allocation)

The malloc function is the standard C library routine for dynamic memory allocation. It first tries to satisfy requests from a cache; if the cache is insufficient, it interacts with the kernel via system calls. Small allocations (<128 KB) typically use brk to extend the heap, while larger ones use mmap to map a new virtual memory area.

Steps of malloc implementation:

Adjust request size for alignment and internal metadata.

Search a free‑list for a block large enough.

Split the block if it is larger than needed.

Update metadata of the allocated block.

Return a pointer to the usable memory.

<code>typedef struct node {
    size_t size;       // block size
    struct node* next; // next block
    struct node* prev; // previous block (optional)
} Node;

Node* free_list = NULL; // head of free list

void* malloc(size_t size) {
    size += sizeof(Node); // reserve space for metadata
    Node* block = find_suitable_block(free_list, size);
    if (!block) {
        block = allocate_new_block(size); // may extend heap or split existing block
    } else {
        remove_from_list(block);
    }
    block->size = size;
    return (void*)((char*)block + sizeof(Node));
}
</code>

1.2 kmalloc (Kernel‑Space Allocation)

kmalloc allocates physically contiguous memory in the kernel, similar to malloc but for kernel code. It draws from slab caches of various sizes (32 B to 128 KB). The GFP_KERNEL flag may sleep; GFP_ATOMIC is used in interrupt context where sleeping is forbidden.

<code>static inline void *kmalloc(size_t size, gfp_t flags) {
    if (__builtin_constant_p(size)) {
        if (size > KMALLOC_MAX_CACHE_SIZE)
            return kmalloc_large(size, flags);
        unsigned int index = kmalloc_index(size);
        if (!index)
            return ZERO_SIZE_PTR;
        return kmem_cache_alloc_trace(kmalloc_caches[kmalloc_type(flags)][index], flags, size);
    }
    return __kmalloc(size, flags);
}
</code>

1.3 vmalloc (Kernel Virtual Allocation)

vmalloc creates virtually contiguous but physically non‑contiguous memory, useful for large buffers when physical continuity is unavailable. It builds a new vm_struct , allocates pages, and maps them into a contiguous virtual range.

<code>void *vmalloc(unsigned long size);
void vfree(const void *addr);
</code>

2. mmap Function Details

mmap maps a file or device into a process's address space, allowing the process to read/write the file via normal memory accesses. The kernel creates a vm_area_struct for the mapping and links it to the file's inode.

Typical usage steps:

Open the file with open to obtain a file descriptor.

Call mmap to create the mapping; it returns the start address.

Read or write through the returned pointer.

Unmap with munmap .

Close the file descriptor.

Key points:

The mapping size must be a multiple of the page size.

Accessing unmapped pages triggers a page‑fault; the kernel loads the needed page from disk.

Changes are written back automatically or can be forced with msync .

2.1 mmap Prototype

<code>void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset);
int munmap(void *addr, size_t length);
</code>

2.2 Practical Example (iOS)

<code>// ViewController.m (iOS example)
#import "ViewController.h"
#import <sys/mman.h>
#import <sys/stat.h>

int MapFile(const char *inPathName, void **outDataPtr, size_t *outDataLength, size_t appendSize) {
    int fd = open(inPathName, O_RDWR, 0);
    if (fd < 0) return errno;
    struct stat st;
    if (fstat(fd, &st) != 0) return errno;
    ftruncate(fd, st.st_size + appendSize);
    *outDataPtr = mmap(NULL, st.st_size + appendSize, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
    if (*outDataPtr == MAP_FAILED) return errno;
    close(fd);
    return 0;
}

void ProcessFile(const char *path) {
    size_t len; void *ptr; char *append = " append_key2";
    if (MapFile(path, &ptr, &len, strlen(append)) == 0) {
        memcpy((char*)ptr + len, append, strlen(append));
        munmap(ptr, len + strlen(append));
    }
}
</code>

3. Page Allocation Functions

Linux provides low‑level page allocators such as alloc_page , alloc_pages , and the __get_free_pages family. They return page structures or physical addresses for one or more contiguous pages.

<code>#define alloc_page(gfp_mask)  alloc_pages(gfp_mask, 0)
#define alloc_pages(gfp_mask, order) alloc_pages_node(numa_node_id(), gfp_mask, order)
static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask, unsigned int order) {
    if (order >= MAX_ORDER) return NULL;
    if (nid < 0) nid = numa_node_id();
    return __alloc_pages(gfp_mask, order, nodemask_for_node(nid, gfp_mask));
}
</code>

The companion free functions are free_page and free_pages which release the pages back to the buddy system.

<code>#define free_page(addr) free_pages((addr), 0)
void free_pages(unsigned long addr, unsigned int order) {
    if (addr) __free_pages(virt_to_page((void *)addr), order);
}
</code>

4. Buddy System

The Buddy allocator manages memory in blocks of size 2ⁿ pages, keeping free lists for each order. It minimizes external fragmentation by merging adjacent free blocks (buddies) when possible.

4.1 Data Structures

<code>#define MAX_ORDER 11
struct zone {
    struct free_area free_area[MAX_ORDER];
    /* ... */
};
struct free_area {
    struct list_head free_list;
    unsigned long nr_free; // number of free blocks of this order
};
</code>

4.2 Allocation

When a request for order k arrives, the allocator searches the free list of that order. If empty, it looks for a higher order, splits a larger block, and inserts the remainder into the appropriate lower‑order list.

<code>page = __rmqueue(zone, order);
for (current_order = order; current_order < MAX_ORDER; ++current_order) {
    area = zone->free_area + current_order;
    if (list_empty(&area->free_list)) continue;
    page = list_first_entry(&area->free_list, struct page, lru);
    list_del(&page->lru);
    rmv_page_order(page);
    area->nr_free--;
    zone->free_pages -= 1UL << order;
    return expand(zone, page, order, current_order, area);
}
</code>

4.3 Freeing

When a block is freed, the allocator checks whether its buddy is also free and of the same order; if so, they are merged into a higher‑order block, repeating until no further merge is possible.

<code>free_pages_bulk(zone, 1, &list, order);
while (order < MAX_ORDER-1) {
    buddy_idx = page_idx ^ (1 << order);
    buddy = base + buddy_idx;
    if (!page_is_buddy(buddy, order)) break;
    list_del(&buddy->lru);
    area = zone->free_area + order;
    area->nr_free--;
    rmv_page_order(buddy);
    page_idx &= buddy_idx;
    order++;
}
coalesced = base + page_idx;
set_page_order(coalesced, order);
list_add(&coalesced->lru, &zone->free_area[order].free_list);
zone->free_area[order].nr_free++;
</code>

5. CMA (Contiguous Memory Allocator)

CMA reserves a region of physical memory that can be reclaimed and turned into a contiguous block for devices that need DMA‑able memory. When a large contiguous allocation is requested and normal zones cannot satisfy it, the kernel moves movable pages out of the CMA area, updates page tables, and hands the now‑free contiguous region to the device.

6. Slab Allocator

The slab allocator provides caches (slab caches) for objects of the same size, reducing fragmentation and allocation overhead for frequently used kernel structures.

6.1 Creating a Slab Cache

<code>struct kmem_cache *kmem_cache_create(const char *name, size_t size,
                                      size_t align, unsigned long flags,
                                      void (*ctor)(void *, struct kmem_cache *, unsigned long),
                                      void (*dtor)(void *, struct kmem_cache *, unsigned long));
</code>

Parameters include the cache name, object size, alignment, allocation flags (e.g., SLAB_HWCACHE_ALIGN , SLAB_CACHE_DMA ), and optional constructor/destructor callbacks.

6.2 Allocation and Free

<code>void *kmem_cache_alloc(struct kmem_cache *cachep, gfp_t flags);
void kmem_cache_free(struct kmem_cache *cachep, void *objp);
</code>

6.3 Destroying a Cache

<code>int kmem_cache_destroy(struct kmem_cache *cachep);
</code>

Example: a cache for struct thread_info objects.

<code>static struct kmem_cache *thread_info_cache;
thread_info_cache = kmem_cache_create("thread_info", sizeof(struct thread_info),
                                      SLAB_HWCACHE_ALIGN|SLAB_PANIC, NULL, NULL);
struct thread_info *ti = kmem_cache_alloc(thread_info_cache, GFP_KERNEL);
/* use ti */
kmem_cache_free(thread_info_cache, ti);
kmem_cache_destroy(thread_info_cache);
</code>

7. Memory Pools

Linux also offers generic memory‑pool support for allocating a predefined number of objects, useful when allocation failures must be avoided.

7.1 Creating a Pool

<code>mempool_t *mempool_create(int min_nr,
                         mempool_alloc_t *alloc_fn,
                         mempool_free_t *free_fn,
                         void *pool_data);
</code>

7.2 Allocate / Free

<code>void *mempool_alloc(mempool_t *pool, int gfp_mask);
void mempool_free(void *element, mempool_t *pool);
</code>

7.3 Destroying a Pool

<code>void mempool_destroy(mempool_t *pool);
</code>

8. Application Cases

8.1 Real‑World Scenarios

Database servers benefit from the Buddy allocator for large contiguous buffers and from the slab allocator for frequently used small structures such as connection objects. Embedded devices like smart cameras rely on CMA to obtain contiguous DMA‑able memory for high‑throughput image transfers.

8.2 Common Issues and Solutions

Long‑running workloads can cause memory fragmentation, making large contiguous allocations fail. Tools like defrag (run as root) can compact memory. In constrained embedded environments, using memory pools or pre‑allocating CMA regions prevents allocation failures.

8.3 Monitoring Memory Allocation

Buddy statistics are available in /proc/buddyinfo , showing free blocks per order for each zone. Slab usage can be inspected with cat /proc/slabinfo , which lists active objects, total objects, object size, and per‑slab statistics. General memory usage can be observed with free -h , top , or htop .

Memory management illustration
Memory management illustration
mmap diagram
mmap diagram
Buddy allocation example
Buddy allocation example
Buddy allocation process
Buddy allocation process
Buddy allocation after split
Buddy allocation after split
Buddy free and merge
Buddy free and merge
Buddy allocation example
Buddy allocation example
memory managementLinuxMMAPmallocSlab AllocatorBuddy Algorithmkmalloc
Deepin Linux
Written by

Deepin Linux

Research areas: Windows & Linux platforms, C/C++ backend development, embedded systems and Linux kernel, etc.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.