Fundamentals 12 min read

How Linux Allocates Heap Memory: Inside vm_area_struct and the brk System Call

This article explains the Linux kernel's heap memory allocation process, covering the vm_area_struct and mm_struct data structures, the brk system call implementation, and the page‑fault handling that maps virtual addresses to physical memory, while also mentioning other allocation mechanisms such as mmap and HugePages.

Liangxu Linux
Liangxu Linux
Liangxu Linux
How Linux Allocates Heap Memory: Inside vm_area_struct and the brk System Call

Memory Region (vm_area_struct)

Linux represents each virtual memory region with struct vm_area_struct. A simplified definition is:

struct vm_area_struct {
    struct mm_struct *vm_mm;   // owning mm_struct
    unsigned long vm_start;   // start address
    unsigned long vm_end;     // end address
    struct vm_area_struct *vm_next; // linked‑list pointer
    struct rb_node vm_rb;      // node in the red‑black tree
    /* ... other fields omitted ... */
};

The fields locate the region, chain all regions, and enable fast lookup via a red‑black tree.

Process Memory Descriptor (mm_struct)

Each process has a struct mm_struct that owns all its vm_area_struct objects:

struct mm_struct {
    struct vm_area_struct *mmap;   // head of VMA list
    struct rb_root mm_rb;          // root of VMA red‑black tree
    unsigned long start_brk, brk;  // heap start and current top
    /* ... other fields omitted ... */
};

Heap Expansion via sys_brk

The brk() system call is implemented by sys_brk (Linux 2.6.32). Core logic:

unsigned long sys_brk(unsigned long brk)
{
    struct mm_struct *mm = current->mm;
    unsigned long rlim, newbrk, oldbrk, retval;

    down_write(&mm->mmap_sem);
    rlim = current->signal->rlim[RLIMIT_DATA].rlim_cur;
    if (rlim < RLIM_INFINITY &&
        (brk - mm->start_brk) + (mm->end_data - mm->start_data) > rlim)
        goto out;

    newbrk = PAGE_ALIGN(brk);
    oldbrk = PAGE_ALIGN(mm->brk);
    if (oldbrk != newbrk && do_brk(oldbrk, newbrk - oldbrk) != oldbrk)
        goto out;

    mm->brk = brk;
out:
    retval = mm->brk;
    up_write(&mm->mmap_sem);
    return retval;
}

Four main steps:

Check the RLIMIT_DATA limit.

If the aligned new brk equals the old value, skip further work.

Otherwise call do_brk to adjust the heap VMA.

Store the new value in mm->brk.

do_brk

do_brk

locates the heap VMA (the one whose vm_start equals mm->start_brk) and updates its vm_end to the new brk value.

Page‑Fault Handling

When a program accesses an unmapped virtual address, the CPU raises a page‑fault exception. The kernel handles it with do_page_fault:

void do_page_fault(struct pt_regs *regs, unsigned long error_code)
{
    unsigned long address = read_cr2();               // faulting address
    struct vm_area_struct *vma = find_vma(current->mm, address);
    if (vma && vma->vm_start <= address) {
        int write = error_code & PF_WRITE;
        handle_mm_fault(current->mm, vma, address,
                        write ? FAULT_FLAG_WRITE : 0);
    }
    /* ... */
}

handle_mm_fault

Walks the four‑level page table hierarchy, allocating missing entries, and finally calls handle_pte_fault:

int handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma,
                    unsigned long address, unsigned int flags)
{
    pgd_t *pgd = pgd_offset(mm, address);
    pud_t *pud = pud_alloc(mm, pgd, address);
    pmd_t *pmd = pmd_alloc(mm, pud, address);
    pte_t *pte = pte_alloc_map(mm, pmd, address);
    return handle_pte_fault(mm, vma, address, pte, pmd, flags);
}

handle_pte_fault

If the PTE is not present, it delegates to do_anonymous_page for anonymous (heap) pages:

int handle_pte_fault(struct mm_struct *mm, struct vm_area_struct *vma,
                     unsigned long address, pte_t *pte, pmd_t *pmd,
                     unsigned int flags)
{
    pte_t entry = *pte;
    if (!pte_present(entry)) {
        if (pte_none(entry))
            return do_anonymous_page(mm, vma, address, pte, pmd, flags);
        /* other fault types omitted */
    }
    return 0;
}

do_anonymous_page

Distinguishes read‑only and write faults:

int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
                      unsigned long address, pte_t *pte, pmd_t *pmd,
                      unsigned int flags)
{
    pte_t entry;
    if (!(flags & FAULT_FLAG_WRITE)) {
        /* map the global zero page (read‑only) */
        entry = pte_mkspecial(pfn_pte(my_zero_pfn(address), vma->vm_page_prot));
        goto setpte;
    }
    /* allocate a zero‑filled page for write */
    struct page *page = alloc_zeroed_user_highpage_movable(vma, address);
    entry = mk_pte(page, vma->vm_page_prot);
    if (vma->vm_flags & VM_WRITE)
        entry = pte_mkwrite(pte_mkdirty(entry));
setpte:
    set_pte_at(mm, address, pte, entry);
    return 0;
}

Read faults reuse the shared zero page to save memory; write faults allocate a fresh physical page and mark it writable.

Illustrative Diagram

Memory region layout
Memory region layout

Summary

The complete path from a malloc call to physical memory is: malloc eventually invokes brk, which runs sys_brk. sys_brk validates limits, calls do_brk, and updates mm->brk. do_brk expands the heap VMA by adjusting its vm_end.

When the process first accesses the new region, a page fault occurs; the kernel walks the page‑table hierarchy ( do_page_faulthandle_mm_faulthandle_pte_faultdo_anonymous_page) and maps the virtual address to a physical page (zero page for reads, newly allocated page for writes).

Other allocation mechanisms such as mmap or HugePages follow different code paths and are not covered here.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Linux kernelHeapPage Faultmemory allocationvm_area_structbrk system call
Liangxu Linux
Written by

Liangxu Linux

Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.