How Linux Allocates Heap Memory: Inside vm_area_struct and the brk System Call
This article explains the Linux kernel's heap memory allocation process, covering the vm_area_struct and mm_struct data structures, the brk system call implementation, and the page‑fault handling that maps virtual addresses to physical memory, while also mentioning other allocation mechanisms such as mmap and HugePages.
Memory Region (vm_area_struct)
Linux represents each virtual memory region with struct vm_area_struct. A simplified definition is:
struct vm_area_struct {
struct mm_struct *vm_mm; // owning mm_struct
unsigned long vm_start; // start address
unsigned long vm_end; // end address
struct vm_area_struct *vm_next; // linked‑list pointer
struct rb_node vm_rb; // node in the red‑black tree
/* ... other fields omitted ... */
};The fields locate the region, chain all regions, and enable fast lookup via a red‑black tree.
Process Memory Descriptor (mm_struct)
Each process has a struct mm_struct that owns all its vm_area_struct objects:
struct mm_struct {
struct vm_area_struct *mmap; // head of VMA list
struct rb_root mm_rb; // root of VMA red‑black tree
unsigned long start_brk, brk; // heap start and current top
/* ... other fields omitted ... */
};Heap Expansion via sys_brk
The brk() system call is implemented by sys_brk (Linux 2.6.32). Core logic:
unsigned long sys_brk(unsigned long brk)
{
struct mm_struct *mm = current->mm;
unsigned long rlim, newbrk, oldbrk, retval;
down_write(&mm->mmap_sem);
rlim = current->signal->rlim[RLIMIT_DATA].rlim_cur;
if (rlim < RLIM_INFINITY &&
(brk - mm->start_brk) + (mm->end_data - mm->start_data) > rlim)
goto out;
newbrk = PAGE_ALIGN(brk);
oldbrk = PAGE_ALIGN(mm->brk);
if (oldbrk != newbrk && do_brk(oldbrk, newbrk - oldbrk) != oldbrk)
goto out;
mm->brk = brk;
out:
retval = mm->brk;
up_write(&mm->mmap_sem);
return retval;
}Four main steps:
Check the RLIMIT_DATA limit.
If the aligned new brk equals the old value, skip further work.
Otherwise call do_brk to adjust the heap VMA.
Store the new value in mm->brk.
do_brk
do_brklocates the heap VMA (the one whose vm_start equals mm->start_brk) and updates its vm_end to the new brk value.
Page‑Fault Handling
When a program accesses an unmapped virtual address, the CPU raises a page‑fault exception. The kernel handles it with do_page_fault:
void do_page_fault(struct pt_regs *regs, unsigned long error_code)
{
unsigned long address = read_cr2(); // faulting address
struct vm_area_struct *vma = find_vma(current->mm, address);
if (vma && vma->vm_start <= address) {
int write = error_code & PF_WRITE;
handle_mm_fault(current->mm, vma, address,
write ? FAULT_FLAG_WRITE : 0);
}
/* ... */
}handle_mm_fault
Walks the four‑level page table hierarchy, allocating missing entries, and finally calls handle_pte_fault:
int handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma,
unsigned long address, unsigned int flags)
{
pgd_t *pgd = pgd_offset(mm, address);
pud_t *pud = pud_alloc(mm, pgd, address);
pmd_t *pmd = pmd_alloc(mm, pud, address);
pte_t *pte = pte_alloc_map(mm, pmd, address);
return handle_pte_fault(mm, vma, address, pte, pmd, flags);
}handle_pte_fault
If the PTE is not present, it delegates to do_anonymous_page for anonymous (heap) pages:
int handle_pte_fault(struct mm_struct *mm, struct vm_area_struct *vma,
unsigned long address, pte_t *pte, pmd_t *pmd,
unsigned int flags)
{
pte_t entry = *pte;
if (!pte_present(entry)) {
if (pte_none(entry))
return do_anonymous_page(mm, vma, address, pte, pmd, flags);
/* other fault types omitted */
}
return 0;
}do_anonymous_page
Distinguishes read‑only and write faults:
int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
unsigned long address, pte_t *pte, pmd_t *pmd,
unsigned int flags)
{
pte_t entry;
if (!(flags & FAULT_FLAG_WRITE)) {
/* map the global zero page (read‑only) */
entry = pte_mkspecial(pfn_pte(my_zero_pfn(address), vma->vm_page_prot));
goto setpte;
}
/* allocate a zero‑filled page for write */
struct page *page = alloc_zeroed_user_highpage_movable(vma, address);
entry = mk_pte(page, vma->vm_page_prot);
if (vma->vm_flags & VM_WRITE)
entry = pte_mkwrite(pte_mkdirty(entry));
setpte:
set_pte_at(mm, address, pte, entry);
return 0;
}Read faults reuse the shared zero page to save memory; write faults allocate a fresh physical page and mark it writable.
Illustrative Diagram
Summary
The complete path from a malloc call to physical memory is: malloc eventually invokes brk, which runs sys_brk. sys_brk validates limits, calls do_brk, and updates mm->brk. do_brk expands the heap VMA by adjusting its vm_end.
When the process first accesses the new region, a page fault occurs; the kernel walks the page‑table hierarchy ( do_page_fault → handle_mm_fault → handle_pte_fault → do_anonymous_page) and maps the virtual address to a physical page (zero page for reads, newly allocated page for writes).
Other allocation mechanisms such as mmap or HugePages follow different code paths and are not covered here.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Liangxu Linux
Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
