Master Linux Memory Management: From CPU Access to CMA and Page Faults
This comprehensive guide walks through Linux memory management, explaining CPU memory access, virtual‑to‑physical address translation, page‑table structures, zone organization, the buddy and slab allocators, vmalloc, page‑fault handling, and the Contiguous Memory Allocator (CMA) with detailed code examples and diagrams.
CPU Access to Memory
The CPU accesses memory through a series of steps illustrated by diagrams: the CPU (blue) reads the address, the MMU translates virtual addresses using the TLB and caches, and the physical memory (gray) is accessed.
Virtual Address Translation
On ARMv8 the virtual address space size is controlled by CONFIG_ARM64_VA_BITS (typically 48 bits). Kernel space uses TTBR1_EL1, user space uses TTBR0_EL0. The translation follows a four‑level page table hierarchy (PGD → PUD → PMD → PTE). The MMU selects the appropriate base address based on the highest bit of the virtual address, then walks the tables to obtain the physical page frame number (PFN) and combines it with the page offset to form the final physical address.
Linux Memory Initialization
During early boot the head.S file creates the initial page tables via create_page_tables, which sets up identity mappings for idmap_text and kernel image mappings for .text, .rodata, .data, .bss, etc. The code shows the assembly sequence that calls __create_page_tables and then proceeds to CPU setup.
Zone‑Based Page‑Frame Allocator
Linux organizes physical memory into nodes (NUMA) or a single node (UMA), then subdivides each node into zones (ZONE_DMA, ZONE_NORMAL, ZONE_HIGHMEM). Within zones, memory is represented by struct page objects, each covering a 4 KB page. Page frames are identified by PFN ( physical_address >> PAGE_SHIFT).
Allocation Process
Before any allocation, the memblock subsystem manages the raw memory.
Page‑table mappings are then created.
Finally, the zoned allocator fulfills requests.
Allocation Functions
Six public interfaces eventually call __alloc_pages_nodemask. Fast allocation tries per‑CPU caches and the buddy system; if those fail, the slow path performs reclamation, waiting, or page‑swap‑in.
Buddy Allocator
The buddy system maintains free lists for block sizes 1, 2, 4, 8, … 1024 pages. When a request cannot be satisfied at the desired order, larger blocks are split; when a block is freed, adjacent free blocks are merged. The core functions get_page_from_freelist, rmqueue, and __alloc_pages_nodemask implement this logic, with water‑mark checks to decide between fast and slow paths.
Watermarks
Each zone has three watermarks: min, low, and high (ratio 4:5:6). If free pages fall below min, direct reclaim occurs; below low, the kswapd daemon is awakened; above high, the zone is considered healthy.
Fragmentation and Compaction
Internal fragmentation occurs when a 4 KB page is allocated for a smaller request; external fragmentation occurs when enough total free memory exists but not as a contiguous block. The kernel’s page‑migration mechanism compacts memory by scanning from both ends of a zone, moving movable pages to create larger contiguous regions. Three compaction strategies are implemented in alloc_pages_direct_compact.
Slab Allocator
For byte‑size allocations, the slab allocator builds on top of the buddy system. Allocation proceeds through per‑CPU caches, partial lists, node‑level partial lists, and finally creates a new slab if needed. Diagrams illustrate the four‑step process of kmem_cache_alloc.
vmalloc
vmallocprovides virtually contiguous memory by allocating individual pages (via alloc_page) and mapping them into a contiguous virtual range between VMALLOC_START and VMALLOC_END. The three‑step process is: find a free virtual hole, allocate the required pages, and map them into the hole.
Page‑Fault Handling
When a process accesses an unmapped virtual address, the CPU raises a page‑fault exception. The ARM64 vector table dispatches to el1_sync, which examines the ESR register to determine the fault class and calls do_page_fault. The kernel then locates the vm_area_struct, checks permissions, and invokes handle_mm_fault. Depending on the situation, the fault is handled by: do_anonymous_page for anonymous mappings (zero‑page optimization for reads, copy‑on‑write for writes). do_swap_page to bring pages back from swap. do_wp_page for copy‑on‑write of shared writable pages.
Contiguous Memory Allocator (CMA)
CMA reserves a region of memory for allocating large contiguous blocks, useful for DMA‑capable devices. The region can be defined via Device Tree ( reserved-memory/linux,cma) or kernel command line ( cma=...). During boot, cma_init_reserved_areas marks the pages with MIGRATE_CMA and adds them to the buddy system via __free_pages. Allocation uses cma_alloc, which ultimately calls alloc_contig_range with the MIGRATE_CMA flag. Because CMA may trigger page migration and reclamation, it should not be used in atomic contexts.
Conclusion
The article ties together CPU memory access, address translation, zone‑based allocation, the buddy and slab allocators, vmalloc, page‑fault handling, and CMA, providing a complete picture of Linux memory management and forming a solid foundation for further kernel study.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Liangxu Linux
Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
