Inside Linux Physical Memory Management: From FLATMEM to NUMA, Watermarks, and Page Structures
This article provides an in‑depth, step‑by‑step explanation of how the Linux kernel organizes and manages physical memory, covering memory models (FLATMEM, DISCONTIGMEM, SPARSEMEM), NUMA vs. UMA architectures, zone partitioning, watermarks, reserved pages, hot‑cold page handling, and the detailed struct page layout used for both anonymous and file‑backed pages.
1. Physical Memory Models
The kernel supports three models: FLATMEM (contiguous memory managed by a global mem_map array), DISCONTIGMEM (multiple node_mem_map arrays for non‑contiguous regions), and SPARSEMEM (section‑based management for sparse, hot‑plugged memory).
2. Memory Architectures
Two architectures are described: UMA (Uniform Memory Access) where all CPUs share a single memory node, and NUMA (Non‑Uniform Memory Access) where each CPU has a local memory node and remote accesses incur higher latency.
3. NUMA Nodes and Zones
Each NUMA node is represented by struct pglist_data, which contains an array of struct zone objects (ZONE_DMA, ZONE_DMA32, ZONE_NORMAL, ZONE_HIGHMEM, ZONE_MOVABLE). Zones hold per‑node memory statistics and a buddy allocator ( free_area array) for page allocation.
4. Reserved Memory and Low‑Memory Reserves
Zones keep a reserved pool ( nr_reserved_highatomic) for critical kernel operations and a low‑memory reserve calculated from /proc/sys/vm/lowmem_reserve_ratio to prevent high‑order zones from starving low‑order zones.
5. Watermarks (WMARK_MIN, WMARK_LOW, WMARK_HIGH)
Watermarks define three memory thresholds based on min_free_kbytes. When free pages fall below WMARK_LOW, the kswapd daemon is woken; below WMARK_MIN direct reclaim occurs. The values are computed from total managed pages and can be tuned via /proc/sys/vm/min_free_kbytes and /proc/sys/vm/watermark_scale_factor.
6. Hot and Cold Pages
Pages are placed on LRU lists: active (hot) and inactive (cold). Separate lists exist for anonymous and file‑backed pages, allowing the kernel to prioritize reclaim based on the swappiness setting. Pages also have per‑CPU caches to reduce allocation latency.
7. struct page Overview
The kernel describes every physical page with struct page. Key fields include: flags – status bits (e.g., PG_locked, PG_dirty, PG_active, PG_lru). mapping – points to struct address_space for file pages or to an encoded struct anon_vma for anonymous pages (low bit distinguishes the two). index – page offset within the file cache or offset inside a VMA for anonymous pages. _mapcount – number of VMA mappings to this page. _refcount – kernel references to the page. lru – list head linking the page to the appropriate LRU list.
Compound (huge) pages are built from multiple contiguous pages; the head page has PG_head set and stores compound_order, compound_dtor, and reference counters. Tail pages point back to the head via compound_head.
8. Slab Allocation
Small kernel objects (e.g., anon_vma, vm_area_struct) are allocated from slab caches. Each slab page embeds a struct kmem_cache pointer, a freelist, and usage counters ( inuse, objects).
9. Anonymous Page Reverse Mapping
Anonymous pages use struct anon_vma and struct anon_vma_chain to map a physical page back to all VMAs that reference it. The anon_vma holds a red‑black tree of anon_vma_chain entries, each linking to a specific vm_area_struct. This enables fast lookup when a page must be reclaimed or migrated.
10. Practical Tools
Useful commands to inspect the state: cat /proc/zoneinfo – shows per‑zone free pages, watermarks, and LRU counts. numactl -H – displays NUMA node layout and distances. cat /proc/sys/vm/* – view tunable parameters such as min_free_kbytes, swappiness, and watermark_scale_factor.
Conclusion
The Linux kernel combines hierarchical structures (nodes → zones → pages), flexible memory models, and sophisticated reclaim mechanisms to efficiently manage physical memory on modern NUMA systems while providing fast access for both file‑backed and anonymous pages.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Bin's Tech Cabin
Original articles dissecting source code and sharing personal tech insights. A modest space for serious discussion, free from noise and bureaucracy.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
