Understanding Linux Kernel Memory Management: From Allocation to OOM
This article explains Linux kernel memory management, covering process address space layout, memory allocation, OOM handling, where allocated memory resides, and both manual and automatic memory reclamation techniques, with practical examples and visual illustrations.
During an internship the author became fascinated by Linux kernel memory management and decided to document the knowledge after extensive study.
The article analyzes a single process's memory layout and allocation from a global perspective, covering four main topics:
Process memory request and allocation
Out‑of‑Memory (OOM) handling
Where the allocated memory resides
System memory reclamation
1. Process Memory Request and Allocation
When a program is started, the exec system call loads the executable into memory, mapping the code, data, BSS, and stack segments via mmap, while the heap is created on demand. The dynamic linker then loads required shared libraries before execution begins.
Memory allocation via malloc first uses brk to extend the heap; if no VMA exists, an anonymous mmap creates a new region and adds it to the process's mm_struct red‑black tree. Large allocations may call mmap directly, returning virtual memory that is backed by physical pages only upon first access.
When free releases memory, if the region was allocated with mmap it is returned to the kernel with munmap; otherwise the memory is returned to the allocator, which later releases it back to the system.
2. OOM After Memory Exhaustion
When the system runs out of memory, the OOM killer selects a process to terminate based on factors such as memory usage, runtime, priority, user ID, child count, and the oom_adj score. The function select_bad_process computes an oom_score for each process; the highest score is killed.
Administrators can influence the decision by writing to /proc/<pid>/oom_adj. Setting oom_adj to –17 makes a process immune to OOM termination.
The kernel parameter /proc/sys/vm/overcommit_memory controls allocation behavior:
0 – heuristic overcommit (default)
1 – always allow overcommit
2 – never exceed a calculated limit (swap + RAM × ratio)
3. Where Does Allocated Memory Reside?
Memory can be mapped as shared or private file mappings, or as anonymous mappings. Shared file mappings (code and shared libraries) are cached in the kernel page cache. Private file mappings also use the cache, but modifications trigger copy‑on‑write, allocating separate pages.
Anonymous private mappings (heap, stack, BSS) are allocated directly without involving the page cache, increasing only the used memory.
Shared anonymous mappings (used for inter‑process communication) are backed by the page cache, so the cache grows when they are created.
4. System Memory Reclamation
4.1 Manual Reclamation
Writing 1, 2, or 3 to /proc/sys/vm/drop_caches frees page cache, dentries, and inodes respectively. Dirty pages must be flushed with sync before they can be dropped.
4.2 tmpfs
tmpfs, procfs, sysfs, and ramfs are memory‑based filesystems. Files in tmpfs are stored in the page cache and may also use swap. The read/write paths involve shmem_file_read and shmem_file_write, which operate on the page cache and mark pages dirty when modified.
4.3 Shared Memory
POSIX and System V shared memory are implemented on top of tmpfs; they create a file in tmpfs and map it into processes, making the memory non‑evictable until explicitly removed.
4.4 Automatic Reclamation
The kernel’s kswapd daemon periodically scans LRU lists and reclaims pages. It first tries to free clean file pages, writing back dirty pages if necessary. Anonymous pages are swapped out to disk. The vm.swappiness parameter controls the balance between swapping and cache reclamation (0 prefers cache, 100 prefers swap).
5. Summary
The article reviewed process address space, explained how Linux allocates and frees memory, described manual and automatic reclamation mechanisms, and detailed the behavior of OOM killing. It demonstrates how the kernel strives to free memory efficiently while preserving necessary data.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
