Understanding Linux Kernel Memory Management: From Process Allocation to OOM Handling
This article explains Linux kernel memory management, covering process address space allocation, OOM selection criteria, memory mapping types, cache behavior, tmpfs, and both manual and automatic memory reclamation mechanisms.
The author, motivated by an OOM talk during an internship, shares a comprehensive overview of Linux kernel memory management after extensive study.
1. Process Memory Allocation
When a program starts, the exec system call loads the executable, mapping code, data, BSS, and stack segments via mmap, while the heap is created on demand using brk or anonymous mmap. The kernel builds VMA structures in the mm_struct red‑black tree and list, and user‑space allocators (ptmalloc, tcmalloc, jemalloc) manage the returned virtual memory. Large allocations may use direct mmap, remaining virtual until first access triggers physical allocation.
2. Out‑of‑Memory (OOM) Mechanism
When memory is exhausted, the OOM killer selects a victim process based on factors such as memory usage, runtime, priority, user, child count, and oom_adj. Each process receives an oom_score; the highest score is killed. Adjusting /proc/<pid>/oom_adj can protect a process (e.g., setting -17 makes it immune). The /proc/sys/vm/overcommit_memory setting influences when OOM occurs: 0 = heuristic, 1 = always allow overcommit, 2 = strict limit based on swap + RAM * overcommit_ratio.
3. Where Allocated Memory Resides
Memory can be mapped as shared file, private file, private anonymous, or shared anonymous. Experiments show:
Shared file mappings (code, shared libraries) increase buff/cache because they are cached in the page cache.
Private file mappings also use the cache, with copy‑on‑write creating private pages on modification.
Private anonymous mappings (heap, stack, BSS) increase only used memory, not cache.
Shared anonymous mappings (e.g., mmap with MAP_SHARED) increase buff/cache as they are backed by the cache.
Tmpfs and other memory‑based filesystems (procfs, sysfs, ramfs) also use the page cache; files created in /dev/shm or other tmpfs mounts increase buff/cache and cannot be reclaimed while referenced.
4. Memory Reclamation
4.1 Manual Reclamation
Writing to /proc/sys/vm/drop_caches (1, 2, or 3) frees page cache, dentries, and inodes after syncing dirty pages.
4.2 Automatic Reclamation
The kernel runs kswapd to scan LRU lists and reclaim pages when free memory falls below pages_low. It moves pages from active to inactive lists, then frees them in batches until pages_high is reached. If memory pressure is severe, a direct reclaim is triggered, similar to kswapd but more aggressive.
File pages are reclaimed by writing back dirty pages or simply dropping clean pages. Anonymous pages are swapped out because they have no backing store.
The /proc/sys/vm/vfs_cache_pressure parameter controls the balance between reclaiming cache versus swapping anonymous pages; higher values favor cache reclamation.
5. Summary
The article reviews the process address space, explains how OOM selects victims, details where different types of memory allocations end up (cache vs. used memory), and describes both manual ( drop_caches) and automatic (kswapd) reclamation strategies, highlighting the kernel’s effort to free memory before resorting to OOM.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
