How Linux Kernel Manages Memory: Allocation, OOM, and Recovery
This article explains Linux kernel memory management by covering process address space layout, allocation mechanisms, OOM killer behavior, overcommit settings, various types of file and anonymous mappings, tmpfs usage, and both manual and automatic memory reclamation techniques.
1. Process Memory Allocation
When a program is started, the terminal calls
execto load the executable into memory; the code, data, BSS, and stack segments are mapped via
mmap, while the heap is created on demand. After
exec, the dynamic linker loads required shared libraries before the process begins execution, which can be traced with
strace.
On the first
malloc, the kernel handles a
brksystem call. If no heap VMA exists, the kernel creates an anonymous mapping with
mmapand adds the VMA to the process's red‑black tree. The user‑space allocator (ptmalloc, tcmalloc, jemalloc, etc.) then subdivides this region and returns the requested block. Large allocations may bypass the heap and use
mmapdirectly; the returned memory is virtual until first accessed, at which point physical pages are allocated.
When
freeis called, memory obtained via
mmapis released with
munmap. Memory obtained via the heap is returned to the allocator, which may later give it back to the kernel.
2. OOM After Memory Exhaustion
The OOM (Out‑of‑Memory) killer selects a process to terminate when the system runs out of memory. Selection factors include memory usage, runtime, priority, user ID, number of child processes, and the
oom_adjscore. The kernel computes an
oom_scorefor each process; the highest score is killed.
Administrators can influence the decision by writing to
/proc/<pid>/oom_adj. Values range from –16 (immune) to 15 (most likely to be killed). Setting
oom_adjto –17 gives a process VIP‑like protection.
The
/proc/sys/vm/overcommit_memorysetting controls allocation policy:
0 – heuristic OOM: modest over‑commit is allowed, but huge virtual allocations trigger OOM.
1 – always allow over‑commit; OOM occurs only when physical memory is truly exhausted.
2 – never exceed
swap + RAM * overcommit_ratio; allocation fails once the limit is reached.
3. Where Allocated Memory Resides
Linux uses two main mapping types:
File mappings (code, data, shared libraries) are cached in the page cache. When multiple processes map the same file, they share the same physical pages.
Anonymous mappings (heap, BSS, stack,
mallocvia
brkor
mmap) are not backed by a file and reside in regular RAM until swapped out.
Experiments show that shared file mappings increase
buff/cache, while private anonymous mappings increase only
usedmemory.
Shared anonymous mappings (e.g.,
mmapwith
MAP_SHARED) also use the page cache; the memory appears in
buff/cacheand is visible to all participating processes.
Tmpfs (including
/dev/shm) creates files in a memory‑backed filesystem. These files are stored in the page cache and cannot be reclaimed while they are referenced, but they can be swapped out.
POSIX and System V shared memory are implemented on top of tmpfs, so their pages are also part of the page cache and share the same reclamation constraints.
4. Memory Reclamation
4.1 Manual Reclamation
Writing to
/proc/sys/vm/drop_cachesforces the kernel to drop clean caches:
echo 1 > /proc/sys/vm/drop_caches # drop page cache echo 2 > /proc/sys/vm/drop_caches # drop dentries and inodes echo 3 > /proc/sys/vm/drop_caches # drop both
Dirty pages must be flushed with
syncbefore they can be dropped.
4.2 Automatic Reclamation
The kernel’s
kswapddaemon periodically scans LRU lists. It moves inactive pages to the reclaimable list and frees them until the free‑page target (
pages_high) is reached. When memory pressure exceeds a critical threshold, a more aggressive reclaim pass runs.
File pages are reclaimed by writing back dirty data and then freeing the cache. Anonymous pages are reclaimed by swapping them out to disk.
The
vm.swappinessparameter (0‑100) controls the balance between swapping anonymous pages and reclaiming cache; higher values favor swapping.
5. Summary
The article reviewed the Linux process address space, explained how memory is allocated via
brkand
mmap, described the OOM killer’s decision process and over‑commit policies, distinguished between file‑backed and anonymous mappings, and covered both manual (
drop_caches) and automatic (kswapd, swap) memory reclamation mechanisms.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.