Why Does Linux Need Swapping? Uncovering Memory Management Mechanics
This article explains Linux's swapping mechanism, detailing how the kernel moves rarely used memory pages to disk to handle memory shortage and idle memory, the underlying allocation paths, key parameters, and the trade‑offs involved for system performance and stability.
Memory Shortage
When a system's demand exceeds available physical RAM, the Linux kernel swaps out infrequently used pages to the configured swap space, freeing memory for active processes. This forced reclamation, called Direct Page Reclaim, is triggered during page allocation via __alloc_pages_nodemask and may invoke functions such as __alloc_pages_slowpath, __alloc_pages_direct_compact, __alloc_pages_direct_reclaim, and __alloc_pages_may_oom to compress, reclaim, or ultimately kill processes if memory remains insufficient.
static inline struct page * __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, struct alloc_context *ac) {
...
if (alloc_flags & ALLOC_KSWAPD)
wake_all_kswapds(order, gfp_mask, ac);
page = get_page_from_freelist(gfp_mask, order, alloc_flags, ac);
if (page) goto got_pg;
if (can_direct_reclaim && (costly_order || (order > 0 && ac->migratetype != MIGRATE_MOVABLE)) && !gfp_pfmemalloc_allowed(gfp_mask)) {
page = __alloc_pages_direct_compact(gfp_mask, order, alloc_flags, ac, INIT_COMPACT_PRIORITY, &compact_result);
if (page) goto got_pg;
...
}
retry:
page = __alloc_pages_direct_reclaim(gfp_mask, order, alloc_flags, ac, &did_some_progress);
page = __alloc_pages_direct_compact(gfp_mask, order, alloc_flags, ac, compact_priority, &compact_result);
page = __alloc_pages_may_oom(gfp_mask, order, ac, &did_some_progress);
got_pg:
return page;
}Wake the kswapd daemon to reclaim memory and call get_page_from_freelist for fast page retrieval.
Expensive allocations invoke __alloc_pages_direct_compact to compact memory before another get_page_from_freelist attempt.
Call __alloc_pages_direct_reclaim to directly reclaim pages.
Retry __alloc_pages_direct_compact to compress and obtain free pages.
Finally, __alloc_pages_may_oom attempts allocation and may trigger an out‑of‑memory kill if it fails.
These steps illustrate Linux's strategies for handling memory pressure: compression, direct reclamation, and OOM termination.
Idle Memory
During early application startup, large amounts of memory are allocated but seldom used afterward. The background daemon kswapd monitors low‑watermarks (WMARK_LOW, WMARK_HIGH, WMARK_MIN) and swaps out idle pages to keep free memory available for other processes.
The kernel uses a Least Recently Used (LRU) algorithm with active_list and
inactive_list</b> to track page usage. Pages move to the head of their list on access; the tail of the active list holds the oldest pages. <code>kswapdbalances the two lists by moving pages from the active tail to the inactive head, while shrink_zones reclaims pages from the inactive list.
enum lru_list {
LRU_INACTIVE_ANON = LRU_BASE,
LRU_ACTIVE_ANON = LRU_BASE + LRU_ACTIVE,
LRU_INACTIVE_FILE = LRU_BASE + LRU_FILE,
LRU_ACTIVE_FILE = LRU_BASE + LRU_FILE + LRU_ACTIVE,
LRU_UNEVICTABLE,
NR_LRU_LISTS
};Anonymous pages (ANON) store process stacks and heap, while file‑backed pages (FILE) correspond to program binaries or data. LRU_UNEVICTABLE marks pages that must not be reclaimed.
Conclusion
Swapping provides a safety net for both memory shortage and idle memory scenarios, allowing Linux to keep applications running by moving rarely used pages to disk instead of terminating them outright. Administrators can tune swap behavior with kernel parameters, but the decision to enable or disable swap should consider the workload—for example, Kubernetes nodes typically require swap to be disabled.
Swapping frees memory for active processes when physical RAM is exhausted.
Swapping reclaims idle pages to make room for future allocations.
Key discussion points include which kernel parameters control swap behavior and when sacrificing some service quality for partial availability is acceptable.
Linux provides parameters such as vm.swappiness, vm.min_free_kbytes, and water‑mark thresholds to fine‑tune swapping.
Use cases where reduced service quality is preferable to outright failure include latency‑tolerant background jobs or environments where uptime is critical.
References:
Kubelet/Kubernetes should work with Swap Enabled #53533 – https://github.com/kubernetes/kubernetes/issues/53533
Linux Performance: Why You Should Almost Always Add Swap Space – https://haydenjames.io/linux-performance-almost-always-add-swap-space/
Do we really need swap on modern systems? – https://www.redhat.com/en/blog/do-we-really-need-swap-modern-systems
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
