How Linux’s kswapd Thread Reclaims Memory: Inside the Kernel’s Reclamation Engine
This article explains how the Linux kernel’s kswapd thread is created, when it is awakened, and the detailed steps it follows—including LRU management, zone migration, balance_pgdat, and shrink functions—to reclaim memory efficiently on memory‑constrained devices.
Memory is a limited resource on mobile devices, and the Linux kernel uses aggressive reclamation techniques to keep applications responsive. When memory becomes scarce, the kernel relies on a dedicated thread called kswapd to free pages, swap anonymous pages, and trigger OOM killing if necessary.
Creating the kswapd Thread
During system initialization, the kernel creates one kswapdx thread per NUMA node (where x is the node ID). The init thread, started by start_kernel(), eventually calls kswapd_init() to launch the thread. Key fields such as kswapd_order, kswapd_wait, kswapd_failures, and kswapd_highest_zoneidx are stored in the node’s pg_data_t structure.
When kswapd Is Woken Up
If a task cannot allocate memory above the low watermark, the kernel calls wakeup_kswapd(). This occurs when alloc_pages_nodemask() fails with the ALLOC_WMARK_LOW flag and falls back to the slow‑path allocation.
kswapd Working Flow
The core of kswapd is the kswapd() function, which runs in a loop after performing several initial steps:
Bind the thread to the appropriate CPUs ( set_cpus_allowed_ptr()).
Set thread flags: PF_MEMALLOC, PF_SWAPWRITE, PF_KSWAPD.
Initialize kswapd_order and kswapd_highest_zoneidx from the slow‑path allocator.
During each iteration, kswapd attempts to sleep via kswapd_try_to_sleep(), which places the thread on pgdat->kswapd_wait. If memory pressure persists, the thread is woken again.
LRU Lists
The kernel maintains five LRU lists: LRU_INACTIVE_ANON, LRU_ACTIVE_ANON, LRU_INACTIVE_FILE, LRU_ACTIVE_FILE, and LRU_UNEVICTABLE. Pages are first placed on the active list; as they age they move to the inactive list, where they become candidates for reclamation.
LRU Aging and the Second‑Chance Algorithm
When a page is scanned, the kernel checks its PTE PTE_Young bit. If the bit is clear, the page can be reclaimed; if set, the page receives a second chance and is moved back to the active list. This logic is implemented in page_check_references() and the associated PAGEREF_* enums.
Zone → LRU Migration (Kernel 4.8+)
Before kernel 4.8, each zone had its own LRU lists protected by zone->lru_lock. Starting with kernel 4.8, LRUs are managed per node ( pgdat->lru_lock), simplifying synchronization and ensuring consistent aging across zones.
balance_pgdat()
The function balance_pgdat() drives the reclamation cycle. It builds a struct scan_control describing how many pages to scan, the allocation mask, and the priority. The priority field starts at 12 and decreases, causing larger scans as memory pressure grows. Helper functions such as pgdat_balanced(), zone_watermark_ok(), and high_wmark_pages() decide whether a zone is already balanced.
shrink_lruvec()
This function scans both active and inactive LRUs. It first calls get_scan_count() to compute how many pages to examine from each LRU based on the swappiness setting (default 60). Active pages are moved to the inactive list when the inactive list becomes too small; inactive pages are reclaimed if they are not referenced.
shrink_slab()
Kernel‑allocated caches such as inode and dentry caches are reclaimed via the slab shrinker infrastructure. Shrinkers are registered with register_shrinker() and provide count_objects and scan_objects callbacks. When memory is tight, do_shrink_slab() iterates over the global shrinker_list to free cache entries.
Conclusion
Memory reclamation in the Linux kernel is a complex, multi‑stage process that balances page aging, zone watermarks, and cache shrinkers. The kswapd thread orchestrates these mechanisms, and ongoing kernel development continues to refine the algorithms for better performance on both mobile devices and servers.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
OPPO Kernel Craftsman
Sharing Linux kernel-related cutting-edge technology, technical articles, technical news, and curated tutorials
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
