Understanding Linux Swap: Partitions, Data Structures, Swap Out/In Processes, and Optimizations
Linux swap moves anonymous pages to a dedicated swap area—either a partition or a file—using structures such as swap_info_struct, swap_map and swp_entry_t, with a two‑pass swap‑out, per‑CPU slot cache, and SSD‑focused optimizations like clustering and readahead to improve performance.
Due to the large performance gap between memory and disk I/O, Linux uses free memory to cache disk data when memory is abundant, improving I/O speed. When memory becomes scarce, these caches are reclaimed and dirty pages are written back to disk. Anonymous pages such as heap and stack have no backing file, so they are swapped out to a dedicated swap area on disk.
Linux supports two forms of swap: a swap partition (swap disk) and a swap file . The former is a raw block device reserved for swapping, while the latter is a regular file stored on a filesystem, whose implementation depends on the underlying filesystem.
Figure 1: Two types of swap area.
A swap area is created with the mkswap command, which formats a swap partition or file. The area can then be enabled or disabled with swapon and swapoff . Current swap usage can be inspected via cat /proc/swaps or swapon -s .
Figure 2: Creation and usage of a swap area.
The kernel manages each swap area with a swap_info_struct structure. A swap area is divided into fixed‑size swap slots , and a swap_map records the usage of each slot (0 = free, >0 = occupied by one or more processes).
Figure 3: swap_info_struct layout.
When a memory page is reclaimed, it is assigned to a swap slot and its page‑table entry (PTE) is replaced with a swp_entry_t value. swp_entry_t is a 64‑bit field where bits 2‑7 store the swap type and bits 8‑57 store the slot offset within the partition. During a swap‑in, the kernel reads the PTE, extracts the slot, and fetches the data from disk.
Figure 4: swp_entry_t structure.
Swap‑out process
The kernel’s memory reclamation eventually reaches shrink_page_list , which walks the page‑list and attempts to reclaim pages. Anonymous pages undergo two shrink passes:
First shrink: the page is handed to add_to_swap , marked dirty, written back, and placed into the swap cache but not yet freed.
Second shrink: if the page is already clean, it is removed from the swap cache and the memory is released.
Figure 5: Swap‑out flow.
Swap slot cache
To accelerate slot allocation, the kernel maintains a per‑CPU swap slot cache consisting of an alloc and a free cache. This reduces contention on the global swap_map .
Figure 6: Swap slot cache structure.
SSD‑oriented optimizations
For SSD devices, the kernel introduces swap_cluster_info . A configurable number of consecutive slots (default is SWAPFILE_CLUSTER ) form a cluster . During swap‑out, the kernel searches for free slots at the cluster level, reducing lock contention and improving wear‑leveling on SSDs.
Figure 7: swap_cluster_info layout.
Swap‑in process
When a swapped‑out page triggers a page‑fault, the kernel’s do_swap_page locates the corresponding slot and reads the data back into memory.
Figure 8: Swap‑in flow.
The kernel also implements a readahead mechanism called swapin_readahead . Similar to normal I/O readahead, it pre‑fetches additional pages. On SSDs, both physical‑address‑based and VMA‑based readahead are available and can be toggled via /sys/kernel/mm/swap/vma_ra_enabled .
Figure 9: swapin_readahead illustration.
Conclusion
This article gave a concise overview of Linux’s swap mechanism, including partition/file creation, core data structures, the swap‑out/in workflows, and performance‑enhancing techniques such as the per‑CPU slot cache and SSD‑aware clustering. As SSDs become more prevalent, these optimizations significantly improve swap performance and open new avenues for further enhancement.
OPPO Kernel Craftsman
Sharing Linux kernel-related cutting-edge technology, technical articles, technical news, and curated tutorials
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.