How TencentOS Engineers Revamped Linux Swap for 5‑20% Performance Gains
This article translates and consolidates three LWN analyses of the Linux swap subsystem modernization led by TencentOS kernel engineer Kairui Song, detailing the introduction of swap tables, removal of the swap map, virtual swap concepts, code changes, performance improvements of up to 20 % and the broader impact on the kernel community.
The Linux kernel’s swap subsystem has long been a complex and performance‑critical part of memory management. Over the past 18 months, TencentOS kernel engineer Kairui Song and collaborators have undertaken a systematic redesign that introduces a new swap table, removes the legacy swap map, and explores virtual swap concepts.
Background
Traditional swap handling relied on the XArray‑based swapper_spaces structure and a per‑slot swap_map byte array, leading to high lock contention and memory overhead. The first phase, merged into Linux 6.18, replaced the XArray with a swap table and introduced swap_cluster_info to improve locality and reduce metadata usage.
1. Introducing the Swap Table
Each swap slot is now represented by a 64‑bit (or 32‑bit on 32‑bit arches) entry in a dynamically allocated array:
typedef struct { unsigned long val; } swp_entry_t;The array pointer is stored in the new table field: atomic_long_t __rcu *table; This design eliminates the need for the XArray lookup, reduces per‑slot memory from ~30 % to a few bytes, and allows the kernel to allocate the array lazily – only when a cluster is actually used.
Swap Entry Layout
Bits in the entry encode the slot state:
0 – empty slot (NULL)
1 – shadow entry for a swapped‑out folio (high bits store reference count)
2 – resident folio (high bits store PFN)
3 – unused pointer entry
4 – bad slot marker
By moving reference‑count tracking into the table, the separate swap map can be removed entirely.
2. Removing the Swap Map
The legacy swap_map was a unsigned char * array storing per‑slot usage counts and special bits such as SWAP_HAS_CACHE (0x40). Its removal eliminates the extra byte‑per‑slot overhead and the complex bit‑lock logic used for swap‑in synchronization.
Performance measurements reported by the author show throughput and RPS improvements of roughly 5‑20 % after the first phase, mainly due to the elimination of XArray lookups and reduced lock contention.
3. Virtual Swap and GhostSwap
Beyond the swap table, the community is exploring virtual swap layers. Meta proposes a virtual swap space that abstracts physical devices, while TencentOS introduced a Virtual GhostSwap implementation based on Google’s GhostSwap idea. Both approaches use a unified swp_desc structure:
struct swp_desc {
union {
swp_slot_t slot;
struct zswap_entry *zswap_entry;
};
union {
struct folio *swap_cache;
void *shadow;
};
unsigned int swap_count;
unsigned short memcgid:16;
bool in_swapcache:1;
enum swap_type type:2;
};This structure can represent a real device slot, a zero‑filled page, a zswap entry, or a resident folio, enabling flexible migration between devices and eliminating the need to scan page tables when a swap device is removed.
Design Trade‑offs
The virtual‑swap design increases per‑entry memory from 8 bytes to up to 32 bytes and adds complexity, but it simplifies device removal and supports tiered swap configurations (e.g., fast NVMe tier + slower HDD tier) as proposed by Youngjun Park.
4. Community Impact
The patches have been reviewed on the Linux‑MM mailing list, with contributions from Google’s Chris Li and Meta’s Pham. Discussions cover performance regressions, memory‑usage concerns, and the interaction with existing subsystems such as zswap and swap‑cgroup. The work also paves the way for future extensions, including integrating memory‑controller limits into the swap table.
5. Future Outlook
Stage 3 of the swap‑table project aims to eliminate the remaining swap‑map responsibilities entirely. Additional work on MGLRU page‑reclaim logic shows up to 30 % performance gains on HDD‑bound workloads and significant OOM reductions. All patches are awaiting final integration into the mainline kernel.
Tencent Technical Engineering
Official account of Tencent Technology. A platform for publishing and analyzing Tencent's technological innovations and cutting-edge developments.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
