Evolution from Radix Tree to XArray in the Linux Kernel
The Linux kernel’s page‑cache indexing migrated from the classic radix‑tree structure—using shift‑based multi‑way nodes, RCU‑protected insert, lookup, and delete operations—to the newer XArray API introduced in 4.20, which retains the radix layout while offering automatic resizing, built‑in locking, richer marking, and a cleaner, safer interface.
This article explains the evolution of the kernel data structure used for the page cache, from the traditional radix tree to the newer XArray implementation introduced in Linux 4.20.
1. Page Cache – When a CPU needs to read a file from disk, the file data is first copied into RAM. The kernel uses a portion of free memory, called the page cache, to store these copies. The address_space structure manages all pages cached for a file, and i_mapping in the inode points to this structure. The page cache works on a demand‑paging basis, similar to CPU hardware caches.
2. Radix Tree Basics – A radix tree is a multi‑way search tree that stores key‑value pairs using a long integer key. It is used in the kernel for page cache indexing, IDR, and other subsystems. Important fields of a radix‑tree node include shift , offset , count , exceptional , parent , slots (a pointer array), and tags . The tree depth is determined by RADIX_TREE_MAP_SHIFT (usually 6, giving 64 slots per node).
3. Radix Tree Operations
Insertion – The kernel exports EXPORT_SYMBOL(radix_tree_insert); . The insertion algorithm creates the necessary nodes based on the index, then stores the entry in the appropriate slot.
EXPORT_SYMBOL(radix_tree_insert);
// ① Create tree nodes according to the index
// ② Insert the entry into the node's slotLookup – The lookup routine walks the tree from the root, shifting the index according to the node's shift value until it reaches a leaf slot. If the index is out of the current tree range, the item is not present.
// Example lookup flow:
// ① Compute max index range; if index > max, not found
// ② Descend nodes using radix_tree_descend (shift operations)
// ③ Return the entry found in the leaf slotDeletion – Deletion uses EXPORT_SYMBOL(radix_tree_delete_item); . It first looks up the entry, clears the slot, updates the slot count, and may shrink the tree if a node becomes empty.
EXPORT_SYMBOL(radix_tree_delete_item);
// ① Find the entry via lookup
// ② Clear the slot (set to NULL) and decrement count
// ③ If the node becomes empty, call radix_tree_shrink to reduce height4. RCU Mechanism – Radix trees are protected by Read‑Copy‑Update (RCU) to allow lock‑free reads. Readers use rcu_read_lock() / rcu_read_unlock() , and updates are performed on a new copy of the data before publishing the new pointer with memory barriers.
5. XArray Replacement – XArray is a redesigned API that keeps the underlying radix‑tree layout but presents it as an automatically resizing array. It provides built‑in locking, removes the pre‑load mechanism, and adds a richer marking system (up to three marks per entry). Example XArray functions:
void xa_init(struct xarray *xa);
void *xa_store(struct xarray *xa, unsigned long index, void *entry, gfp_t gfp);
void *xa_erase(struct xarray *xa, unsigned long index);
void *xa_cmpxchg(struct xarray *xa, unsigned long index, void *old, void *entry, gfp_t gfp);
void *xa_load(struct xarray *xa, unsigned long index);The lookup path converts an xarray node to an xa_state , then uses xas_descend (similar to radix_tree_descend ) to shift the index and locate the entry.
6. Maple Tree – The article also mentions the maple tree, another kernel data structure that offers a simple API and good performance for contiguous ranges. It is intended to replace both red‑black trees and radix trees in some subsystems.
7. Conclusion – XArray has largely replaced the radix tree for page‑cache management, providing a cleaner API and better RCU safety. The kernel source ( include/linux/radix-tree.h , include/linux/xarray.h ) can be consulted for the exact definitions.
OPPO Kernel Craftsman
Sharing Linux kernel-related cutting-edge technology, technical articles, technical news, and curated tutorials
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.