Fundamentals 8 min read

Exploring eBPF‑Based Programmable Memory Management in the Linux Kernel

This article examines recent efforts to make Linux kernel memory management programmable with eBPF, covering BPF‑MM patches for mTHP order, cache‑ext’s customizable LRU, FetchBPF prefetch policies, and BPF OOM hooks, and discusses their design, implementation details, and performance impacts.

Linux Kernel Journey
Linux Kernel Journey
Linux Kernel Journey
Exploring eBPF‑Based Programmable Memory Management in the Linux Kernel

eBPF provides significant flexibility for customizing kernel scheduler policies, and this article continues the exploration by focusing on programmable memory management.

In the past year, both academia and industry have targeted programmable memory management to adjust strategies such as mTHP, LRU, prefetching, and OOM based on specific workloads.

Yafang Shao submitted the mm, bpf: BPF‑MM, BPF‑THP patchset, which introduces a struct bpf_thp_ops allowing a BPF program to specify the allocation order for mTHP large pages. The hook modifies thp_vma_allowable_orders(), enabling per‑load decisions for anon page faults, swap‑in, and other paths.

This approach offers greater flexibility compared to the previous sysfs interface for controlling mTHP order, which was less adaptable and criticized by the community.

At SOSP ’25, researchers from Columbia University and IBM presented "cache_ext: Customizing the Page Cache with eBPF"[2][3], which hooks the LRU eviction algorithm to provide custom behavior.

The provided eviction‑list API mirrors familiar LRU operations such as add/remove folios, marking access, and iterating over folios. Using this API, a typical LFU policy can be implemented as shown:

When the policy loads, lfu_policy_init() creates a new eviction list. During insertion, lfu_folio_added() uses list_add() to append the folio and initializes its frequency to 1 in an internal map. On access, lfu_folio_accessed() increments the frequency. During eviction, lfu_evict_folios() iterates over N nodes, invoking score_lfu() to score each folio by access frequency, selects the lowest‑scoring C candidates, and adds them to ctx->candidates. Non‑selected folios are moved to the list tail, and after successful reclamation, lfu_folio_removed() cleans up metadata.

The paper reports that customizing LRU improves performance for several workloads, as illustrated:

At USENIX ATC 2024, researchers from the University of British Columbia and Wake Forest presented "FetchBPF: Customizable Prefetching Policies in Linux with eBPF"[4], which defines three helpers— bpf_prefetch_physical_page, bpf_prefetch_virtual_page, and bpf_<start/stop>_block_plug —to rapidly prototype prefetch algorithms and compare their effectiveness across different load patterns.

Roman Gushchin contributed the mm: BPF OOM patchset[5], which hooks two points: the Pressure Stall Information (PSI) subsystem via int bpf_handle_psi_event(struct psi_trigger *t), allowing a BPF program to assess system state and decide on OOM events; and the OOM killer via int bpf_handle_out_of_memory(struct oom_control *oc), enabling custom kill decisions while falling back to the default killer if memory cannot be freed.

Discussion continues about which parts of memory management are suitable for eBPF instrumentation. The upcoming LPC conference will feature a call for papers titled "Where to use eBPF in MM, and where not," reflecting the community’s optimism about programmable memory management.

The author likens kernel work to "building bridges and paving roads" versus "setting up stalls and doing business." Providing eBPF‑based customization for the memory manager is seen as foundational infrastructure—necessary before any commercial exploitation can occur.

References:

[1] https://lore.kernel.org/linux-mm/[email protected]/

[2] https://dl.acm.org/doi/pdf/10.1145/3731569.3764820

[3] https://github.com/cache-ext/cache_ext

[4] https://www.usenix.org/system/files/atc24-cao.pdf

[5] https://lore.kernel.org/linux-mm/[email protected]/

[6] https://lore.kernel.org/linux-mm/[email protected]/

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

memory managementeBPFLRULinux kernelOOMprefetchmTHP
Linux Kernel Journey
Written by

Linux Kernel Journey

Linux Kernel Journey

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.