Self‑Destructing Page Cache: Huawei’s ModelFS and vivo’s F2FS Uncached Buffer I/O Innovations
The article examines how Huawei’s ModelFS makes prefetch and memory reclamation programmable and how vivo adapts F2FS to support uncached buffer I/O, enabling AI model weights to be loaded without lingering in the page cache and eliminating kswapd overhead.
At the CLK2025 conference, Huawei engineer Huang Xiaojia highlighted that loading AI model files incurs long latency and high memory usage, and that the weight data does not need to remain in the page cache, allowing a "self‑destructing" cache behavior.
Huawei’s response is ModelFS, a file‑system design that turns prefetch and memory‑reclaim operations into programmable hooks. By registering prefetch and evict callbacks in user space, applications can implement NUMA‑aware algorithms and tailor caching policies to the observed I/O patterns.
vivo engineer Han Qi presented their work on enabling uncached buffer I/O in F2FS. Uncached buffer I/O bypasses the usual LRU queue and its reclamation cost, using a drop‑behind mechanism that discards folios immediately after use. Compared with fully cached buffer I/O, it avoids LRU overhead while still synchronizing reads and writes through the page cache, similar to Tencent’s approach of avoiding swap‑cache bypass.
Previous teams (Huang Ying, Chris Li, Kairui Song, Barry Song) had to apply various work‑arounds for synchronization issues. vivo’s contribution focuses on the write path: by breaking F2FS’s atomic context and using an asynchronous worker to drop folios, they achieve a "self‑destructing" behavior that reduces kswapd overhead to zero.
However, Han Qi also noted that because each folio drop requires a synchronous write, write performance can degrade, prompting further investigation.
Trend analysis suggests two directions: (1) programmable file‑system/memory‑management that lets user‑space customize reclamation and prefetch policies, echoing earlier work on programmable kernel memory management; (2) the kernel must address the massive I/O and memory demands of AI‑scale models, coordinating I/O, memory, and reclamation while being aware of NUMA and NPU workflows.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
