Fundamentals 8 min read

Multithreaded Parallel Writeback: Vivo’s Exploration of Page Cache Write Acceleration

The article examines Linux's page‑cache writeback mechanism, explains why the single‑threaded writeback path becomes a bottleneck under heavy writes, and details Vivo's multithreaded writeback patches—including inode‑to‑context mapping and sysfs‑controlled thread counts—that achieve up to 2.4 GB/s on XFS and a 22 % speedup on F2FS, while also discussing fragmentation trade‑offs and optimal thread‑to‑allocation‑group ratios.

Linux Kernel Journey

Nov 10, 2025

Multithreaded Parallel Writeback: Vivo’s Exploration of Page Cache Write Acceleration

Background: Linux Page‑Cache Writeback

When a file is written via buffered I/O, Linux stores data in the page cache and flushes dirty pages only after either a timeout expires or the amount of dirty data exceeds a space threshold. Under light write loads the kernel silently writes back data, but a burst of writes can fill the dirty page pool, causing the writing process to block.

Prior Work and Limitations

Community research showed that using large folios for writeback can dramatically increase throughput (e.g., 800 MB/s vs. 2.4 GB/s on XFS/NVMe) because a single thread iterates over larger units. However, this approach does not address the fundamental bottleneck that each block device has only one writeback thread.

Parallel Writeback Patchset

Kundan Kumar’s patchset introduced multiple writeback contexts per block device, turning the single dirty‑inode list into several lists processed by a thread pool whose default size equals the number of CPUs. This multithreaded design removes the serialisation point of a single writeback thread.

Vivo’s Extensions

Building on the above, Vivo’s contributors Wang Yufei and Zhang Xirui focused on two enhancements, both implemented for XFS:

Allow the filesystem to bind specific inodes to particular writeback contexts instead of using a simple modulo mapping, reducing cross‑thread allocation‑group lock contention.

Expose the number of writeback threads via a sysfs entry (

/sys/class/bdi/<major>:<minor>/nwritebacks

) so that the count can be tuned independently of CPU count.

Experimental Results

QEMU tests with an emulated 20 GB NVMe SSD (8 CPU cores, 4 GB RAM) showed that setting the writeback thread count equal to the number of allocation groups (AGs) yields the best performance for XFS. On real hardware, Samsung’s measurements reported writeback speeds rising from 800 MB/s to 2.4 GB/s when multiple threads are used.

For F2FS on UFS devices, the authors observed a 22 % performance gain with their parallel writeback implementation.

Fragmentation Trade‑offs

The authors note that binding each inode to a single writeback context can reduce filesystem fragmentation caused by concurrent writes to the same inode, but it also eliminates parallelism for that inode, which may hurt workloads that heavily write a single file.

Even with the single‑inode‑per‑context rule, the final experiments indicated that multithreaded writeback still increased fragmentation overall, suggesting further investigation is needed.

Conclusion

Multithreaded writeback can substantially improve write throughput on modern SSDs and filesystems, provided that the number of writeback threads is tuned to the allocation‑group layout and that inode‑to‑context mapping is managed to balance performance against fragmentation.

References

[1] https://lwn.net/Articles/1024402/

[2] https://sourceforge.net/p/linux-f2fs/mailman/linux-f2fs-devel/thread/20251014120845.2361-1-kundan.kumar%40samsung.com/

[3] https://blog.linuxnews.dev/p/parallelizing-linux-writeback

[4] https://patchew.org/linux/[email protected]/

[5] https://lore.kernel.org/linux-fsdevel/[email protected]/

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Linux multithreading page cache FileSystem F2FS writeback XFS

Written by

Linux Kernel Journey

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.