Fundamentals 8 min read

Linux Write Operation: How Data Flows from User Space to Disk

This article explains how Linux write operations work internally, detailing how data flows from user space through the kernel's page cache to disk, including the timing of actual disk writes and kernel parameters that control writeback behavior.

Refining Core Development Skills
Refining Core Development Skills
Refining Core Development Skills
Linux Write Operation: How Data Flows from User Space to Disk

This article provides a comprehensive analysis of how Linux write operations work internally, focusing on the common case where files are written without O_DIRECT or O_SYNC flags. The author begins by explaining that Linux kernel write operations are extremely complex, with source code now spanning millions of lines.

The article traces the write operation flow starting from a simple C code example that writes a single byte to a file. The author explains that most write operations don't actually write directly to disk but instead write to the Page Cache in memory for performance reasons. This is because mechanical hard drives have very slow random access times (milliseconds), while memory access is much faster.

The write operation flow is illustrated through a diagram showing how data moves through various kernel layers including the VFS layer, ext4 filesystem, and finally to the block layer. The key point is that after the write system call returns, data typically remains in the Page Cache as dirty pages rather than being written to disk.

The article then explains three scenarios when data actually gets written to disk:

1. When the proportion of dirty pages exceeds the dirty_ratio or dirty_bytes threshold, causing the write call to block until writeback completes.

2. When a background kernel worker thread periodically checks if dirty pages exceed the dirty_background_ratio or dirty_background_bytes threshold and initiates writeback.

3. When dirty pages have been in memory longer than the dirty_expire_centisecs timeout (default 30 seconds), triggering writeback regardless of the dirty page ratio.

The author provides systemtap code examples to trace kernel writeback operations and shows how to check and modify kernel parameters like dirty_ratio, dirty_background_ratio, and dirty_expire_centisecs through /proc/sys/vm/ or /etc/sysctl.conf.

The conclusion emphasizes that this write-back mechanism prioritizes performance over data safety - if power is lost before dirty pages are written to disk, data can be lost. For critical applications requiring guaranteed persistence, the author recommends using fsync instead.

File I/OLinux kernelpage cacheext4 filesystemfsynckernel parameterssystemtapWrite OperationsWriteback
Refining Core Development Skills
Written by

Refining Core Development Skills

Fei has over 10 years of development experience at Tencent and Sogou. Through this account, he shares his deep insights on performance.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.