Understanding Linux I/O: From Application Buffers to Disk Writes
This article explains the Linux I/O stack, detailing how data moves from user‑space buffers through libc and page cache to the disk, covering system calls, synchronization primitives, scheduler behavior, consistency, safety, and performance considerations.
1. Introduction
Linux I/O is the foundation of file storage. This article summarizes basic Linux I/O concepts.
2. Linux I/O Stack
Linux file I/O uses a layered design, providing clear architecture and functional decoupling.
From a general‑purpose and performance perspective, Linux adopts a compromise solution for everyday disk writes. For example:
void foo() {
char *buf = malloc(MAX_SIZE);
strncpy(buf, src, MAX_SIZE);
fwrite(buf, MAX_SIZE, 1, fp);
fclose(fp);
}The buf corresponds to the application buffer. After fwrite, the OS copies data to the libc (standard I/O) buffer. If the process exits before fclose, the data in the libc buffer is lost because it never reaches the disk. fclose only flushes the libc buffer to the page cache; to ensure data reaches the disk, the kernel buffer must also be flushed, e.g., using sync or fsync. fflush flushes only to the page cache, not to the disk. sync schedules writes but does not guarantee they have completed; the actual write to the physical medium is handled by the disk controller.
When using write directly, data is copied from the application buffer to the page cache via a system call, which incurs a user‑to‑kernel mode switch. The write is asynchronous; the kernel decides when to issue the actual disk I/O. Adding O_SYNC makes the write synchronous.
To bypass the page cache and write directly to the device, open the file with O_DIRECT. To write directly to disk sectors, use raw device access (e.g., dd).
3. I/O Call Chain
fwriteis the highest‑level interface. It buffers small writes in user space, merges them, and eventually calls write once. write copies data from the application buffer to the kernel, triggers a user‑kernel transition, and places the data in the page cache. The kernel does not immediately forward the data to the disk; a background pdflush thread writes dirty pages to the I/O queue, where the I/O scheduler decides when to issue the actual disk operation.
4. I/O Scheduler Layer
Tasks entering the I/O scheduler queue are reordered to maximize overall throughput, often using an elevator algorithm that moves the disk head in one direction before reversing. Linux provides several scheduler algorithms such as noop, deadline, and cfq. For SSDs, which lack mechanical seek latency, noop is typically preferred.
After leaving the scheduler, the request reaches the driver layer, which uses DMA to transfer data to the disk cache.
5. Consistency and Safety
Data may reside in multiple caches before reaching the disk. If a process exits, data in the application or libc buffer is lost, but data already in the page cache survives. If the kernel crashes, any data not yet in the disk cache is lost. Power loss discards all data.
Process exit: application/libc buffers lost; page cache persists.
Kernel crash: data not in disk cache lost.
Power loss: all data lost.
When multiple file descriptors write to the same file, writes can overwrite each other because each descriptor has its own file offset. Opening the file with O_APPEND (or using O_SYNC) ensures each write appends at the current end of the file, preventing overlap.
6. Performance Issues
Disk seek time is around 10 ms, limiting the number of seeks per second. Rotational speed also impacts throughput; a 15 000 rpm drive makes about 500 rotations per second, but the head cannot keep up with every rotation.
Typical sequential write speeds are 0–30 MB/s for HDDs and up to 400 MB/s for SSDs; sequential reads can reach 0–50 MB/s on HDDs and higher on SSDs.
References
http://blog.chinaunix.net/uid-27105712-id-3270102.html?page=2
https://zhuanlan.zhihu.com/p/138371910
https://meik2333.com/posts/linux-many-proc-write-file/
https://blog.csdn.net/qq_43648751/article/details/104151401
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Open Source Linux
Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
