Understanding the Linux I/O Stack and File Read Process
This article explains how a seemingly simple read‑of‑one‑byte in user code triggers a complex Linux I/O stack involving the I/O engine, system calls, VFS, page cache, file system implementations, generic block layer and I/O scheduler, and clarifies when actual disk I/O occurs and its granularity.
Hello, I'm Fei! In everyday development many of us use high‑level file‑read functions without truly understanding what happens inside the kernel.
Reading a single byte raises two questions: does it cause disk I/O, and if so, how large is that I/O?
To answer, we need to look inside the Linux I/O stack.
1. Linux I/O Stack Overview
The stack consists of several layers that cooperate to satisfy a read request.
1.1 I/O Engine
Functions like read , write , pread , and pwrite select a synchronous I/O engine that ultimately relies on lower‑level kernel components.
1.2 System Call
When a user program invokes a system call, the kernel wraps lower‑level functionality and exposes it to user space.
1.3 VFS (Virtual File System)
VFS provides a uniform interface to different file‑system implementations. Its core structures are superblock , inode , dentry , and file .
struct file {
...
const struct file_operations *f_op;
}
struct file_operations {
...
ssize_t (*read)(struct file *, char __user *, size_t, loff_t *);
ssize_t (*write)(struct file *, const char __user *, size_t, loff_t *);
...
}VFS only defines abstract function pointers; concrete file systems (e.g., ext4) provide the actual implementations.
1.4 Page Cache
The page cache is a pure‑memory cache that stores recently accessed disk pages (typically 4 KB). If the requested block is already cached, the kernel copies data from the cache to user space without any disk I/O.
1.5 File System
Specific file‑system drivers (ext4, XFS, ZFS, etc.) implement the VFS operations. For example, ext4 defines ext4_file_operations where read maps to do_sync_read .
const struct file_operations ext4_file_operations = {
.llseek = ext4_llseek,
.read = do_sync_read,
.write = do_sync_write,
...
};1.6 Generic Block Layer
This layer offers a uniform block‑device interface, abstracting away differences between physical devices. It creates bio structures that represent I/O requests.
1.7 I/O Scheduler
The scheduler orders pending block requests to maximize throughput (e.g., deadline, cfq, noop). For SSDs, the simple noop scheduler is often sufficient.
2. The File‑Read Process
A detailed diagram (omitted here) shows the flow from the user‑level read call through the system call, VFS, page cache, generic block layer, scheduler, and finally to the disk.
3. Revisiting the Initial Questions
Does reading one byte cause disk I/O? If the data is already in the page cache, no disk I/O occurs; the kernel simply copies from memory.
If disk I/O does happen, how large is it? The kernel works with blocks larger than a single byte: page cache pages (4 KB), file‑system blocks (commonly 4 KB), and disk sectors (typically 512 bytes). Even a request for one byte will trigger a read of at least one sector, and usually an entire page (4 KB) is transferred.
Additional caching layers (disk‑internal cache, RAID controller cache) may further absorb the request before the mechanical arm moves.
4. Conclusion
The operating system abstracts away the complexity, giving you the illusion of reading a single byte while performing many behind‑the‑scenes operations to optimize performance. Understanding these mechanisms helps you diagnose performance issues when they arise.
For deeper study, refer to Chapter 14 of "Understanding the Linux Kernel".
Refining Core Development Skills
Fei has over 10 years of development experience at Tencent and Sogou. Through this account, he shares his deep insights on performance.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.