Why Traditional Linux System Call I/O Involves Four Copies and How to Optimize It
The article explains the traditional Linux read/write system‑call I/O path, detailing the multiple CPU and DMA copies, context switches, page‑cache mechanisms, and advanced techniques such as zero‑copy, multiplexing, and direct I/O for performance optimization.
Traditional System Call I/O
In Linux, the classic way to access files or network sockets is through the write() and read() system calls. read() copies data from the kernel into a user buffer, and write() copies data from a user buffer to a network port.
The traditional I/O flow involves two CPU copies, two DMA copies, a total of four copies, and four context switches.
CPU copy : Data is moved directly by the CPU, occupying CPU resources.
DMA copy : The CPU commands the DMA controller to move data, freeing the CPU after the transfer.
Context switch : The CPU switches from user mode to kernel mode when a system call is made and back again when it returns.
Read Operation
If the requested data is already in the process's page cache, it is read directly from memory. Otherwise the data is first loaded into the kernel's read buffer and then copied to the user page.
read(file_fd, tmp_buf, len);A traditional read triggers two context switches, one DMA copy, and one CPU copy.
The read flow:
User process calls read(), switching from user space to kernel space.
CPU uses DMA to move data from memory or disk into the kernel's read buffer.
CPU copies data from the read buffer to the user buffer.
Context switches back to user space and the call returns.
Write Operation
When an application calls write(), data is first copied from the user page cache to the kernel's socket buffer, then from the socket buffer to the NIC for transmission.
A traditional write triggers two context switches, one CPU copy, and one DMA copy.
The write flow:
User process calls write(), switching to kernel space.
CPU copies data from the user buffer to the kernel's socket buffer.
CPU uses DMA to move data from the socket buffer to the NIC.
Context switches back to user space and the call returns.
Network I/O
Disk I/O
High‑performance I/O optimizations include:
Zero‑copy techniques.
Multiplexing (e.g., epoll).
PageCache usage.
PageCache is the OS cache for file data, reducing disk I/O by keeping file pages in memory. It enables near‑memory speed for sequential reads/writes.
PageCache read strategy : When a read request arrives, the kernel first checks if the data is in the page cache. If present, it reads directly from the cache; otherwise it schedules a disk I/O to load the needed pages into the cache.
If the data is in the cache, the disk is bypassed.
If not, the kernel reads a few pages from disk into the cache, then serves the request.
PageCache write strategy : A write() first writes to the page cache and marks the page as “dirty”. A background flusher later writes dirty pages to disk, triggered by low memory, dirty‑page age, or explicit sync() / fsync() calls.
Storage I/O stack diagram:
The Linux I/O stack has three layers below the system‑call interface:
File‑system layer: the kernel copies user data into the file‑system cache.
Block layer: manages block‑device queues, merges and schedules I/O requests.
Device layer: DMA directly interacts with the device.
Understanding this stack helps relate Buffered I/O, mmap, and Direct I/O to their positions in the hierarchy.
Traditional Buffered I/O reads a “cold” file by creating a page cache entry, scheduling block‑device I/O, using DMA to fill the cache, and finally copying data to the user buffer—two copies in total. mmap maps the page cache directly into user space, eliminating the second copy.
Direct I/O bypasses the page cache, moving data directly between user buffers and the block device via DMA, reducing copies for writes and improving write throughput. Reads can be faster for the first access but may lose the cache benefit for subsequent reads.
Both mmap and Direct I/O require page‑aligned buffers; Direct I/O also demands I/O sizes that are multiples of the underlying block size.
Standard I/O buffering in the C library adds a user‑space buffer to reduce the number of expensive system calls. Functions like fflush and setbuf control this buffer.
Linux also distinguishes between Page Cache (file content) and Buffer Cache (raw block device data). Page Cache is tied to the file system, while Buffer Cache stores device blocks independent of any file system.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
