How DMA and Zero‑Copy Revolutionize Linux I/O Performance
This article explains the principles of Direct Memory Access (DMA), compares it with traditional I/O, details the DMA data‑transfer workflow, and explores zero‑copy techniques such as mmap + write and sendfile, highlighting how they reduce context switches, data copies, and improve overall Linux I/O efficiency.
Direct Memory Access (DMA) offloads data movement from the CPU to a dedicated controller, allowing the CPU to perform other tasks while the controller transfers data between memory and devices.
Traditional I/O vs. DMA
Without DMA, every byte transferred between disk and memory requires the CPU to copy data, leading to multiple context switches and high CPU usage. Using DMA reduces the number of data copies and system calls, cutting the number of user‑kernel transitions.
DMA Transfer Process
The typical DMA‑based I/O flow is:
The user process calls read, causing the kernel to issue an I/O request and block the process.
The kernel forwards the request to the DMA controller.
The DMA controller commands the disk to read data into its internal buffer.
When the disk buffer is full, it interrupts the DMA controller.
The DMA controller copies the data from the disk buffer to the kernel buffer without CPU involvement.
After sufficient data is transferred, the DMA controller interrupts the CPU.
The CPU copies the data from the kernel buffer to user space, completing the system call and returning to user mode.
This workflow typically involves four data copies (two by the CPU) and four user‑kernel context switches.
Optimisation Strategies
To improve performance, the article suggests reducing the number of context switches and data copies. Since user space cannot directly access disks or NICs, system calls are required; minimizing their frequency is key.
Zero‑Copy Techniques
Two common zero‑copy approaches are:
mmap + write
sendfile
mmap + write
Using mmap maps a file directly into the process address space, eliminating the need for read to copy data from kernel to user space. The process then writes directly from the shared kernel buffer to the socket, with the CPU only moving data between kernel buffers and the NIC.
Performance impact: one fewer data copy compared with the traditional path, but still incurs two system calls and two context switches.
sendfile
The sendfile system call (available since Linux 2.1) replaces read and write, reducing one system call and two context switches. It transfers data from the kernel page cache to the socket buffer, and the NIC’s SG‑DMA moves the data to the network without CPU copying.
Resulting workflow: only two context switches and two data copies (disk → page cache → NIC), achieving at least a 2× speedup over traditional I/O.
2次数据拷贝,无CPU参与拷贝
1次系统调用
2 次用户态与内核态的上下文切换Kernel Page Cache Role
The page cache stores recently accessed data, provides read‑ahead (pre‑fetch) based on spatial locality, and allows the kernel I/O scheduler to merge requests, reducing disk seek time. However, large files can fill the cache, displacing hot small files and degrading performance.
Large‑File Transfer Recommendation
For large files, zero‑copy is less effective because the kernel cache becomes a bottleneck. The article recommends combining asynchronous I/O with direct I/O, which bypasses the page cache entirely, avoiding unnecessary copies and context switches.
Conclusion
DMA and zero‑copy techniques dramatically reduce CPU involvement, context switches, and data copies in Linux I/O, improving throughput. Nevertheless, zero‑copy cannot be used when the application needs to process data in user space or when transferring very large files, where async + direct I/O is preferable.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Liangxu Linux
Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
