Understanding Linux Zero‑Copy Techniques: DMA, sendfile, mmap and Direct I/O
This article explains how Linux zero‑copy mechanisms such as DMA, sendfile, mmap and Direct I/O reduce data copies and context switches, detailing their operation, advantages, limitations, and real‑world usage in systems like Kafka and MySQL.
Hello, I am a top architect.
Note: Except for Direct I/O, all file read/write operations that involve disks use the page cache technique.
1. Four Copies and Four Context Switches of Data
When an application handles a client request, it typically performs the following system calls:
File.read(file, buf, len);
Socket.send(socket, buf, len);
For example, the Kafka message middleware reads a batch of messages from disk and writes them unchanged to the network interface card (NIC) for transmission.
Without any optimization, the operating system performs four data copies and four context switches, as shown in the diagram below:
If no optimization is applied, reading data from disk and then sending it through the NIC yields poor performance.
Four copies
Physical device ↔ Memory: CPU moves data from disk to the kernel‑space Page Cache. CPU moves data from the kernel‑space socket buffer to the network.
Internal memory copies: CPU moves data from the kernel Page Cache to the user‑space buffer. CPU moves data from the user‑space buffer to the kernel socket buffer.
Four context switches
read system call: user → kernel.
read returns: kernel → user.
write system call: user → kernel.
write returns: kernel → user.
These copies and switches are often a source of performance complaints.
2. Data Copies with DMA Involvement
DMA (Direct Memory Access) uses a dedicated controller (DMAC) to transfer data between memory and I/O devices without CPU intervention, acting as a co‑processor.
DMAC is especially valuable when transferring large or very fast data streams, or when transferring very small, slow data streams, because the CPU would otherwise become a bottleneck.
When DMA is used, the CPU only issues control signals while the DMAC handles the actual data movement.
Now the DMAC replaces the CPU for memory‑disk and memory‑NIC transfers, while the CPU remains the controller.
DMAC cannot handle internal device‑to‑device copies; the CPU still performs kernel‑space ↔ user‑space copies.
3. Zero‑Copy Techniques
3.1 What Is Zero‑Copy?
Zero‑copy is a design concept where the CPU does not need to copy data from one memory region to another; instead, the data is transferred directly between the source and destination, often within the kernel.
Zero‑copy does not mean that no copying occurs at all; it means the CPU is not responsible for the entire copy process. When data is not already in memory, the CPU may still manage the transfer while a DMA controller performs the actual copy.
Common implementations include:
sendfile
mmap
Direct I/O
splice
3.2 sendfile
sendfile is used when data read from disk does not need any processing before being sent over the network, such as in message‑queue scenarios.
Traditional I/O requires four copies and four context switches; sendfile reduces this to two copies and two context switches by leveraging DMA and passing file descriptors.
Key techniques used by sendfile:
DMA technology.
Passing file descriptors instead of copying data.
By using DMA, sendfile reduces two CPU‑controlled copies, as illustrated:
DMA moves data from disk to the kernel Page Cache and from the kernel socket buffer to the NIC.
Passing the file descriptor avoids an extra copy because both the Page Cache and socket buffer reside in kernel space and the data is not modified during transfer.
Note: Only NICs that support SG‑DMA (Scatter‑Gather DMA) can avoid the extra kernel‑space copy.
Because sendfile performs a single system call (instead of separate read and write), the number of user‑kernel context switches drops from four to two.
Limitation: If the application needs to modify the data (e.g., encryption), sendfile cannot be used because the data never reaches user space.
3.3 mmap
mmap maps a file directly into the process’s address space, allowing the application to read or write the file without explicit read/write system calls. It relies on DMA and address‑mapping to avoid data copies between kernel and user space.
Reference: https://spongecaptain.cool/SimpleClearFileIO/3.%20mmap.html
3.4 Direct I/O
Direct I/O bypasses the page cache, allowing user‑space buffers to communicate directly with the disk or NIC via DMA. This eliminates kernel buffering for data transfer, though metadata (inode cache, dentry cache) still uses the kernel cache.
Advantages:
Reduces kernel‑level buffering overhead, improving performance for large data transfers.
Avoids kernel‑space ↔ user‑space copies, similar to other zero‑copy techniques.
Disadvantages:
Requires page pinning, which adds overhead and forces applications to manage a persistent memory pool.
Read operations always hit the disk, which can be slow for small or infrequent accesses.
Application‑level complexity increases because the program must handle its own caching.
Typical users include self‑caching applications such as database management systems. For example, MySQL can use O_DIRECT to perform Direct I/O.
Self‑caching applications maintain their own cache in user space and therefore prefer Direct I/O to avoid kernel page cache interference.
4. Typical Cases
4.1 Kafka
Kafka uses mmap (via java.nio.MappedByteBuffer ) for persisting incoming messages and sendfile for delivering messages to consumers. The combination provides high‑throughput persistence and zero‑copy network transmission, allowing multiple consumers to share the same page‑cached data.
4.2 MySQL
MySQL’s I/O path is more complex due to SQL processing, but it also leverages Direct I/O (O_DIRECT) for certain workloads to bypass the page cache and achieve predictable performance.
5. Summary
DMA enables the CPU to issue only control signals while the DMAC performs the actual data movement between memory and devices such as disks and NICs.
Linux zero‑copy techniques can be categorized as:
Eliminating or reducing user‑kernel copies: mmap, sendfile, splice – data stays within the kernel.
Bypassing the kernel with Direct I/O: user‑space communicates directly with hardware via DMA.
Optimizing transfers between kernel buffers and user buffers: techniques that reduce CPU copy overhead while still using the kernel.
These strategies improve performance for high‑throughput services such as Kafka, databases, and other I/O‑intensive applications.
Top Architect
Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.