Zero‑Copy Explained: From write+read to mmap, sendfile and splice
This article breaks down zero‑copy data transfer techniques—write+read, mmap+write, sendfile, sendfile + SG‑DMA, and splice—showing how they reduce CPU copies and context switches to boost I/O performance in modern operating systems.
1. write+read data transfer principle
Traditional I/O uses a write call to send data to the NIC and a read call on the receiver side. Both operations involve multiple steps: user‑space to kernel‑space switches, DMA copies from disk to kernel buffers, and CPU copies between kernel and user buffers, resulting in four context switches and four data copies (2 DMA, 2 CPU).
These repeated copies and switches limit throughput.
2. mmap + write
mmap maps a file directly into the process address space, allowing the kernel and user buffers to share the same physical memory. When data is read, the kernel copies the file to a kernel buffer via DMA, but because of the shared mapping, no additional CPU copy to user space is needed. The write call then moves data from the kernel buffer to the socket buffer, followed by a DMA copy to the NIC. This method still incurs four context switches and three copies (2 DMA, 1 CPU), improving efficiency compared with the traditional approach.
3. sendfile
sendfile, introduced in Linux 2.1, is a system call that transfers a file directly from disk to the NIC. The process switches to kernel mode, DMA copies the file to a kernel buffer, the kernel copies the data to the socket buffer (CPU copy), and a final DMA copy moves it to the NIC. This reduces the number of context switches to two and the copies to three (2 DMA, 1 CPU), offering a modest performance gain over mmap+write.
4. sendfile + SG‑DMA
Linux 2.4 added SG‑DMA support, allowing the NIC to gather data from non‑contiguous memory regions. With sendfile + SG‑DMA, after the initial DMA copy to the kernel buffer, the kernel sends the descriptor and length directly to the socket buffer, and the NIC pulls the data from the kernel buffer via SG‑DMA, eliminating the CPU copy to the socket buffer. This results in only two context switches and two DMA copies (no CPU copy), achieving true zero‑copy.
5. splice
splice, added in Linux 2.6.17, creates a pipe between the kernel file cache and the socket buffer, avoiding any CPU copy. The entire transfer involves only two context switches and two DMA copies (no CPU copy), representing the most efficient zero‑copy implementation.
Summary
All zero‑copy techniques rely on at least two DMA copies; they differ mainly in how many CPU copies and context switches they eliminate. Implementations include mmap+write, sendfile, sendfile + SG‑DMA, and splice. Support varies across operating systems—only NIO and epoll‑based transfers currently expose zero‑copy in Java. Messaging systems such as RocketMQ and Kafka use these methods (RocketMQ uses mmap+write for both producers and consumers; Kafka uses mmap+write for producers and sendfile for consumers). Java NIO’s MappedByteBuffer uses mmap, while FileChannel’s transferTo/transferFrom use sendfile.
Lobster Programming
Sharing insights on technical analysis and exchange, making life better through technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.