How Zero-Copy Supercharges RocketMQ & Kafka: mmap, sendfile, DMA
This article explains why zero‑copy techniques such as mmap, sendfile and DMA dramatically reduce CPU copies and context switches in traditional read/write I/O, improving the performance of high‑throughput systems like RocketMQ and Kafka.
Traditional IO
Traditional I/O relies on read() and write() system calls: data is read from disk into a kernel buffer, copied to a user buffer, then written from the user buffer to a socket buffer before reaching the network card.
The process incurs four user‑kernel context switches and four memory copies.
User process calls read(), switching from user mode to kernel mode.
DMA controller copies data from disk to the read buffer.
CPU copies data from the read buffer to the application buffer; read() returns.
User process calls write(), switching back to kernel mode.
CPU copies data from the application buffer to the socket buffer.
DMA controller moves data from the socket buffer to the network card; write() returns.
User space (process memory) and kernel space are isolated; switching between them is costly, especially under high concurrency.
DMA (Direct Memory Access) offloads data transfer from the CPU, allowing the controller to move data directly between memory and I/O devices, reducing CPU wait time.
Zero‑Copy
Zero‑copy means the CPU does not need to copy data between separate memory regions, saving CPU cycles and memory bandwidth during network transmission.
Zero‑copy does not eliminate all copies but reduces the number of user‑kernel transitions and CPU copies.
mmap+write
mmapreplaces the read() part, mapping the file directly into the process address space and sharing the kernel buffer, thus eliminating one CPU copy.
The operation still involves four context switches but only three copies.
User calls mmap(), switching to kernel mode.
DMA copies data from disk to the read buffer.
Kernel returns to user mode after mmap() completes.
User calls write(), switching back to kernel mode.
CPU copies data from the read buffer to the socket buffer.
DMA moves data from the socket buffer to the network card; write() returns.
Using mmap saves one CPU copy and reduces memory usage, making it suitable for large‑file transfers.
sendfile
Compared with mmap, sendfile reduces one CPU copy and also cuts two context switches.
sendfileis a Linux 2.1+ system call that transfers data entirely within kernel space, avoiding user‑space copies and eliminating the read+write overhead.
User calls sendfile(), switching to kernel mode.
DMA copies data from disk to the read buffer.
CPU copies data from the read buffer to the socket buffer.
DMA moves data from the socket buffer to the network card; sendfile returns.
Because the data never appears in user space, sendfile is ideal for static file servers.
sendfile+DMA Scatter/Gather
Since Linux 2.4, sendfile can use DMA Scatter/Gather, eliminating CPU copies entirely but requiring hardware support.
The process involves only two context switches and two DMA transfers, with no CPU copy.
User calls sendfile(), switching to kernel mode.
DMA scatter copies data from disk to fragmented read buffers.
CPU sends file descriptor and length to the socket buffer.
DMA gather moves data from kernel buffers to the network card. sendfile() returns, switching back to user mode.
This approach dramatically improves throughput but depends on newer hardware.
Application Scenarios
Both RocketMQ and Kafka employ zero‑copy. RocketMQ uses mmap+write for persisting messages, while Kafka uses mmap+write for persistence and sendfile for delivering messages to consumers.
Summary
CPU‑IO speed disparity led to DMA technology, which moves data without CPU intervention. Traditional read+write incurs two DMA copies, two CPU copies, and four context switches. mmap+write reduces this to two DMA copies, one CPU copy, and four switches, saving memory for large files. sendfile adds only two DMA copies, one CPU copy, and two switches, making it perfect for static file serving. sendfile+DMA Scatter/Gather eliminates CPU copies altogether, achieving the highest performance at the cost of hardware requirements.
Su San Talks Tech
Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
