Why Zero-Copy Supercharges RocketMQ & Kafka: mmap, sendfile, DMA
Zero‑copy techniques such as mmap + write, sendfile, and sendfile + DMA scatter/gather reduce CPU‑memory copies and context switches compared with traditional read/write IO, improving performance for high‑throughput systems like RocketMQ and Kafka, while explaining user‑kernel mode, DMA, and their trade‑offs.
In interviews you often hear why RocketMQ and Kafka are fast and what mmap is; the common thread is zero‑copy.
Traditional IO
Traditional IO uses read() and write() system calls. Data is read from disk into a kernel buffer, copied to a user buffer, then written to a socket buffer and finally to the NIC, causing four user‑kernel context switches and four memory copies.
The process involves four user‑kernel context switches and four copies:
User process calls read(), switching from user to kernel mode.
DMA controller copies data from disk to the read buffer.
CPU copies data from the read buffer to the application buffer; read() returns, switching back to user mode.
User process calls write(), switching to kernel mode.
CPU copies data from the application buffer to the socket buffer.
DMA controller copies data from the socket buffer to the NIC; write() returns, switching back to user mode.
User mode runs user processes, kernel mode runs the kernel; they are isolated for safety, and context switches are costly.
DMA (Direct Memory Access) moves data between memory and I/O devices without CPU intervention, reducing CPU wait time.
Zero Copy
Zero‑copy means the CPU does not need to copy data between memory regions; it is typically used to save CPU cycles and memory bandwidth when transferring files over the network.
Zero‑copy does not eliminate all copies, but it reduces the number of user‑kernel switches and CPU copies.
mmap + write
mmapreplaces the read part of read + write, eliminating one CPU copy. The kernel and user buffers share the same address space.
The process involves four user‑kernel switches and three copies:
User process calls mmap(), switching to kernel mode.
DMA copies data from disk to the read buffer.
Context switches back to user mode, mmap() returns.
User process calls write(), switching to kernel mode.
CPU copies data from the read buffer to the socket buffer.
DMA copies data from the socket buffer to the NIC; write() returns. mmap saves one CPU copy and can halve memory usage for large files.
sendfile
sendfilealso removes one CPU copy and reduces two context switches compared with mmap. It transfers data entirely in kernel space.
The process involves two user‑kernel switches and three copies:
User process calls sendfile(), switching to kernel mode.
DMA copies data from disk to the read buffer.
CPU copies data from the read buffer to the socket buffer.
DMA copies data from the socket buffer to the NIC; sendfile returns, switching back to user mode. sendfile is suitable for static‑file servers because the data never reaches user space.
sendfile + DMA Scatter/Gather
Since Linux 2.4, sendfile can use DMA scatter/gather to eliminate CPU copies entirely, requiring hardware support.
The process involves two user‑kernel switches and two copies, with no CPU copy:
User process calls sendfile(), switching to kernel mode.
DMA scatter copies data from disk to the read buffer in a dispersed manner.
CPU sends the file descriptor and length to the socket buffer.
DMA gather copies data from the kernel buffer to the NIC. sendfile() returns, switching back to user mode.
Application Scenarios
Both RocketMQ and Kafka use zero‑copy. RocketMQ persists data with mmap+write; Kafka persists with mmap+write and sends data with sendfile.
Summary
Because CPU is much slower than I/O, DMA was introduced to move data without CPU involvement. Traditional read + write incurs two DMA copies, two CPU copies, and four context switches. mmap+write reduces one CPU copy (two DMA + one CPU, four switches). sendfile adds only two DMA copies, one CPU copy, and two switches, making it ideal for static file serving. sendfile+DMA scatter/gather eliminates CPU copies altogether, further improving performance but requiring hardware support.
Su San Talks Tech
Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
