Fundamentals 11 min read

Why Zero-Copy Supercharges RocketMQ & Kafka: mmap, sendfile, DMA

Zero‑copy techniques such as mmap + write, sendfile, and sendfile + DMA scatter/gather reduce CPU‑memory copies and context switches compared with traditional read/write IO, improving performance for high‑throughput systems like RocketMQ and Kafka, while explaining user‑kernel mode, DMA, and their trade‑offs.

Su San Talks Tech
Su San Talks Tech
Su San Talks Tech
Why Zero-Copy Supercharges RocketMQ & Kafka: mmap, sendfile, DMA

In interviews you often hear why RocketMQ and Kafka are fast and what mmap is; the common thread is zero‑copy.

Traditional IO

Traditional IO uses read() and write() system calls. Data is read from disk into a kernel buffer, copied to a user buffer, then written to a socket buffer and finally to the NIC, causing four user‑kernel context switches and four memory copies.

The process involves four user‑kernel context switches and four copies:

User process calls read(), switching from user to kernel mode.

DMA controller copies data from disk to the read buffer.

CPU copies data from the read buffer to the application buffer; read() returns, switching back to user mode.

User process calls write(), switching to kernel mode.

CPU copies data from the application buffer to the socket buffer.

DMA controller copies data from the socket buffer to the NIC; write() returns, switching back to user mode.

User mode runs user processes, kernel mode runs the kernel; they are isolated for safety, and context switches are costly.

DMA (Direct Memory Access) moves data between memory and I/O devices without CPU intervention, reducing CPU wait time.

Zero Copy

Zero‑copy means the CPU does not need to copy data between memory regions; it is typically used to save CPU cycles and memory bandwidth when transferring files over the network.

Zero‑copy does not eliminate all copies, but it reduces the number of user‑kernel switches and CPU copies.

mmap + write

mmap

replaces the read part of read + write, eliminating one CPU copy. The kernel and user buffers share the same address space.

The process involves four user‑kernel switches and three copies:

User process calls mmap(), switching to kernel mode.

DMA copies data from disk to the read buffer.

Context switches back to user mode, mmap() returns.

User process calls write(), switching to kernel mode.

CPU copies data from the read buffer to the socket buffer.

DMA copies data from the socket buffer to the NIC; write() returns. mmap saves one CPU copy and can halve memory usage for large files.

sendfile

sendfile

also removes one CPU copy and reduces two context switches compared with mmap. It transfers data entirely in kernel space.

The process involves two user‑kernel switches and three copies:

User process calls sendfile(), switching to kernel mode.

DMA copies data from disk to the read buffer.

CPU copies data from the read buffer to the socket buffer.

DMA copies data from the socket buffer to the NIC; sendfile returns, switching back to user mode. sendfile is suitable for static‑file servers because the data never reaches user space.

sendfile + DMA Scatter/Gather

Since Linux 2.4, sendfile can use DMA scatter/gather to eliminate CPU copies entirely, requiring hardware support.

The process involves two user‑kernel switches and two copies, with no CPU copy:

User process calls sendfile(), switching to kernel mode.

DMA scatter copies data from disk to the read buffer in a dispersed manner.

CPU sends the file descriptor and length to the socket buffer.

DMA gather copies data from the kernel buffer to the NIC. sendfile() returns, switching back to user mode.

Application Scenarios

Both RocketMQ and Kafka use zero‑copy. RocketMQ persists data with mmap+write; Kafka persists with mmap+write and sends data with sendfile.

Summary

Because CPU is much slower than I/O, DMA was introduced to move data without CPU involvement. Traditional read + write incurs two DMA copies, two CPU copies, and four context switches. mmap+write reduces one CPU copy (two DMA + one CPU, four switches). sendfile adds only two DMA copies, one CPU copy, and two switches, making it ideal for static file serving. sendfile+DMA scatter/gather eliminates CPU copies altogether, further improving performance but requiring hardware support.

LinuxDMAmmapsendfileIO performancezero-copy
Su San Talks Tech
Written by

Su San Talks Tech

Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.