Fundamentals 11 min read

How Zero-Copy Supercharges RocketMQ & Kafka: mmap, sendfile, DMA

This article explains why zero‑copy techniques such as mmap, sendfile and DMA dramatically reduce CPU copies and context switches in traditional read/write I/O, improving the performance of high‑throughput systems like RocketMQ and Kafka.

Su San Talks Tech
Su San Talks Tech
Su San Talks Tech
How Zero-Copy Supercharges RocketMQ & Kafka: mmap, sendfile, DMA

Traditional IO

Traditional I/O relies on read() and write() system calls: data is read from disk into a kernel buffer, copied to a user buffer, then written from the user buffer to a socket buffer before reaching the network card.

The process incurs four user‑kernel context switches and four memory copies.

User process calls read(), switching from user mode to kernel mode.

DMA controller copies data from disk to the read buffer.

CPU copies data from the read buffer to the application buffer; read() returns.

User process calls write(), switching back to kernel mode.

CPU copies data from the application buffer to the socket buffer.

DMA controller moves data from the socket buffer to the network card; write() returns.

User space (process memory) and kernel space are isolated; switching between them is costly, especially under high concurrency.

DMA (Direct Memory Access) offloads data transfer from the CPU, allowing the controller to move data directly between memory and I/O devices, reducing CPU wait time.

Zero‑Copy

Zero‑copy means the CPU does not need to copy data between separate memory regions, saving CPU cycles and memory bandwidth during network transmission.

Zero‑copy does not eliminate all copies but reduces the number of user‑kernel transitions and CPU copies.

mmap+write

mmap

replaces the read() part, mapping the file directly into the process address space and sharing the kernel buffer, thus eliminating one CPU copy.

The operation still involves four context switches but only three copies.

User calls mmap(), switching to kernel mode.

DMA copies data from disk to the read buffer.

Kernel returns to user mode after mmap() completes.

User calls write(), switching back to kernel mode.

CPU copies data from the read buffer to the socket buffer.

DMA moves data from the socket buffer to the network card; write() returns.

Using mmap saves one CPU copy and reduces memory usage, making it suitable for large‑file transfers.

sendfile

Compared with mmap, sendfile reduces one CPU copy and also cuts two context switches.

sendfile

is a Linux 2.1+ system call that transfers data entirely within kernel space, avoiding user‑space copies and eliminating the read+write overhead.

User calls sendfile(), switching to kernel mode.

DMA copies data from disk to the read buffer.

CPU copies data from the read buffer to the socket buffer.

DMA moves data from the socket buffer to the network card; sendfile returns.

Because the data never appears in user space, sendfile is ideal for static file servers.

sendfile+DMA Scatter/Gather

Since Linux 2.4, sendfile can use DMA Scatter/Gather, eliminating CPU copies entirely but requiring hardware support.

The process involves only two context switches and two DMA transfers, with no CPU copy.

User calls sendfile(), switching to kernel mode.

DMA scatter copies data from disk to fragmented read buffers.

CPU sends file descriptor and length to the socket buffer.

DMA gather moves data from kernel buffers to the network card. sendfile() returns, switching back to user mode.

This approach dramatically improves throughput but depends on newer hardware.

Application Scenarios

Both RocketMQ and Kafka employ zero‑copy. RocketMQ uses mmap+write for persisting messages, while Kafka uses mmap+write for persistence and sendfile for delivering messages to consumers.

Summary

CPU‑IO speed disparity led to DMA technology, which moves data without CPU intervention. Traditional read+write incurs two DMA copies, two CPU copies, and four context switches. mmap+write reduces this to two DMA copies, one CPU copy, and four switches, saving memory for large files. sendfile adds only two DMA copies, one CPU copy, and two switches, making it perfect for static file serving. sendfile+DMA Scatter/Gather eliminates CPU copies altogether, achieving the highest performance at the cost of hardware requirements.

LinuxDMAmmapsendfileIO performancezero-copy
Su San Talks Tech
Written by

Su San Talks Tech

Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.