How Zero‑Copy Eliminates CPU Bottlenecks for High‑Performance I/O
This article explains the evolution of data transfer in computers—from early CPU‑bound copying to DMA, channel architectures, and I/O processors—then demonstrates how zero‑copy techniques like sendFile and memory‑mapped I/O dramatically reduce CPU usage and improve throughput.
Preface
"Zero‑copy" is a technique widely used in open‑source components such as Kafka, RocketMQ, Netty, and Nginx. This article shares essential knowledge about zero‑copy.
Data Transfer in Computers
Before discussing zero‑copy, it is useful to review how data moves within a computer system.
Early Stage
In the early stage, connections were scattered, work was serial, and the CPU acted like a babysitter, manually reading data from I/O interfaces and copying it to main memory. This process required multiple CPU‑controlled steps and occupied the CPU throughout the transfer.
CPU initiates the I/O device.
CPU repeatedly polls the device to check readiness.
When the device signals readiness, the CPU reads data from the I/O interface.
CPU then copies the data to main memory.
This approach is inefficient because the CPU is tied up for the entire transfer.
Interface Module and DMA Stage
Interface Module
Early computers used point‑to‑point wiring (distributed connections), making I/O expansion difficult. Buses were introduced to provide a common transmission channel for multiple devices.
In this model, data exchange uses interrupt‑driven programming: the CPU starts the I/O device, then does other work; the device later interrupts the CPU when ready, and the CPU reads the data.
DMA
Even with interrupts, the CPU is still involved during data transfer. Direct Memory Access (DMA) adds a dedicated data path between main memory and the I/O device, allowing the transfer to occur without further CPU intervention.
Channel‑Based Architecture
For large systems, assigning a dedicated DMA interface to each I/O device raises hardware cost and complexity, and the CPU must manage many DMA channels. A channel processor—a specialized processor subordinate to the CPU—was introduced to handle I/O and memory exchanges, freeing the CPU and improving resource utilization.
I/O Processor Stage
The fourth stage adds an I/O processor (or peripheral processor) that works independently of the host CPU, handling I/O control, formatting, and error correction. This further increases parallelism and reduces CPU involvement.
Summary
The evolution of data transfer consistently aims to reduce CPU occupancy and improve resource utilization.
Data Copy Techniques
Traditional Copy
Using Java, a typical implementation copies data through multiple stages: two DMA operations and two CPU interrupts, resulting in four copies and context switches, which lowers performance but is simple to develop.
CPU commands DMA to move data from disk to kernel buffer.
CPU interrupt copies data from kernel buffer to application buffer.
CPU copies data from application buffer to socket buffer.
DMA copies data from socket buffer to NIC buffer.
Advantages: low development cost, suitable for low‑performance requirements.
Disadvantages: multiple context switches and CPU usage lead to poor performance.
Zero‑Copy with sendFile
Zero‑copy means sending a file over the network without copying its contents to user space; the transfer stays in kernel space. In Java NIO, FileChannel.transferTo() invokes the OS sendFile operation.
The process:
Calling sendFile triggers DMA to copy disk data to a kernel buffer.
DMA completion generates an interrupt; the CPU copies data from the kernel buffer to the socket buffer, and sendFile returns.
DMA then moves data from the socket buffer to the NIC buffer.
This eliminates copying data into the application buffer, reducing the number of copies to three and removing CPU involvement in the final transfer.
Modern kernels (e.g., Linux 2.4) further optimize by storing buffer descriptors in the socket buffer, allowing the CPU to be completely bypassed during the transfer, achieving maximal efficiency.
Memory‑Mapped I/O (mmap)
When an application needs to read or modify the file data, memory‑mapping can be used. The OS maps a file directly into the process’s address space, turning file reads/writes into memory accesses. In Java, MappedByteBuffer provides this capability.
Conclusion
Zero‑copy reduces CPU usage, allowing the processor to focus on business logic rather than data movement. By understanding the underlying mechanisms—DMA, channel processors, and memory‑mapped I/O—developers can choose the most efficient data transfer method for their applications.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Backend Technology
Focus on Java-related technologies: SSM, Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading. Occasionally cover DevOps tools like Jenkins, Nexus, Docker, and ELK. Also share technical insights from time to time, committed to Java full-stack development!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
