Zero‑Copy Data Transfer: Principles, Mechanisms, and Applications in Kafka and Spark
This article explains the traditional copy‑based data transmission process, introduces the zero‑copy technique—including basic sendfile(), scatter/gather DMA and mmap support—shows how it reduces context switches and copies, and demonstrates its practical use in Kafka and Spark for high‑throughput workloads.
Introduction
Zero‑copy is an efficient data‑transfer mechanism widely used in low‑latency scenarios. The article first presents the traditional approach, then analyses the details of zero‑copy, and finally describes its applications.
Traditional Data Transfer Method
In Java the naïve way to send a file over a socket looks like this:
Socket socket = new Socket(HOST, PORT);
InputStream inputStream = new FileInputStream(FILE_PATH);
OutputStream outputStream = new DataOutputStream(socket.getOutputStream());
byte[] buffer = new byte[4096];
while (inputStream.read(buffer) >= 0) {
outputStream.write(buffer);
}
outputStream.close();
socket.close();
inputStream.close();The process involves multiple system calls and data copies:
JVM issues a read() system call, causing a user‑to‑kernel context switch.
Data is read from storage into a kernel buffer via DMA.
The kernel copies data from its buffer to a user‑space buffer and returns to the JVM.
The JVM issues a write() system call, another context switch.
Data is copied from the user buffer to the socket’s kernel buffer.
The kernel sends the data to the NIC via DMA, then returns to user space.
This results in four context switches and four copies, which severely degrades performance.
Zero‑Copy Data Transfer Methods
Basic Zero‑Copy Mechanism
Most Unix‑like systems provide the sendfile() system call, which copies data directly between two file descriptors inside the kernel, eliminating the user‑space copy:
sendfile() copies data between one file descriptor and another. Because this copying is done within the kernel, sendfile() is more efficient than the combination of read(2) and write(2), which would require transferring data to and from user space.
In Java NIO the equivalent API is FileChannel.transferTo(), which ultimately invokes a native implementation that uses sendfile() when available.
SocketAddress socketAddress = new InetSocketAddress(HOST, PORT);
SocketChannel socketChannel = SocketChannel.open();
socketChannel.connect(socketAddress);
File file = new File(FILE_PATH);
FileChannel fileChannel = new FileInputStream(file).getChannel();
fileChannel.transferTo(0, file.length(), socketChannel);
fileChannel.close();
socketChannel.close();This reduces the number of copies to three and context switches to two.
Scatter/Gather Support
Scatter/Gather DMA uses a list of non‑contiguous memory descriptors, allowing the hardware to transfer all fragments with a single interrupt, thus avoiding the extra copy from kernel buffer to socket buffer.
Memory‑Mapped (mmap) Support
Using mmap() the file can be mapped directly into the kernel address space, enabling in‑place modifications before the data is flushed. This still incurs four context switches and additional TLB overhead, so it must be used judiciously.
Applications of Zero‑Copy
In Kafka
Kafka’s transport layer defines a transferFrom() method that ultimately calls FileChannel.transferTo() to move message data from disk to the socket without extra copies.
long transferFrom(FileChannel fileChannel, long position, long count) throws IOException;The implementation in PlaintextTransportLayer simply forwards to fileChannel.transferTo(). This zero‑copy path is used by FileRecords.writeTo() to write received data to the destination channel.
In Spark
Spark’s shuffle write phase can use zero‑copy via the spark.file.transferTo configuration (default true). The Bypass‑Merge‑Sort shuffle writer calls a utility method that repeatedly invokes FileChannel.transferTo() to merge spill files efficiently.
def copyFileStreamNIO(
input: FileChannel,
output: FileChannel,
startPosition: Long,
bytesToCopy: Long): Unit = {
var count = 0L
while (count < bytesToCopy) {
count += input.transferTo(count + startPosition, bytesToCopy - count, output)
}
}This method moves data between file channels without copying data into user space, greatly improving shuffle performance.
Overall, zero‑copy reduces both the number of memory copies and context switches, providing significant performance gains for high‑throughput systems such as Kafka and Spark.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
