Fundamentals 16 min read

Zero-Copy Data Transfer Mechanism: Principles, Implementations, and Applications in Java, Kafka, and Spark

This article explains the zero‑copy data transfer technique, compares it with traditional read/write approaches, shows Java NIO code examples, and discusses its use in high‑performance systems such as Kafka and Spark, highlighting the reductions in context switches and memory copies.

Big Data Technology & Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
Zero-Copy Data Transfer Mechanism: Principles, Implementations, and Applications in Java, Kafka, and Spark

Preface

Zero‑copy is an efficient data‑transfer mechanism that is widely used in low‑latency scenarios. This article first introduces traditional data‑transfer methods, then analyses the details of zero‑copy, and finally presents several practical applications. The operating‑system concepts referenced can be found in standard textbooks such as Silberschatz's Operating System Concepts and Tanenbaum's Modern Operating Systems .

Traditional Data Transfer Methods

In the Internet era, sending a file from one machine to another is commonplace. A typical Java implementation of the sender looks like the following:

Socket socket = new Socket(HOST, PORT);
InputStream inputStream = new FileInputStream(FILE_PATH);
OutputStream outputStream = new DataOutputStream(socket.getOutputStream());

byte[] buffer = new byte[4096];
while (inputStream.read(buffer) >= 0) {
    outputStream.write(buffer);
}

outputStream.close();
socket.close();
inputStream.close();

Although the code appears simple, the OS‑level operations are much more involved, consisting of the following steps:

JVM issues a read() system call, causing a mode switch from user space to kernel space.

The file is read from storage (e.g., disk) into a kernel‑space buffer via DMA.

The kernel copies the data from its buffer to a user‑space buffer, then returns from the read() call.

JVM issues a write() system call, switching back to kernel space.

The kernel copies the user buffer to the socket’s kernel buffer.

The data is finally transmitted by the NIC through DMA, and the write() call returns.

The process involves four context switches and four data copies, which can be visualised with the following sequence diagram:

Two simplified diagrams illustrate the same process without the low‑level system‑call details.

Because context switches are CPU‑intensive and data copies are I/O‑intensive, the traditional approach is inefficient for even simple transfers.

Zero‑Copy Data Transfer Methods

Basic Zero‑Copy Mechanism

The second and third copies (kernel‑to‑user and user‑to‑kernel) are unnecessary; data can be moved directly from the kernel buffer to the socket buffer. Most Unix‑like systems provide the sendfile() system call, described in its man page as:

sendfile() copies data between one file descriptor and another.
Because this copying is done within the kernel, sendfile() is more efficient than the combination of read(2) and write(2), which would require transferring data to and from user space.

The following sequence diagram shows the zero‑copy flow:

In Java NIO, the corresponding API is FileChannel.transferTo(). The concrete implementation resides in sun.nio.ch.FileChannelImpl and relies on native code.

A zero‑copy version of the sender can be written as:

SocketAddress socketAddress = new InetSocketAddress(HOST, PORT);
SocketChannel socketChannel = SocketChannel.open();
socketChannel.connect(socketAddress);

File file = new File(FILE_PATH);
FileChannel fileChannel = new FileInputStream(file).getChannel();
fileChannel.transferTo(0, file.length(), socketChannel);

fileChannel.close();
socketChannel.close();

The simplified flow diagram for this method is:

Zero‑copy reduces the number of copies to three and context switches to two, greatly improving efficiency, though further optimisation is possible.

Support for Scatter/Gather DMA

Traditional block DMA requires source and destination physical addresses to be contiguous, forcing an extra copy. Scatter/Gather DMA uses a list of non‑contiguous descriptors, allowing the hardware to transfer all fragments with a single interrupt, eliminating the extra copy.

The complete zero‑copy flow with Scatter/Gather is illustrated below:

Support for Memory‑Mapped Files (mmap)

When data needs to be modified during transfer, zero‑copy alone is insufficient. Many OSes provide mmap() / munmap() to map files into kernel address space, allowing direct manipulation before flushing back. The sequence diagram is:

Although mmap reduces copies, it still incurs four context switches and additional TLB overhead, so it must be used judiciously. Java NIO offers MappedByteBuffer for mmap support, while DirectByteBuffer allocates off‑heap memory.

Applications of Zero‑Copy

Zero‑copy is widely adopted in many frameworks. Below are brief examples from Kafka and Spark.

In Kafka

Kafka achieves high throughput through several design choices, including zero‑copy. The transport layer defines a transferFrom() method that delegates to FileChannel.transferTo() when possible:

/**
 * Transfers bytes from `fileChannel` to this `TransportLayer`.
 *
 * This method will delegate to {@link FileChannel#transferTo(long, long, java.nio.channels.WritableByteChannel)},
 * but it will unwrap the destination channel, if possible, in order to benefit from zero copy. This is required
 * because the fast path of `transferTo` is only executed if the destination buffer inherits from an internal JDK
 * class.
 *
 * @param fileChannel The source channel
 * @param position The position within the file at which the transfer is to begin; must be non‑negative
 * @param count The maximum number of bytes to be transferred; must be non‑negative
 * @return The number of bytes, possibly zero, that were actually transferred
 * @see FileChannel#transferTo(long, long, java.nio.channels.WritableByteChannel)
 */
long transferFrom(FileChannel fileChannel, long position, long count) throws IOException;

The concrete implementation simply calls fileChannel.transferTo():

@Override
public long transferFrom(FileChannel fileChannel, long position, long count) throws IOException {
    return fileChannel.transferTo(position, count, socketChannel);
}

This method is invoked in FileRecords.writeTo() to write buffered data to the destination channel without extra copies.

In Spark

Spark uses zero‑copy during shuffle spill merging. The configuration spark.file.transferTo (enabled by default) controls whether transferTo is used. The utility method copyFileStreamNIO demonstrates the technique:

def copyFileStreamNIO(
    input: FileChannel,
    output: FileChannel,
    startPosition: Long,
    bytesToCopy: Long): Unit = {
  val initialPos = output.position()
  var count = 0L
  // In case transferTo method transferred less data than we have required.
  while (count < bytesToCopy) {
    count += input.transferTo(count + startPosition, bytesToCopy - count, output)
  }
  assert(count == bytesToCopy,
    s"request to copy $bytesToCopy bytes, but actually copied $count bytes.")

  val finalPos = output.position()
  val expectedPos = initialPos + bytesToCopy
  assert(finalPos == expectedPos,
    s"""
       |Current position $finalPos do not equal to expected position $expectedPos
       |after transferTo, please check your kernel version to see if it is 2.6.32,
       |this is a kernel bug which will lead to unexpected behavior when using transferTo.
       |You can set spark.file.transferTo = false to disable this NIO feature.
       |""".stripMargin)
}

This method copies data from one FileChannel to another using zero‑copy, enabling efficient shuffle file merging.

Copyright Notice: This article is authored by the "import_bigdata" public account and may not be reproduced without permission.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

performanceKafkaZero CopyOperating SystemSparkJava NIOData Transfer
Big Data Technology & Architecture
Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.