How Zero‑Copy, MMAP, and Off‑Heap Memory Boost Java I/O Performance

This article explains the concepts of virtual memory, kernel and user modes, DMA, and why data must move between kernel and user spaces, then compares standard I/O with zero‑copy techniques, MMAP, and off‑heap buffers, showing how they reduce copy operations and improve Java server performance.

Java High-Performance Architecture
Java High-Performance Architecture
Java High-Performance Architecture
How Zero‑Copy, MMAP, and Off‑Heap Memory Boost Java I/O Performance

Preface

Zero‑copy, MMAP, and off‑heap memory are often confusing.

Before understanding these concepts, you need to know:

Virtual memory: separates logical user memory from physical memory.

Kernel mode: the kernel has special privileges and can communicate with device controllers, controlling user‑space processes.

User mode: non‑privileged area where code cannot directly access hardware devices.

DMA (Direct Memory Access): allows peripheral devices to read/write system memory without CPU involvement, reducing CPU load.

DMA takes over data read/write work, so the CPU does not need to handle I/O interrupts.

Question: Why copy data from kernel space to user space?

Hardware usually cannot access user space directly.

Block devices operate on fixed‑size blocks, while user requests may be of arbitrary size or misaligned.

The kernel acts as an intermediary, breaking down and recombining data between storage and user processes.

Question: Why can only kernel mode directly access physical memory?

The separation protects system programs.

(1) Standard I/O

Basic I/O : unbuffered I/O functions such as read() and write(). Standard I/O : adds streams and buffers on top of basic I/O, using functions like fopen(), getc(), putc(). Page Cache is used to improve read/write efficiency and protect disks.

Scenario: reading data from a file and sending it over the network.

The process involves four data copies :

First copy: user calls read(), context switches to kernel, DMA reads data into Page Cache.

Second copy: data moves from Page Cache to user buffer, then returns to the user process.

Third copy: user calls send(), context switches back to kernel, data copies from user buffer to socket buffer.

Fourth copy: after send() returns, data is asynchronously copied from socket buffer to the protocol engine.

Question: Why is Page Cache needed?

It acts as a cache, enabling pre‑reading of file data and improving I/O performance.

(2) Zero‑Copy

Can the number of copies be reduced? Yes, by using zero‑copy.

In Linux, the sendfile() system call transfers data directly between file descriptors, achieving zero‑copy.

Java also supports zero‑copy via NIO FileChannel methods: transferTo(): transfers data from a FileChannel directly to another channel. transferFrom(): transfers data from a channel into a FileChannel.

Scenario: reading a file and sending it over the network.

The process now involves three data copies :

First copy: DMA reads data into Page Cache.

Second copy: CPU copies data from Page Cache to socket buffer.

Third copy: DMA transfers data from socket buffer to the NIC.

Case: Kafka Log Writing

Kafka also uses zero‑copy when writing data.

In Kafka’s source, MemoryRecords.writeTo calls FileChannel.transferTo():

public class FileRecords extends AbstractRecords implements Closeable {<br/>    @Override<br/>    public long writeTo(GatheringByteChannel destChannel, long offset, int length) throws IOException {<br/>        long newSize = Math.min(channel.size(), end) - start;<br/>        int oldSize = sizeInBytes();<br/>        if (newSize < oldSize)<br/>            throw new KafkaException(String.format("Size of FileRecords %s has been truncated during write: old size %d, new size %d", file.getAbsolutePath(), oldSize, newSize));<br/>        long position = start + offset;<br/>        int count = Math.min(length, oldSize);<br/>        final long bytesTransferred;<br/>        if (destChannel instanceof TransportLayer) {<br/>            TransportLayer tl = (TransportLayer) destChannel;<br/>            bytesTransferred = tl.transferFrom(channel, position, count);<br/>        } else {<br/>            // key point<br/>            bytesTransferred = channel.transferTo(position, count, destChannel);<br/>        }<br/>        return bytesTransferred;<br/>    }<br/>}<br/>

Further Reduction: Two Copies

Can we further reduce kernel copies?

Since Linux 2.4, socket buffers can store descriptors (file descriptors). With DMA Gather Copy support, the kernel can avoid one CPU copy.

DMA reads the file into Page Cache, then instead of copying to the socket buffer, it appends length and offset information to the socket buffer; DMA then reads directly from the kernel buffer to the protocol engine, eliminating one CPU copy.

(3) MMAP

MMAP maps a file into a process’s virtual address space.

Creates a one‑to‑one mapping between file disk addresses and a region of virtual memory.

Applications can access the file by reading/writing the mapped memory.

Benefits:

User processes treat file data as memory, avoiding read() / write() system calls.

Page faults automatically load data from disk; modified pages are marked dirty and flushed back.

The OS’s virtual memory subsystem caches pages intelligently.

Data is always page‑aligned, removing buffer copies.

Large files can be accessed without consuming large RAM.

MMAP file operation steps:

MMAP creates a new virtual memory region for the process.

Establishes mapping between file disk address and virtual memory (no copy yet).

When the process accesses the memory, a page fault may occur; the OS copies the required page from disk into memory.

Case: RocketMQ Log Writing

MMAP has size limits (≈1.5‑2 GB). RocketMQ sets CommitLog file size to 1 GB and ConsumeQueue to 5.72 MB.

When tranisentStorePoolEnable is false, messages go to page cache first; when true, they go to off‑heap memory and then to FileChannel .

(4) Off‑Heap Memory

When performing I/O inside the JVM, data must be copied to off‑heap memory before a system call.

Why can’t the OS use JVM heap memory directly?

Two reasons: The OS does not recognize the JVM heap layout; it cannot read/write it as it would regular memory. Object addresses may change due to garbage collection, making direct I/O unsafe.

Developers use NIO DirectBuffer to allocate off‑heap memory. Ordinary buffers allocate on the heap; DirectBuffers have higher allocation/destruction cost, so they are usually reused. Their memory is released via Java’s reference mechanism when the wrapper is garbage‑collected.

Case: Netty

Netty performs I/O using off‑heap memory, avoiding copies between the JVM heap and native memory.

Summary

Source: juejin.cn/post/7126195733471952910

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

JavaKafkaLinuxmmapZero CopyOff-HeapI/O performance
Java High-Performance Architecture
Written by

Java High-Performance Architecture

Sharing Java development articles and resources, including SSM architecture and the Spring ecosystem (Spring Boot, Spring Cloud, MyBatis, Dubbo, Docker), Zookeeper, Redis, architecture design, microservices, message queues, Git, etc.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.