Fundamentals 78 min read

Understanding JDK NIO File I/O and Linux Kernel Mechanisms: Buffered vs Direct IO, Page Cache, and Dirty Page Management

This article provides a comprehensive analysis of how JDK NIO performs file read and write operations by examining the underlying Linux kernel mechanisms, including the differences between Buffered and Direct IO, the structure and management of the page cache, file readahead algorithms, and the kernel parameters governing dirty page writeback.

Refining Core Development Skills
Refining Core Development Skills
Refining Core Development Skills
Understanding JDK NIO File I/O and Linux Kernel Mechanisms: Buffered vs Direct IO, Page Cache, and Dirty Page Management

This article provides an in-depth exploration of how JDK NIO handles file read and write operations by tracing the execution path from the Java layer down to the Linux kernel.

The process begins with FileChannel and ByteBuffer, which ultimately trigger native system calls. The kernel distinguishes between two primary I/O modes: Buffered IO and Direct IO. Buffered IO leverages the page cache to store hot disk data in memory, significantly accelerating sequential reads through asynchronous readahead algorithms. In contrast, Direct IO bypasses the page cache entirely, transferring data directly between user-space buffers and disk via DMA, which is ideal for databases or random access patterns.

The page cache is managed internally using a radix_tree data structure, enabling efficient page lookups and state tracking. When reading, the kernel employs a sliding window readahead mechanism that dynamically adjusts based on access patterns. The core system call flow is defined as follows:

SYSCALL_DEFINE3(read, unsigned int, fd, char __user *, buf, size_t, count){
......省略......
}

For write operations, data is first copied to the page cache and marked as dirty. The kernel delays flushing these pages to disk to batch I/O operations. The timing of dirty page writeback is controlled by six key kernel parameters: dirty_background_ratio, dirty_background_bytes, dirty_ratio, dirty_bytes, dirty_writeback_centisecs, and dirty_expire_centisecs. These parameters balance data safety and system performance by dictating when the background flusher thread wakes up or when processes must synchronously flush data.

By analyzing the underlying C structures and system call flows, this guide clarifies the complete lifecycle of file I/O, offering developers actionable insights for optimizing Java applications and tuning Linux kernel settings.

File I/OLinux Kernelpage cachesystem callsBuffered IODirect IODirty Page WritebackJDK NIO
Refining Core Development Skills
Written by

Refining Core Development Skills

Fei has over 10 years of development experience at Tencent and Sogou. Through this account, he shares his deep insights on performance.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.