Fundamentals 18 min read

Why Disk I/O Speed Depends on Sequential Access: Page Cache, Scheduling, and B+Tree Insights

This article explains how disk I/O performance is shaped by page cache behavior, sequential versus random access, elevator scheduling, and storage data structures such as B+Tree and LSM trees, showing why operating systems and databases must design for sequential reads and writes to achieve optimal throughput.

G7 EasyFlow Tech Circle

Mar 20, 2019

Why Disk I/O Speed Depends on Sequential Access: Page Cache, Scheduling, and B+Tree Insights

Starting from Page Cache

Disk I/O performance is heavily influenced by the operating system's page cache. When an application calls read(), the kernel first checks if the required data block is already in the page cache. If it is, the data is copied directly to the user buffer and the call returns immediately. If not, the kernel issues a DMA read request to the disk, suspends the calling process, and resumes it only after the data has been loaded into the page cache, which typically takes hundreds of milliseconds.

For write(), the kernel also writes to the page cache first. If the target block is already cached, the data is copied into the cache and the call returns. If the block is not cached, the kernel may issue a read to bring the block into the cache before updating it. A background kernel thread (the flush thread) later writes the dirty pages to disk using an elevator (or “SCAN”) scheduling algorithm that minimizes head movement by serving requests in one direction before reversing.

What Is Sequential and What Is Random?

Sequential access means reading or writing blocks that are contiguous on the disk, allowing the elevator algorithm to serve them with minimal head movement. Random access jumps to unrelated blocks, causing the head to seek repeatedly and dramatically increasing latency.

Examples: on a 1 GB RAM machine with a 10 GB file and a 4 KB page cache block size, reading 1 byte sequentially from the start to the end triggers roughly 2.6 million disk reads, whereas random reads of scattered offsets may cause many more cache misses and additional disk reads.

Disk I/O Scheduling and the Elevator Algorithm

The physical nature of disks makes sequential access faster because the read/write head moves less. The elevator algorithm sorts pending I/O requests by cylinder number and services them in one direction, then reverses, similar to an elevator serving floors in order.

Why B+Tree Is the Dominant Index Structure

B+Tree stores many keys per node, matching the page cache block size (typically 4 KB). This reduces tree height compared to binary trees (RB‑Tree, AVL‑Tree), decreasing the number of random disk I/Os needed to locate a key. The ordered layout also enables efficient sequential scans.

Enterprise‑Grade SSDs vs. Consumer SSDs

Enterprise SSDs guarantee low‑latency fsync() and durable writes for workloads with heavy transactional logging (WAL, undo/redo). Consumer SSDs prioritize read performance and cost, making them suitable for read‑heavy or static‑content scenarios.

Kafka’s Disk‑Centric Design

Kafka treats log files as append‑only queues, exploiting fast sequential writes. It uses zero‑copy techniques (e.g., sendfile or Java NIO FileChannel.transferTo) to move data from page cache directly to the network socket, reducing user‑mode copies.

LSM‑Tree: LevelDB, HBase, and the Quest for Sequential I/O

Log‑Structured Merge (LSM) trees keep recent writes in an in‑memory ordered structure (e.g., RB‑Tree or SkipList). When flushed, they produce sorted immutable files on disk. Periodic compaction merges these files into larger sorted runs, ensuring that most disk I/O remains sequential.

Compaction is I/O‑intensive, so it must be scheduled during low‑traffic periods to avoid degrading throughput. Reading cold data can evict hot pages from the cache, causing latency spikes for latency‑sensitive services.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Kafka page cache B+Tree disk I/O LSM sequential access

Written by

G7 EasyFlow Tech Circle

Official G7 EasyFlow tech channel! All the hardcore tech, cutting‑edge innovations, and practical sharing you want are right here.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.