Big Data 4 min read

How Kafka Achieves Million-Message Writes: Inside Its Sequential Append & Page Cache Magic

This article explains how Apache Kafka attains million‑level write throughput by sequentially appending messages to log files, leveraging operating‑system page cache and asynchronous disk flushing, and details the step‑by‑step data flow from producer to broker to storage.

Mike Chen's Internet Architecture
Mike Chen's Internet Architecture
Mike Chen's Internet Architecture
How Kafka Achieves Million-Message Writes: Inside Its Sequential Append & Page Cache Magic

Apache Kafka is a high‑throughput, scalable, distributed message‑queue system originally built for LinkedIn activity streams and now widely used for real‑time log analysis, stream processing and other scenarios.

In high‑volume data‑stream processing, Kafka’s core advantage is its ability to write messages sequentially, appending them to .log files without seeking or modifying existing data.

Traditional HDDs suffer from seek time and rotational latency, making random writes much slower than sequential writes, which require minimal head movement.

When a producer sends messages to a Kafka broker’s partition, the broker appends them in order to the partition’s log file (Log Segment). New messages are always written at the file’s tail.

Kafka does not call fsync for each write; instead it relies on the operating‑system’s page cache, achieving a “sequential write + asynchronous flush” mechanism that balances performance and durability.

The page cache is a file‑system‑level cache: data written via write() first goes to kernel memory, then the OS asynchronously writes it to disk.

Data flow:

<ol><li>Producer</li><li>↓</li><li>Kafka Broker receives data</li><li>↓</li><li>Write to Partition’s LogSegment (*.log)</li><li>↓</li><li>Java <code>FileChannel</code> → mmap/buffer → <code>write()</code></li><li>↓</li><li>Data written to OS page cache</li><li>↓</li><li>Asynchronous flush to disk (fsync/flush/kernel scheduling)</li></ol>

When the broker writes messages to disk, the data first enters the page cache and is later flushed to the physical disk by the OS, a delayed‑write approach that boosts write throughput.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

KafkaMessage QueueHigh Throughputpage cache
Mike Chen's Internet Architecture
Written by

Mike Chen's Internet Architecture

Over ten years of BAT architecture experience, shared generously!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.