How Kafka Achieves Million‑Message Throughput: Sequential Writes, Page Cache, Batching & Zero‑Copy
The article explains how Kafka attains high‑throughput performance by using sequential disk writes, leveraging the OS page cache, employing producer and consumer batching with configurable parameters, and utilizing zero‑copy sendfile to minimize CPU and memory overhead, enabling stable million‑message per second rates.
Sequential Write
Kafka achieves high throughput by writing all data to disk sequentially. In high‑concurrency scenarios, messages are appended to the end of log files, avoiding random‑write overhead. This allows even massive data volumes to be written stably and efficiently.
Sequential writes maximize mechanical disk performance because the read/write head moves in one direction, eliminating seek time. Write speeds can reach several hundred MB/s, approaching SSD performance.
Page Cache
Kafka does not write directly to disk for each message; instead it first writes to the operating system’s page cache.
Page Cache is a memory area used by the OS to cache disk data. When a producer sends a message, it is stored in the page cache, and background kernel threads (pdflush/writeback) flush dirty pages to disk. If a consumer reads data that is still in the page cache, the read occurs without disk I/O, greatly reducing latency and increasing throughput.
Batch Sending and Pulling
To reduce the number of network and disk I/O operations, Kafka’s client design uses batching.
On the producer side, messages are accumulated into a batch before being sent to the broker. The batch size is controlled by batch.size and the maximum wait time by linger.ms.
On the consumer side, messages are fetched in batches rather than one by one. The fetch behavior is controlled by fetch.min.bytes and fetch.max.wait.ms.
Zero‑Copy Transfer
When a consumer pulls messages, Kafka employs zero‑copy to avoid unnecessary CPU and memory copies.
Without zero‑copy, data would be copied from disk to kernel buffer, then to application buffer, then to socket buffer, and finally to the NIC, involving multiple copies and context switches.
Kafka uses the sendfile system call (or equivalent) to transfer data directly from the page cache to the network socket, eliminating two memory copies and associated context switches. This reduces CPU load and significantly improves transmission efficiency and throughput.
Architect Chen
Sharing over a decade of architecture experience from Baidu, Alibaba, and Tencent.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
