How Kafka Achieves Million‑Message Throughput: Partitions, Zero‑Copy, and Page Cache Explained
This article breaks down Kafka's high‑throughput design by detailing how partitioning enables parallelism, sequential disk writes and zero‑copy reduce I/O overhead, and OS page cache maximizes memory efficiency, together allowing millions of messages per second.
Partitioning and Parallelism
Kafka splits each topic into multiple partitions, each acting as an independent log and parallel processing unit. Adding more partitions increases the system's ability to process data in parallel, so throughput scales linearly with the number of brokers and partitions. Producers can write to different partitions concurrently, while consumer groups consume each partition in parallel, achieving horizontal scaling. The partition design also supports replication, providing data redundancy and high availability.
Sequential Writes and Zero‑Copy
Kafka appends incoming messages to the end of a partition log, performing fully sequential disk writes. Mechanical hard drives can achieve sequential write speeds of 300‑500 MB/s, far exceeding random‑write performance. By leveraging the operating system's page cache, writes first land in memory and are flushed to disk asynchronously, dramatically boosting throughput.
Page Cache Utilization
Kafka does not manage its own memory; it relies on the operating system. When Kafka writes data, it first goes into the system's page cache, and the OS later handles the actual disk flush (fsync). If a consumer reads data that is still cached, the read is satisfied entirely from memory, eliminating physical disk access. This approach maximizes memory utilization, and the page cache persists even after the Kafka process restarts, avoiding JVM garbage‑collection pauses because the data resides outside the Java heap.
Zero‑Copy and Efficient Network Transmission
When sending data, Kafka employs zero‑copy techniques (e.g., sendfile(), mmap) to avoid copying data between user space and kernel space, reducing CPU overhead and context switches. Kafka batches multiple messages into a single batch before sending, writing, and fetching, which cuts down the number of network requests and system calls. This batching spreads the per‑message processing cost across many messages. Combined with compression algorithms such as GZIP, Snappy, or LZ4, zero‑copy and batching significantly lower bandwidth consumption while increasing messages processed per second.
Architect Chen
Sharing over a decade of architecture experience from Baidu, Alibaba, and Tencent.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
