Backend Development 6 min read

Why Can Kafka Process 20 Million Messages per Second? Inside Its High‑Performance Secrets

This article breaks down how Kafka achieves up to 20 million messages per second and 600 MB/s throughput by examining producer batching and compression, broker PageCache and zero‑copy techniques, and consumer parallelism with group coordination.

dbaplus Community

Jul 13, 2023

Why Can Kafka Process 20 Million Messages per Second? Inside Its High‑Performance Secrets

1. Producer Optimizations

Kafka’s producer reduces overhead by batching messages, sending them in bulk instead of one‑by‑one, which cuts the number of requests the broker must handle. It also uses a custom binary protocol that serializes and compresses data, shrinking payload size and saving network bandwidth.

Compression algorithms are compared: in terms of throughput LZ4 > Snappy > zstd > GZIP, while compression ratio ranks zstd > LZ4 > GZIP > Snappy.

When send() is called, messages are buffered locally; the client decides the optimal moment to flush the buffer, forming batches that are transmitted together.

2. Broker (Server) Optimizations

The broker’s high performance stems from three key techniques:

PageCache acceleration : Writes first go to the OS PageCache, then are flushed to disk in large batches, reducing disk I/O. Reads also come from PageCache, giving a high hit rate for freshly written data.

File layout and sequential writes : Kafka stores data per topic and partition , each with its own directory. Within a partition, files are written sequentially, allowing multiple files to be written in parallel and fully exploiting disk throughput. Compared with RocketMQ’s single commit‑log approach, Kafka’s layout scales better but can suffer when the number of topics or partitions grows excessively.

Zero‑copy sendfile : The broker can transfer data directly from PageCache to the network socket, bypassing user‑space copying. DMA handles the transfer, eliminating CPU involvement and speeding up consumption.

3. Consumer Optimizations

Consumers pull messages in batches from the leader partition. To increase throughput, multiple consumers can work in parallel within a consumer group (identified by group.id). The article illustrates several scenarios:

One consumer (group.id = 1) reads all three partitions.

Two consumers (group.id = 2) split the partitions 2‑1.

Three consumers (group.id = 3) each handle one partition.

Four consumers (group.id = 4) where the fourth remains idle because partitions are fewer than consumers.

When zero‑copy is used, the broker copies data directly from PageCache to the socket buffer, avoiding an extra copy into user memory and further reducing latency.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

kafka Zero‑copy consumer Broker producer compression

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.