Big Data 7 min read

Why Can Kafka Process 20 Million Messages per Second? Inside Its High‑Performance Architecture

This article explains how Kafka achieves extremely high throughput—up to 20 million messages and 600 MB per second per node—by optimizing the producer, broker, and consumer components through batch sending, custom protocols, page‑cache usage, zero‑copy transfers, and efficient compression algorithms.

Java High-Performance Architecture

Nov 5, 2022

Why Can Kafka Process 20 Million Messages per Second? Inside Its High‑Performance Architecture

Someone reported that on a well‑configured machine Kafka can handle nearly 20 million messages per second, achieving a throughput of 600 MB/s per node. Why is Kafka so fast, and how does it achieve this performance?

The analysis focuses on three perspectives: the producer side, the broker (server) side, and the consumer side.

Production side

Broker side

Consumer side

(1) Producer

The producer workflow includes selecting a target Topic, choosing a partition (default round‑robin or based on a key’s hash), locating the leader partition, establishing a socket connection to the broker, and sending a request in Kafka’s custom protocol that contains batched messages.

Key optimizations:

Batch sending : Messages are cached and sent in batches, reducing the number of broker requests and increasing overall throughput.

Custom protocol & compression : Serialization and compression shrink the payload, saving network bandwidth.

Compression algorithm comparison (throughput): LZ4 > Snappy > zstd > GZIP. Compression ratio comparison (best to worst): zstd > LZ4 > GZIP > Snappy.

(2) Broker

The broker’s high performance stems from three main techniques: PageCache usage for read/write buffering.

Kafka’s file layout and sequential disk writes per topic‑partition.

Zero‑copy sendfile to accelerate data transfer.

PageCache writes data to memory first, then flushes to disk in batches, reducing disk I/O overhead. Reads also come from the cache, and recently written pages are likely to be cached due to LRU eviction policies.

The file layout organizes data as topic + partition, with each partition having its own directory. Kafka writes sequentially at the partition level, enabling parallel file writes and better disk I/O performance compared to systems like RocketMQ, which enforce a single global commit log.

When using Kafka, be mindful of the number of topics and partitions, as they affect I/O performance.

Zero‑copy sendfile eliminates an extra memory copy by moving data directly from PageCache to the socket buffer, often using DMA, so the CPU is not involved in the transfer.

(3) Consumer

Consumers pull messages in batches from the leader partition. Kafka supports consumer groups identified by group.id, allowing multiple consumers in the same group to share the load.

Examples illustrate different group configurations:

One consumer (group.id=1) processes all partitions.

Two consumers (group.id=2) split the partitions (2 + 1).

Three consumers (group.id=3) each handle one partition.

Four consumers (group.id=4) with one idle consumer because consumers exceed partitions.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

kafka Zero‑copy consumer Broker producer compression

Written by

Java High-Performance Architecture

Sharing Java development articles and resources, including SSM architecture and the Spring ecosystem (Spring Boot, Spring Cloud, MyBatis, Dubbo, Docker), Zookeeper, Redis, architecture design, microservices, message queues, Git, etc.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.