Big Data 7 min read

How Kafka Hits 20M msgs/sec: Inside Producer, Broker & Consumer Optimizations

This article dissects why a well‑tuned Kafka cluster can process up to 20 million messages per second, examining producer batching and custom protocols, broker page‑cache, file layout and zero‑copy techniques, as well as consumer group strategies that together unlock its high throughput.

IT Architects Alliance

Sep 2, 2022

How Kafka Hits 20M msgs/sec: Inside Producer, Broker & Consumer Optimizations

Kafka is often claimed to reach a single‑node throughput of around 20 million messages per second (≈600 MB/s). This article explains the technical reasons behind such performance by analyzing three perspectives: the producer side, the broker side, and the consumer side.

Producer Optimizations

The producer workflow includes selecting a topic, choosing a partition (default round‑robin or key‑based hashing), locating the leader partition, establishing a socket connection to the Broker, and sending a custom‑protocol request that can contain batched messages.

Two key techniques boost throughput:

Batch sending : Calls to send() buffer messages locally and transmit them in bulk, reducing the number of request‑handling cycles on the broker.

Custom protocol format : Efficient serialization and compression shrink the payload, saving network bandwidth.

Compression algorithm comparison (throughput order): LZ4 > Snappy > zstd > GZIP. Compression ratio order: zstd > LZ4 > GZIP > Snappy.

Broker (Server) Optimizations

The broker’s high performance stems from three mechanisms: PageCache usage: writes are first cached in memory, then flushed to disk in large batches, reducing disk I/O. Reads also come from the cache, and recent writes enjoy high cache‑hit rates due to LRU eviction.

Kafka’s file layout: data is organized as topic + partition, each partition having its own directory. This enables parallel, sequential writes at the partition level, fully exploiting disk I/O.

Zero‑copy with sendfile: Data can be transferred directly from PageCache to the socket buffer, bypassing user‑space copying. DMA handles the copy, eliminating CPU involvement and further speeding up the pipeline.

Consumer Optimizations

Consumers pull messages in batches from the leader partition. To increase consumption speed, multiple consumers can work in parallel within a consumer group identified by group.id. The article illustrates four scenarios with two brokers, one topic (3 partitions), and varying numbers of consumers: group.id=1: one consumer handles all three partitions. group.id=2: two consumers split the partitions (2 + 1). group.id=3: three consumers each own one partition. group.id=4: four consumers, one remains idle because consumers exceed partitions.

These strategies, combined with the producer’s batching, broker’s page‑cache and zero‑copy, and efficient consumer group coordination, explain how Kafka can sustain extremely high message‑per‑second rates.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

distributed systems Performance Optimization Big Data Kafka Message Queue

Written by

IT Architects Alliance

Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.