Boost Kafka to Over 1 Million Messages per Second: Metrics and Tuning Tips
This article explains what high concurrency means for Kafka, outlines key performance metrics such as QPS, TPS, throughput and latency, and provides concrete configuration and architectural techniques—including broker optimization, horizontal scaling, network batching, and zero‑copy—to achieve write rates exceeding one million records per second.
What Is High Concurrency in Distributed Systems?
High concurrency refers to a system's ability to handle a large number of requests simultaneously. In Kafka, it is measured by the number of messages processed per second rather than the speed of a single thread.
Key Performance Indicators
QPS (Queries Per Second) : Number of requests per second.
TPS (Transactions Per Second) : Number of committed transactions per second.
Throughput : Amount of data processed per second (e.g., MB/s, GB/s).
Latency : Time from request issuance to response.
Typical Throughput Benchmarks
• Single‑broker (1 Broker) – 5 ~ 10 × 10⁴ messages/s (writes > 10⁵ msg/s are considered high).
• Small cluster (3 Brokers) – 30 ~ 50 × 10⁴ messages/s.
• Medium cluster (6‑10 Brokers) – 100 ~ 300 × 10⁴ messages/s.
• Large cluster (20+ Brokers) – 5 ~ 10 × 10⁶ TPS, typical of “high‑concurrency Kafka” scenarios.
Core Techniques for Achieving High Concurrency
1. Broker‑Layer Optimisation
Increase partitions to create more parallel write threads.
Leverage sequential disk writes with page cache + zero copy to approach disk‑level throughput.
Enable batch writes using batch.size and linger.ms to coalesce requests.
Boost I/O parallelism by configuring num.io.threads and num.network.threads.
2. Horizontal Scaling of Brokers
Adding more broker nodes distributes write load across the cluster, reducing pressure on any single machine.
3. Network and Batch Transmission
Configure asynchronous batch sending:
acks=1 linger.ms=10 batch.size=32768 compression.type=lz44. Zero‑Copy Support
Ensure the operating system enables zero‑copy (e.g., sendfile) – a critical factor for Kafka's high‑throughput capability.
Example Configuration Snippet
log.dirs=/data1/kafka,/data2/kafka,/data3/kafkaUsing multiple disk paths improves I/O performance by spreading writes across disks.
Illustrative Diagrams
Mike Chen's Internet Architecture
Over ten years of BAT architecture experience, shared generously!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
