Kafka Performance Testing and Optimization Report
This report presents a comprehensive performance‑testing plan for a Kafka cluster, detailing objectives, test scope, JVM and broker tuning, producer and consumer parameter experiments, extensive benchmark results, and practical recommendations for achieving high throughput and stability in large‑scale message processing.
Kafka Performance Testing and Optimization Report
The purpose of this performance test is to evaluate the ability of a single‑server Kafka instance in a production environment to handle MQ messages at the hundred‑million level, and to verify whether the current Kafka configuration meets the project’s throughput requirements.
1. Test Scope and Method
We used Kafka’s built‑in performance scripts to generate write and read loads. Different message volumes, batch sizes, compression codecs, acks, partitions and replication factors were varied to observe their impact on throughput, latency and resource consumption.
1.1 Test Environment
All tests were executed on a single physical server running Kafka. Disk I/O performance was measured beforehand with the following commands:
1. Test read speed
hdparm -t --direct /dev/sda3
2. Test write speed
sync; /usr/bin/time -p bash -c "dd if=/dev/zero of=test.dd bs=1M count=20000"The results showed read speeds between 163 MiB/s and 206 MiB/s and write speeds around 125 MiB/s.
2. JVM and Kafka Parameters
JVM tuning focused on memory allocation and the G1 garbage collector (recommended over CMS). Example JVM options:
-Xmx6G -Xms6G -XX:MMetaspaceSize=96m -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:G1HeapRegionSize=16m -XX:MinMetaspaceFreeRatio=50 -XX:MaxMetaspaceFreeRatio=80Key Kafka broker settings examined include num.replica.fetchers, num.io.threads, num.network.threads, log.flush.interval.messages and log.flush.interval.ms.
3. Producer Tests
Parameters varied: batch.size, acks, message.size, compression.type, partition, replication and overall throughput limits.
3.1 Batch Size
Increasing batch.size from 10 k to 80 k showed that throughput stabilised at ~30 k messages/s once the batch reached 20 k, with a data rate of 19.65 MiB/s.
3.2 Acks
Three ack strategies were compared (0, 1, -1). The fastest was acks=0, followed by leader‑only ( acks=1); full replication ( acks=-1) reduced throughput to roughly 25 % of the no‑ack case.
3.3 Message Size
Tests with 687 B and 454 B messages indicated that larger messages (687 B) yielded higher throughput. A 4 k message size performed best in later experiments.
3.4 Compression Codec
Four codecs were evaluated (none, gzip, snappy, lz4) at various batch sizes and concurrency levels. LZ4 consistently delivered the highest throughput (up to 30 k+ messages/s), while gzip was the slowest due to compression overhead.
3.5 Partitions and Replication
Increasing partition count improved throughput up to the point where broker threads became saturated; beyond that, additional partitions gave no benefit. Higher replication factors reduced throughput linearly.
4. Consumer Tests
Variables included thread count, fetch size, partition count, replication factor and fetch‑thread count.
4.1 Threads
Four consumer threads achieved the best throughput (~241 k messages/s); more threads offered diminishing returns.
4.2 Fetch Size
Larger fetch sizes (up to 20 k) improved consumption rates, with the optimal point around 20 k bytes per fetch.
4.3 Partitions and Replication
Similar to the producer side, more partitions increased overall throughput until broker resources were exhausted. Higher replication reduced consumer throughput.
5. Broker Parameter Tests
Each broker setting was varied while keeping other factors constant.
5.1 num.replica.fetchers
Increasing the fetcher count up to the number of CPU cores (+1) yielded modest improvements; the optimal value observed was 32.
5.2 num.io.threads
IO threads scaled with CPU cores; the best performance was observed at three times the core count (e.g., 96 threads on a 32‑core machine).
5.3 num.network.threads
Network threads performed best when set to the core count plus one (optimal around 32).
5.4 log.flush.interval.messages & log.flush.interval.ms
Both parameters had limited impact on throughput; a message interval of 20 k and a time interval of 10 s provided a good balance.
6. Disaster Recovery Tests
Scenarios such as full broker outage, partial broker failure, disk failure and full‑cluster recovery were exercised. Results showed that producers experience temporary errors but recover quickly, while consumers continue processing with minimal interruption. Replication factors of 2‑4 provide sufficient resilience.
7. Single‑Machine Tests
Various ack/compression combinations were benchmarked on a single node. The highest observed throughput was 350 988 messages/s (≈230 MiB/s) with acks=0 and compression=lz4. CPU usage peaked at 323 % on a 32‑core server, while memory usage stayed below 6 %.
8. Conclusions
For producers, use acks=1 for a balance of safety and performance, LZ4 compression, batch sizes around 1 M, 60 concurrent threads, message size ~4 k, 3‑5 partitions and a replication factor of 3.
For consumers, configure 4 threads, fetch size ~400 k messages, and 10 fetch‑threads.
Broker settings: num.replica.fetchers = core count, num.io.threads = 3 × core count, num.network.threads = core count, log.flush.interval.messages = 20 k, log.flush.interval.ms = 10 000 ms.
Overall, a well‑tuned single‑node Kafka can sustain ~35 k messages/s with LZ4 compression, while disk I/O peaks at ~198 k writes/s and CPU usage remains within acceptable limits.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
