Big Data 38 min read

Kafka Performance Testing and Optimization Report

This report presents a comprehensive performance‑testing plan for a Kafka cluster, detailing objectives, test scope, JVM and broker tuning, producer and consumer parameter experiments, extensive benchmark results, and practical recommendations for achieving high throughput and stability in large‑scale message processing.

Big Data Technology & Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
Kafka Performance Testing and Optimization Report

Kafka Performance Testing and Optimization Report

The purpose of this performance test is to evaluate the ability of a single‑server Kafka instance in a production environment to handle MQ messages at the hundred‑million level, and to verify whether the current Kafka configuration meets the project’s throughput requirements.

1. Test Scope and Method

We used Kafka’s built‑in performance scripts to generate write and read loads. Different message volumes, batch sizes, compression codecs, acks, partitions and replication factors were varied to observe their impact on throughput, latency and resource consumption.

1.1 Test Environment

All tests were executed on a single physical server running Kafka. Disk I/O performance was measured beforehand with the following commands:

1. Test read speed
   hdparm -t --direct /dev/sda3
2. Test write speed
   sync; /usr/bin/time -p bash -c "dd if=/dev/zero of=test.dd bs=1M count=20000"

The results showed read speeds between 163 MiB/s and 206 MiB/s and write speeds around 125 MiB/s.

2. JVM and Kafka Parameters

JVM tuning focused on memory allocation and the G1 garbage collector (recommended over CMS). Example JVM options:

-Xmx6G -Xms6G -XX:MMetaspaceSize=96m -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:G1HeapRegionSize=16m -XX:MinMetaspaceFreeRatio=50 -XX:MaxMetaspaceFreeRatio=80

Key Kafka broker settings examined include num.replica.fetchers, num.io.threads, num.network.threads, log.flush.interval.messages and log.flush.interval.ms.

3. Producer Tests

Parameters varied: batch.size, acks, message.size, compression.type, partition, replication and overall throughput limits.

3.1 Batch Size

Increasing batch.size from 10 k to 80 k showed that throughput stabilised at ~30 k messages/s once the batch reached 20 k, with a data rate of 19.65 MiB/s.

3.2 Acks

Three ack strategies were compared (0, 1, -1). The fastest was acks=0, followed by leader‑only ( acks=1); full replication ( acks=-1) reduced throughput to roughly 25 % of the no‑ack case.

3.3 Message Size

Tests with 687 B and 454 B messages indicated that larger messages (687 B) yielded higher throughput. A 4 k message size performed best in later experiments.

3.4 Compression Codec

Four codecs were evaluated (none, gzip, snappy, lz4) at various batch sizes and concurrency levels. LZ4 consistently delivered the highest throughput (up to 30 k+ messages/s), while gzip was the slowest due to compression overhead.

3.5 Partitions and Replication

Increasing partition count improved throughput up to the point where broker threads became saturated; beyond that, additional partitions gave no benefit. Higher replication factors reduced throughput linearly.

4. Consumer Tests

Variables included thread count, fetch size, partition count, replication factor and fetch‑thread count.

4.1 Threads

Four consumer threads achieved the best throughput (~241 k messages/s); more threads offered diminishing returns.

4.2 Fetch Size

Larger fetch sizes (up to 20 k) improved consumption rates, with the optimal point around 20 k bytes per fetch.

4.3 Partitions and Replication

Similar to the producer side, more partitions increased overall throughput until broker resources were exhausted. Higher replication reduced consumer throughput.

5. Broker Parameter Tests

Each broker setting was varied while keeping other factors constant.

5.1 num.replica.fetchers

Increasing the fetcher count up to the number of CPU cores (+1) yielded modest improvements; the optimal value observed was 32.

5.2 num.io.threads

IO threads scaled with CPU cores; the best performance was observed at three times the core count (e.g., 96 threads on a 32‑core machine).

5.3 num.network.threads

Network threads performed best when set to the core count plus one (optimal around 32).

5.4 log.flush.interval.messages & log.flush.interval.ms

Both parameters had limited impact on throughput; a message interval of 20 k and a time interval of 10 s provided a good balance.

6. Disaster Recovery Tests

Scenarios such as full broker outage, partial broker failure, disk failure and full‑cluster recovery were exercised. Results showed that producers experience temporary errors but recover quickly, while consumers continue processing with minimal interruption. Replication factors of 2‑4 provide sufficient resilience.

7. Single‑Machine Tests

Various ack/compression combinations were benchmarked on a single node. The highest observed throughput was 350 988 messages/s (≈230 MiB/s) with acks=0 and compression=lz4. CPU usage peaked at 323 % on a 32‑core server, while memory usage stayed below 6 %.

8. Conclusions

For producers, use acks=1 for a balance of safety and performance, LZ4 compression, batch sizes around 1 M, 60 concurrent threads, message size ~4 k, 3‑5 partitions and a replication factor of 3.

For consumers, configure 4 threads, fetch size ~400 k messages, and 10 fetch‑threads.

Broker settings: num.replica.fetchers = core count, num.io.threads = 3 × core count, num.network.threads = core count, log.flush.interval.messages = 20 k, log.flush.interval.ms = 10 000 ms.

Overall, a well‑tuned single‑node Kafka can sustain ~35 k messages/s with LZ4 compression, while disk I/O peaks at ~198 k writes/s and CPU usage remains within acceptable limits.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

throughputbroker-configurationperformance-testing
Big Data Technology & Architecture
Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.