Big Data 11 min read

How to Accurately Size Kafka Clusters: Real‑World Disk I/O Tests and Capacity Planning

This article shares 360 Group's systematic Kafka capacity‑planning methodology, covering hardware performance analysis, disk I/O benchmarking, cluster configuration, load‑testing procedures, observed write‑read dynamics, and practical recommendations for reliable Kafka deployments.

360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
How to Accurately Size Kafka Clusters: Real‑World Disk I/O Tests and Capacity Planning

In 360 Group's big‑data architecture, Apache Kafka serves as the core messaging middleware and has been deeply integrated; years of production experience have yielded systematic operation and maintenance practices, which we present here with a focus on Kafka cluster capacity evaluation.

Hardware Performance Impact

Kafka persists messages to disk, making disk performance a primary bottleneck. Key metrics include maximum write IOPS, random read IOPS, write bandwidth, and random read bandwidth. For example, HDDs provide roughly 608 write IOPS and 505 random‑read IOPS, while NVMe devices reach 17 800 write IOPS and 45 500 random‑read IOPS, with corresponding bandwidths of 121 MB/s vs 3 636 MB/s for writes and 50 MB/s vs 5 287 MB/s for reads.

Write tests use a 256 KB block size; read tests use 128 KB because Kafka reads smaller chunks when consumers request specific offsets.

Testing Methodology

Disk sequential‑write performance was measured with fio using Direct I/O, 8 concurrent threads, and a depth of 64:

# Test sequential write, max throughput, block size 256k
fio --name=disk-max-write \
    --filename=/data/sdk \
    --rw=write \
    --bs=256k \
    --size=10g \
    --runtime=120 \
    --time_based \
    --direct=1 \
    --ioengine=libaio \
    --iodepth=64 \
    --numjobs=8 \
    --group_reporting > disk-max-write.log &
# Simulate Kafka read pattern, random read, block size 128k
fio --name=kafka-read-test \
    --filename=/data/sdk \
    --rw=randread \
    --bs=128k \
    --size=10g \
    --runtime=120 \
    --time_based \
    --direct=1 \
    --ioengine=libaio \
    --iodepth=64 \
    --numjobs=8 \
    --group_reporting \
    --output-format=normal > disk-read-test.log &

Real‑time disk metrics were observed with iostat -xd 1, focusing on r/s, w/s, rkB/s, wkB/s, and average request size.

Performance Findings

For HDDs without consumer backlog, the write ceiling is 121 MB/s; with backlog, the read ceiling is 50 MB/s. The fio results confirm these limits.

Cluster Configuration

Typical node: 40 CPU cores, 192 GB RAM, 8 TB × 12 disks, dual 10 Gb NIC (bond4). Theoretical limits: network 2.5 GB/s (produce + consume), disk write 121 MB/s per HDD (12 × 121 MB/s ≈ 1.5 GB/s), read 50 MB/s per HDD (12 × 50 MB/s ≈ 600 MB/s). Under backlog, a 10‑node cluster can sustain up to 6 GB/s of read traffic.

Load Test Setup

Using the OpenMessaging benchmark tool, we created two topics with 200 partitions each and a replication factor of 2 (≈7 partitions per disk). After producing 3 TB of data to simulate severe backlog, consumers started reading while producers continued writing.

name: Kafka
driverClass: io.openmessaging.benchmark.driver.kafka.KafkaBenchmarkDriver
replicationFactor: 2
topicConfig: |
  min.insync.replicas=1
commonConfig: |
  bootstrap.servers=10.1.1.2:9092
  default.api.timeout.ms=12000
  request.timeout.ms=12000
producerConfig: |
  acks=all
  linger.ms=1
  batch.size=1048576
consumerConfig: |
  auto.offset.reset=earliest
  enable.auto.commit=false
  max.partition.fetch.bytes=10485760

During the catch‑up phase, write throughput dropped from 5 GB/s to near zero, causing massive producer failures; once the backlog cleared, write throughput recovered.

Disk I/O metrics showed IOPS hitting ~500 and read bandwidth peaking at 50 MB/s during the catch‑up period, then returning to idle once consumption finished.

Conclusions & Recommendations

Hardware selection : Disk performance is the primary bottleneck; for workloads with heavy backlog or long‑term retention, SSDs (random‑read IOPS up to 90 × HDD) are strongly recommended.

Capacity estimation : Write capacity = min(total disk write bandwidth, network bandwidth). For an HDD cluster, 12 × 121 MB/s ≈ 1.5 GB/s (network must be ≥10 Gb). Catch‑up read capacity = disk random‑read IOPS × data‑block size (e.g., 12 × 500 IOPS × 128 KB ≈ 600 MB/s).

Key performance insight : In extreme catch‑up scenarios, read IOPS saturate the disks, causing a "write avalanche" where write latency spikes and producers time out, as observed when throughput fell from 5 GB/s to 0.

Effective Kafka capacity planning therefore requires precise disk I/O analysis, theoretical calculations, and validation through real‑world stress tests to ensure a stable, reliable messaging system.

MonitoringKafkabig-datadisk-ioperformance-testingcapacity-planning
360 Zhihui Cloud Developer
Written by

360 Zhihui Cloud Developer

360 Zhihui Cloud is an enterprise open service platform that aims to "aggregate data value and empower an intelligent future," leveraging 360's extensive product and technology resources to deliver platform services to customers.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.