Big Data 15 min read

How to Build a High‑Performance Kafka Production Cluster: Sizing, Config, and Best Practices

This guide explains how to design and deploy a Kafka production cluster—including capacity planning for 1 billion daily messages, hardware sizing, key configuration parameters, command‑line operations, and useful management tools—to achieve reliable high‑throughput streaming.

ITFLY8 Architecture Home
ITFLY8 Architecture Home
ITFLY8 Architecture Home
How to Build a High‑Performance Kafka Production Cluster: Sizing, Config, and Best Practices

1. Kafka Production Cluster Deployment

Background : Assume the cluster must handle 1 billion messages per day. About 80% of the data (800 million) arrives within 16 hours, and 80% of that (640 million) arrives in the peak 3‑hour window, requiring roughly 60 000 QPS.

Disk space : Each message is ~50 KB, so daily raw data is 46 TB. With two replicas this becomes 92 TB, and keeping three days of data requires about 276 TB of storage.

Hardware sizing :

QPS: A single physical machine can sustain 40‑50 k QPS; therefore 5‑7 machines (total 200‑300 k QPS) provide a safe headroom of 3‑4× peak load.

Disk: Five servers storing 276 TB means roughly 56 TB per server; choose appropriate number and size of disks.

SSD vs. SAS: Kafka writes sequentially, so high‑capacity SAS drives are acceptable for cost‑sensitive deployments, while SSDs are recommended for workloads with random I/O (e.g., MySQL).

Memory: Allocate ~10 GB JVM heap for Kafka and use the rest for OS cache. For 100 topics with 5 partitions each (500 partitions) and 2 replicas, about 50 GB of cache is sufficient; a 64 GB server is adequate.

CPU: 16‑core CPUs are typical; they can handle 100‑200 threads per broker. More cores (32) give extra headroom.

Network: Peak inbound traffic is ~488 MB/s; with replication the required bandwidth is ~976 MB/s, so 10 GbE is preferable, though 1 GbE may be a bottleneck.

2. Kafka Cluster Configuration

Edit server.properties (only this file is needed). Important settings include: broker.id: unique ID per broker (0‑255). log.dirs: directories for log storage; multiple comma‑separated paths spread data across disks. zookeeper.connect: Zookeeper ensemble address. listeners: client connection port (default 9092). num.network.threads (default 3) and num.io.threads (default 8). unclean.leader.election.enable: false (only ISR followers can become leader). delete.topic.enable: true (allow topic deletion). log.retention.hours: retention period (default 7 days). min.insync.replicas: controls acks behavior for durability.

3. Basic Kafka Cluster Operations

Typical commands (adjusted for your environment):

bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 2 --partitions 2 --topic tellYourDream

List topics:

bin/kafka-topics.sh --list --zookeeper localhost:2181

Produce messages:

bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test

Consume messages:

bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning

Performance test (produce 500 k records, 200 bytes each):

bin/kafka-producer-perf-test.sh --topic test-topic --num-records 500000 --record-size 200 --throughput -1 --producer-props bootstrap.servers=hadoop03:9092,hadoop04:9092,hadoop05:9092 acks=-1

Consume test:

bin/kafka-consumer-perf-test.sh --broker-list hadoop03:9092,hadoop04:9092,hadoop53:9092 --fetch-size 2000 --messages 500000 --topic test-topic

4. Management Tools

KafkaManager (Scala‑based) helps monitor brokers, topics, partitions, and perform administrative actions such as creating, deleting, or re‑partitioning topics.

KafkaOffsetMonitor provides consumer lag visibility; it is a simple JAR that can be started with a Java command.

Additional tools like MirrorMaker can be used for cross‑datacenter replication when needed.

Further articles will cover producer/consumer internals and common interview questions.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Kafkaperformance tuningCluster Deployment
ITFLY8 Architecture Home
Written by

ITFLY8 Architecture Home

ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.