Big Data 47 min read

Zero‑Data‑Loss Kafka Cluster Scaling: Complete Step‑by‑Step Guide

This comprehensive guide explains how to safely expand a Kafka cluster without data loss by covering applicable scenarios, pre‑conditions, anti‑pattern warnings, environment matrices, a detailed checklist, step‑by‑step Linux commands for broker preparation, partition‑rebalancing plan generation, throttled execution, real‑time monitoring, verification, rollback procedures, backup strategies, performance testing, common troubleshooting, FAQs and best‑practice scripts, all illustrated with code snippets and practical examples.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
Zero‑Data‑Loss Kafka Cluster Scaling: Complete Step‑by‑Step Guide

Applicable Scenarios & Prerequisites

The guide is intended for Kafka clusters that suffer capacity or performance bottlenecks, need additional broker nodes, have uneven partition loads, or require data migration. Supported OS: RHEL/CentOS 7.9+, Ubuntu 20.04+. Required components: Kafka 2.8.0+ (recommended 3.4.0+ with KRaft), OpenJDK 11+, ZooKeeper 3.6.0+ (if using ZooKeeper mode), 8 CPU 16 GB RAM per broker, 500 GB SSD (1 TB NVMe recommended), ≥10 Gbps NIC, appropriate Kafka user and sudo permissions, and familiarity with Kafka architecture, partition‑replica mechanisms, Linux commands and shell scripting.

Anti‑Pattern Warnings

Single‑broker clusters cannot be rebalanced; expand to at least three brokers first.

Clusters with replication factor 1 lack redundancy; increase to 2+ before rebalancing.

Kafka versions < 2.4 lack kafka-reassign-partitions.sh advanced features; upgrade first.

Do not run rebalancing on a disk‑failed broker; fix the failure first.

Insufficient network bandwidth (< 1 Gbps) will saturate during migration; avoid.

Without monitoring (Kafka Manager/Cruise Control) you cannot assess the impact; deploy monitoring before scaling.

Environment & Version Matrix

OS: RHEL 8.7+ / CentOS Stream 9 or Ubuntu 22.04 LTS (tested).

Kernel: 4.18.0‑425+ (RHEL) or 5.15.0‑60+ (Ubuntu) (tested).

Kafka: 2.8.2 (ZooKeeper) / 3.4.0 (KRaft) (tested).

ZooKeeper: 3.6.4 / 3.8.1 (tested).

Java: OpenJDK 11.0.18+ or 17.0.6+ (tested).

Recommended hardware: 16 CPU 32 GB RAM, 1 TB NVMe SSD per broker.

Network: ≥10 Gbps NIC.

Quick Checklist

Assess current load (CPU, memory, disk, network) one day before.

Backup ZooKeeper data ( zkCli.sh export).

Provision new broker hardware and install Java.

Download matching Kafka binaries and create a dedicated kafka user.

Configure server.properties (broker.id, listeners, log.dirs, replication factor, JVM heap, network & I/O threads, throttling parameters, JMX port).

Deploy systemd service and start the new broker.

Implementation Steps

1. Evaluate Current Cluster State

# Check broker versions
kafka-broker-api-versions.sh --bootstrap-server localhost:9092 | grep -E "^[0-9]"
# List topics
kafka-topics.sh --bootstrap-server localhost:9092 --list
# Describe a topic
kafka-topics.sh --bootstrap-server localhost:9092 --describe --topic my-topic
# Verify under‑replicated partitions (should be 0)
kafka-topics.sh --bootstrap-server localhost:9092 --describe --under-replicated-partitions
# Check disk usage on each broker
for broker in kafka-broker-{1..3}; do ssh $broker "df -h /kafka-logs | tail -n1"; done
# Check CPU & network load
for broker in kafka-broker-{1..3}; do ssh $broker "uptime; iostat -x 1 3"; done
# Verify consumer lag
kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe --group my-consumer-group

2. Prepare New Broker Nodes

# Install Java
sudo yum install -y java-11-openjdk java-11-openjdk-devel
# Download matching Kafka
cd /opt && sudo wget https://archive.apache.org/dist/kafka/3.4.0/kafka_2.13-3.4.0.tgz
sudo tar xzf kafka_2.13-3.4.0.tgz && sudo ln -s kafka_2.13-3.4.0 kafka
# Create kafka user and log directory
sudo useradd -r -s /bin/false kafka
sudo mkdir -p /kafka-logs && sudo chown -R kafka:kafka /opt/kafka /kafka-logs
# Configure server.properties (example excerpt)
cat > /opt/kafka/config/server.properties <<'EOF'
broker.id=4
listeners=PLAINTEXT://192.168.1.14:9092
advertised.listeners=PLAINTEXT://192.168.1.14:9092
log.dirs=/kafka-logs
num.partitions=12
default.replication.factor=2
log.retention.hours=168
log.segment.bytes=1073741824
num.network.threads=8
num.io.threads=16
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
replica.fetch.max.bytes=1048576
replica.socket.timeout.ms=30000
KAFKA_HEAP_OPTS="-Xmx8G -Xms8G"
EOF
# Create systemd service
cat > /etc/systemd/system/kafka.service <<'EOF'
[Unit]
Description=Apache Kafka Server
After=network.target

[Service]
Type=simple
User=kafka
Group=kafka
ExecStart=/opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/server.properties
ExecStop=/opt/kafka/bin/kafka-server-stop.sh
Restart=on-failure
LimitNOFILE=100000

[Install]
WantedBy=multi-user.target
EOF
sudo systemctl daemon-reload && sudo systemctl enable kafka && sudo systemctl start kafka

3. Generate Rebalance Plan

# List topics to move (JSON format)
cat > /tmp/topics-to-move.json <<'EOF'
{"topics":[{"topic":"my-topic-1"},{"topic":"my-topic-2"}],"version":1}
EOF
# Generate plan for brokers 1‑5
kafka-reassign-partitions.sh \
  --bootstrap-server localhost:9092 \
  --topics-to-move-json-file /tmp/topics-to-move.json \
  --broker-list "1,2,3,4,5" \
  --generate > /tmp/reassignment-plan.json
# Extract proposed assignment
grep -A 9999 "Proposed partition reassignment configuration" /tmp/reassignment-plan.json | grep -v "Proposed" > /tmp/proposed.json
# Keep current assignment for rollback
grep -B 9999 "Proposed partition reassignment configuration" /tmp/reassignment-plan.json | head -n -1 | grep -A 9999 "Current partition replica assignment" > /tmp/current.json

4. Execute Rebalance with Throttling

# Set throttle to 50 MB/s (52428800 bytes)
THROTTLE=52428800
kafka-reassign-partitions.sh \
  --bootstrap-server localhost:9092 \
  --reassignment-json-file /tmp/proposed.json \
  --execute \
  --throttle $THROTTLE

5. Real‑Time Monitoring

# Verify progress every 30 s
while true; do
  kafka-reassign-partitions.sh --bootstrap-server localhost:9092 \
    --reassignment-json-file /tmp/proposed.json --verify
  sleep 30
done
# Monitor under‑replicated partitions
watch -n 5 "kafka-topics.sh --bootstrap-server localhost:9092 --describe --under-replicated-partitions | wc -l"
# Monitor consumer lag
watch -n 10 "kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe --group my-consumer-group | grep -E 'TOPIC|LAG'"
# Optional: network bandwidth with iftop or sar

6. Post‑Rebalance Verification & Cleanup

# Verify all partitions are complete
kafka-reassign-partitions.sh --bootstrap-server localhost:9092 \
  --reassignment-json-file /tmp/proposed.json --verify | grep -c "is complete"
# Remove throttling configuration from all brokers
for id in 1 2 3 4 5; do
  kafka-configs.sh --bootstrap-server localhost:9092 \
    --entity-type brokers --entity-name $id \
    --alter --delete-config follower.replication.throttled.rate,leader.replication.throttled.rate || true
done
# Ensure no under‑replicated partitions
kafka-topics.sh --bootstrap-server localhost:9092 --describe --under-replicated-partitions
# Validate balanced leader distribution
kafka-topics.sh --bootstrap-server localhost:9092 --describe --topic my-topic | grep "Leader:" | awk '{print $6}' | sort | uniq -c
# Validate disk usage across brokers
for broker in kafka-broker-{1..5}; do
  echo -n "$broker: "
  ssh $broker "df -h /kafka-logs | tail -n1 | awk '{print \$5}'"
done
# Performance sanity test (optional)
kafka-producer-perf-test.sh --topic perf-test --num-records 1000000 --record-size 1024 --throughput -1 --producer-props bootstrap.servers=localhost:9092 acks=all
kafka-consumer-perf-test.sh --bootstrap-server localhost:9092 --topic perf-test --messages 1000000 --threads 4

Core Mechanisms

The partition‑rebalance workflow consists of the controller issuing a new assignment, target brokers becoming followers, fetching data from the current leader, catching up to the leader’s Log End Offset (LEO), joining the ISR, optional leader election, and finally removing the old replicas. ISR (In‑Sync Replicas) guarantees that only fully synchronized replicas can become leaders, preventing data loss.

Throttling parameters ( leader.replication.throttled.rate and follower.replication.throttled.rate) limit migration bandwidth to avoid saturating the network and degrading production traffic.

Observability & Alerts

Key JMX metrics to watch during scaling: kafka.server:ReplicaManager,name=UnderReplicatedPartitions – should stay at 0. kafka.server:ReplicaManager,name=IsrShrinksPerSec – should be 0. kafka.server:BrokerTopicMetrics,name=BytesInPerSec and BytesOutPerSec – monitor for spikes. kafka.network:RequestChannel,name=RequestQueueSize – keep below 100.

Example Prometheus alert rules (critical for Under‑Replicated partitions, offline partitions, ISR shrink, and high disk usage) are provided in the original guide.

Common Faults & Troubleshooting

Typical symptoms such as stuck migration, persistent under‑replicated partitions, consumer lag spikes, broker start‑up failures, or leader election issues are mapped to root causes and concrete remediation steps (adjust throttling, fix disk space, restart brokers, verify ZooKeeper connectivity, increase replication factor, etc.).

FAQ Highlights

Restarting a broker during rebalancing is possible but discouraged; if required, move the controller role first.

Estimated migration time = data (GB) ÷ (throttle (MB/s) × 3600) × 1.5 safety factor.

Consumer lag may increase due to bandwidth saturation or leader elections; mitigate by lowering throttle or adding consumer instances.

Rollback procedure: stop throttling, execute

kafka-reassign-partitions.sh --reassignment-json-file /tmp/current.json --execute

, verify, and optionally decommission new brokers.

Data integrity verification after scaling can be done with kafka-run-class.sh kafka.tools.GetOffsetShell and kafka-replica-verification.sh.

Best Practices

Establish performance baselines before scaling (throughput, partition distribution, disk usage).

Enable rack awareness to spread replicas across failure domains.

Consider Cruise Control for automated, optimal rebalancing.

Scale in phases (add one broker, validate, then add the next) to reduce risk.

Deploy comprehensive monitoring and alerting covering Under‑Replicated partitions, disk usage, network bandwidth, and ISR health.

Periodically rehearse the entire scaling process in a test environment.

Fine‑tune producer (acks=all, idempotence, compression) and consumer (fetch size, max poll) settings to minimise impact.

Key Scripts

Two ready‑to‑use Bash scripts are included:

kafka_scale_out.sh – automates broker provisioning, plan generation, throttled execution, progress monitoring, and post‑scale cleanup with rollback support.

kafka_balance_check.sh – validates leader and replica distribution, disk usage, and Under‑Replicated partitions after scaling.

Both scripts assume standard Kafka installation paths and require sudo privileges to manage systemd services.

KafkaLinuxcluster scalingShell ScriptsPartition RebalancingZero Data Loss
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.