Big Data 27 min read

Zero Data Loss Kafka Cluster Scaling: From 3 to 10 Nodes – A Complete Guide

This comprehensive guide walks you through expanding or shrinking a production‑grade Kafka cluster—covering prerequisites, anti‑pattern warnings, environment matrices, step‑by‑step expansion and contraction procedures, partition rebalancing principles, monitoring, best practices, and troubleshooting—to ensure zero data loss during scaling.

Ops Community
Ops Community
Ops Community
Zero Data Loss Kafka Cluster Scaling: From 3 to 10 Nodes – A Complete Guide

Kafka Cluster Scaling: Zero Data Loss Partition Rebalancing

This article provides a complete, production‑ready workflow for scaling a Kafka cluster from 3 nodes to up to 10 nodes (or shrinking it) while guaranteeing zero data loss.

1. Applicable Scenarios & Prerequisites

Traffic growth requiring expansion, cost optimization requiring shrinkage, node failure replacement, cross‑datacenter migration.

Kafka version 3.0+ (recommended 3.6+ with KRaft support).

ZooKeeper 3.6+ (not needed in KRaft mode).

OS: RHEL/CentOS 7.9+ or Ubuntu 20.04+.

Java JDK 11 or 17.

Network: at least 1 Gbps NIC (10 Gbps recommended).

Storage: SSD preferred, or high‑performance HDD RAID 10.

Cluster size: minimum 3 nodes, 5+ nodes recommended for HA.

Replication factor ≥ 2 (3 recommended).

Admin rights on Kafka and OS root/sudo.

Familiarity with Kafka configuration, partition‑replica mechanics, and network tuning.

2. Anti‑Pattern Warnings

Single‑node Kafka – no replica fault tolerance, scaling is meaningless.

No monitoring – real‑time traffic, latency, and replica sync must be observed.

Insufficient disk space – need > 50 % free space for temporary replica data.

Network bandwidth limits – rebalancing can double traffic; insufficient bandwidth causes timeouts.

Peak‑time operations – avoid scaling during business peaks to reduce load.

3. Alternative Solutions Comparison

Temporary traffic spikes – adjust partition count + client tuning instead of frequent scaling.

Single‑node failure – replace the node directly, no rebalancing needed.

Cost‑sensitive workloads – use managed Kafka (MSK, Alibaba Cloud) with pay‑as‑you‑go auto‑scaling.

Cross‑cloud migration – use MirrorMaker 2.0 for zero‑downtime data sync.

Small data volumes – consider RabbitMQ or RocketMQ; Kafka excels with large data.

4. Environment & Version Matrix

Key components tested:

Kafka 3.6.0 (30 % faster rebalancing than 3.0).

ZooKeeper 3.8.3 (enhanced stability).

KRaft (ZK‑less) supported from Kafka 3.3+ – simplifies architecture.

OS: Ubuntu 22.04 / RHEL 9.1.

Java OpenJDK 17 (better GC performance than 11).

5. Reading Navigation

Quick start (≈30 min): Sections 6 → 7 → 14.

Deep dive (≈90 min): Sections 8 → 7 → 10.

Troubleshooting: Sections 11 → 9.

6. Quick Checklist

Expansion Checklist

Preparation

Check current cluster status: kafka-topics.sh --describe Backup server.properties.

Provision new node (install Kafka + Java).

Verify disk space > 50 %.

Implementation

Start new broker and join cluster.

Generate partition reassignment plan with kafka-reassign-partitions.sh --generate.

Validate plan (dry‑run).

Execute reassignment with optional throttling.

Wait for ISR synchronization.

Verification

Confirm even partition distribution.

Verify all replicas are in‑sync (no under‑replicated partitions).

Run performance tests to ensure throughput improvement.

Shrinkage Checklist

Preparation

Identify broker(s) to decommission.

Ensure replication factor ≥ 2 for all partitions on those brokers.

Generate exclusion plan (remove broker IDs from replica list).

Implementation

Execute reassignment to move data off the target brokers.

Monitor progress until completion.

Verification

Confirm no partitions reference the decommissioned brokers.

Stop and clean up the old brokers (and ZooKeeper metadata if applicable).

Validate cluster health (no under‑replicated or offline partitions).

7. Implementation Steps

Kafka Partition Rebalancing Architecture

【Kafka 集群架构与分区分布】

初始状态(3 Broker 集群)
 ├─ Broker 0 (192.168.1.10)
 │   ├─ Topic: my-topic, Partition 0 (Leader)
 │   ├─ Topic: my-topic, Partition 3 (Follower)
 │   └─ Topic: my-topic, Partition 5 (Follower)
 ├─ Broker 1 (192.168.1.11)
 │   ├─ Topic: my-topic, Partition 1 (Leader)
 │   ├─ Topic: my-topic, Partition 4 (Follower)
 │   └─ Topic: my-topic, Partition 0 (Follower)
 └─ Broker 2 (192.168.1.12)
     ├─ Topic: my-topic, Partition 2 (Leader)
     ├─ Topic: my-topic, Partition 5 (Leader)
     └─ Topic: my-topic, Partition 1 (Follower)

扩容后(5 Broker 集群)
 ├─ 新增 Broker 3 (192.168.1.13)
 │   ├─ Topic: my-topic, Partition 0 (Follower)
 │   └─ Topic: my-topic, Partition 3 (Leader)
 ├─ 新增 Broker 4 (192.168.1.14)
 │   ├─ Topic: my-topic, Partition 1 (Follower)
 │   └─ Topic: my-topic, Partition 4 (Leader)
 └─ 原有 Broker 0‑2 保留部分分区

【分区重平衡流程】
步骤 1: 生成重分配计划
  kafka-reassign-partitions.sh --generate …
步骤 2: 执行重分配
  kafka-reassign-partitions.sh --execute …
步骤 3: 副本同步(ISR、HW、LEO)
步骤 4: 可选的 Preferred Leader Election
步骤 5: 清理旧副本数据

Expansion Step 1 – Prepare New Nodes

Goal: Deploy Kafka on new machines and join them to the cluster.

# Install Java (Ubuntu)
apt update && apt install -y openjdk-17-jdk
java -version

# Download Kafka 3.6.0
cd /opt
wget https://archive.apache.org/dist/kafka/3.6.0/kafka_2.13-3.6.0.tgz
tar -xzf kafka_2.13-3.6.0.tgz
ln -s kafka_2.13-3.6.0 kafka

# Create data directory
mkdir -p /data/kafka/logs
chown -R kafka:kafka /data/kafka

Configure new broker (core parameters)

vi /opt/kafka/config/server.properties

# Example snippet
broker.id=3
listeners=PLAINTEXT://192.168.1.13:9092
advertised.listeners=PLAINTEXT://192.168.1.13:9092
log.dirs=/data/kafka/logs
zookeeper.connect=192.168.1.10:2181,192.168.1.11:2181,192.168.1.12:2181
num.network.threads=8
num.io.threads=16
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
replica.lag.time.max.ms=30000
compression.type=snappy

Start the broker

# Foreground (testing)
/opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/server.properties

# Background (production)
nohup /opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/server.properties > /data/kafka/kafka.log 2>&1 &

# Systemd service (recommended)
cat > /etc/systemd/system/kafka.service <<EOF
[Unit]
Description=Apache Kafka Server
After=network.target
[Service]
Type=simple
User=kafka
ExecStart=/opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/server.properties
ExecStop=/opt/kafka/bin/kafka-server-stop.sh
Restart=on-failure
[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl start kafka
systemctl enable kafka

Post‑start verification

# Check process
ps aux | grep kafka
# Verify port listening
netstat -tunlp | grep 9092
# Confirm broker joined cluster
/opt/kafka/bin/kafka-broker-api-versions.sh --bootstrap-server 192.168.1.13:9092
# List broker IDs
/opt/kafka/bin/zookeeper-shell.sh 192.168.1.10:2181 <<< "ls /brokers/ids"

Expansion Step 2 – Generate Reassignment Plan

Goal: Create a JSON plan that evenly distributes partitions across all brokers.

# Create topic list
cat > topics-to-move.json <<EOF
{
  "topics": [
    {"topic": "my-topic"},
    {"topic": "order-events"},
    {"topic": "user-logs"}
  ],
  "version": 1
}
EOF

# Generate plan (include new brokers 0‑4)
/opt/kafka/bin/kafka-reassign-partitions.sh \
  --bootstrap-server 192.168.1.10:9092 \
  --topics-to-move-json-file topics-to-move.json \
  --broker-list "0,1,2,3,4" \
  --generate > reassignment-plan.json

# Inspect plan (example snippet)
cat reassignment-plan.json | jq '.'

Expansion Step 3 – Execute Reassignment

Goal: Start data migration with optional throttling.

/opt/kafka/bin/kafka-reassign-partitions.sh \
  --bootstrap-server 192.168.1.10:9092 \
  --reassignment-json-file reassignment-plan.json \
  --execute \
  --throttle 50000000   # 50 MB/s, adjust per bandwidth

Monitor progress

# Verify status
/opt/kafka/bin/kafka-reassign-partitions.sh \
  --bootstrap-server 192.168.1.10:9092 \
  --reassignment-json-file reassignment-plan.json \
  --verify

# Loop until complete
while true; do
  /opt/kafka/bin/kafka-reassign-partitions.sh \
    --bootstrap-server 192.168.1.10:9092 \
    --reassignment-json-file reassignment-plan.json \
    --verify | grep "in progress"
  if [ $? -ne 0 ]; then
    echo "Reassignment completed!"
    break
  fi
  sleep 10
done

Check for under‑replicated partitions and ensure they are empty.

Expansion Step 4 – Preferred Leader Election (Optional)

# Elect preferred leaders for all topics
/opt/kafka/bin/kafka-leader-election.sh \
  --bootstrap-server 192.168.1.10:9092 \
  --election-type preferred \
  --all-topic-partitions

Verify leader distribution is balanced across brokers.

Shrinkage Procedure (Mirror of Expansion)

Generate a plan that excludes the brokers to be decommissioned, execute the reassignment, monitor until completion, then stop and clean up the old brokers and their metadata.

8. Partition Rebalancing Principles

Kafka does not automatically move partitions after adding nodes to avoid performance spikes. Administrators must manually create and execute a reassignment plan, optionally throttling the data transfer to protect production traffic.

Key Mechanisms

ISR (In‑Sync Replicas) – replicas that are fully caught up with the leader; only ISR members can become leaders.

HW (High Watermark) – the offset all ISR replicas have replicated; consumers can only read up to HW.

LEO (Log End Offset) – the latest offset of a replica’s log.

9. Monitoring & Validation

Use Prometheus + JMX Exporter to ensure:

under‑replicated partitions = 0

offline partitions = 0

ISR shrink rate = 0

request queue size stays below 100

Run producer performance tests before and after scaling to verify throughput gains.

10. Best Practices

Perform scaling during low‑traffic windows (e.g., 02:00‑05:00).

Start with moderate throttling (≈50 % of network capacity) and adjust as needed.

Continuously monitor ISR status to confirm replica sync.

Backup ZooKeeper data (or KRaft metadata) before any change.

Scale in small batches (2‑3 nodes at a time) to limit impact.

Validate consumer lag remains stable after rebalancing.

Prefer KRaft mode for new clusters to simplify operations.

Run Preferred Leader Election weekly to keep load balanced.

11. Common Issues & Troubleshooting

Reassignment stuck – check network connectivity and disk space.

Many under‑replicated partitions – wait for sync or increase throttling.

Consumer lag spikes – reduce throttling or pause rebalancing.

Broker fails to start – verify port availability and configuration syntax.

12. FAQ

Q1: Will expansion affect production? Slight impact; with proper throttling latency increase stays < 10 %.

Q2: Can a reassignment be rolled back? Not directly; generate a reverse plan and re‑execute.

Q3: How does KRaft differ? Same steps, but ZooKeeper operations are omitted.

Q4: How to speed up reassignment? Increase throttling, use SSDs, or upgrade network bandwidth.

Q5: How to ensure safety before shrinkage? Verify replication factor ≥ 2 and that all data has migrated off the target broker.

big dataKafkacluster scalingPartition RebalancingZero Data Loss
Ops Community
Written by

Ops Community

A leading IT operations community where professionals share and grow together.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.