Zero Data Loss Kafka Cluster Scaling: From 3 to 10 Nodes – A Complete Guide
This comprehensive guide walks you through expanding or shrinking a production‑grade Kafka cluster—covering prerequisites, anti‑pattern warnings, environment matrices, step‑by‑step expansion and contraction procedures, partition rebalancing principles, monitoring, best practices, and troubleshooting—to ensure zero data loss during scaling.
Kafka Cluster Scaling: Zero Data Loss Partition Rebalancing
This article provides a complete, production‑ready workflow for scaling a Kafka cluster from 3 nodes to up to 10 nodes (or shrinking it) while guaranteeing zero data loss.
1. Applicable Scenarios & Prerequisites
Traffic growth requiring expansion, cost optimization requiring shrinkage, node failure replacement, cross‑datacenter migration.
Kafka version 3.0+ (recommended 3.6+ with KRaft support).
ZooKeeper 3.6+ (not needed in KRaft mode).
OS: RHEL/CentOS 7.9+ or Ubuntu 20.04+.
Java JDK 11 or 17.
Network: at least 1 Gbps NIC (10 Gbps recommended).
Storage: SSD preferred, or high‑performance HDD RAID 10.
Cluster size: minimum 3 nodes, 5+ nodes recommended for HA.
Replication factor ≥ 2 (3 recommended).
Admin rights on Kafka and OS root/sudo.
Familiarity with Kafka configuration, partition‑replica mechanics, and network tuning.
2. Anti‑Pattern Warnings
Single‑node Kafka – no replica fault tolerance, scaling is meaningless.
No monitoring – real‑time traffic, latency, and replica sync must be observed.
Insufficient disk space – need > 50 % free space for temporary replica data.
Network bandwidth limits – rebalancing can double traffic; insufficient bandwidth causes timeouts.
Peak‑time operations – avoid scaling during business peaks to reduce load.
3. Alternative Solutions Comparison
Temporary traffic spikes – adjust partition count + client tuning instead of frequent scaling.
Single‑node failure – replace the node directly, no rebalancing needed.
Cost‑sensitive workloads – use managed Kafka (MSK, Alibaba Cloud) with pay‑as‑you‑go auto‑scaling.
Cross‑cloud migration – use MirrorMaker 2.0 for zero‑downtime data sync.
Small data volumes – consider RabbitMQ or RocketMQ; Kafka excels with large data.
4. Environment & Version Matrix
Key components tested:
Kafka 3.6.0 (30 % faster rebalancing than 3.0).
ZooKeeper 3.8.3 (enhanced stability).
KRaft (ZK‑less) supported from Kafka 3.3+ – simplifies architecture.
OS: Ubuntu 22.04 / RHEL 9.1.
Java OpenJDK 17 (better GC performance than 11).
5. Reading Navigation
Quick start (≈30 min): Sections 6 → 7 → 14.
Deep dive (≈90 min): Sections 8 → 7 → 10.
Troubleshooting: Sections 11 → 9.
6. Quick Checklist
Expansion Checklist
Preparation
Check current cluster status: kafka-topics.sh --describe Backup server.properties.
Provision new node (install Kafka + Java).
Verify disk space > 50 %.
Implementation
Start new broker and join cluster.
Generate partition reassignment plan with kafka-reassign-partitions.sh --generate.
Validate plan (dry‑run).
Execute reassignment with optional throttling.
Wait for ISR synchronization.
Verification
Confirm even partition distribution.
Verify all replicas are in‑sync (no under‑replicated partitions).
Run performance tests to ensure throughput improvement.
Shrinkage Checklist
Preparation
Identify broker(s) to decommission.
Ensure replication factor ≥ 2 for all partitions on those brokers.
Generate exclusion plan (remove broker IDs from replica list).
Implementation
Execute reassignment to move data off the target brokers.
Monitor progress until completion.
Verification
Confirm no partitions reference the decommissioned brokers.
Stop and clean up the old brokers (and ZooKeeper metadata if applicable).
Validate cluster health (no under‑replicated or offline partitions).
7. Implementation Steps
Kafka Partition Rebalancing Architecture
【Kafka 集群架构与分区分布】
初始状态(3 Broker 集群)
├─ Broker 0 (192.168.1.10)
│ ├─ Topic: my-topic, Partition 0 (Leader)
│ ├─ Topic: my-topic, Partition 3 (Follower)
│ └─ Topic: my-topic, Partition 5 (Follower)
├─ Broker 1 (192.168.1.11)
│ ├─ Topic: my-topic, Partition 1 (Leader)
│ ├─ Topic: my-topic, Partition 4 (Follower)
│ └─ Topic: my-topic, Partition 0 (Follower)
└─ Broker 2 (192.168.1.12)
├─ Topic: my-topic, Partition 2 (Leader)
├─ Topic: my-topic, Partition 5 (Leader)
└─ Topic: my-topic, Partition 1 (Follower)
扩容后(5 Broker 集群)
├─ 新增 Broker 3 (192.168.1.13)
│ ├─ Topic: my-topic, Partition 0 (Follower)
│ └─ Topic: my-topic, Partition 3 (Leader)
├─ 新增 Broker 4 (192.168.1.14)
│ ├─ Topic: my-topic, Partition 1 (Follower)
│ └─ Topic: my-topic, Partition 4 (Leader)
└─ 原有 Broker 0‑2 保留部分分区
【分区重平衡流程】
步骤 1: 生成重分配计划
kafka-reassign-partitions.sh --generate …
步骤 2: 执行重分配
kafka-reassign-partitions.sh --execute …
步骤 3: 副本同步(ISR、HW、LEO)
步骤 4: 可选的 Preferred Leader Election
步骤 5: 清理旧副本数据Expansion Step 1 – Prepare New Nodes
Goal: Deploy Kafka on new machines and join them to the cluster.
# Install Java (Ubuntu)
apt update && apt install -y openjdk-17-jdk
java -version
# Download Kafka 3.6.0
cd /opt
wget https://archive.apache.org/dist/kafka/3.6.0/kafka_2.13-3.6.0.tgz
tar -xzf kafka_2.13-3.6.0.tgz
ln -s kafka_2.13-3.6.0 kafka
# Create data directory
mkdir -p /data/kafka/logs
chown -R kafka:kafka /data/kafkaConfigure new broker (core parameters)
vi /opt/kafka/config/server.properties
# Example snippet
broker.id=3
listeners=PLAINTEXT://192.168.1.13:9092
advertised.listeners=PLAINTEXT://192.168.1.13:9092
log.dirs=/data/kafka/logs
zookeeper.connect=192.168.1.10:2181,192.168.1.11:2181,192.168.1.12:2181
num.network.threads=8
num.io.threads=16
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
replica.lag.time.max.ms=30000
compression.type=snappyStart the broker
# Foreground (testing)
/opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/server.properties
# Background (production)
nohup /opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/server.properties > /data/kafka/kafka.log 2>&1 &
# Systemd service (recommended)
cat > /etc/systemd/system/kafka.service <<EOF
[Unit]
Description=Apache Kafka Server
After=network.target
[Service]
Type=simple
User=kafka
ExecStart=/opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/server.properties
ExecStop=/opt/kafka/bin/kafka-server-stop.sh
Restart=on-failure
[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl start kafka
systemctl enable kafkaPost‑start verification
# Check process
ps aux | grep kafka
# Verify port listening
netstat -tunlp | grep 9092
# Confirm broker joined cluster
/opt/kafka/bin/kafka-broker-api-versions.sh --bootstrap-server 192.168.1.13:9092
# List broker IDs
/opt/kafka/bin/zookeeper-shell.sh 192.168.1.10:2181 <<< "ls /brokers/ids"Expansion Step 2 – Generate Reassignment Plan
Goal: Create a JSON plan that evenly distributes partitions across all brokers.
# Create topic list
cat > topics-to-move.json <<EOF
{
"topics": [
{"topic": "my-topic"},
{"topic": "order-events"},
{"topic": "user-logs"}
],
"version": 1
}
EOF
# Generate plan (include new brokers 0‑4)
/opt/kafka/bin/kafka-reassign-partitions.sh \
--bootstrap-server 192.168.1.10:9092 \
--topics-to-move-json-file topics-to-move.json \
--broker-list "0,1,2,3,4" \
--generate > reassignment-plan.json
# Inspect plan (example snippet)
cat reassignment-plan.json | jq '.'Expansion Step 3 – Execute Reassignment
Goal: Start data migration with optional throttling.
/opt/kafka/bin/kafka-reassign-partitions.sh \
--bootstrap-server 192.168.1.10:9092 \
--reassignment-json-file reassignment-plan.json \
--execute \
--throttle 50000000 # 50 MB/s, adjust per bandwidthMonitor progress
# Verify status
/opt/kafka/bin/kafka-reassign-partitions.sh \
--bootstrap-server 192.168.1.10:9092 \
--reassignment-json-file reassignment-plan.json \
--verify
# Loop until complete
while true; do
/opt/kafka/bin/kafka-reassign-partitions.sh \
--bootstrap-server 192.168.1.10:9092 \
--reassignment-json-file reassignment-plan.json \
--verify | grep "in progress"
if [ $? -ne 0 ]; then
echo "Reassignment completed!"
break
fi
sleep 10
doneCheck for under‑replicated partitions and ensure they are empty.
Expansion Step 4 – Preferred Leader Election (Optional)
# Elect preferred leaders for all topics
/opt/kafka/bin/kafka-leader-election.sh \
--bootstrap-server 192.168.1.10:9092 \
--election-type preferred \
--all-topic-partitionsVerify leader distribution is balanced across brokers.
Shrinkage Procedure (Mirror of Expansion)
Generate a plan that excludes the brokers to be decommissioned, execute the reassignment, monitor until completion, then stop and clean up the old brokers and their metadata.
8. Partition Rebalancing Principles
Kafka does not automatically move partitions after adding nodes to avoid performance spikes. Administrators must manually create and execute a reassignment plan, optionally throttling the data transfer to protect production traffic.
Key Mechanisms
ISR (In‑Sync Replicas) – replicas that are fully caught up with the leader; only ISR members can become leaders.
HW (High Watermark) – the offset all ISR replicas have replicated; consumers can only read up to HW.
LEO (Log End Offset) – the latest offset of a replica’s log.
9. Monitoring & Validation
Use Prometheus + JMX Exporter to ensure:
under‑replicated partitions = 0
offline partitions = 0
ISR shrink rate = 0
request queue size stays below 100
Run producer performance tests before and after scaling to verify throughput gains.
10. Best Practices
Perform scaling during low‑traffic windows (e.g., 02:00‑05:00).
Start with moderate throttling (≈50 % of network capacity) and adjust as needed.
Continuously monitor ISR status to confirm replica sync.
Backup ZooKeeper data (or KRaft metadata) before any change.
Scale in small batches (2‑3 nodes at a time) to limit impact.
Validate consumer lag remains stable after rebalancing.
Prefer KRaft mode for new clusters to simplify operations.
Run Preferred Leader Election weekly to keep load balanced.
11. Common Issues & Troubleshooting
Reassignment stuck – check network connectivity and disk space.
Many under‑replicated partitions – wait for sync or increase throttling.
Consumer lag spikes – reduce throttling or pause rebalancing.
Broker fails to start – verify port availability and configuration syntax.
12. FAQ
Q1: Will expansion affect production? Slight impact; with proper throttling latency increase stays < 10 %.
Q2: Can a reassignment be rolled back? Not directly; generate a reverse plan and re‑execute.
Q3: How does KRaft differ? Same steps, but ZooKeeper operations are omitted.
Q4: How to speed up reassignment? Increase throttling, use SSDs, or upgrade network bandwidth.
Q5: How to ensure safety before shrinkage? Verify replication factor ≥ 2 and that all data has migrated off the target broker.
Ops Community
A leading IT operations community where professionals share and grow together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
