Backend Development 15 min read

How to Minimize Data Movement When Scaling Kafka Replicas

This article explores strategies for batch scaling Kafka replicas with minimal data migration, presenting two design ideas, detailed calculations of broker lists, partition counts, start indexes, and replica shifts, and provides step‑by‑step algorithms and code snippets to compute optimal replica assignments for both expansion and contraction scenarios.

dbaplus Community

Jan 14, 2023

How to Minimize Data Movement When Scaling Kafka Replicas

Background

When scaling Kafka replicas in bulk, using the default --generate reassignment algorithm can cause massive data migration because many partitions change their leader and follower brokers. The goal is to design a method that keeps the movement as small as possible.

Idea 1 – Minimal‑change reassignment

Kafka does not support direct replica scaling, but the kafka-reassign-partitions.sh tool can be used. Manually configuring each topic’s replica placement is error‑prone and leads to unbalanced assignments. The idea is to compute a new assignment based on the existing broker‑to‑replica mapping while only changing the replica count.

Key variables

BrokerList – the ordered list of brokers participating in the assignment.

Partition count – total number of partitions.

Replica count – desired number of replicas per partition.

startIndex – the index of the first replica of the first partition in BrokerList.

nextReplicaShift – a random offset used to compute the position of the second replica relative to the first.

Step‑by‑step calculation (case: no prior partition expansion)

Read the current replica assignment from ZooKeeper (example JSON shown).

Derive BrokerList by taking the first replica of each partition and arranging them so that a contiguous block of brokers covers all brokers (e.g., {2,3,0,1,4}).

Set partition count = 10 and replica count = 3.

Compute startIndex by locating the first replica of partition 0 in BrokerList (here it is 0).

Determine nextReplicaShift by examining the offset between the first and second replicas of the first few partitions (example value = 3).

With these parameters, calling AdminUtils.assignReplicasToBrokersRackUnaware reproduces the original assignment; changing only the replica count yields a minimal‑change reassignment. If the original assignment was manually specified, this method cannot be used and a full recomputation is required.

Idea 2 – Simple sequential shift

This approach ignores most of the variables from Idea 1. For each partition, the last replica is moved to the next available broker (or removed when shrinking). It works regardless of whether partitions were previously expanded or manually assigned.

Replica expansion example

{"0":[0,1,4]   =>   [0,1,4,2]
"1":[1,4,2]   =>   [1,4,2,3]
"2":[4,2,3]   =>   [4,2,3,1]
"3":[3,4,0]   =>   [3,4,0,1]
"4":[4,0,1]   =>   [4,0,1,2]
"5":[0,2,3]   =>   [0,2,3,4]}

When only a single replica exists, the algorithm simply sorts partitions by number, builds a BrokerList, and applies the same sequential shift.

Replica shrinking example

{"0":[0,1,4]   =>   [0,1]
"1":[1,4,2]   =>   [1,4]
"2":[4,2,3]   =>   [4,2]
"3":[3,4,0]   =>   [3,4]
"4":[4,0,1]   =>   [4,0]
"5":[0,2,3]   =>   [0,2]}

The method also includes a safeguard: if the last replica of many partitions resides on the same broker, removing them all at once could unbalance the cluster. The algorithm checks for such cases and shifts the removal to the previous broker when necessary.

Comparison

Idea 1 handles many edge cases and preserves existing placements but requires complex calculations and may interfere with custom assignments.

Idea 2 is straightforward, works with any existing assignment, and changes only the newly added replicas, keeping data movement minimal.

Final solution

The chosen implementation uses Idea 2 by default, falling back to Idea 1 when the original replica count equals 1 or when specific conditions demand the more precise calculation.

Implementation note

The prototype is planned to be visualized with LogIKM; the actual code has not been released yet.

Images

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

backend algorithm Kafka Partition Assignment Replica Scaling

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.