Operations 8 min read

Zero‑Downtime Kafka Migration with MirrorMaker 2: Full Step‑by‑Step Guide

This guide explains how to achieve a zero‑downtime Kafka cluster migration by deploying a new cluster, configuring MirrorMaker 2 for bidirectional replication, gradually switching producers and consumers, monitoring key metrics, and safely decommissioning the old cluster.

Ray's Galactic Tech
Ray's Galactic Tech
Ray's Galactic Tech
Zero‑Downtime Kafka Migration with MirrorMaker 2: Full Step‑by‑Step Guide

Zero‑downtime migration is a common operational scenario for Kafka where the old (blue) and new (green) clusters run simultaneously, data is kept in sync via a replication tool, and client traffic is switched seamlessly at the right moment.

Core Principle

The process follows a blue‑green deployment or dual‑write pattern:

Deploy the new Kafka cluster (green zone).

Enable bidirectional synchronization between the old (blue) and new clusters using MirrorMaker 2 (MM2) so that writes on either side are replicated.

Switch client producers to the new cluster, then switch consumers after offsets are synchronized.

Retire the old cluster once the new one is stable.

Phase 1 – Preparation

Plan the new cluster: decide broker count, disk size, network settings, and optionally upgrade the Kafka version.

Ensure network connectivity between the two clusters with minimal latency.

Configure MirrorMaker 2

Create mm2.properties with the essential settings:

# Name the MM2 replication flow
cluster.alias = A->B
# Define source (old) and target (new) clusters
clusters = A, B
# Old cluster A connection
A.bootstrap.servers = old-kafka-broker1:9092,old-kafka-broker2:9092
# New cluster B connection
B.bootstrap.servers = new-kafka-broker1:9092,new-kafka-broker2:9092
# Enable bidirectional sync
A->B.enabled = true
B->A.enabled = true
# Replicate all topics and consumer groups
A->B.topics = .*
B->A.topics = .*
groups = .*
# Replicate topic configs and ACLs
sync.topic.configs.enabled = true
sync.topic.acls.enabled = true
# Internal topic replication factor for MM2 high availability
checkpoints.topic.replication.factor=2
offset-syncs.topic.replication.factor=2
heartbeats.topic.replication.factor=2
# Max offset lag
offset.lag.max = 1000000
# Security (if needed)
# A.security.protocol=SASL_SSL
# A.sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule required username="..." password="...";
Tip: Enabling sync.topic.acls.enabled = true automatically replicates ACLs, avoiding permission issues. In high‑throughput scenarios, tune consumer.fetch.max.bytes and producer.batch.size to control network pressure.

Phase 2 – Synchronization and Switch

Start the MM2 process. It copies topics from cluster A to B and creates internal topics such as heartbeats (connection health) and mm2-offset-syncs.B.internal (consumer‑group offset sync).

Validate data replication using console consumers:

# Consume from old cluster
kafka-console-consumer.sh --bootstrap-server old-broker:9092 --topic test-topic --from-beginning
# Consume from new cluster
kafka-console-consumer.sh --bootstrap-server new-broker:9092 --topic test-topic --from-beginning

Monitor MM2 metrics: offset-age-ms , heartbeats-lag-ms , failed-records-total , as well as producer/consumer latency and network traffic. Switch producers in batches by updating bootstrap.servers to point to the new cluster, allowing MM2 to sync data back to the old cluster so existing consumers remain unaffected. After producers are fully migrated, switch consumers in stages; MM2 automatically syncs offsets so consumption resumes near the previous position, preventing duplicates or data loss.

Phase 3 – Cleanup

Confirm that all producers and consumers have been running on the new cluster for at least 24 hours without issues.

Stop the MM2 process.

Delete internal topics created by MM2 on the new cluster (e.g., mm2-offset-syncs.B.internal, heartbeats).

Decommission the old cluster.

Key Considerations

Bidirectional sync & split‑brain: After the switch, promptly disable the reverse B→A sync to avoid circular replication.

Network bandwidth: Dual‑write consumes cross‑data‑center bandwidth; plan capacity accordingly.

Monitoring: Track MM2 metrics, producer/consumer latency, and overall network traffic.

Testing: Perform a full rehearsal in a pre‑production environment before the actual migration.

Rollback plan: Keep the ability to revert producers to the old cluster; MM2’s bidirectional sync ensures data can be recovered.

Optional Tools

Confluent Replicator – commercial tool with graphical management.

Uber uReplicator – open‑source alternative addressing older MirrorMaker limitations.

MirrorMaker 2 – officially supported by Kafka, recommended for production use.

Process Overview

Deploy new cluster (green zone).

Start MM2 with one‑way sync A→B.

Validate data synchronization.

Enable B→A sync to prepare producer switch.

Gradually switch producers to the new cluster.

Gradually switch consumers to the new cluster.

Verify business stability.

Stop MM2 and clean up old cluster.

operationsblue-green deploymentdata replicationzero‑downtime migrationmirror-maker-2
Ray's Galactic Tech
Written by

Ray's Galactic Tech

Practice together, never alone. We cover programming languages, development tools, learning methods, and pitfall notes. We simplify complex topics, guiding you from beginner to advanced. Weekly practical content—let's grow together!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.