How to Seamlessly Migrate Cloud Kafka to an On-Premises Cluster with Zero Downtime
This guide details a step‑by‑step migration from a cloud‑hosted Kafka service to a self‑managed on‑premises cluster, covering requirements, a data‑synchronization strategy using Kafka MirrorMaker, consumer offset alignment, potential pitfalls, and practical scripts to ensure a smooth transition.
Background
The company has been using a cloud‑based Kafka service, but rising costs and scaling needs motivate a move to a self‑hosted Kafka cluster.
Requirements
Minimal code changes
Stability during the upgrade
Correctness of message production and consumption after migration
Why Dual‑Write/Read Was Rejected
Although dual‑write (producer sends to both old and new clusters) and dual‑read (consumer reads from both) seem straightforward, they introduce heavy code refactoring, complex consumer logic, and high development cost, making the approach impractical.
Data‑Sync Solution
The chosen approach uses a data‑synchronization tool to replicate messages from the old cluster to the new one, allowing producers and consumers to continue using a single cluster after a simple address change.
The overall flow is: data sync → migrate producers → migrate consumers → decommission the old cluster.
Message Synchronization with MirrorMaker
Kafka’s official kafka-mirror-maker tool is used, but its default implementation does not preserve partition ordering. A custom MirrorMakerMessageHandler is added to forward each record to the same partition on the target cluster.
private[tools] object defaultMirrorMakerMessageHandler extends MirrorMakerMessageHandler {
override def handle(record: BaseConsumerRecord): util.List[ProducerRecord[Array[Byte], Array[Byte]]] = {
val timestamp: java.lang.Long = if (record.timestamp == RecordBatch.NO_TIMESTAMP) null else record.timestamp
Collections.singletonList(new ProducerRecord(record.topic, record.partition, timestamp, record.key, record.value, record.headers))
}
}Before running the sync, ensure the target topics exist with identical partition counts. Sync can be scoped to specific topics, and a start/stop script automates the process.
Consumer Offset Synchronization
Switching consumers to the new cluster is the most complex part. The strategy is a two‑phase offset alignment:
Mark offsets during the data‑sync phase.
When migrating a consumer group, reset its offsets in the new cluster to match the position it had in the old cluster.
A small utility was built to create subscriptions for a consumer group on the new cluster, as shown in the screenshots, and to calculate the correct offset based on the old cluster’s log positions.
Potential Sync Issues and Mitigations
If the sync process crashes, duplicate or missing messages may occur; manual offset reset can recover consistency.
Differences in log retention between clusters can cause mismatched message counts; the two‑phase approach accounts for this by aligning offsets before the final cut‑over.
ACL differences (old cluster without ACL, new cluster with ACL) require additional handling, which is addressed in the migration scripts.
Conclusion
The data‑synchronization method provides a low‑impact, “smooth” migration path with minimal code changes. The main operational burden shifts to the migration team, who must ensure correct message sync, offset alignment, and ACL configuration, but the approach avoids the heavy refactoring required by dual‑write/read solutions.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
