How to Migrate Live Data from MySQL to HBase Without Downtime
This article explains the four‑step online data migration process—including dual‑write, historical data transfer, read cut‑over, and cleanup—using a real‑world MySQL‑to‑HBase fan‑list case study, and provides practical tips for ensuring zero‑downtime and high data consistency.
Online data migration means moving live service data from one location to another without stopping the service. Depending on the data layer, it can be cache migration or storage migration; based on whether the data organization changes, it is classified as "flat migration" (平移) or "transform migration" (转移).
Flat Migration vs. Transform Migration
Flat migration keeps the data organization unchanged, such as expanding MySQL from 1 to 4 instances or Redis from 4 to 16 ports. When the original design already supports scaling, migration is straightforward—e.g., adding read replicas in MySQL and switching traffic.
Transform migration changes the data model. An example is a social platform that upgraded user IDs from auto‑increment to UUID, requiring primary‑key changes and massive compatibility work. Most migrations avoid changing primary keys; they usually only alter storage format, such as moving a Redis hash counter to a KV store or moving fan‑list data from MySQL to HBase for better scalability.
Four‑Step Online Migration Process
Enable dual‑write: write to both old and new stores simultaneously.
Offline historical data transfer: move existing bulk data to the new system.
Read cut‑over: route read requests to the new store.
Cleanup: retire old data, resources, and code, and document lessons learned.
In some cases steps 1 and 2 can be swapped; if historical data is moved first, new writes must be queued and later replayed (“catch‑up”).
Case Study: Migrating a Fan List from MySQL to HBase
The platform first designed a detailed workflow (see diagram) before starting migration.
Dual‑Write Implementation
Before coding dual‑write, decide HBase table schema and primary‑key design based on business rules and performance targets. Two common HBase patterns for list data are:
Wide‑table mode : one row per list, each item stored as a separate column.
High‑table mode : one row per item, similar to MySQL.
The team chose wide‑table mode for better read performance despite higher write complexity.
Writes are made asynchronous via a message queue; the HBase write module can be linked serially or in parallel with the existing MySQL write path. Idempotent processing is required, and a duplicate‑message detection module is added to guarantee consistency for non‑idempotent operations.
Because HBase lacks secondary indexes, join, and order‑by, the migration plan must verify that new query patterns (e.g., fetching the latest 5,000 fans) are still supported.
After dual‑write is live, consistency is validated on two dimensions: storage (direct data comparison) and business (user‑visible results). The target consistency threshold is six‑nines (99.9999%).
Historical Data Transfer
Once dual‑write passes validation, historical data is moved. The main challenge is handling concurrent modifications: if a list changes during migration, inserts may overwrite deletes, leading to inconsistency. Without full transactions, a lightweight Memcached lock can emulate serializable isolation.
It is recommended to migrate a subset first, verify consistency, then proceed with the full dataset to reduce risk and time.
Read Cut‑Over
After full data migration and validation, reads are switched to HBase. The switch is controlled by a feature flag (e.g., via Config Service) and rolled out in stages: internal whitelist, 0.01 % gray release, 1 %, 10 %, then 100 %. Each stage validates functionality, performance, and resource usage, typically over one to two weeks.
Cleanup and Knowledge Capture
When the cut‑over succeeds, the old MySQL code, supporting services, and resources are retired. The final step is to document lessons learned, share the migration workflow, and refactor tools for reuse in future migrations.
Online data migration does not require exotic technology; it demands solid understanding of business logic, careful process design, and attention to detail.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Art of Distributed System Architecture Design
Introductions to large-scale distributed system architectures; insights and knowledge sharing on large-scale internet system architecture; front-end web architecture overviews; practical tips and experiences with PHP, JavaScript, Erlang, C/C++ and other languages in large-scale internet system development.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
