Zero‑Downtime Online Data Migration: Step‑by‑Step Guide
This article explains how to migrate live service data between systems without downtime, covering migration types, a four‑stage process, practical examples with MySQL and HBase, and key techniques for ensuring consistency and smooth cut‑over.
Online data migration means moving data that is actively serving users from one location to another while keeping the service uninterrupted.
Based on the data layer, migrations can be classified as cache migration or storage migration; based on changes before and after migration, they are either "horizontal" (no change in data organization) or "vertical" (data organization changes).
Horizontal migration keeps the data structure unchanged, such as expanding MySQL from one instance to four, scaling Redis ports, or adding HBase nodes. Proper initial design, like sharding MySQL, simplifies scaling by adding replicas and switching traffic, sometimes even fully automated.
Vertical migration changes the data organization, for example upgrading IDs from auto‑increment to UUIDs, or moving a Redis hash to a KV store for better batch query performance. Most migrations avoid altering primary keys, focusing instead on restructuring storage formats.
The biggest challenge is ensuring the service remains unaffected during migration. By following proven practices, even junior engineers can accomplish it.
Four‑Stage Migration Process
1. Online dual‑write : write to both old and new stores simultaneously.
2. Offline historical data migration : move existing data from the old system to the new one.
3. Read switching : route read requests to the new store.
4. Cleanup : remove old data, reclaim resources, and document lessons learned.
In some cases, steps 1 and 2 may be swapped, requiring careful handling of new writes during the migration window, often using a queue to "track" data.
Figure 1 illustrates these steps.
Below is a concrete example of migrating a social platform's follower list from MySQL to HBase.
Before migration, a detailed workflow diagram is prepared:
During the online dual‑write phase, the HBase table schema and primary key design must be defined based on business rules and performance targets.
HBase offers two typical patterns for list data: a "tall" table (one row per item, similar to MySQL) and a "wide" table (one row per list, with each item stored as a separate column). The tall table is easier to understand but may suffer from poorer query performance due to data being spread across regions; the wide table offers better performance at the cost of more complex data handling.
To achieve high availability, writes are often made asynchronous via a message queue, allowing the same message to be processed by multiple modules and supporting both serial and parallel execution. Modules should be idempotent so that retries do not corrupt data. When idempotency cannot be guaranteed (e.g., in HBase wide tables), an additional duplicate‑message detection component is introduced.
Because HBase lacks secondary indexes, joins, and ORDER BY, the migration must verify that the new schema can satisfy all business queries, such as fetching the latest 5,000 followers while still supporting retrieval of the first 100.
After dual‑write is deployed, consistency checks are performed on both storage and business dimensions, aiming for six‑nines (99.9999%) data consistency.
Historical Data Migration
Once dual‑write passes verification, historical data is migrated. The main difficulty is avoiding interference with live writes: if a list changes between extraction from MySQL and insertion into HBase, deletions may be lost. Lightweight locks (e.g., Memcache locks) can emulate serializable isolation without the overhead of full transactions.
It is advisable to migrate a subset of data first, validate consistency, and then proceed with the full dataset to reduce risk and time.
Read Switching
After full data migration and validation, read traffic is switched to the new store. This is typically done via feature flags or configuration services, progressing from internal whitelist, to small gray percentages (0.01%, 1%, 10%), and finally to 100% traffic over one to two weeks.
Cleanup and Consolidation
When read switching completes, the migration is considered finished, but cleanup remains: decommission old code, shut down supporting systems, reclaim resources, and most importantly, document lessons learned and generalize tools for future migrations.
Online data migration does not require exotic technology; it mainly demands solid understanding of business logic, careful process design, and attention to detail.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
