How to Migrate Large Tables and Sync Data Seamlessly in Production
This article explains common business scenarios such as large‑table splitting, cross‑database migration, and data synchronization, then compares loss‑bearing and loss‑less migration strategies, and provides detailed step‑by‑step implementation guidance for smooth dual‑write migration and real‑world tooling.
1. Business Scenarios
Typical Internet‑scale systems encounter:
Large‑table splitting (single tables storing user rights, orders, etc. become a performance bottleneck).
Cross‑database data migration when business modules are isolated into separate databases.
Real‑time data synchronization for downstream services (e.g., a payment middle‑platform that needs order data for anti‑addiction controls).
2. Migration Strategies
2.1 Lossy (downtime) migration
Service is stopped for a short maintenance window (usually low‑traffic hours). The typical workflow:
Identify a low‑traffic time slot based on monitoring data.
Announce the maintenance window to users.
At the scheduled time, shut down the service, export data from the old database (multi‑threaded read/write or physical copy), and import it into the new sharded tables.
Update connection strings, restart services, verify read/write against the new tables, then reopen traffic.
This method is simple and fast but incurs a brief outage.
2.2 Smooth (loss‑less) migration
Also called dual‑write migration. The service remains online while writes are performed to both the old and the new databases.
Modify code or introduce a message queue so that every write to the old DB is duplicated to the new DB.
Synchronize historical data in batches; incremental changes are captured by the dual‑write path.
Run a verification task that compares each row between the two databases and reconciles differences, always keeping the newer version.
When full consistency is achieved, switch traffic to the new database using a gradual gray‑release strategy.
2.3 Incremental migration
Suitable for data with a limited lifetime (e.g., coupon tables). During a gray‑release period, new records are written only to the new table while the old table’s historical data is left untouched. After the old data becomes irrelevant, traffic is fully switched to the new table and the old table is archived.
3. Implementation Details of Smooth Migration
3.1 Dual‑write implementation methods
Direct code change: add a write to the new DB wherever the old write occurs. This tightly couples the two writes and requires compensation logic if the new write fails.
Message‑queue decoupling: log modifications to the old DB, push them to a queue (e.g., Kafka), and consume the queue asynchronously to write to the new DB.
Binlog capture: use a Canal‑based binlog collector to stream change events from MySQL, convert them to JSON, publish to Kafka, and let downstream consumers write to targets such as Hive, HBase, Elasticsearch, etc.
3.2 Handling CRUD operations while migrating historical data
Insert: both databases insert the same row; consistency is preserved.
Delete: if the row has already been migrated, both databases delete it; if not yet migrated, only the old DB deletes it, which is acceptable because the migration tool will not re‑insert a deleted row.
Update: if the row exists in the new DB, it is updated; otherwise the new DB treats the update as an insert. The rule “newer data wins” ensures that stale updates do not overwrite newer records.
Special cases (e.g., a row deleted in the old DB after the migration tool has fetched it) are resolved by a final consistency check before cut‑over.
3.3 Consistency verification
A scheduled task iterates over each table, compares rows between old and new databases, and reconciles differences by preferring the newer version. The task runs repeatedly until no alerts are generated, after which a gray‑release can gradually route all traffic to the new database.
4. Production Scheme
A department‑level order micro‑service needed to migrate order data from multiple legacy databases into a unified order middle‑platform with a different schema. The solution combined:
A Canal‑based binlog collector that captures incremental changes, converts them to JSON, and publishes them to Kafka.
Configuration steps: enable binlog on an offline replica, set binlog_format=ROW and log_slave_updates=ON, enable GTID, and record the starting GTID position.
A Kafka listener that consumes change events and applies them to the new order service tables.
After the bulk historical migration, a scheduled reconciliation task continues to compare old and new tables, fixing any residual differences until the data sets are fully consistent. Only then is the final cut‑over performed, routing all read/write traffic to the new database.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
