How to Achieve Fast and Stable MySQL Data Center Migration at Scale
This article details the background, migration options, and step‑by‑step automated procedures used by a large‑scale e‑commerce platform to safely move over 400 MySQL clusters, comparing expansion‑plus‑master‑slave switching with cascading replication and explaining the chosen fast, reliable solution.
Background
TurnTurn needed to replace its legacy TGW load balancer with Tencent Cloud Load Balancer (CLB) and migrate MySQL, TiDB, and Redis services to a new IDC. The MySQL environment consisted of over 400 clusters, making migration high‑risk and complex.
Migration Options
Expansion + Master‑Slave Switch : Add enough replica slaves, then use MHA (Master High Availability) to switch and decommission old nodes. Simple to implement but MHA switch time exceeds 30 seconds per cluster, causing unacceptable business impact.
Cascading Replication Switch : Build a cascading replica cluster, sync data, then cut the cascade and switch DNS. Achieves sub‑10‑second switch times but requires extensive automation (auto‑scaling, cascade setup, pre/post checks, traffic redirection).
Comparison : The cascading approach wins due to faster switch (<10 s), smoother CLB upgrade, and lower business impact despite higher automation effort.
Fast and Stable Migration Process
Pre‑build Cascading Clusters
Backup and expand new clusters, establish cascade links with old clusters, and respect mixed‑instance deployment constraints (disk, memory). Automation scripts balance load and resource cost. Recommended limits per host:
Maximum 5 master instances
Maximum 10 slave instances
Maximum 15 total instances
Memory and disk usage ≤ 85 %
Service Suspension
Because of complex upstream/downstream dependencies, a short maintenance window (early morning) is used to pause writes for core clusters, minimizing manual coordination.
Automated Batch Operations and Decoupling
All migration steps are automated and modular, forming a closed‑loop system that can quickly locate failures and roll back. Key automation modules include:
Automatic cascade cluster provisioning
Pre‑ and post‑migration checks
Batch read/write traffic switching
Automatic termination of old connections and verification of new ones
Batch decommissioning of old clusters
Cluster Tiering
Clusters are classified into three tiers (P1, P2, P3) with roughly equal distribution. Tier determines migration window:
P3 – any daytime slot
P2 – evening 20:00‑22:00
P1 – early‑morning maintenance window
Pre‑ and Post‑Migration Checks
Critical checks include:
VIP connectivity buffer_pool_size and sql_mode consistency
Replica count
Cascade latency
Read‑only status of masters
Real‑time connection counts
DNS redirection correctness
Gray‑Scale Switch Verification
After automation is ready, multiple rounds of batch switches are performed on test clusters, then rolled out tier by tier. P3 clusters serve as a low‑impact pilot, while P1 clusters are switched after thorough validation. Issues encountered during gray‑scale testing:
Multiple domain names per cluster requiring special handling
Inaccurate CMDB metadata necessitating manual verification
Final Outcome
All 400+ MySQL clusters were migrated on September 27 during a brief maintenance window. P3 and P2 clusters completed batch switches before the window; the remaining 100+ P1 core clusters were switched at an average of 10 seconds per cluster, finishing within half an hour with smooth operation and successful post‑migration validation.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
