How to Achieve Fast and Stable MySQL Data Center Migration at Scale

This article details the background, migration options, and step‑by‑step automated procedures used by a large‑scale e‑commerce platform to safely move over 400 MySQL clusters, comparing expansion‑plus‑master‑slave switching with cascading replication and explaining the chosen fast, reliable solution.

dbaplus Community
dbaplus Community
dbaplus Community
How to Achieve Fast and Stable MySQL Data Center Migration at Scale

Background

TurnTurn needed to replace its legacy TGW load balancer with Tencent Cloud Load Balancer (CLB) and migrate MySQL, TiDB, and Redis services to a new IDC. The MySQL environment consisted of over 400 clusters, making migration high‑risk and complex.

Migration Options

Expansion + Master‑Slave Switch : Add enough replica slaves, then use MHA (Master High Availability) to switch and decommission old nodes. Simple to implement but MHA switch time exceeds 30 seconds per cluster, causing unacceptable business impact.

Cascading Replication Switch : Build a cascading replica cluster, sync data, then cut the cascade and switch DNS. Achieves sub‑10‑second switch times but requires extensive automation (auto‑scaling, cascade setup, pre/post checks, traffic redirection).

Comparison : The cascading approach wins due to faster switch (<10 s), smoother CLB upgrade, and lower business impact despite higher automation effort.

Fast and Stable Migration Process

Pre‑build Cascading Clusters

Backup and expand new clusters, establish cascade links with old clusters, and respect mixed‑instance deployment constraints (disk, memory). Automation scripts balance load and resource cost. Recommended limits per host:

Maximum 5 master instances

Maximum 10 slave instances

Maximum 15 total instances

Memory and disk usage ≤ 85 %

Service Suspension

Because of complex upstream/downstream dependencies, a short maintenance window (early morning) is used to pause writes for core clusters, minimizing manual coordination.

Automated Batch Operations and Decoupling

All migration steps are automated and modular, forming a closed‑loop system that can quickly locate failures and roll back. Key automation modules include:

Automatic cascade cluster provisioning

Pre‑ and post‑migration checks

Batch read/write traffic switching

Automatic termination of old connections and verification of new ones

Batch decommissioning of old clusters

Cluster Tiering

Clusters are classified into three tiers (P1, P2, P3) with roughly equal distribution. Tier determines migration window:

P3 – any daytime slot

P2 – evening 20:00‑22:00

P1 – early‑morning maintenance window

Pre‑ and Post‑Migration Checks

Critical checks include:

VIP connectivity buffer_pool_size and sql_mode consistency

Replica count

Cascade latency

Read‑only status of masters

Real‑time connection counts

DNS redirection correctness

Gray‑Scale Switch Verification

After automation is ready, multiple rounds of batch switches are performed on test clusters, then rolled out tier by tier. P3 clusters serve as a low‑impact pilot, while P1 clusters are switched after thorough validation. Issues encountered during gray‑scale testing:

Multiple domain names per cluster requiring special handling

Inaccurate CMDB metadata necessitating manual verification

Final Outcome

All 400+ MySQL clusters were migrated on September 27 during a brief maintenance window. P3 and P2 clusters completed batch switches before the window; the remaining 100+ P1 core clusters were switched at an average of 10 seconds per cluster, finishing within half an hour with smooth operation and successful post‑migration validation.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Automationhigh availabilitymysqldatabase migrationCascading Replication
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.