Databases 11 min read

Fast and Stable MySQL Data Center Migration: Choosing and Implementing the Optimal Strategy

This article details the background, migration plan selection, and step‑by‑step procedures—including pre‑building cascades, service pause, automated batch operations, cluster tiering, pre‑ and post‑checks, and gray‑scale validation—to achieve a fast, stable MySQL data‑center migration for a large‑scale production environment.

Zhuanzhuan Tech

Feb 21, 2023

Fast and Stable MySQL Data Center Migration: Choosing and Implementing the Optimal Strategy

1 Background

As a leading domestic circular economy company, the rapid growth of the ZhaiZhai business has pushed its basic services to a “shell‑shedding” stage. Existing IDC resources are saturated and cannot meet future demand. Additionally, Tencent Cloud’s load‑balancer upgrade requires replacing TGW with CLB.

After six months of planning, a new IDC and CLB have been deployed, and public services such as MySQL, TiDB, and Redis need orderly migration. The MySQL migration faces many clusters, high impact, and strict operational requirements, prompting thorough research and a reasonable plan to minimize service perception.

2 Migration Plan Selection

2.1 Option 1: Expansion + Master‑Slave Switch

Backup and expand enough slave instances, then use MHA (Master High Availability) to actively switch, finally decommission old nodes to change the cluster topology.

2.2 Option 2: Cascading Switch

Build a cascading cluster via backup, synchronize data to the new cluster, then perform a cascade break and domain‑name switch to complete the topology change.

2.3 Comparison

Option 1: Small development effort; expansion and MHA switch are easy, but each cluster’s MHA switch takes >30 s, causing prolonged business impact and demanding high‑performance MHA for large‑scale switches.

Option 2: Simple principle and fast switch (<10 s per cluster), but requires extensive automation (auto‑scaling, cascading setup, pre/post checks, traffic switching), leading to higher development cost.

Considering business impact, the fast and stable cascading solution (Option 2) was chosen; it also allows smooth CLB upgrade during new‑cluster construction.

Cascading read‑traffic switch diagram

Cascading write‑traffic switch diagram

3 How to Achieve a Fast and Stable MySQL Data‑Center Migration

MySQL cluster migration carries high risk; improper operations can render clusters unavailable. ZhaiZhai operates hundreds of MySQL clusters, requiring a rapid yet reliable migration.

3.1 Pre‑Build Cascades

Backup and expand new clusters, then establish cascading relationships with old clusters. Because ZhaiZhai uses a single‑machine multi‑instance architecture, the new clusters must consider mixed deployment issues, instance resource limits, and load balancing.

Constraints:

Maximum 5 master instances per machine

Maximum 10 slave instances per machine

Maximum 15 total instances per machine

Memory and disk usage ≤ 85 %

Cascading construction process:

3.2 Service Pause

Complex upstream/downstream relationships make selective pause difficult; therefore, a short low‑traffic window is used to pause services and batch‑switch core clusters, minimizing downtime while ensuring robust batch operation and recovery capabilities.

3.3 Automated Batch Operations and Decoupling Critical Steps

All migration steps are automated and modular, forming a closed loop that enables rapid fault localization and rollback.

Automated cascading cluster setup

Automated pre/post checks

Batch read‑traffic switch

Batch write‑traffic switch

Automatic kill of old‑cluster connections and detection of new connections

Batch decommission of old clusters

3.4 Cluster Tiering

Clusters are divided into three tiers (P1, P2, P3) with roughly equal distribution.

P3 clusters can be switched at any daytime

P2 clusters are switched between 20:00‑22:00

P1 clusters require early‑morning maintenance windows

3.5 Pre‑ and Post‑Switch Checks

Both new and old clusters undergo thorough checks to ensure configuration consistency.

Pre‑checks include:

VIP‑RSHOST connectivity

buffer_pool_size

sql_mode

Number of slave instances

Cascade latency

…

Post‑checks include:

Read‑only status of new and old masters

Real‑time business connections

Domain name points to the new cluster

…

3.6 Gray‑Scale Switch Validation

After automation development, multiple rounds of batch switch tests are performed on a test cluster, followed by tiered online switches. P3 clusters serve as a low‑impact pilot to surface issues before larger batches.

Gray‑scale switch order:

Single‑cluster switch

Small batch (<10)

Large batch (>30)

Issues encountered:

Multiple domain names : Some clusters have several master/slave domain names, requiring compatibility handling during traffic switch.

CMDB inaccuracies : Stale metadata leads to incorrect instance or domain information, necessitating pre‑migration verification.

4 Conclusion

ZhaiZhai operates over 400 MySQL clusters; all switches were completed during the early‑morning maintenance window on September 27. P3 and P2 clusters were switched beforehand; the remaining 100+ P1 core clusters averaged 10 s per switch, finishing within half an hour with smooth operation and successful validation.

Author: Huang Jianbo, ZhaiZhai DBA, responsible for MySQL operations and database platform development.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Automation Operations mysql Cloud database migration

Written by

Zhuanzhuan Tech

A platform for Zhuanzhuan R&D and industry peers to learn and exchange technology, regularly sharing frontline experience and cutting‑edge topics. We welcome practical discussions and sharing; contact waterystone with any questions.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.