Mastering MySQL Disaster Recovery: Replication Modes and Strategies
This article explains MySQL disaster‑recovery techniques, covering cold and hot backups, same‑city versus remote setups, master‑slave topologies, async, semi‑sync and full‑sync replication, the MAR strong‑sync approach, and practical recommendations for building resilient two‑city three‑center architectures.
Disaster Recovery Overview
MySQL disaster recovery is a critical topic for backend developers; without a proper mechanism, service outages can cause severe business impact.
Recovery strategies are classified by temperature (cold vs. hot) and distance (same‑city vs. remote).
Cold backup: data is stored offline and restored when needed.
Hot backup: standby nodes are kept up‑to‑date for immediate failover.
Same‑city: low latency, suitable for hot or full‑sync modes.
Remote: higher latency, typically using cold or async modes.
Master‑Slave Deployment
MySQL can be deployed in master‑slave mode; if the master fails, a slave can be promoted. During normal operation, read traffic can be split to slaves to reduce load.
Common master‑slave topologies include:
One‑master‑one‑slave : a single standby takes over when the master fails.
One‑master‑multiple‑slaves : multiple standbys share read load and provide redundancy.
Cascade (multi‑level) replication : a primary replicates to an intermediate master, which further replicates to downstream slaves.
Replication Modes
Replication can work in three synchronization modes:
Asynchronous : the master does not wait for the slave; it returns success immediately.
Semi‑synchronous : the master waits for acknowledgment from at least one slave before confirming success, improving reliability at some performance cost.
Full synchronous : the transaction is considered committed only after all slaves have applied it, ensuring strong consistency but incurring high latency.
Full synchronous is rarely used unless the network is extremely reliable.
MAR Strong‑Sync Replication
To mitigate full‑sync performance loss, many systems adopt MAR (asynchronous multi‑threaded strong sync). The master writes to the binlog, a dump thread sends the log to slaves, and slaves acknowledge after writing to the relay log. MAR combines the safety of semi‑sync with asynchronous throughput by using a thread pool.
Tencent’s TDSQL uses MAR for strong consistency with high performance.
Practical Usage Recommendations
For high‑real‑time requirements, same‑city hot backup with full‑sync (or MAR) is preferred. For disaster‑tolerant designs, remote cold backup with asynchronous replication is common, accepting minimal data loss during failures.
When a disaster occurs, failover procedures differ:
Same‑city strong‑sync: simply promote the standby.
Remote setups: switch to remote standby if the outage is prolonged; otherwise, keep the primary offline and recover later.
Two‑City Three‑Center Architecture
This architecture combines two same‑city data centers (cold and hot) with one remote center, providing both low‑latency failover and protection against large‑scale regional disasters.
During a failure, traffic can be redirected to the surviving center, and data can be synchronized back once the affected site recovers.
Conclusion
Understanding MySQL disaster‑recovery mechanisms, replication modes, and advanced strategies like MAR and two‑city three‑center designs is essential for building robust, high‑availability services and can be a strong differentiator in technical interviews.
NiuNiu MaTe
Joined Tencent (nicknamed "Goose Factory") through campus recruitment at a second‑tier university. Career path: Tencent → foreign firm → ByteDance → Tencent. Started as an interviewer at the foreign firm and hopes to help others.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
