Understanding MySQL Master‑Slave Replication: Principles, Lag, and Failover
This article explains MySQL master‑slave replication, covering its architecture, binlog‑based replication process, causes and mitigation of replication lag, and strategies for master‑slave failover, helping readers grasp why and how to use replication for read/write separation, high availability, and backup.
Hello everyone, I am Lou Zai.
MySQL master‑slave replication is a common interview topic; although the concepts are basic, few can answer them comprehensively.
Below is the article outline.
1. MySQL Master‑Slave
1.1 What is MySQL master‑slave?
MySQL master‑slave means having two identical databases: a master that handles read‑write operations and a slave that provides read‑only queries.
1.2 Why use MySQL master‑slave?
On a single machine (e.g., 4‑core 8 GB running MySQL 5.7) you can handle about 500 TPS and 10 000 QPS. When traffic spikes, you need master‑slave separation.
Most systems have a read‑heavy access pattern; by using one master and multiple slaves, the master handles writes and core queries, while slaves handle read queries, improving performance and reducing master load.
When the master fails, a slave can be promoted to master, ensuring high availability, and the master can serve as a backup.
Read/Write Separation : slaves handle reads, reducing master pressure.
High Availability : failover to a slave when the master crashes.
Data Backup : slaves store a copy of the data for disaster recovery.
2. Master‑Slave Replication
2.1 Replication principle
MySQL replication relies on the binary log (binlog), which records every change in binary form.
The master writes updates to the binlog; the slave asynchronously pulls the binlog and replays it.
Master writes binlog : update/insert/delete statements are recorded.
Master sends binlog : a dump thread streams the binlog to the slave.
Slave writes relay log : an IO thread fetches the binlog and stores it in a relay log.
Slave replays : an SQL thread reads the relay log and applies the changes, achieving consistency.
2.2 Ensuring consistency
If the binlog format is STATEMENT , the master may use one index while the slave uses another, leading to data divergence. Example:
delete from t where a > '666' and create_time < '2022-03-01' limit 1;To avoid this, switch the binlog format to ROW , which records the exact rows affected (Table_map and Delete_rows events), eliminating inconsistency.
However, row format can be large for massive deletes. A mixed format ( MIXED ) uses row format only when potential inconsistency is detected, otherwise statement format.
3. Replication Lag
Lag occurs when the time between the master writing a binlog entry and the slave replaying it grows, causing the slave to return stale data.
3.1 Lag causes
Single‑threaded replication: the master writes binlog sequentially, but the slave’s SQL thread processes events one by one, leading to bottlenecks.
IO thread speed vs. SQL thread speed mismatch.
Lock contention on the slave, especially when a large DDL blocks the SQL thread.
Summarized: high TPS, large transactions, and lock contention on the slave are the main sources of lag.
3.2 Typical lag scenarios
Slave hardware inferior : use machines with comparable specs to the master.
Slave overload : consider adding more slaves or offloading queries to Hadoop‑like systems.
Too many slaves : keep the number of slaves reasonable (3‑5).
Large transactions : avoid massive deletes in a single statement; split into batches.
Network latency : increase bandwidth.
Old MySQL version : upgrade to a version that supports multi‑threaded replication.
3.3 Lag mitigation solutions
Use caching : write to cache and read from cache first (introduces cache‑DB consistency challenges).
Read from master : direct critical reads to the master (adds load, not recommended for all queries).
Data redundancy : for asynchronous processing, push data IDs to a message queue so consumers don’t need to query the slave.
In practice, critical paths (e.g., inventory, payment) should query the master, while non‑critical paths can use slaves.
4. Master‑Slave Switchover
4.1 One master, one slave
Two machines: A (master) handles reads/writes, B (slave) handles reads. If A fails, B is promoted to master; after repair, A becomes slave.
Pros : simple, provides read offloading and basic HA.
Cons : limited read capacity and still a single point of failure for both machines.
4.2 One master, multiple slaves
One master (A) with several slaves (B, C, D). If A fails, B can become the new master while C and D continue reads; after repair, A re‑joins as a slave.
Pros : many slaves share read load, significantly improving read concurrency.
Cons : write concurrency remains limited to the single master.
Large companies (e.g., Baidu, Didi) typically adopt the one‑master‑many‑slaves architecture to handle high query volume, achieve read/write separation, and ensure high availability and disaster recovery.
IT Services Circle
Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.