Understanding and Solving MySQL Slave Lag: Causes, Scenarios, and Optimizations
This article analyzes common MySQL slave‑lag scenarios—including missing indexes, large master transactions, high write rates, and MyISAM backups—explains their risks, and provides practical solutions such as index tuning, hardware upgrades, configuration changes, and enabling parallel replication to improve replication performance.
Author Introduction
Yang Qilong, known as "North in South", is a 7‑year DBA veteran currently working at Hangzhou Youzan Technology as a DBA, responsible for database architecture design and operation platform development, and skilled in performance tuning and fault diagnosis.
1 Introduction
When operating MySQL databases, DBAs often receive alerts about master‑slave lag, for example:
check_ins_slave_lag (err_cnt:1)critical‑slavelag on ins:3306=39438
Slave lag is a long‑standing issue. The risks include:
In abnormal cases, HA cannot switch; consistency checks reveal master‑slave divergence.
Backup hangs because flush tables with read lock may timeout after 900 seconds.
Backups based on the slave are not up‑to‑date, reflecting delayed data.
This article explores how to solve and avoid slave lag by analyzing several common causes.
2 Slave Lag Scenarios and Solutions
1. No primary key, no index or low‑selectivity index
Typical characteristics:
a. show slave status shows position unchanged
b. show open tables shows a table with in_use = 1
c. show create table reveals no primary key or low‑selectivity indexes.Solution:
a. Identify high‑selectivity columns and test:
select count(*) from xx;
select count(*) from (select distinct xx from xxx) t;
If the two counts are similar, add indexes on those columns.
b. stop slave; // may take long because of transaction rollback
c. set global slave_rows_search_algorithms='TABLE_SCAN,INDEX_SCAN,HASH_SCAN';
set sql_log_bin=0;
alter table xx add key xx(xx);
start slave;
d. For InnoDB, monitor rows_inserted/updated/deleted/selected via
show innodb status
to gauge replication speed.2. Large transactions on the master causing slave delay
Symptoms can be seen in the binlog, where a single transaction occupies many events.
Solution:
Prevention: cooperate with developers to add caching, use asynchronous writes, and avoid large single‑transaction writes.
Remediation: tune IO‑related parameters such as innodb_flush_log_at_trx_commit and sync_binlog, or enable parallel replication.3. High write frequency on the master overwhelms the slave
When the master performs many INSERT/UPDATE/DELETE operations, the slave's single sql_thread cannot keep up.
Solution:
a. Upgrade slave hardware (e.g., SSD, high‑performance I/O).
b. Use the pre‑heat tool (relay fetch) to load required data into memory before the sql_thread executes it.
c. Deploy multi‑threaded replication (Alibaba MySQL implementation) based on row‑level parallelism; this requires binlog format ROW.
The approach allows two transactions that modify different rows of the same table to run concurrently.4. Many MyISAM tables cause delay during backup
Tools like xtrabackup lock tables (via flush tables with read lock ) for a consistent backup; MyISAM locks block the slave SQL thread, leading to hangs. Converting those tables to InnoDB eliminates the problem.
3 MySQL Improvements
MySQL continuously optimizes replication to reduce latency.
1. Group‑commit based parallel replication
Replication works by the io_thread pulling the master’s binlog into the relay log, and the sql_thread replaying it. MySQL 5.6 introduced parallel replication with three thread types:
coordinator_thread : reads the relay log, groups events by transaction, and dispatches them to worker threads; also handles DDL or cross‑database transactions.
worker_thread : executes assigned binlog events; the number of workers is controlled by slave_parallel_workers .
When only one database exists, parallelism offers little benefit; otherwise it can improve throughput.
Enabling group‑commit parallelism in MySQL 5.7 requires:
slave_parallel_workers > 0
global.slave_parallel_type='LOGICAL_CLOCK'With this setting, transactions that share the same sequence number (seqno) can be executed concurrently on the slave, breaking the previous limitation that transactions from the same database could not run in parallel.
# show variables like 'slave_parallel%';
+------------------------+----------+
| Variable_name | Value |
+------------------------+----------+
| slave_parallel_type | DATABASE |
| slave_parallel_workers | 4 |
+------------------------+----------+After enabling, show processlist displays additional threads waiting for events from the coordinator:
+----+-------------+-----------------+------+---------+------+--------------------------------------------------------+------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+-------------+-----------------+------+---------+------+--------------------------------------------------------+------------------+
| 9 | root | localhost:40270 | NULL | Query | 0 | starting | show processlist |
| 10 | system user | NULL | NULL | Connect | 1697 | Waiting for master to send event | NULL |
| 31 | system user | NULL | NULL | Connect | 5 | Slave has read all relay log; waiting for more updates | NULL |
... (other waiting threads) ...
+----+-------------+-----------------+------+---------+------+--------------------------------------------------------+------------------+2. Write‑set based parallel replication
Since MySQL 5.7.22, a write‑set based optimization allows transactions that modify disjoint rows to be applied in parallel, effectively moving parallelism from the transaction level to the row level. This can increase replication speed by 5‑10×. Detailed performance graphs are available in external articles.
4 Conclusion
Slave lag occurs when the slave cannot apply binlog events as quickly as the master writes them. Mitigation strategies include adding appropriate indexes, reducing large transactions, upgrading hardware, and enabling parallel replication (database‑, transaction‑, or row‑based). Storage technology advances from HDD to SSD/PCI‑E SSD/NVMe continuously improve I/O, while MySQL’s replication evolves from single‑threaded to sophisticated parallel algorithms.
5 Further Reading
Speed up 5‑10× with WRITESET based MySQL parallel replication
A MySQL master‑slave acceleration solution – improvement
MySQL multi‑threaded sync MySQL‑Transfer (deprecated)
MySQL parallel replication evolution and WriteSet optimization in MySQL 8.0
Aikesheng Open Source Community
The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.