MySQL Replication Delay Analysis and Troubleshooting for Large Transactions
The article describes a MySQL 5.7.31 replication delay where Seconds_Behind_Master fluctuates between 0 and over 10,000 seconds, analyzes root causes such as single‑threaded replication, large or long‑running transactions, and provides preventive, diagnostic, and remediation steps including enabling parallel replication and using the infobin tool.
The author, a DBA from the iKesheng community, received a report that a MySQL replica’s Seconds_Behind_Master value was repeatedly jumping between 0 and more than 10,000 seconds.
Running SHOW SLAVE STATUS\G confirmed the erratic delay values. The server was MySQL 5.7.31, using the DATABASE replication mode with slave_parallel_workers=0 , i.e., a single SQL thread.
The client had recently launched a new high‑QPS service that generated many INSERT and UPDATE statements on a modest cloud server. After the issue appeared, the new service was temporarily taken offline.
Investigation revealed that Seconds_Behind_Master is affected by three factors: the replica’s system time, the time difference between master and replica, and the mi->last_timestamp . In single‑threaded mode, DML events are recorded in the binlog; the query_event exec_time is usually 0 and can be ignored. The timestamp in each event header is used to compute the delay, so long‑running or large transactions can cause instantaneous jumps.
Since the new workload was shut down, the delay gradually returned to normal, and the issue was considered resolved after monitoring confirmed stability.
The author then reflected on three questions: how to prevent such phenomena, how to identify long‑uncommitted or large transactions, and how to remediate them.
Prevention recommendations include enabling parallel replication when the production environment permits and performing pre‑deployment SQL review to avoid problematic statements.
For diagnosis, the author suggests using SHOW PROCESSLIST , querying information_schema.innodb_trx , and, for historical analysis, parsing binlog files with tools like mysqlbinlog or the third‑party infobin utility. Example command:
infobin mysql-bin.000005 20 2000000 10 -t >/root/result.logThe command splits the binlog into 20 pieces, treats transactions larger than 2 MB as “large”, and flags those lasting more than 10 seconds as long‑running. The tool’s output shows total transaction count, largest transaction size, and per‑table binlog statistics.
Remediation involves collaborating with the business side to refactor the workload; if the business is uncooperative, the DBA may need to coordinate directly during off‑hours to apply fixes.
Additional sections promote the SQLE open‑source SQL audit platform, providing repository and documentation links, and invite readers to join the official QQ group for further discussion.
Aikesheng Open Source Community
The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.