Why Does MySQL Master‑Slave Replication Lag? Causes and Fixes
This article explains why MySQL master‑slave replication can become delayed, outlines the differences between DDL and DML, shows how to diagnose lag with show slave status, and provides architectural, hardware, and configuration solutions—including semi‑synchronous replication and tuning parameters—to reduce latency.
1. Causes of MySQL Master‑Slave Replication Lag
MySQL replication runs single‑threaded; the master writes all DDL and DML events to the binlog sequentially, which is efficient. The slave’s SQL Thread replays these events, but DML/DDL I/O is random and costly. When the master’s concurrency is high, the slave may not keep up, especially if large queries cause lock waits. Common reasons include high load on master or slave, network latency, insufficient hardware, or sub‑optimal MySQL configuration.
2. About DDL and DML
SQL language categories:
DQL (Data Query Language) – SELECT, FROM, WHERE, GROUP BY, HAVING, ORDER BY, LIMIT.
DDL (Data Definition Language) – CREATE, ALTER, DROP, TRUNCATE, COMMENT, RENAME; used to define tables, indexes, views, procedures, triggers.
DML (Data Manipulation Language) – SELECT, INSERT, UPDATE, DELETE, MERGE, CALL, EXPLAIN PLAN, LOCK TABLE; used for data manipulation.
DCL (Data Control Language) – GRANT, REVOKE; manages permissions.
TCL (Transaction Control Language) – COMMIT, ROLLBACK, SAVEPOINT, SET TRANSACTION.
3. How to Diagnose Replication Lag
Run SHOW SLAVE STATUS and check the Seconds_Behind_Master value:
NULL – either the I/O thread or SQL thread has failed.
0 – replication is healthy.
Positive number – indicates lag; larger numbers mean greater delay.
4. Solutions
Data‑loss mitigation:
Semi‑synchronous replication – supported from MySQL 5.5; the master waits until at least one slave writes the transaction to its relay log before returning to the client, improving safety at the cost of extra round‑trip latency.
Configure sync_binlog=1 on the master and innodb_flush_log_at_trx_commit=1 to ensure binlog and InnoDB logs are flushed to disk after each transaction.
Note: Setting both values to 1 can limit write performance and should only be used when data safety is critical (e.g., financial transactions) and the I/O subsystem can handle the load.
4.1 Reducing Slave Lag
Architecture
Adopt sharding to distribute load across multiple MySQL instances.
Use read/write splitting: one master, multiple slaves for reads.
Introduce a cache layer (Memcached or Redis) between the application and MySQL.
Physically separate business‑specific MySQL instances on different machines.
Deploy slaves on more powerful hardware than the master.
Hardware
Upgrade servers (e.g., 4U > 2U > 1U).
Use SSDs, disk arrays, or SAN for faster random writes.
Place master and slaves on the same switch with 10 GbE connectivity.
In short, stronger hardware naturally reduces latency; the ultimate solution is to invest both money and time.
MySQL replication acceleration tips:
Set sync_binlog=0 on the slave.
Use --logs-slave-updates to prevent slave updates from being written to its own binlog.
Disable binlog on the slave if not needed.
If using InnoDB, set innodb_flush_log_at_trx_commit=2 on the slave.
Filesystem‑level optimizations: Disable atime updates (e.g., add noatime to /etc/fstab and remount) to avoid unnecessary write‑back of file access timestamps.
Adjust sync parameters: high safety settings (e.g., sync_binlog=1, innodb_flush_log_at_trx_commit=1) are essential on the master but can be relaxed on the slave to improve performance.
4.2 Other Slave‑Sync Issues and Solutions
Problems:
Potential data loss if the master crashes.
Single SQL thread on the slave leads to replication lag under heavy write load.
Solutions:
Semi‑synchronous replication – addresses data‑loss risk.
Parallel replication – reduces slave lag.
Semi‑synchronous replication details:
Available as a plugin from MySQL 5.5; requires separate installation.
Ensures the transaction’s binlog reaches at least one slave before the master commits.
Does not guarantee the slave has applied the transaction.
Introduces some performance overhead and longer response times.
Network issues or slave failure can block the master until timeout or recovery.
Comparison of asynchronous, semi‑synchronous, and parallel replication principles.
Link: https://juejin.cn/post/7305582809811599360 (© original author, removal upon request)
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
