Databases 13 min read

Why MySQL Replication Lag Happens and How to Fix It: Real-World Cases

This article examines the common causes of MySQL master‑slave replication delay, explains how to monitor the issue with SHOW SLAVE STATUS, and provides practical solutions for six typical scenarios ranging from heavy write traffic to hardware mismatches.

UCloud Tech
UCloud Tech
UCloud Tech
Why MySQL Replication Lag Happens and How to Fix It: Real-World Cases

MySQL master‑slave replication delay has long troubled the industry, reducing the value of read/write separation for latency‑sensitive applications. UDB, UCloud's cloud database service, has operated thousands of MySQL instances for six years and routinely resolves 2‑3 replication‑delay cases daily.

Why Replication Delay Matters

The replication mechanism is used throughout UDB, including standard master‑slave setups and the dual‑master high‑availability mode. Any delay harms data consistency, prevents automatic disaster‑recovery switchover, and can cause severe business impact such as missing newly inserted or updated records.

DBAs typically run SHOW SLAVE STATUS and watch the Seconds_Behind_Master value; UDB even exposes this metric as a dedicated monitoring item.

Production‑Environment Analysis and Solutions

We categorize the most frequent replication‑delay cases and summarize their symptoms, root causes, and remediation steps.

Case 1: Frequent DML on the Master

Symptoms : Sudden spike in master write QPS accompanied by rising delay.

Root Cause : Massive binlog generation overwhelms the single‑threaded SQL thread on the replica.

Solution : For MySQL 5.7‑, shard the workload; for MySQL 5.7 use Group Commit parallel replication; for MySQL 8.0 use Write‑Set parallel replication.

Case 2: Large Transactions on the Master

Symptoms : Exec_Master_Log_Pos stalls while Seconds_Behind_Master climbs; Slave_SQL_Running_State shows “Reading event from the relay log”.

Root Cause : Long‑running transactions generate extensive binlog entries that take a long time to replay on the replica.

Solution : Split large transactions into smaller ones to allow timely commits.

Case 3: DDL on Large Tables

Symptoms : Exec_Master_Log_Pos unchanged, Slave_SQL_Running_State indicates waiting for metadata lock or “altering table”.

Root Cause : Replicating DDL on big tables is slow, similar to large transactions.

Solution : Identify blocking DDL via SHOW PROCESSLIST or information_schema.innodb_trx and kill it; schedule DDL during low‑traffic windows or execute it manually on both master and replica with sql_log_bin=0 after careful testing.

Case 4: Mismatched Master/Replica Configurations

Symptoms : Significant performance metric differences between master and replica.

Root Cause : Hardware or configuration disparities (e.g., SSD vs. SAS disks, different RAID policies, kernel parameters).

Solution : Align hardware and MySQL settings; for OLAP workloads, provision replica with equal or higher resources.

Case 5: Missing Primary Key or Proper Index

Symptoms : Slave_SQL_Running_State shows “Reading event from the relay log”, high CPU on replica, and no change in Exec_Master_Log_Pos.

Root Cause : In ROW binlog format, updates on tables without suitable indexes cause full‑table scans for each row, dramatically slowing replay.

Solution : Ensure every table has an explicit auto‑increment primary key and appropriate indexes.

Case 6: Replica Overload

Symptoms : High CPU or I/O utilization on the replica, causing slow SQL‑thread execution.

Root Cause : Read‑heavy workloads or OLAP queries on the replica exceed its capacity.

Solution : Add more replicas to distribute read traffic; dedicate a replica for OLAP workloads and tolerate some delay.

Summary

When using MySQL master‑slave replication, delay is a critical factor affecting data consistency and high‑availability failover. By monitoring SHOW SLAVE STATUS, analyzing QPS, transaction size, DDL activity, hardware parity, indexing, and replica load, operators can systematically diagnose and mitigate replication lag.

MySQLmaster‑slaveTroubleshootingDatabase PerformanceUDBreplication lag
UCloud Tech
Written by

UCloud Tech

UCloud is a leading neutral cloud provider in China, developing its own IaaS, PaaS, AI service platform, and big data exchange platform, and delivering comprehensive industry solutions for public, private, hybrid, and dedicated clouds.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.