Databases 6 min read

Analysis of MySQL Master‑Slave Replication Delay and Mitigation Strategies

The article recounts a pre‑promotion MySQL replication incident at JD.com, explains the master‑slave architecture and thread roles, identifies slow‑SQL and missing indexes as root causes of replication lag, and proposes practical measures to reduce latency and improve system stability.

JD Tech
JD Tech
JD Tech
Analysis of MySQL Master‑Slave Replication Delay and Mitigation Strategies

The author documents a real‑world incident that occurred on the eve of a major JD.com promotion, where an online service raised an alarm due to a null‑pointer exception caused by an empty database query. Initial investigation pointed to master‑slave replication delay, which lasted a few seconds but recurred multiple times over subsequent months, worsening on the first day of the 11.11 campaign.

Further diagnosis revealed that the slave’s SQL thread was processing a large number of delete statements on a table without an index, while the master already had an appropriate index. This mismatch caused the slave to fall behind, resulting in noticeable replication lag.

The article then outlines the fundamentals of MySQL master‑slave replication, which involves three threads:

Master (binlog dump thread) : writes data‑change events to the binary log and notifies the slave.

Slave I/O thread : connects to the master, requests binlog data, and stores it in the relay log.

Slave SQL thread : reads the relay log and re‑executes the events to keep the slave in sync with the master.

The replication process is illustrated by an accompanying diagram (image omitted for brevity).

Combining the replication theory with the observed symptoms, the root cause is identified as the slave’s SQL thread executing slow statements sequentially; any poorly performing operation inflates replication latency.

To mitigate the issue, the team added the missing index on the slave, which immediately reduced the lag and made the impact on the online service negligible.

Additional recommendations for lowering master‑slave delay include:

Introduce a caching layer (e.g., Memcached or Redis) between the application and MySQL to reduce read pressure.

Deploy higher‑performance hardware for the slave.

Set sync_binlog=0 on the slave.

Use the --log‑slave‑updates option to prevent the slave from writing received updates to its own binary log.

Disable binary logging on the slave when it is not needed.

JD.com’s architecture typically employs one master and eight slaves; high replication lag directly affects read‑heavy services across all business lines. By addressing indexing and employing the above strategies, the system’s stability and performance can be significantly improved.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

performanceindexingLatencymysqlMaster‑SlaveReplicationdatabases
JD Tech
Written by

JD Tech

Official JD technology sharing platform. All the cutting‑edge JD tech, innovative insights, and open‑source solutions you’re looking for, all in one place.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.