Databases 7 min read

How to Diagnose and Fix MySQL Master‑Slave Replication Lag

This article explains why MySQL master‑slave replication can become delayed in high‑traffic order systems, describes the underlying binlog synchronization mechanism, and provides practical steps—including network upgrades, server tuning, transaction reduction, version updates, and slave count limits—to eliminate the lag.

Su San Talks Tech
Su San Talks Tech
Su San Talks Tech
How to Diagnose and Fix MySQL Master‑Slave Replication Lag

Introduction

I often encounter MySQL master‑slave latency issues, especially in a restaurant ordering system where the order service uses a master‑slave architecture (one master, two slaves) to ensure performance and high availability during peak lunch and dinner periods.

1. Incident Overview

The ordering system publishes an MQ message containing only the order ID after a user places an order. The dish‑allocation service consumes this message, calls the order query API using the ID, and the API reads data from the slave database. If master‑slave synchronization is delayed, the API may return incomplete data, causing the dish‑allocation write to fail.

2. How MySQL Master‑Slave Replication Works

The master writes all changes to a binary log (binlog) on disk. Replication copies the binlog to each slave, usually asynchronously. The process consists of:

The master writes changes to the binlog.

A Log Dump thread on the master streams the binlog to the slaves.

The slave’s I/O thread receives the binlog and writes it to a replay log.

The slave’s SQL thread reads the replay log and replays the changes into the slave database.

If any step fails, replication delay can occur.

3. How to Resolve Master‑Slave Delay

3.1 Network Issues

Network problems can block binlog transmission. Increasing bandwidth (e.g., from 100 M to 300 M) helps.

3.2 Server Performance

The master should be more powerful than the slaves. Heavy write traffic generates large binlogs that slow replay on the slave. Upgrading the slave hardware to match the master can alleviate this.

3.3 Large Transactions

Big transactions slow both master writes and slave replay. Reduce transaction scope, move non‑critical queries out of transactions, and execute asynchronous logic where possible.

3.4 MySQL Version

Older MySQL versions use single‑threaded binlog replication, which is slow. Upgrading to MySQL 5.6 or later enables multi‑threaded replication.

3.5 Too Many Slaves

Having many slaves can degrade overall replication speed because the master must wait for all slaves to acknowledge. Keeping the number of slaves below five is recommended.

4. Our Solution

We first asked operations to increase network bandwidth. Then we optimized the business code by shrinking large transactions and moving non‑critical work to asynchronous processing, reducing concurrent writes. We also added an automatic retry mechanism: if the MQ consumer receives incomplete order data, the exception is logged and a background job retries the operation. After these changes, the master‑slave latency issue was essentially resolved.

Open Question

If you need strict strong consistency between master and slave, what approach should you take?

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

performance tuningLatencymysqlMaster‑SlaveReplication
Su San Talks Tech
Written by

Su San Talks Tech

Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.