Databases 12 min read

Mastering MySQL Master‑Slave Replication: Principles, Challenges & Solutions

This article explains MySQL master‑slave replication fundamentals, why it’s used, the replication workflow, causes of lag, and presents multiple practical solutions—including system tuning, partitioning, caching, multi‑threaded relay log replay, and read‑from‑master strategies—to improve performance and reliability.

Architecture & Thinking
Architecture & Thinking
Architecture & Thinking
Mastering MySQL Master‑Slave Replication: Principles, Challenges & Solutions

1 Review MySQL Master‑Slave Replication

Master‑slave replication creates a replica database identical to the primary and copies DDL/DML logs from the primary to the replica, then replays them to keep data consistent.

1.1 Why Use Replication

1. Separate read/write to avoid lock contention; master handles writes, slaves handle reads.

2. Provide hot backup; failover if master fails.

3. Scale architecture; multiple slaves reduce I/O load.

4. Implements divide‑and‑conquer, splitting pressure.

5. Adjust master‑slave ratio based on read/write ratio (example table omitted).

1.2 Replication Mechanism

When replication starts, an I/O thread on the slave connects to the master, which creates a Binlog Dump thread to send events. The I/O thread writes them to the relay log, and the SQL thread on the slave replays them.

Steps:

Master records data changes (INSERT, DELETE, UPDATE) in the binary log.

Binlog Dump thread sends binlog to the slave’s relay log.

Slave replays changes from relay log.

Three threads are involved: binlog dump on master, I/O and SQL threads on slave; each slave gets its own binlog dump thread.

1.3 Causes of Replication Lag

High TPS can generate more DML/DDL than a single SQL thread on the slave can process, leading to lag.

The SQL thread is single‑threaded, limiting its ability to replay the relay log.

2 Several Solutions

2.1 Optimal System Configuration

Optimize system, connection, storage engine settings: max connections, timeouts, pool sizes, etc., and ensure sufficient CPU, memory, storage.

Linux kernel parameters can be tuned:

# TIME_WAIT timeout, default 60s
net.ipv4.tcp_fin_timeout = 30
# Increase TCP backlog
net.ipv4.tcp_max_syn_backlog = 65535
# Reduce resources on close
net.ipv4.tcp_max_tw_buckets = 8000
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_fin_timeout = 10
# Open files limit
*soft nofile 65535
*hard nofile 65535

MySQL 5.5+ uses InnoDB by default; key parameters affecting performance include:

max_connections = 151
sort_buffer_size = 2M
open_files_limit = 1024
innodb_buffer_pool_size = 128M
innodb_buffer_pool_instances = 1
innodb_flush_log_at_trx_commit = 1
sync_binlog = 1
innodb_file_per_table = ON
innodb_log_buffer_size = 8M

2.2 Logical Partitioning at the Database Layer

Partition databases to reduce load on a single SQL thread; see a separate article on sharding.

2.3 Wait for Slave Sync Before Acknowledging Writes

Ensuring data is replicated to all slaves before confirming write improves consistency but greatly reduces throughput.

2.4 Introduce Caching

Use Redis or other NoSQL to cache frequently accessed data; write to both DB and cache, read from cache if present, delete cache after DB sync.

Note: Frequent cache deletions under high concurrency can be problematic; consider periodic eviction and avoid deleting cache before confirming DB sync.

2.5 Multi‑Threaded Relay Log Replay

MySQL normally replays the relay log with a single thread; parallel replay can reduce lag but requires careful partitioning to avoid inconsistency.

Same‑table writes must be replayed by the same thread; different tables can be parallelized.

update t_score set score = 721 where stu_code=374532;
update t_score set score = 806 where stu_code=374532;
update t_score set score = 899 where stu_code=374532;

Hash the database name to a thread number to assign threads, ensuring writes to the same table use the same thread.

2.6 Direct Reads from Master for Small Workloads

For low‑traffic, read from master to avoid replication lag, but only for latency‑critical reads.

2.7 Rate Limiting and Degradation

When traffic exceeds capacity, apply caching, rate limiting, and graceful degradation to handle load and reduce lag.

3 Summary

Various solutions have trade‑offs; choose based on scenario. MySQL 5.6+ supports parallel replication per database, and MySQL 5.7+ adds GTID‑based parallel replication.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

mysqlReplicationread/write splittingDatabase Performance
Architecture & Thinking
Written by

Architecture & Thinking

🍭 Frontline tech director and chief architect at top-tier companies 🥝 Years of deep experience in internet, e‑commerce, social, and finance sectors 🌾 Committed to publishing high‑quality articles covering core technologies of leading internet firms, application architecture, and AI breakthroughs.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.