How to Fix MySQL Replication Lag: Practical Parameter Tweaks
When MySQL replication falls behind, common culprits include oversized buffer pools, backup‑induced MDL waits, large transactions, slow queries, parallel replication quirks, and network issues, and this guide offers concrete configuration adjustments to restore sync and improve performance.
Introduction
During database operations such as migration or scaling, MySQL replication lag is a frequent and painful problem. The author, a DBA from Sina Weibo, lists typical root causes and then shares practical parameter‑tuning ideas.
Typical Causes of Lag
buffer_poolset too large, causing MySQL to swap.
Backup processes causing SQL_THREAD to wait for MDL locks.
Large transactions.
Slow queries that degrade replica performance.
Parallel replication keeping the replica delay monitor stuck at 1 s.
Network problems.
Each case requires specific analysis, which is omitted here.
Idea 1: Sync‑Related Settings
Increase the frequency of log flushing to improve write‑ahead logging speed. The following settings can be applied temporarily on the replica and reverted after the issue is resolved.
sync_binlog=0
sync_master_info=10000 # default
sync_relay_log=10000 # default
sync_relay_log_info=10000 # defaultIdea 2: Buffer and Concurrency Adjustments
Enlarge the InnoDB buffer pool and adjust change‑buffer settings to speed up SQL_THREAD replay. Example values:
innodb_buffer_pool_size=24G # 24*1024*1024*1024
innodb_change_buffer_max_size=50
innodb_thread_concurrency=0
innodb_adaptive_hash_index=0Risks of increasing innodb_buffer_pool_size:
Excessive memory allocation may trigger swap or OOM; keep the buffer pool below ~70 % of physical RAM.
Changes should be applied incrementally while monitoring system memory.
Idea 3: Replica‑Specific Settings
Enable Parallel Replication
For MySQL 8.0+, consider using writeset parallelism. Adjust the number of workers but do not exceed the number of CPU cores or the innodb_thread_concurrency value.
slave_parallel_type=LOGICAL_CLOCK
slave_parallel_workers=16
slave_preserve_commit_order=OFFDisable slave_preserve_commit_order during lag recovery, then re‑enable after the replica catches up.
Consider Disabling log_slave_updates
This variable requires a restart to take effect; experienced users may modify it via gdb, but it is risky and not recommended for most.
Idea 4: Group Replication (MGR) Adjustments
Temporarily switch MGR to asynchronous mode by disabling slave_preserve_commit_order. Re‑enable once the replica has caught up.
Idea 5: Other Performance Parameters
Follow the provided template for additional settings; they generally pose little risk and should help reduce replication delay, eventually bringing it down to near zero.
Appendix: Parallel Replication Statistics Example
2021-01-10T16:08:39.947611+08:00 85441 [Note] Multi-threaded slave statistics for channel '';
seconds elapsed = 120; events assigned = 4005889; worker queues filled over overrun level = 0;
waited due a Worker queue full = 0; waited due the total size = 0; waited at clock conflicts = 6918018179200;
waited (count) when Workers occupied = 0; waited when Workers occupied = 0Aikesheng Open Source Community
The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
