How to Handle MySQL Replication Lag in Read/Write Splitting Architectures
This article explains the reasons for read/write splitting, compares direct‑client and proxy‑based master‑slave setups, and presents five practical strategies—including forcing reads to the master, sleep delays, lag checks, master‑position waiting, and GTID waiting—to mitigate stale reads caused by MySQL replication lag.
1. Master‑Slave Architecture
Read/write splitting is introduced to relieve pressure on a single server by routing read requests to replicas and write requests to the primary, improving availability and read performance. A basic MySQL topology with one master, one standby, and three slaves is shown, where the client performs load balancing by selecting the appropriate server.
An alternative diagram adds a proxy layer; the client connects only to the proxy, which decides routing based on request type and context.
Comparison of the two architectures
Direct client connection offers better query performance and simpler troubleshooting, but any change in backend topology (master‑standby switch, migration) requires client reconfiguration.
Proxy‑based architecture hides backend details from the client, simplifying connection management, but demands a highly available proxy and adds complexity to the overall system.
Regardless of the chosen setup, replication lag can cause "stale reads" when a transaction commits on the master and the subsequent read is directed to a replica that has not yet applied the changes.
General strategies to avoid stale reads
Force reads to the master.
Introduce a sleep delay before reading from replicas.
Determine whether the replica is lag‑free.
Wait for the master’s binlog position to be applied on the replica.
Wait for the GTID set to be executed on the replica.
2. Master‑Slave Synchronization
A brief review of MySQL asynchronous replication: the master maintains a binlog, the replica establishes a persistent connection, and two threads on the replica (io_thread and sql_thread) fetch and apply the binlog entries. The relay log stores the received binlog before execution.
Typical causes of replication lag include high concurrency on the master, large transactions (e.g., bulk deletes, DDL), and the lack of parallel replication in versions prior to MySQL 5.6.
3. Replication‑Lag Mitigation Solutions
1. Force reads to the master
Classify requests: those requiring the latest data go to the master, while others may use replicas. This eliminates stale reads but shifts all load back to the master, reducing the benefits of read/write splitting.
2. Sleep approach
Execute SELECT SLEEP(1) before querying a replica. This can hide lag if it is less than the sleep duration, but it adds unnecessary latency and may still return stale data when lag exceeds the sleep interval.
3. Check for zero lag
Run SHOW SLAVE STATUS on the replica and examine Seconds_Behind_Master ; if it is zero, proceed with the read. For finer granularity, compare Master_Log_File with Relay_Master_Log_File and Read_Master_Log_Pos with Exec_Master_Log_Pos . GTID sets ( Retrieved_Gtid_Set vs. Executed_Gtid_Set ) can also be compared. These checks are more accurate than the sleep method but still cannot guarantee zero‑lag during high‑throughput periods.
4. Wait for master position
Use the function SELECT master_pos_wait(file, pos[, timeout]) on the replica. After a transaction commits, obtain the master’s binlog file and position via SHOW MASTER STATUS, then execute master_pos_wait on a chosen replica with a short timeout (e.g., 1 second). If the function returns a non‑negative value, the replica has caught up and the read can proceed; otherwise, fall back to the master.
5. Wait for GTID
When GTID mode is enabled, use SELECT wait_for_executed_gtid_set(gtid_set, 1) on the replica. The GTID of the committed transaction is obtained from the client’s response packet. If the function returns 0, the replica has executed the transaction and the read is safe; otherwise, query the master.
Both the master‑position and GTID‑waiting methods provide more reliable guarantees than simple lag checks, but they still require a fallback to the master when the replica cannot catch up within the timeout.
4. Summary
The article reviewed read/write splitting architectures and the challenges posed by replication lag. Various mitigation techniques—from forcing reads to the master to sophisticated waiting mechanisms—were discussed, each with trade‑offs between complexity, latency, and consistency. Selecting an appropriate solution depends on the specific workload and tolerance for stale reads.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
JD Cloud Developers
JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
