Databases 9 min read

Improving MySQL Asynchronous Replication by Aligning Read_Master_Log_Pos with Exec_Master_Log_Pos

This article analyzes a MySQL master‑crash scenario where mismatched Read_Master_Log_Pos and Exec_Master_Log_Pos prevent HA failover, explains binlog event structures, and proposes updating Read_Master_Log_Pos only after a full transaction is received to ensure reliable asynchronous replication.

Tencent Database Technology
Tencent Database Technology
Tencent Database Technology
Improving MySQL Asynchronous Replication by Aligning Read_Master_Log_Pos with Exec_Master_Log_Pos

Recently a failure was observed in an asynchronous master‑slave setup: after the master crashed, the slave remained unavailable because Read_Master_Log_Pos and Exec_Master_Log_Pos diverged, indicating that binlog replay was still in progress and HA was waiting for it to finish.

MySQL binlog overview : Introduced in MySQL 3.23.14, the binary log records all data‑modifying operations. It serves two main purposes: (1) replication, where the master’s binlog is sent to the slave for replay to keep data consistent, and (2) data recovery, which is not the focus of this article.

Replication based on binlog : The master generates binlog events; the slave’s IO thread pulls them into a relay log, and the SQL thread replays them. The smallest unit is an event , while a transaction consists of one or more events. Each event has a header (type, timestamp, server ID, etc.) and data specific to the event type (e.g., XID_EVENT, WRITE_ROWS_EVENT). The first event describes the file format, the last is a log‑rotation event, and the rest are regular operation events.

An example from GTID mode shows the file‑header event, a Previous_gtids event, followed by a GTID event that uniquely identifies each transaction, then events for table creation and data insertion.

Limitations of asynchronous replication : Because the master sends binlog only after a transaction commits, a master crash can leave the slave with an incomplete set of events. HA will not switch traffic to the slave until Read_Master_Log_Pos (the position up to which events have been fetched) matches Exec_Master_Log_Pos (the position up to which events have been executed). If the master fails while a transaction is only partially fetched, the positions diverge and HA waits indefinitely, making failover unreliable.

When to update Read_Master_Log_Pos : The root cause of the mismatch is that Read_Master_Log_Pos is updated per event, while Exec_Master_Log_Pos updates only after a full transaction is replayed. By changing the IO thread to update Read_Master_Log_Pos only after a complete transaction has been read, the two positions stay synchronized regardless of when the master crashes. This adjustment allows HA to safely redirect traffic once the positions are equal.

Implementation considerations : The slave’s IO thread must now parse each event to determine transaction boundaries, handling differences between GTID and non‑GTID modes and between row‑based and statement‑based replication. The necessary changes are limited to the queue_event function, where additional logic can be added to detect transaction‑end events and update Read_Master_Log_Pos accordingly.

asynchronousMySQLbinlogReplicationHAExec_Master_Log_PosRead_Master_Log_Pos
Tencent Database Technology
Written by

Tencent Database Technology

Tencent's Database R&D team supports internal services such as WeChat Pay, WeChat Red Packets, Tencent Advertising, and Tencent Music, and provides external support on Tencent Cloud for TencentDB products like CynosDB, CDB, and TDSQL. This public account aims to promote and share professional database knowledge, growing together with database enthusiasts.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.