Understanding MySQL GTID and Position‑Based Replication: Issues and Solutions
This article explains the transition from dual‑primary MySQL to a master‑slave read/write split, outlines the pain points of position‑based replication, and demonstrates how GTID simplifies setup, failover, and monitoring of replication lag while ensuring data consistency.
Hello, I am Wukong.
This article discusses the transition from a dual‑primary MySQL setup to a master‑slave read/write split architecture, the pain points of position‑based replication, and how GTID solves many of these issues, including failover, setup complexity, and data consistency.
1. Background
To ensure high availability, a dual‑primary MySQL cluster was deployed in the test environment, allowing traffic to switch to the other primary when one fails. However, frequent data conflicts led to switching to a master‑slave read/write split mode, with the slave handling I/O‑intensive tasks such as large‑scale statistics.
Dual‑primary mode
The principle diagram of dual‑primary mode is shown below.
Because of frequent primary‑key conflicts, the dual‑primary mode was changed to master‑slave read/write split mode. The master serves read/write, while the slave handles I/O‑intensive tasks.
Position‑based replication
The slave replication uses the position method: specifying the binlog file and position so the slave knows where to start replicating.
Problems encountered:
Problem 1: Primary‑key conflict on slave B due to binlog format mixed , causing replication failure.
Problem 2: Data inconsistency after slave B stops replicating.
Problem 3: Recovery requires fixing failed transactions before restarting replication.
Problem 4: Need to know the exact binlog file and position (position) where replication stopped; even if found, it may be imprecise.
Problem 5: Binlog files older than a few days are purged on the master before the slave can catch up, causing further failures.
Problem 6: Replication lag exists.
This article focuses on problems 4 and 6.
2. Pain points of position‑based replication
2.1 Principle diagram
Diagram of position‑based replication.
Steps:
1. Master generates multiple binlog files.
2. Slave I/O thread requests the specified binlog file and position.
3. Master dump thread provides the requested binlog.
4. Master reads the binlog from the given position and pushes it to the slave.
5. Slave writes the received binlog to its relay log.
6. Slave SQL thread reads and parses the relay log.
7. Slave SQL thread replays the commands.
Pain point 1: Complex steps to start replication
Ensure master and slave are consistent.
Find the master’s binlog position.
Set the slave’s binlog position.
Start the slave replication threads.
Pain point 2: Complex steps to recover replication
Find the position where the slave stopped.
Resolve the failing transaction or skip errors (e.g., slave_skip_errors=1032,1062 ).
Both the initial setup and recovery steps are cumbersome and error‑prone, which GTID introduced in MySQL 5.6 aims to eliminate.
3. GTID solution
3.1 What is GTID?
GTID (Global Transaction Identifier) uniquely identifies each transaction. Example:
c5d74746-d7ec-11ec-bf8f-0242ac110002:1Structure:
GTID=server_uuid:gnoserver_uuid is generated on first startup; gno is an incrementing integer.
3.2 Advantages of GTID
Simpler failover without manual position lookup.
Easier master‑slave setup.
More secure replication.
Continuous, gap‑free GTID ensures data consistency and zero loss.
3.3 Enabling GTID
Modify the configuration files:
#GTID:
gtid_mode=on
enforce_gtid_consistency=onSlave replication parameters:
CHANGE MASTER TO
MASTER_HOST=$host_name
MASTER_PORT=$port
MASTER_USER=$user_name
MASTER_PASSWORD=$password
master_auto_position=1With GTID, MASTER_LOG_FILE and MASTER_LOG_POS are no longer needed.
3.4 GTID synchronization scheme
GTID synchronization works as follows:
GTID scheme: the master calculates the difference between its GTID set and the slave’s GTID set, then pushes the missing binlog entries to the slave.
Steps:
Slave B connects to master A.
Slave B sends its GTID set (y) to master A.
Master A computes the difference (x − y) and identifies missing GTIDs.
Master A reads binlog from the first missing GTID and streams it to the slave.
Slave B writes the binlog to its relay log, and its SQL thread replays it.
Compared with position‑based replication, GTID eliminates manual position setting and is friendlier to operators.
4. How to detect replication lag
Problem 6 concerns replication lag after read/write split. Three methods are presented:
Method 1: Check seconds_behind_master
Run SHOW SLAVE STATUS \G on the slave and examine seconds_behind_master . This metric is in seconds and not very precise.
Method 2: Compare positions
Inspect Master_Log_File and Read_Master_Log_Pos (master’s latest position) versus Relay_Master_Log_File and Exec_Master_Log_Pos (slave’s executed position). If both pairs match, there is no lag.
Method 3: Compare GTID sets
Run SHOW SLAVE STATUS \G and compare Retrieved_Gtid_Set with Executed_Gtid_Set . If the executed set contains the retrieved set, replication is complete.
c5d74746-d7ec-11ec-bf8f-0242ac110002:1-87323GTID comparison is more accurate than seconds_behind_master , though achieving perfect precision may still require semi‑sync replication.
Conclusion: Using GTID simplifies master‑slave synchronization and provides better ways to monitor replication delay.
References: www.passjava.cn, https://time.geekbang.org/column/article/77636, High Performance MySQL 4th Edition, MySQL Performance Optimization Pyramid.
Wukong Talks Architecture
Explaining distributed systems and architecture through stories. Author of the "JVM Performance Tuning in Practice" column, open-source author of "Spring Cloud in Practice PassJava", and independently developed a PMP practice quiz mini-program.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.