Databases 9 min read

MySQL Replication Failure Due to GTID Set Loss and Binlog Persistence Issues and Its Resolution

This article analyzes a MySQL replication failure caused by stale GTID execution data and unsynchronized binlog persistence, explains the underlying mechanisms, demonstrates reproducible testing, and provides two practical remediation methods to skip duplicate-key errors and restore replication.

Aikesheng Open Source Community
Aikesheng Open Source Community
Aikesheng Open Source Community
MySQL Replication Failure Due to GTID Set Loss and Binlog Persistence Issues and Its Resolution

1. Environment Description

The MySQL instance runs version 5.7.34‑log. Important replication‑related parameters are shown below:

root@3306 (none)> SELECT @@VERSION;
+------------+
| @@VERSION |
+------------+
| 5.7.34-log |
+------------+
1 row in set (0.01 sec)

Binlog settings:

root@3306 (none)> select @@log_bin,@@log_slave_updates;
+-----------+---------------------+
| @@log_bin | @@log_slave_updates |
+-----------+---------------------+
| 1 | 1 |
+-----------+---------------------+
1 row in set (0.00 sec)

GTID settings:

root@3306 (none)> select @@binlog_gtid_simple_recovery,@@enforce_gtid_consistency,@@gtid_mode;
+-------------------------------+----------------------------+-------------+
| @@binlog_gtid_simple_recovery | @@enforce_gtid_consistency | @@gtid_mode |
+-------------------------------+----------------------------+-------------+
| 1 | ON | ON |
+-------------------------------+----------------------------+-------------+
1 row in set (0.01 sec)

Semi‑synchronous and multi‑thread replication parameters are also displayed in the original article.

2. Fault Phenomenon

After the slave host rebooted, the sql_thread reported errors indicating a duplicate‑key conflict:

root@3306 (none)> show slave status\G
-- ...
Slave_IO_Running: Yes
Slave_SQL_Running: No
Last_Errno: 1062
Last_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction '471c2974-f9bb-11eb-afb1-52540010fb89:88313207' ...
Duplicate entry '14765130' for key 'PRIMARY', Error_code: 1062

The worker thread attempted to replay transaction 471c2974-f9bb-11eb-afb1-52540010fb89:88313207, but the primary key already existed, causing the failure.

3. Fault Analysis

In GTID mode with master_auto_position=1, the slave sends the union of Retrieved_Gtid_Set and Executed_Gtid_Set to the master. After the reboot, Retrieved_Gtid_Set is re‑initialized to empty, while Executed_Gtid_Set retains a stale value, indicating that some GTID transactions were lost from memory.

The gtid_executed variable is persisted either in the mysql.gtid_executed table or in binlog files. When log_bin=ON and log_slave_updates=OFF, the table is updated in real time; when log_slave_updates=ON, updates occur only on binlog rotation, and after a restart the value is reconstructed from the last binlog, which may be outdated.

Checking sync_binlog shows it is set to 600, meaning binlog data is flushed to disk only every 600 transactions, confirming that the binlog was not persisted in real time.

root@3306 (none)> show variables like 'sync_binlog';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| sync_binlog   | 600   |
+---------------+-------+
1 row in set (0.00 sec)

Thus the root cause is: stale gtid_executed → binlog not flushed → duplicate‑key error on replay.

4. Test Verification

A master‑slave environment was built, and sysbench was used to generate concurrent inserts on the master. After forcibly powering off the slave, the same duplicate‑key error reappeared, confirming the analysis.

[email protected]][(none)]> select * from performance_schema.replication_applier_status_by_worker\G
*************************** 1. row ***************************
CHANNEL_NAME:
WORKER_ID: 1
SERVICE_STATE: OFF
LAST_SEEN_TRANSACTION: 4a0ad3da-b89e-11eb-9d0b-000c299b4d6c:452362
LAST_ERROR_NUMBER: 1062
LAST_ERROR_MESSAGE: Worker 1 failed executing transaction '4a0ad3da-b89e-11eb-9d0b-000c299b4d6c:452362' ...

5. Fault Handling

Two remediation approaches are suggested:

5.1 – Execute an empty transaction with the offending GTID to skip the error, then restart the SQL thread:

set gtid_next='xxxxxx';
begin;
commit;
set gtid_next=AUTOMATIC;
start slave sql_thread;

For many missing GTIDs, a stored procedure can automate the process (code omitted for brevity).

5.2 – Configure the slave to skip error 1062 permanently and restart MySQL:

SET GLOBAL slave_skip_errors=1062;
STOP SLAVE;
START SLAVE;

After replication stabilizes, remove the slave_skip_errors setting and restart the server.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

databasemysqlReplicationtroubleshootingGTID
Aikesheng Open Source Community
Written by

Aikesheng Open Source Community

The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.