Databases 13 min read

Troubleshooting MySQL Master‑Slave Replication: Relay Log Corruption and GTID Skipping

This article explains a MySQL master‑slave synchronization failure caused by a corrupted relay log, walks through the replication principles, analyzes slave status and GTID differences, and provides step‑by‑step commands to skip the problematic GTID and restore replication.

Wukong Talks Architecture
Wukong Talks Architecture
Wukong Talks Architecture
Troubleshooting MySQL Master‑Slave Replication: Relay Log Corruption and GTID Skipping

The author encountered a MySQL master‑slave synchronization issue where the slave's SQL thread stopped because the relay log could not be parsed.

The error message reported a relay log read failure, suggesting possible binlog or relay‑log corruption, network problems, or MySQL bugs.

To understand the problem, the article reviews the replication architecture: the slave runs an I/O thread (Slave_IO_Running) that fetches the master’s binlog and writes it to a local relay log, and an SQL thread (Slave_SQL_Running) that reads the relay log and executes the statements.

By examining show slave status \G output, the author compares key variables such as Master_Log_File , Read_Master_Log_Pos , Relay_Log_File , Relay_Log_Pos , Relay_Master_Log_File , Exec_Master_Log_Pos , Slave_IO_Running , and Slave_SQL_Running . Mismatched positions indicate that the slave is out of sync.

The investigation considers possible causes: network issues, bugs, or a very large transaction (a manual backup of a 250 k‑row table) that produced an oversized binlog and relay log.

GTID analysis shows the master’s GTID set contains ...:8634832 while the slave has executed up to ...:8634831 . Skipping the offending GTID is identified as the recovery path.

Recovery steps are provided: stop slave; reset slave; These commands clear the existing relay logs. set gtid_next='c5d74746-d7ec-11ec-bf8f-0242ac110002:8634832'; begin; commit; set gtid_next=automatic; start slave; . After restarting, the I/O and SQL threads run, the slave catches up, and Seconds_Behind_Master drops to zero.

The article concludes that the relay log was indeed corrupted; by manually advancing the GTID and restarting replication, the slave resumes normal operation without re‑executing the problematic transaction.

MySQLbinlogReplicationTroubleshootingGTIDRelay Log
Wukong Talks Architecture
Written by

Wukong Talks Architecture

Explaining distributed systems and architecture through stories. Author of the "JVM Performance Tuning in Practice" column, open-source author of "Spring Cloud in Practice PassJava", and independently developed a PMP practice quiz mini-program.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.