Databases 6 min read

Handling Replication Anomalies in MySQL Slave IO Thread

This article analyzes MySQL replication anomalies caused by master failures or network interruptions that lead to incomplete transaction replay on slaves, demonstrates a reproducible experiment using network delay and iptables, and provides practical guidance for both recovering and permanently handling stalled slave IO threads.

Aikesheng Open Source Community
Aikesheng Open Source Community
Aikesheng Open Source Community
Handling Replication Anomalies in MySQL Slave IO Thread

Replication Anomaly

During replication, if the master crashes or the network is interrupted, the slave IO thread may stop, causing the slave to receive only a part of a transaction. For example, the master executes:

begin;
insert 1;
insert 2;
commit;

The slave might receive only the BEGIN statement, or BEGIN; INSERT 1; , etc., and the slave SQL thread will replay this half‑transaction without committing or rolling back.

When the slave IO thread recovers, the SQL thread waits for the remaining binlog; until then the half‑transaction behaves like a manually executed incomplete transaction.

Experiment Process

Test steps:

##1. On the slave, use tc to simulate network latency, slowing binlog reading
tc qdisc add dev eth0 root netem delay 3000ms 3000ms 100%
##2. On the master, execute a multi‑statement transaction
begin;
update t2 set pad='4' where id < 40;
update t2 set pad='5' where id < 50;
update t2 set pad='6' where id < 60;
commit;
##3. Immediately after commit, cut the network between master and slave
iptables -A OUTPUT -d 172.16.21.4 -j DROP
iptables -A INPUT -s 172.16.21.4 -j DROP

Observations on the slave include a worker thread showing Waiting for an event from Coordinator while simultaneously executing UPDATE t2 SET pad='5' WHERE id < 50 , indicating a contradiction. Additionally, SHOW SLAVE STATUS reports Retrieved_Gtid_Set equal to Executed_Gtid_Set , yet the worker thread is still replaying SQL, revealing another inconsistency.

When Slave IO Thread Cannot Recover

If the slave IO thread remains down for a long time, the SQL thread cannot obtain the remaining binlog, so the half‑transaction stays open and holds its locks, potentially blocking business queries. In a master‑failover scenario, the promoted master may inherit these locks.

The recommended action is to run STOP SLAVE , which forces the SQL thread to roll back the transaction and release locks. Note that STOP SLAVE will wait up to 60 seconds for the IO thread to fetch the remaining binlog before terminating the SQL thread.

When Slave IO Thread Recovers

After the IO thread resumes, if GTID‑based replication is used, the slave will start fetching the full binlog from the current GTID, roll back the partially applied transaction, and then replay the complete set of events.

DatabaseMySQLReplicationTroubleshootingGTIDslave_io_thread
Aikesheng Open Source Community
Written by

Aikesheng Open Source Community

The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.