Databases 8 min read

Analysis of MySQL Bug #89370: Semi‑Synchronous Replication Stall

This article examines MySQL bug #89370, detailing how configuring semi‑synchronous replication with multiple slaves can cause a slave to stop receiving data for minutes despite normal replication status, and explains the underlying mechanisms, reproduction steps, and root cause of the issue.

Aikesheng Open Source Community
Aikesheng Open Source Community
Aikesheng Open Source Community
Analysis of MySQL Bug #89370: Semi‑Synchronous Replication Stall

This article is part of the "Illustrated MySQL" series produced by the iKesheng R&D team, aiming to provide high‑quality technical analysis of MySQL internals.

The focus is on MySQL bug #89370, whose main symptom is that when semi‑synchronous replication is configured to multiple slaves, one of the slaves may stop receiving data for several minutes while all replication status indicators appear normal.

Reproduction steps

Configure a master with two slaves in semi‑synchronous mode and set rpl_semi_sync_master_wait_for_slave_count=2 while maintaining a certain data load.

Verify that the master’s rpl_semi_sync_master_status is ON, ensuring semi‑sync has not degraded to async.

Change the master setting to rpl_semi_sync_master_wait_for_slave_count=1 .

Restart one slave using stop slave; start slave .

After step 4, the restarted slave may not receive any binlog events for several minutes, even though the replication status on both master and slave shows as OK.

Diagram explanations

Figure 1: Overview of semi‑synchronous replication flow

The diagram shows the role of the semi‑sync plugin during the binlog group commit process, which consists of three phases:

Flush Phase (Figure 2.1): Transactions are written to the binlog buffer and the plugin registers the transaction, updating the binlog file position.

Sync Phase (Figure 4): The binlog is flushed to disk with an fsync operation.

Commit Phase (Figure 5): The system waits for the semi‑sync plugin to confirm that at least one slave has received the transaction before committing.

Semi‑synchronous plugin behavior

After the binlog position is updated (Figure 3), the plugin’s replication thread reads the binlog and sends events to the slave.

The plugin returns a transaction‑completion acknowledgment to the master (Figure 5.1), allowing the master to decide whether to commit.

MySQL introduced the semi‑sync plugin after version 5.5 to avoid data loss when the master crashes. Enabling semi‑sync ensures that a transaction is only acknowledged to the client after at least one slave confirms receipt.

Historically, MySQL 5.6 suffered from bug #13669, where the group commit mechanism was ineffective before the binlog group commit was fully implemented. From MySQL 5.6 onward, true binlog group commit improved performance in high‑concurrency scenarios.

Figure 2: ACK handling in MySQL 5.6

In MySQL 5.6, the master’s replication thread both sends transactions and receives ACKs from slaves; it cannot send the next transaction until the previous ACK is received, creating a performance bottleneck.

Figure 3: ACK handling in MySQL 5.7

MySQL 5.7 separates ACK processing into a dedicated thread within the semi‑sync plugin, allowing transaction sending and ACK reception to run in parallel, dramatically improving semi‑sync performance.

Figure 4: Root cause of the observed defect

The defect occurs in the ACK receive thread (Figure 1) competing with the replication thread (Figure 2) for a mutex lock. The master’s infinite while‑loop while listening for ACKs keeps the lock occupied, preventing a newly started replication thread from acquiring it, which leads to the slave’s slave_io_thread stalling and no data being replicated.

Further reading

MySQL Bug #89370

MySQL Bug #13669

Analysis of MySQL semi‑sync behavior

MySQLBinlogbugDatabase performancegroup commitSemi-synchronous replication
Aikesheng Open Source Community
Written by

Aikesheng Open Source Community

The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.