Databases 9 min read

Investigating Data Loss with gh-ost in MySQL AFTER_SYNC Semi‑Sync Replication and Applying a Fix

This article documents a reproducible test that shows gh-ost can lose rows when used on a MySQL 5.7 AFTER_SYNC semi‑synchronous replica, explains the underlying cause, and presents a source‑code modification that prevents the loss.

Aikesheng Open Source Community
Aikesheng Open Source Community
Aikesheng Open Source Community
Investigating Data Loss with gh-ost in MySQL AFTER_SYNC Semi‑Sync Replication and Applying a Fix

Background – A recent post claimed that using gh-ost for online DDL in MySQL AFTER_SYNC mode may cause data loss. The author reproduced the issue by configuring a MySQL 5.7 primary‑secondary setup with semi‑sync replication and a 60‑second artificial delay in gh‑ost.

Environment Preparation

Clone the latest gh‑ost source (v1.1.2) with git clone https://github.com/github/gh-ost.git and build it using the provided build.sh script.

Deploy a MySQL 5.7 master‑slave cluster (1 master, 1 slave) and enable AFTER_SYNC semi‑sync replication.

Configure the master’s rpl_semi_sync_master_timeout to a value larger than the artificial delay (e.g., 120 000 ms).

Validation Steps

Insert a 60‑second sleep at the start of addDMLEventsListener in ./gh-ost-master/go/logic/migrator.go.

Set the master’s semi‑sync timeout to 120 s.

Create a test table t and insert a row (id=1).

Run gh‑ost to execute ALTER TABLE t ENGINE=InnoDB;.

Stop the slave’s IO thread to simulate a lost ACK.

Insert a second row (id=2) on the master while gh‑ost is waiting.

The DDL completes after about 120 seconds, but the newly inserted row (id=2) is missing, confirming data loss.

Principle Analysis

The loss occurs because gh‑ost reads the table’s primary‑key range before the transaction that inserted id=2 is fully committed. In AFTER_SYNC mode the master waits for an ACK from the slave; the transaction remains in the redo log until the timeout expires, so gh‑ost never sees the new key value.

Fix Implementation

A pull request adds a shared read lock and a retry mechanism when gh‑ost fetches the range. The changes are made in ./gh-ost-master/go/sql/builder.go and ./gh-ost-master/go/logic/migrator.go. After recompiling and re‑running the test with the same configuration, the second row persists, proving the fix works.

Precautions

Adjust rpl_semi_sync_master_timeout only on the master.

Set rpl_semi_sync_master_wait_no_slave=ON to ensure the master truly waits for an ACK.

When multiple slaves exist, consider rpl_semi_sync_master_wait_for_slave_count for ACK behavior.

Conclusion

The experiment confirms that gh‑ost can lose data under specific AFTER_SYNC timing conditions, but the provided source‑code fix resolves the issue, making gh‑ost safe for semi‑synchronous environments.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

mysqldatabase migrationData lossgh-ostFixSemi-sync replicationAFTER_SYNC
Aikesheng Open Source Community
Written by

Aikesheng Open Source Community

The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.