Databases 11 min read

Why MySQL 8.0 MTS Hangs After 2³¹ Transactions and How to Reproduce the Bug

MySQL 8.0.28 and earlier can deadlock in multi‑threaded slave replication when the internal commit sequence counter overflows after 2³¹ transactions, causing workers to stop waking and the replica to hang; the article explains the underlying slave_preserve_commit_order mechanism, the overflow bug, and a step‑by‑step reproduction method.

dbaplus Community
dbaplus Community
dbaplus Community
Why MySQL 8.0 MTS Hangs After 2³¹ Transactions and How to Reproduce the Bug

Problem description

In MySQL 8.0.28 and earlier, the multi‑threaded slave (MTS) may hang after running for a while. The hang is triggered when an internal 32‑bit integer overflows after 2³¹ transactions, acting as a hidden time bomb.

Relevant bug

https://bugs.mysql.com/bug.php?id=103636

Implementation of slave_preserve_commit_order

The parameter ensures that the commit order on a replica matches the primary. Its implementation consists of two main steps:

During SQL thread dispatch, each transaction’s order is recorded using a sequence number derived from the primary’s binlog order.

Worker threads execute transactions out of order, but must commit in the recorded order.

A global Commit_order_manager tracks this information. Key fields in each worker node include: m_commit_sequence_nr: commit sequence number. value_type m_worker_id: worker thread ID. MDL_context *m_mdl_context: MDL lock information.

memory::Aligned_atomic<Commit_order_queue::enum_worker_stage> m_stage

: state information.

The commit queue ( m_commit_queue) is a lock‑free FIFO that stores worker IDs in dispatch order. Workers check the head of this queue before committing; if the IDs do not match, they wait, changing their state to REQUESTED_GRANT.

Wake‑up logic

When a worker finishes applying a transaction, Commit_order_manager::finish_one attempts to wake the next waiting worker. The wake‑up succeeds only if three conditions are met:

A worker thread exists.

The next worker’s node state is FINISHED_APPLYING or REQUESTED_GRANT.

The next worker’s m_commit_sequence_nr equals the current worker’s m_commit_sequence_nr + 1.

The bug occurs in condition C because the sequence number is stored as an int after being retrieved from an unsigned long long generator. When the counter reaches 2,147,483,647, adding one overflows to -2,147,483,648, breaking the equality check and preventing any further wake‑ups.

Bug reproduction

To reproduce the issue quickly, the global commit sequence generator is initialized near the overflow point (e.g., 2,147,483,640) and the replica is configured with:

mysql> set global transaction_write_set_extraction=XXHASH64;</code><code>mysql> set global binlog_transaction_dependency_tracking=WRITESET;

Eight parallel worker threads are enabled on the replica. After compiling and restarting, the hang can be observed. Debugging is performed by setting a breakpoint around line 286 of rpl_slave_commit_order_manager.cc (inside Commit_order_manager::finish_one) and inspecting the sequence numbers:

(gdb) p *(this->m_workers[next_worker].m_commit_sequence_nr.m_underlying)</code><code>(gdb) p next_seq_nr

The output shows m_commit_sequence_nr as 2,147,483,648 while next_seq_nr is -2,147,483,648, confirming the overflow.

Impact and mitigation

The issue is tied to the slave_preserve_commit_order parameter; disabling it avoids the bug because the commit order manager is not initialized.

Restarting the primary and replica resets the global counter, temporarily fixing the hang.

The bug was fixed in MySQL 8.0.28; the patch changes the type of cs::apply::Commit_order_queue::sequence_type to unsigned long long and adjusts related code.

Conclusion

The hang is a multi‑threaded replication bug caused by a 32‑bit integer overflow in the commit sequence generator.

It only manifests under high concurrency when the replica processes more than 2,147,483,647 transactions.

Disabling slave_preserve_commit_order or upgrading to MySQL 8.0.28+ resolves the problem.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

mysqlMTSinteger overflowbug reproductioncommit orderslave_preserve_commit_order
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.