Databases 11 min read

Why Flush Tables with Read Lock Can Deadlock MySQL Replication and How to Fix It

The article analyzes a rare three‑thread deadlock caused by FLUSH TABLES WITH READ LOCK during parallel replication on MySQL 5.7, explains the MDL lock interactions, shows how to reproduce the issue, and provides practical kill‑thread and configuration solutions to avoid it.

dbaplus Community
dbaplus Community
dbaplus Community
Why Flush Tables with Read Lock Can Deadlock MySQL Replication and How to Fix It

Background

A production MySQL 5.7.18 instance (with minor internal patches) experienced a backup‑related replica stall. The SHOW FULL PROCESSLIST output revealed two deadlocks involving the backup thread and parallel replication workers.

Deadlock Cases

Case 1 (Figure 1) : Thread 162 runs innobackup and holds a global read lock (MDL::global read lock). Thread 144 is the parallel replication coordinator, while threads 145 and 146 are workers. Thread 145 waits for the global read lock to be released, thread 146 holds a global commit lock, and thread 162 blocks on the commit lock held by thread 146, forming a 145→162→146→145 cycle.

Case 2 (Figure 2) : Thread 183 (backup) holds the global read lock, thread 165 is the coordinator, and threads 166/167 are workers. Thread 165 waits for a dependent transaction, while thread 183 blocks on the commit lock held by thread 167, and thread 166 blocks on the read lock held by thread 183, creating a 183→167→166→183 cycle.

MDL Lock Analysis

The deadlocks involve two service‑level MDL locks:

MDL::global read lock (shared)

MDL::global commit lock (shared)

During any data‑modifying transaction MySQL also requests an intention‑exclusive lock on the global MDL key and, before commit, an intention‑exclusive lock on the commit MDL key. These locks are not compatible with the shared locks held by the backup thread.

Reproduction Steps

On the primary, create parallel‑replication transactions and synchronize them with DEBUG_SYNC to pause at the flush stage.

On the replica, modify the source code to insert sleep(20) at critical points in Xid_apply_log_event::do_apply_event_worker and the worker dispatch loop, then rebuild the server.

Example debug commands:

session 1
SET DEBUG_SYNC='waiting_in_the_middle_of_flush_stage SIGNAL s1 WAIT_FOR f';
INSERT INTO test.test VALUES (13); // Transaction A

session 2
SET DEBUG_SYNC='now WAIT_FOR s1';
SET DEBUG_SYNC='bgc_after_enrolling_for_flush_stage SIGNAL f';
INSERT INTO test.test VALUES (16); // Transaction B

Modified code snippet (inserted sleeps):

// Xid_apply_log_event::do_apply_event_worker
if (w->id == 0) {
    std::cout << "before commit" << std::endl;
    sleep(20);
}
// pop_jobs_item
if (worker->id == 0) {
    sleep(20);
}

Resolution

For the first deadlock, killing the backup thread (thread 162) is the safest because the waiting worker (thread 146) is stuck in mysql_cond_wait and cannot be killed.

For the second deadlock, also kill the backup thread (thread 183) to break the cycle.

Prevention

Set kill‑long‑queries‑timeout in XtraBackup to automatically kill long‑running queries that block the global lock.

Use the safe‑slave‑backup option to pause the SQL thread during backup (though this is aggressive).

Disable ordered commit on the replica by setting slave_preserve_commit_order=0, which removes the ordering constraint that contributes to the deadlock.

Adjust kill‑long‑query‑type explicitly (e.g., ALL or SELECT) because the default may not take effect without a proper value.

Summary

The root cause of the replica stall is that FLUSH TABLES WITH READ LOCK acquires two MDL locks (global read and global commit) which are not atomic with transaction execution; combined with parallel replication and ordered commit, three threads can deadlock. The article also uncovers a subtle bug in XtraBackup’s kill‑long‑query‑type handling, emphasizing the need to specify the type explicitly.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

deadlockmysqlxtrabackupparallel replicationflush tablesMDL lock
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.