Databases 22 min read

Production MySQL Deadlocks: Diagnosis Strategies and Permanent Fixes

The article explains how MySQL InnoDB deadlocks occur, details the four necessary conditions, shows how to enable full deadlock logging, demonstrates queries against information_schema and performance_schema, and provides concrete scenarios with code‑level solutions to prevent and resolve deadlocks in production environments.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
Production MySQL Deadlocks: Diagnosis Strategies and Permanent Fixes

What is a MySQL deadlock?

InnoDB uses row‑level locks; when two or more transactions each hold a lock the other needs, a circular wait forms and the involved rows cannot be modified. A deadlock can cause isolated request errors or a complete write‑path blockage.

1. Deadlock formation principles

1.1 Transaction and lock basics

When a transaction updates a row it acquires an exclusive lock that is released only on COMMIT or ROLLBACK.

-- Transaction A locks id=1
BEGIN;
SELECT * FROM accounts WHERE id = 1 FOR UPDATE;  -- exclusive lock on id=1

-- Transaction B locks id=2
BEGIN;
SELECT * FROM accounts WHERE id = 2 FOR UPDATE;  -- exclusive lock on id=2

-- Each transaction now waits for the lock held by the other → deadlock

1.2 Four necessary conditions (Coffman conditions)

Mutual exclusion : a row can be held by only one transaction at a time.

Hold and wait : a transaction holds one lock while requesting another (e.g., Transaction A holds id=1 and waits for id=2).

No preemption : locks are released only by the owning transaction.

Circular wait : a cycle of waiting transactions (A→B, B→A).

InnoDB’s deadlock detector breaks the cycle by rolling back the transaction with the smallest cost, usually the one holding the fewest row locks.

1.3 Lock types and compatibility

Shared lock (S) – SELECT ... LOCK IN SHARE MODE; compatible with other S locks, conflicts with X locks.

Exclusive lock (X) – SELECT ... FOR UPDATE; conflicts with both S and X locks.

Record lock – locks a single index record; the most common row lock.

Gap lock – locks a range between index records; prevents phantom reads.

Next‑Key lock – combination of record lock and gap lock; the default lock in REPEATABLE READ.

Intention lock – table‑level lock (IX/IS) indicating that a transaction will acquire row‑level locks.

Next‑Key lock is a frequent deadlock source because a range query such as WHERE id > 10 AND id < 20 locks the interval (10,20). An insert into that interval is blocked, and repeated attempts can create a deadlock.

-- Transaction A locks the interval (100, +∞)
BEGIN;
SELECT * FROM orders WHERE user_id > 100 FOR UPDATE;

-- Transaction B tries to insert id=101 and is blocked by the gap lock
INSERT INTO orders (id, user_id, amount) VALUES (NULL, 101, 100);

-- Transaction A inserts id=102, which now conflicts with B’s gap lock → deadlock

2. How to locate deadlocks

2.1 Enable full deadlock logging

MySQL records only a summary in the error log. Turn on detailed logging with:

SHOW VARIABLES LIKE 'innodb_print_all_deadlocks';   -- default OFF
SET GLOBAL innodb_print_all_deadlocks = ON;       -- requires SUPER
-- The error log (e.g., /var/log/mysql/error.log) will now contain the complete lock‑wait graph for every deadlock.

2.2 Query information_schema for lock information

SELECT t.trx_id, t.trx_state, t.trx_started, t.trx_rows_locked, t.trx_query,
       l.lock_id, l.lock_mode, l.lock_type, l.lock_table, l.lock_index,
       l.lock_space, l.lock_page, l.lock_rec, l.lock_data
FROM information_schema.INNODB_TRX t
JOIN information_schema.INNODB_LOCKS l ON t.trx_id = l.lock_trx_id
ORDER BY t.trx_started;

To see the waiting relationship:

SELECT requesting_trx.trx_id AS requesting_trx_id,
       requesting_trx.trx_query AS requesting_query,
       blocking_trx.trx_id AS blocking_trx_id,
       blocking_trx.trx_query AS blocking_query,
       blocking_locks.lock_id AS blocking_lock_id,
       blocking_locks.lock_mode AS blocking_lock_mode,
       blocking_locks.lock_type AS blocking_lock_type,
       blocking_locks.lock_table AS blocking_lock_table
FROM information_schema.INNODB_LOCK_WAITS lw
JOIN information_schema.INNODB_TRX requesting_trx ON lw.requesting_trx_id = requesting_trx.trx_id
JOIN information_schema.INNODB_TRX blocking_trx   ON lw.blocking_trx_id   = blocking_trx.trx_id
JOIN information_schema.INNODB_LOCKS blocking_locks ON lw.blocking_lock_id = blocking_locks.lock_id;

2.3 Use performance_schema for lock events (MySQL 8.0)

UPDATE performance_schema.setup_instruments
SET ENABLED = 'YES', TIMED = 'YES'
WHERE NAME LIKE 'wait/lock%';

UPDATE performance_schema.setup_consumers
SET ENABLED = 'YES'
WHERE NAME LIKE '%events_transactions%';

SELECT * FROM performance_schema.events_waits_history_long
WHERE event_name LIKE '%lock%'
ORDER BY TIMER_END DESC
LIMIT 20;

2.4 Interpreting a deadlock report

Typical fields: LOCK WAIT – the transaction is waiting for a lock. HOLDS THE LOCK(S) – locks already owned. WE ROLL BACK TRANSACTION – the chosen victim. lock_mode X locks rec but not gap – a record lock. lock_mode X locks gap before rec – a gap lock.

3. Common deadlock scenarios and fixes

3.1 Scenario 1 – Different order of row access

Problem : Transaction A locks row 1 then row 2; Transaction B locks row 2 then row 1 → circular wait.

# Wrong concurrent write (high deadlock risk)
def transfer_funds_wrong(from_id, to_id, amount):
    with connection.cursor() as cursor:
        cursor.execute("SELECT balance FROM accounts WHERE id = %s FOR UPDATE", (from_id,))
        cursor.execute("SELECT balance FROM accounts WHERE id = %s FOR UPDATE", (to_id,))

# Correct version – lock in a deterministic order
def transfer_funds_correct(from_id, to_id, amount):
    first_id, second_id = (from_id, to_id) if from_id < to_id else (to_id, from_id)
    with connection.cursor() as cursor:
        cursor.execute("SELECT balance FROM accounts WHERE id = %s FOR UPDATE", (first_id,))
        cursor.execute("SELECT balance FROM accounts WHERE id = %s FOR UPDATE", (second_id,))

3.2 Scenario 2 – Index‑driven gap‑lock conflicts

Problem : Range scans or index scans acquire wide Next‑Key locks, blocking inserts.

Use a covering index to limit the locked range.

Lower the isolation level from REPEATABLE READ to READ COMMITTED to reduce gap locks.

-- Create a covering index
CREATE INDEX idx_user_id_covering ON orders(user_id, status, amount);

-- Query that uses the covering index
SELECT status, amount FROM orders WHERE user_id = 100;

-- Change isolation level (session)
SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED;

-- Permanent change (my.cnf)
# transaction-isolation = READ-COMMITTED

3.3 Scenario 3 – Master‑slave delay causing lock escalation

Long‑running transactions on the master hold locks while the slave lags, increasing deadlock probability.

-- Check slave delay
SHOW SLAVE STATUS\G   -- pay attention to Seconds_Behind_Master

-- Optimize slave application speed
STOP SLAVE;
CHANGE MASTER TO MASTER_RETRY_COUNT = 3;
START SLAVE;

3.4 Scenario 4 – Large transactions

Processing too many rows in a single transaction holds locks for a long window.

# Wrong: single transaction updates 100 000 rows
for id in ids:
    cursor.execute("UPDATE orders SET status='processed' WHERE id = %s", (id,))

# Correct: batch updates of 500 rows each
for i in range(0, len(ids), 500):
    batch = ids[i:i+500]
    cursor.execute("BEGIN")
    cursor.execute("UPDATE orders SET status='processed' WHERE id IN (%s)" % ",".join(["%s"]*len(batch)), batch)
    cursor.execute("COMMIT")
    connection.commit()

4. Code‑level deadlock prevention

4.1 Application‑level lock ordering

import threading
LOCK_ORDER = {}

class AccountService:
    def __init__(self, db_connection):
        self.conn = db_connection

    def transfer(self, from_id: int, to_id: int, amount: decimal.Decimal):
        first_id, second_id = sorted([from_id, to_id])
        with self._get_lock(first_id):
            with self._get_lock(second_id):
                self._do_transfer(first_id, second_id, amount)

    def _get_lock(self, account_id: int):
        """Get a per‑account threading lock"""
        if account_id not in LOCK_ORDER:
            LOCK_ORDER[account_id] = threading.Lock()
        return LOCK_ORDER[account_id]

4.2 Lock‑wait timeout configuration

SHOW VARIABLES LIKE 'innodb_lock_wait_timeout';   -- default 50 seconds
SET GLOBAL innodb_lock_wait_timeout = 10;          -- shorten to 10 seconds

# In application code (Python example)
try:
    with connection.cursor() as cursor:
        cursor.execute("SELECT ... FOR UPDATE")
except OperationalError as e:
    if e.args[0] == 1205:  # Lock wait timeout
        logger.error("Lock wait timeout exceeded")
        raise RetryableError("Lock timeout, should retry")

4.3 Limited retry mechanism

MAX_RETRIES = 3
RETRY_DELAY = 0.5  # seconds

for attempt in range(MAX_RETRIES):
    try:
        with connection.cursor() as cursor:
            cursor.execute("BEGIN")
            # lock‑dependent logic here
            cursor.execute("COMMIT")
        return True
    except OperationalError as e:
        if e.args[0] == 1205:
            connection.rollback()
            logger.warning(f"Attempt {attempt+1} failed, retrying…")
            time.sleep(RETRY_DELAY * (attempt + 1))
            continue
        raise
return False

5. Monitoring and prevention

5.1 Continuous monitoring metrics

Innodb_row_lock_waits – alert if > 100 per minute (2× baseline).

Innodb_row_lock_time_avg – alert if > 500 ms (3× baseline).

Threads_connected – alert if > 70 % of max_connections.

Lock_wait_timeout – any occurrence should trigger an alarm.

5.2 Slow‑query correlation

Long‑running queries often lead to deadlocks. Analyse the slow‑query log regularly:

# Show slow‑query variables
mysql -e "SHOW VARIABLES LIKE 'slow_query%';"
mysql -e "SHOW VARIABLES LIKE 'long_query_time';"

# Top 20 slow queries by time
mysqldumpslow -s t -t 20 /var/log/mysql/slow.log

# Top 20 by execution count
mysqldumpslow -s c -t 20 /var/log/mysql/slow.log

5.3 Troubleshooting checklist (sample)

Deadlock error "Deadlock found"

Check error log.

Analyse lock‑wait graph.

Identify circular SQL and adjust order or lock scope.

Lock wait timeout exceeded

Inspect innodb_lock_wait_timeout.

Find long‑running transactions holding locks.

Split large transactions into smaller batches.

Frequent deadlocks on a specific table

Review access patterns.

Check index design.

Consider lowering isolation level.

Slave lag causing master deadlocks

Run SHOW SLAVE STATUS and monitor Seconds_Behind_Master.

Optimize slave I/O/SQL threads or add more slaves.

Occasional deadlocks during batch updates

Analyse lock range of batch SQL.

Process rows in primary‑key order to reduce conflicts.

6. Key take‑aways

When a deadlock occurs in production, immediately export the three InnoDB system tables INNODB_TRX, INNODB_LOCKS and INNODB_LOCK_WAITS, and enable innodb_print_all_deadlocks to capture the full context. Post‑mortem log analysis is often more valuable than on‑the‑spot debugging because the victim transaction may already have been rolled back.

References

MySQL 8.0 Reference Manual – InnoDB Locking

High Performance MySQL, 4th Edition – Chapter 8 (Optimizing Locking Operations)

DeadlockInnoDBMySQLTroubleshootingperformance_schemalock_timeout
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.