Unraveling MySQL InnoDB Deadlocks in Video Conference Sessions
A detailed investigation of intermittent MySQL InnoDB deadlocks occurring during video‑conference session creation, covering log extraction, lock analysis, source‑code debugging, reproducible test cases, and a simple code change that eliminates the deadlock while explaining the underlying locking mechanisms.
1. Capture Deadlock Information
The issue surfaced as occasional "create meeting" failures. The MySQL error message was:
Deadlock found when trying to get lock; try restarting transaction
Caused by: com.mysql.cj.jdbc.exceptions.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transactionRunning SHOW ENGINE INNODB STATUS exported the full deadlock trace, which was sanitized for privacy.
2. Parse the Log
Key observations from the trace:
Transaction 1 (ID 272174) was executing DELETE FROM session … and waited for an X lock on a row in the session table.
Transaction 2 (ID 272193) was executing INSERT INTO session_endpoint … and waited for an insert‑intention lock (gap lock) on the session_endpoint table.
Transaction 2 already held a record lock on the session table.
The database uses InnoDB with the default REPEATABLE‑READ isolation level on MySQL 5.7.
3. Why session_endpoint Blocks
When InnoDB deletes a row it marks it as deleted but keeps the record on the page. If an INSERT later targets the same page, the insert‑intention lock can clash with the lingering record lock. The log shows info bits 32, which corresponds to the deleted‑flag (0x20UL):
// storage/innobase/include/rem0rec.h
#define REC_INFO_DELETED_FLAG 0x20UL /* record has been delete‑marked */This proves that Transaction 2 tried to insert exactly where Transaction 1 had just marked a row as deleted, causing the lock conflict.
4. Why session Blocks
Transaction 2 also inserted a new session row (ID 965b1face74948039ab0dd8da6daa71d). While Transaction 1 was batch‑deleting sessions, it eventually reached a row that was still locked by Transaction 2, so the delete waited, creating the second part of the deadlock cycle.
5. Reproduce and Debug
To verify the hypothesis, the following DDL and test data were used:
CREATE TABLE `session` (
`id` binary(16) NOT NULL,
`code` varchar(64) DEFAULT NULL,
`topic` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `code_idx` (`code`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
CREATE TABLE `session_endpoint` (
`id` binary(16) NOT NULL,
`nickname` varchar(100) DEFAULT NULL,
`session_id` binary(16) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `session_id_idx` (`session_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;Sample inserts populated both tables with binary UUIDs. The isolation level and version were confirmed with:
SELECT @@tx_isolation;
SELECT @@version;Two concurrent transactions were scripted: T2 inserted a new session record, then T1 performed the batch delete. The delete blocked on the third step, matching the original deadlock.
Debugging was performed by setting breakpoints in the MySQL source (e.g., storage/innobase/lock/lock0lock.cc, sql/sql_delete.cc, row_search_mvcc). Screenshots of the debugger showed the lock mode value mode = LOCK_X | LOCK_ORDINARY (value 3) and the exact point where lock_rec_lock_slow waited for the conflicting lock.
6. Fix the Deadlock
The root cause was the inconsistent order of operations on the two tables. Swapping the delete order resolves the circular wait:
@Transactional(rollbackFor = Exception.class)
public void clearSessions(List<UUID> sessionIds) {
sessionRepository.deleteSessionByIds(sessionIds);
sessionEndpointRepository.deleteSessionEndpointBySessionIds(sessionIds);
}After the change, automated stress tests that create and clean up thousands of empty meetings no longer produce deadlocks.
7. Remaining Questions
Why does InnoDB continue scanning past the last targeted row instead of exiting the loop immediately?
Why does the engine read the next record after the last one and then another unrelated record?
What exactly causes the engine to encounter an uncommitted insert from another transaction during the scan?
These nuances are tied to InnoDB’s internal MVCC and page‑level lock handling and merit further source‑code investigation.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
