Investigation of Performance Degradation and Locking Issues in MySQL 5.7.18 Partition Tables
This article investigates the performance degradation and lock timeout issues observed in MySQL 5.7.18 partition tables, reproduces the problem with test cases, analyzes InnoDB lock behavior through source code debugging, validates the root cause related to partition scan locking, and confirms it as a regression bug in MySQL 5.7.
Introduction The author, a MySQL core developer at Ctrip, reports that after upgrading a test environment from MySQL 5.6.21 to 5.7.18, the performance of partitioned tables degraded significantly, manifesting as frequent lock‑timeout events during primary‑key updates.
Problem Description Three observations were made: (1) with version 5.7.18 and partitioned tables, performance drops; (2) converting the same tables to non‑partitioned form restores performance; (3) rolling back to 5.6.21 while keeping partitions also restores performance. This points to the upgrade as the root cause.
Reproduction Steps A minimal test case was built using a simple partitioned table:
CREATE TABLE `t2` (
`id` INT(11) NOT NULL,
`dt` DATETIME NOT NULL,
`data` VARCHAR(10) DEFAULT NULL,
PRIMARY KEY (`id`,`dt`),
KEY `idx_dt` (`dt`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
/*!50100 PARTITION BY RANGE (TO_DAYS(dt))
(PARTITION p20170218 VALUES LESS THAN (736744) ENGINE=InnoDB,
PARTITION p20170219 VALUES LESS THAN (736745) ENGINE=InnoDB,
PARTITION pMax VALUES LESS THAN MAXVALUE ENGINE=InnoDB) */Three rows are inserted, then two concurrent sessions each start a transaction, update different primary‑key rows, and keep the first transaction uncommitted. Session 2 blocks indefinitely, revealing that an update on one partition acquires locks on rows belonging to other partitions.
Inspection of information_schema.innodb_locks shows two lock records, confirming that the first transaction holds a lock on a page that blocks the second transaction.
Further Analysis Debugging the InnoDB source revealed that the lock information is written by fill_innodb_locks_from_cache , which pulls data from a cache object. Tracing the cache usage leads to add_lock_to_cache and ultimately to RecLock::lock_add . The stack trace points to Partition_helper::handle_ordered_index_scan , where the variable m_part_spec.end_part determines the number of rows to lock during an index scan.
Investigation shows that for each UPDATE on a partitioned table, InnoDB incorrectly locks a number of rows equal to the total number of partitions, not just the rows in the target partition. This over‑locking is the fundamental cause of the observed lock‑waits.
Verification Additional rows (ids 4 and 5) were inserted, exceeding the number of partitions in the test case. Updating id 4 succeeds without blocking the earlier transaction on id 1, confirming that rows beyond the partition count are not locked. In real deployments with dozens or hundreds of partitions, this bug would cause massive lock contention.
Conclusion The analysis confirms that MySQL 5.7.18 introduces a regression in the handling of row locks for partitioned tables, causing excessive locking and performance degradation. The issue has been reported to the MySQL community and acknowledged by Oracle as a bug requiring further investigation.
Ctrip Technology
Official Ctrip Technology account, sharing and discussing growth.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.