Analysis of MySQL Bug #115352: InnoDB Assertion Failure When Killing a Partition‑Table ALTER
This article walks through the investigation of MySQL bug #115352, describing how an assertion failure in InnoDB’s dictionary cache caused a server crash during a killed ALTER TABLE on a partitioned table, the debugging steps taken, and the eventual fix applied in MySQL 8.0.40.
The article starts from a support ticket where a DBA killed a DDL statement that modified a partitioned table, after which MySQL crashed. The error log showed an assertion failure: table->get_ref_count() == 0 .
The relevant backtrace is displayed, followed by a description of previous attempts to reproduce the issue, which were unsuccessful.
Investigation of the corefile revealed that the table object's reference count was not zero ( n_ref_count = 1 ), indicating that another thread was still using the table when InnoDB attempted to remove it.
Further analysis showed that the function dict_table_remove_from_cache_low removes a table from the data‑dictionary object cache after confirming the reference count is zero. The code snippet is reproduced below:
/** Removes a table object from the dictionary cache. */
static void dict_table_remove_from_cache_low(
dict_table_t *table, /*!< in, own: table */
bool lru_evict) /*!< in: true if table being evicted to make room in the table LRU list */
{
dict_foreign_t *foreign;
dict_index_t *index;
ut_ad(table);
ut_ad(dict_lru_validate());
ut_a(table->get_ref_count() == 0);
}The name matching logic in dict_partitioned_table_remove_from_cache compares only the first name_len bytes of the requested table name with the cached object's name. Because the partitioned table name test/a_1#p#p0 shares the first six characters with test/a , the function mistakenly treats them as the same object and removes the wrong entry.
Memory layout diagrams illustrate this prefix match, and the article explains that the bug originates from this insufficient name check.
To reproduce the issue, the author provides a set of SQL commands to create a database and partitioned tables, load the auxiliary table into the dictionary cache, and then repeatedly kill the ALTER statement while another session issues the ALTER repeatedly.
create database test;
create table test.a ( x int)
PARTITION BY RANGE (x) (
PARTITION p0 VALUES LESS THAN (10000),
PARTITION pmax VALUES LESS THAN MAXVALUE
);
create table test.a_1 like test.a;After loading test.a_1 into the cache ( select count(*) from test.a_1; ), the following shell loops are used to kill the ALTER and to issue the ALTER repeatedly:
while true; do { mysql -BNe 'select concat("kill ",id,";") from information_schema.processlist where state = "committing alter table to storage engine";' | mysql -vvv ; } ; done
while true; do { mysql -BNe "ALTER TABLE test.a ADD PARTITION (PARTITION pmax VALUES LESS THAN MAXVALUE);" ; mysql -BNe "ALTER TABLE test.a DROP PARTITION pmax;" ; } ; doneThe bug was filed as MySQL bug #115352 and a Percona report (PS‑9264). Oracle later fixed the problem in MySQL 8.0.40 by adding an extra check for the partition‑separator #p# after the matched prefix:
strncmp(
dict_name::PART_SEPARATOR,
prev_table->name.m_name + name_len,
dict_name::PART_SEPARATOR_LEN
) == 0The patch applied to Percona‑Server 8.0.39 is shown, and the corresponding commit in MySQL 8.0.40 is referenced.
References to the MySQL documentation on the data‑dictionary object cache, the bug report, and the commit details are listed at the end of the article.
Aikesheng Open Source Community
The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.