How a DDL Change Can Double MySQL Table Size: The Hidden B+Tree Page‑Split Bug
After a normal DDL change on the OneDBA platform, a table's storage grew nearly 100% and some queries slowed down, revealing a MySQL InnoDB B+Tree page‑split flaw that stores only one row per 16KB page when large records are inserted, which also corrupts statistics and leads to slow SQL.
Problem Background
After a normal DDL change on the OneDBA platform, some SQLs became slow and the table size almost doubled. The root cause was a defect in MySQL's B+Tree page‑split handling when a single row exceeds 8KB.
Index Structure
B+Tree
InnoDB tables are index‑organized; both clustered and secondary indexes use a B+Tree structure. A B+Tree consists of a root page, internal nodes, and leaf pages.
Each page (default 16KB) stores rows in a singly linked list, with two virtual records (Infimum and Supremum) that delimit the range.
Page (page)
A page must store at least two rows. If a row is larger than 8KB, it overflows to an overflow page, and the original page stores only the overflow pointer (20 bytes).
Overflow Page
Rows larger than 8KB are stored in overflow pages; InnoDB chooses TEXT fields first, then VARCHAR fields. This behavior cannot be overridden.
Page Split
When a page becomes full, InnoDB splits it. Two split strategies exist:
Mid‑point split : moves roughly 50% of records to a new page.
Insert‑point split : used for sequential inserts; may split at the insertion point and move records accordingly.
Page split and merge are crucial for storage efficiency, I/O patterns, and locking.
Current DDL Change Mechanism
The platform uses a container‑based, lock‑free DDL tool (based on gh‑ost) that copies all data and then replays binlog increments. The three‑step process can cause large‑ID rows to be inserted before the copied rows, exposing the page‑split defect.
Why the Table Inflates After Change?
During the DDL, large rows are inserted first, triggering page splits that leave many pages with only one row (≈5KB of data in a 16KB page). This halves the rows per page, doubling the number of pages and thus the storage size.
Investigation Process
1. Restored a backup and observed row sizes of 4‑7KB. 2. Analyzed page‑level row distribution before and after the change, finding a shift from ~2 rows/page to ~1 row/page. 3. Reproduced the issue with a test table and a series of INSERT statements, printing page and heap numbers.
CREATE TABLE sbtest (
id int(11) NOT NULL AUTO_INCREMENT,
pad varchar(12000),
PRIMARY KEY (id)
) ENGINE=InnoDB; INSERT INTO sbtest VALUES (10000, REPEAT('a',5120));
INSERT INTO sbtest VALUES (10001, REPEAT('a',5120));
-- subsequent inserts ...The output showed pages splitting and some pages ending up with a single record.
Why Statistics Become Inaccurate?
InnoDB statistics sample leaf pages and count distinct keys. When a page contains only one row, the code subtracts one from the distinct count, resulting in zero for that page and a massive under‑estimation of total rows.
if (n_diff_on_leaf_page > 0) {
n_diff_on_leaf_page--;
}This explains why the primary key statistic dropped from ~200M to ~200K after the change.
Relation Between Statistics and Slow SQL
The optimizer uses the (incorrect) row count to decide whether to use an ordered index (idx_modify_time) or a more selective filter index (idx_user_biz_del). The underestimated row count makes the optimizer favor the ordered index, causing excessive row scans and slow queries.
Temporary Fix
Running a native ALTER TABLE … ENGINE=InnoDB during a low‑traffic window rebuilds the tablespace, eliminating the one‑row‑per‑page situation, but it cannot be throttled and may cause replication lag.
Long‑Term Solution
Modify the DDL tool to copy all data first and apply binlog increments only after the copy completes, avoiding the interleaved inserts that trigger the page‑split bug. This approach aligns the tool’s behavior with native DDL.
Summary
The article explains InnoDB B+Tree mechanics, demonstrates how a DDL change combined with large rows leads to a page‑split defect, causing table bloat and inaccurate statistics, and proposes both immediate and permanent remediation strategies.
DeWu Technology
A platform for sharing and discussing tech knowledge, guiding you toward the cloud of technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
