Why MySQL Tables Should Stay Below 20 Million Rows: A Deep Dive into InnoDB Page Structure
This article examines the practical limits of MySQL single‑table row counts, explains how InnoDB stores data in 16 KB pages, analyzes B+‑tree index structures, and provides formulas and experiments that justify the commonly cited 20 million‑row guideline.
The author starts by questioning the popular advice that a MySQL table should not exceed 20 million rows, noting that many developers repeat this rule without testing it.
Experiment Setup
A simple person table is created and populated using MySQL pseudo‑column rownum and a loop that doubles the number of inserted rows each iteration. By running the insert statement 20 times, roughly 1 million rows are generated; 23 times yields about 8 million rows. The author mentions that the process can be throttled with a WHERE clause to control growth.
When the row count approaches 8–10 million, MySQL may raise The total number of locks exceeds the lock table size. The fix is to increase temporary table memory, e.g.:
SET GLOBAL tmp_table_size = 512*1024*1024; -- 512 MiB
SET GLOBAL innodb_buffer_pool_size = 1*1024*1024*1024; -- 1 GiBMaximum Row Count Theory
The primary key size determines the theoretical row limit: a 32‑bit INT primary key caps at ~2.1 billion rows, while BIGINT extends to ~9.2 quintillion. In practice, storage and index size become bottlenecks before reaching these limits.
InnoDB stores data in 16 KB pages. Each page contains a fixed header/footer and a variable‑size directory, leaving roughly 15 KB for actual data. For an index page, each entry occupies 12 bytes (8‑byte primary key + 4‑byte page number), allowing about 1 280 entries per index page ( 15*1024/12 ≈ 1280). For a data page, assuming an average row size of 1 KB, about 15 rows fit per page.
B+‑Tree Structure
InnoDB indexes are B+‑trees. Leaf pages store the actual rows; non‑leaf pages store only the smallest key of each child page and the child page number. The tree height (Z) is typically 2 or 3 levels. The total row capacity can be approximated by Total = x^(Z‑1) * y, where x is entries per non‑leaf page and y is rows per leaf page.
Using the numbers above (x≈1280, y≈15):
If Z=2, Total ≈ 1 280 × 15 ≈ 19 200 rows.
If Z=3, Total ≈ 1 280² × 15 ≈ 24.6 million rows.
This aligns with the empirical 20 million‑row recommendation. If rows are larger (e.g., 5 KB), only 3 rows fit per leaf page, reducing the total to about 5 million rows for a three‑level tree.
Practical Implications
When a table grows beyond the point where its indexes no longer fit entirely in memory, MySQL must read index pages from disk, causing additional I/O and slower queries. Increasing memory or using partitioning can mitigate the impact, but the underlying page‑size limits remain.
Conclusion
MySQL stores tables in 16 KB pages; not all space is usable for data.
B+‑tree leaf nodes hold actual rows, non‑leaf nodes hold key‑page pointers.
The 20 million‑row guideline emerges from typical three‑level B+‑tree calculations with 1 KB rows.
Larger rows or deeper trees lower the practical limit, while sufficient memory can hide the performance penalty.
Understanding these storage mechanics helps engineers make informed decisions about table design, sharding, and when to consider alternative architectures.
AI Architecture Hub
Focused on sharing high-quality AI content and practical implementation, helping people learn with fewer missteps and become stronger through AI.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
