Why MySQL Single-Table Data Should Stay Under 20 Million Rows
This article explains how InnoDB stores data in 16 KB pages, how B+Tree indexes work, derives the formula for the maximum number of rows a single MySQL table can hold, and why the practical limit of about twenty million rows exists, along with performance implications and optimization tips.
Introduction
The industry often says “MySQL single‑table data should not exceed ten million rows.” This guideline assumes the InnoDB storage engine, the default B+Tree index structure, and a normal table with a moderate number of columns and typical field types.
How MySQL Stores Data
InnoDB stores data in fixed‑size 16 KB pages. Each page contains a header, a footer, a page directory, and the actual records. The records are kept in the table’s .ibd file as a sequence of pages.
Usable space per page is roughly 15 KB after accounting for header, footer and directory.
Each row is called a record .
Pages are linked by a page number and previous/next pointers, allowing sequential traversal.
From Page to Index – B+Tree Index
Leaf pages store the smallest primary‑key id of each page; non‑leaf pages store the key together with a pointer to the child page, forming a B+Tree. The tree height is usually two or three levels for tables up to tens of millions of rows.
Clustered index : data are stored in the leaf nodes.
Secondary index : leaf nodes store only the key, requiring a lookup to the clustered index (a “back‑lookup”).
Row format : e.g., COMPACT stores transaction ID, rollback pointer, column values, etc.
How MySQL Queries Data
Example query: SELECT * FROM user WHERE id = 5; The engine starts at the root page, binary‑searches non‑leaf pages to locate the leaf page that contains id = 5, then reads that leaf page to fetch the row. The I/O cost equals the tree height (typically 1–3 disk reads).
Query steps summary:
Load root page (often cached in the buffer pool).
Iteratively locate the child page by binary search on non‑leaf pages.
When the leaf page is reached, perform a binary search on the records to find the exact row.
I/O count equals the number of levels traversed.
Why the 20 Million Row Upper Bound?
The maximum rows per table can be approximated by the formula: (fanout ^ (height‑1)) × rows_per_leaf_page For a 16 KB page:
Fanout ≈ 1280 (each index entry is about 12 B; 15 KB usable → 15 000 / 12 ≈ 1280).
Rows per leaf page ≈ 15 when each row is about 1 KB.
Two‑level tree: (1280 ^ 1) × 15 ≈ 20 k rows (≈ 20 million). Three‑level tree: (1280 ^ 2) × 15 ≈ 2.5 M rows, which is the commonly cited “2 kw” (≈ 20 million) limit. When the row count exceeds this, the tree height grows to four, increasing I/O from three reads to four and causing a noticeable performance cliff.
Core Principle Summary
All data are organized by a B+Tree; pages store records; non‑leaf pages provide fast navigation; leaf pages store the actual rows. High fan‑out keeps the tree shallow, limiting the number of disk I/O operations required for a lookup.
Extended Questions
Why 16 KB Page Size?
16 KB balances memory consumption, disk I/O, and index density. Larger pages would reduce the number of pages that fit in memory; smaller pages would increase the tree height and the number of I/O operations.
How to Index Strings?
MySQL uses B+Tree for string columns, sorting by lexicographic order. For long strings, consider prefix indexes or full‑text indexes. For Chinese strings, you can use a pinyin plugin or Unicode collation.
Is There a Limit on Index Length?
In InnoDB the maximum index length is 767 bytes (unless a prefix index is used). With utf8mb4, each character may occupy up to 4 bytes, so a VARCHAR(255) can exceed the limit unless a shorter prefix is specified.
Optimization Suggestions
Keep row size small; avoid storing large TEXT / BLOB columns in the main table.
Plan sharding or partitioning when approaching the 10 M‑20 M row range.
Archive cold data to separate tables or storage.
Conclusion
MySQL’s 16 KB page size and B+Tree index design strike a balance between disk I/O and memory usage. When a table grows beyond the practical 20 million‑row threshold, the B+Tree height increases, leading to extra I/O and a performance cliff. Proper row‑size control, sharding, and archiving can keep the system performant.
DeWu Technology
A platform for sharing and discussing tech knowledge, guiding you toward the cloud of technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
