Why MySQL Single‑Table Data Should Stay Below 10 Million Rows – B+Tree, 16KB Pages & Performance Secrets
This article explains how MySQL stores data in 16KB pages, uses B+‑tree indexes to locate rows, derives the commonly cited 20‑million‑row limit from page fan‑out and row size calculations, compares B‑tree and B+‑tree structures, and offers practical optimization tips.
Introduction
The core question is why the industry often advises that a MySQL single table should not exceed the ten‑million‑row range.
How MySQL Stores Data
Assumptions: InnoDB storage engine, default B+‑tree index, ordinary tables with a moderate number of columns and typical field types.
Data Page Structure
InnoDB’s smallest storage unit is a fixed 16KB page that holds table rows, indexes, and metadata. Pages are loaded into memory as whole units to reduce disk I/O.
Each table is stored in an .ibd file (tablespace). Although rows appear contiguous in the logical table, physically they are split across many 16KB pages.
Every page contains a header with a page number and forward/backward pointers, a checksum at the tail, and a page directory that enables binary search, turning linear scans (O(n)) into O(log n) lookups.
From Page to B+Tree Index
Pages are linked together forming a double‑linked list. By selecting the smallest primary‑key record on each page and storing its key and page number, a new upper‑level page is created, adding a page level field. This creates a hierarchical structure that resembles an inverted tree – the B+‑tree.
Leaf level (page level 0) stores full rows; non‑leaf levels store only primary‑key values and pointers to child pages. The fan‑out of a typical 16KB page is about 1,280 pointers (12 bytes per entry, ~15 KB usable space).
Storing Data Example
CREATE TABLE `user` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT COMMENT '主键',
`name` varchar(100) NOT NULL DEFAULT '' COMMENT '名字',
`age` int(11) NOT NULL DEFAULT '0' COMMENT '年龄',
PRIMARY KEY (`id`),
KEY `idx_age` (`age`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;When inserting a new row, MySQL loads the relevant page, checks the key range, and writes the record into the appropriate page if space permits.
How MySQL Queries Data
To find a row (e.g., SELECT * FROM user WHERE id = 5), MySQL starts at the root page, follows pointers using binary search, and reaches the leaf page containing the target row. Typically only 1–3 pages are accessed, resulting in 1–3 disk I/Os.
Query Steps Summary
Load root page (often cached).
Binary‑search non‑leaf pages to locate the child page range.
Repeat until leaf page is reached.
Binary‑search within leaf page to find the exact row.
I/O count: tree height = 3 → up to 3 disk reads; height = 4 → 4 reads.
Calculating the 20 Million Row Limit
The formula for maximum rows in a B+‑tree is (fanout ^ (height‑1)) × rows_per_leaf_page. With a fan‑out of ~1,280 and ~15 rows per leaf (1 KB per row), a two‑level tree holds ~2 × 10⁴ rows, while a three‑level tree holds ~2.5 × 10⁵ rows, explaining the “2 kw” (≈20 million) recommendation.
If rows are smaller (≈250 bytes), a leaf can hold ~60 rows, and a three‑level tree can store about 1 billion rows without exceeding the three‑I/O limit.
Fan‑out Calculation
Each non‑leaf entry stores an 8‑byte BIGINT primary key plus a 4‑byte page number (~12 bytes). After subtracting ~1 KB for headers and directory, ~15 KB remain, yielding roughly 1,280 pointers per page.
Row Capacity per Leaf
Assuming 1 KB per row, a leaf page holds about 15 rows; with 250 bytes per row, it holds about 60 rows.
B‑Tree vs B+‑Tree
B‑tree stores rows in both leaf and non‑leaf nodes, drastically reducing fan‑out (≈15 pointers per page) and increasing tree height. To store ~20 million rows, a B‑tree would need ≥6 levels, causing up to 6 disk I/Os per query, whereas a B+‑tree needs only 2–3 levels.
Optimization Suggestions
Keep row size small; avoid large TEXT/BLOB columns.
Consider sharding or partitioning when a table approaches the ten‑million‑row range.
Archive cold data to separate tables.
Extended Questions
Why 16KB Page Size?
16KB balances memory usage, disk I/O, and index depth: larger pages waste memory and increase I/O; smaller pages increase tree height.
How to Index Strings?
MySQL can index VARCHAR columns using B+‑tree with lexicographic ordering. For long strings, use prefix indexes or full‑text indexes. For Chinese strings, either use a pinyin parser plugin or the utf8mb4_unicode_ci collation.
INSTALL PLUGIN pinyin SONAME 'ha_pinyin.so'; CREATE INDEX idx_name_pinyin ON mytable(name) USING BTREE WITH PARSER pinyin; CREATE INDEX idx_name_unicode ON mytable(name) USING BTREE;Is There a Limit on Index Length?
In InnoDB the maximum index length is 767 bytes (prefix indexes exempt). For utf8mb4 each character can take up to 4 bytes, so a VARCHAR(255) may exceed the limit unless a prefix is used.
Sanyou's Java Diary
Passionate about technology, though not great at solving problems; eager to share, never tire of learning!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
