Databases 20 min read

Why MySQL Tables Shouldn’t Exceed 10 Million Rows – A Deep Dive into InnoDB Pages & B+Tree Limits

This article explains why the industry advises keeping MySQL single‑table row counts below ten million by examining InnoDB’s 16KB page structure, B+‑tree indexing mechanics, fan‑out calculations, and how page size and row size together determine the practical limits and performance cliffs of large tables.

dbaplus Community
dbaplus Community
dbaplus Community
Why MySQL Tables Shouldn’t Exceed 10 Million Rows – A Deep Dive into InnoDB Pages & B+Tree Limits

Introduction

The common rule of thumb that a MySQL table should not contain more than ten million rows is examined from a technical perspective, focusing on InnoDB storage architecture and B+‑tree index characteristics.

How MySQL Stores Data

InnoDB uses fixed‑size 16KB pages. Each page holds rows (records), index entries, and metadata. The data for a table lives in an .ibd file (tablespace). Records are placed into pages; each page contains a header, a footer with a checksum, a page number, and a page directory that enables binary search.

Example table definition:

CREATE TABLE `user` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT COMMENT '主键',
  `name` varchar(100) NOT NULL DEFAULT '' COMMENT '名字',
  `age` int(11) NOT NULL DEFAULT '0' COMMENT '年龄',
  PRIMARY KEY (`id`),
  KEY `idx_age` (`age`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;

1. Data Page

The smallest storage unit in InnoDB is a 16KB page. Rows are stored as records inside pages. Pages are linked by page numbers and previous/next pointers, and a checksum in the footer guarantees integrity.

2. B+‑Tree Index

Leaf pages store full rows (clustered index); non‑leaf pages store only the primary‑key value and a pointer to the child page. Searching a B+‑tree uses binary search on the page directory, turning O(n) scans into O(log n) look‑ups.

Clustered index: rows are organized in a B+‑tree; leaf nodes contain the complete row data.

Secondary index: leaf nodes store only the primary‑key value, requiring a back‑lookup to the clustered index.

Row format: e.g., COMPACT format stores transaction ID, rollback pointer, and column values.

3. Inserting Data Example

When inserting a new row with id=11 into a table that already has rows 1‑10, InnoDB loads the appropriate page, checks available space, and writes the record.

Load page 1 into memory and analyse.

Determine that id=11 belongs after page 5 and check if page 5 has free space.

Write the new record into page 5.

Query Process

Example query: SELECT * FROM user WHERE id = 5. MySQL starts at the root page, follows pointers using binary search until it reaches the leaf page that contains the target id, then loads that leaf page and returns the row. Typically 2–3 page reads (disk I/O) are required.

Query Steps

Load the root page (often cached in the buffer pool).

Binary‑search non‑leaf pages to locate the child page that may contain the target id.

Repeat the process until a leaf page is reached.

Binary‑search the leaf page for the exact row and return the full record.

If the B+‑tree height is 3, the maximum I/O is three page reads; a higher tree adds an extra I/O per level, directly impacting latency.

Why the 20 Million Row “Limit”?

The limit originates from the fan‑out of non‑leaf pages and the number of rows a leaf page can hold. A 16KB page can store roughly 1 280 pointers in a non‑leaf node (each pointer ≈ 12 bytes: 8 bytes for a BIGINT primary key + 4 bytes for the page offset). Assuming a leaf page can hold about 15 rows (1 KB per row), the total rows a table can store is: rows ≈ (fan‑out)^(height‑1) × rows‑per‑leaf For a two‑level tree (height = 2) → ≈ 2 × 10⁴ rows; for a three‑level tree (height = 3) → ≈ 2.5 × 10⁵ rows. If rows are smaller (e.g., 250 bytes), a leaf can hold ~60 rows, raising the three‑level capacity to about 1 × 10⁸ rows.

When the row count passes the point where the tree height must increase (e.g., from 3 to 4), each query needs an additional disk I/O, creating a noticeable performance cliff.

Design Choices Behind 16KB Page Size

Sixteen kilobytes strike a balance among memory consumption, disk I/O, and index depth. Larger pages would reduce fan‑out and increase I/O per page; smaller pages would increase tree height and overhead. The size also aligns well with typical OS page sizes and cache line boundaries.

Indexing Strings

MySQL can index VARCHAR columns using B+‑tree indexes; sorting follows lexicographic order. For long strings, prefix indexes or full‑text indexes improve performance. Chinese strings can be indexed by installing a pinyin parser plugin or by using a Unicode collation.

INSTALL PLUGIN pinyin SONAME 'ha_pinyin.so';
CREATE INDEX idx_name_pinyin ON mytable(name) USING BTREE WITH PARSER pinyin;
SELECT * FROM mytable ORDER BY name COLLATE pinyin;
CREATE INDEX idx_name_unicode ON mytable(name) USING BTREE;
SELECT * FROM mytable ORDER BY name COLLATE utf8mb4_unicode_ci;

Practical Recommendations

Keep row size moderate; avoid storing large TEXT/BLOB columns directly in the main table.

Plan sharding or partitioning before a table approaches ten‑million rows.

Archive cold historical data to separate tables.

Conclusion

MySQL’s 16KB page size and high fan‑out B+‑tree provide good performance up to a few tens of millions of rows. Beyond that, the tree height grows, adding extra disk I/O and causing a performance cliff. Proper schema design, monitoring row size, and early sharding keep the system within the optimal performance envelope.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

indexingInnoDBmysqlB+TreeDatabaseDesignPageSize
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.