Do 20‑Million‑Row MySQL Tables Still Slow Down on SSDs? Experimental Results
The article investigates whether the long‑standing rule of avoiding MySQL tables larger than 20 million rows still holds on modern SSD‑backed instances by creating tables of varying sizes, measuring query latency under different InnoDB buffer pool configurations, and analyzing the impact of caching and index usage.
Rumor
There is a widely‑circulated claim that a MySQL table should not exceed 20 million rows because performance degrades sharply once the table passes this soft limit. The claim originated when HDDs were common; the article tests whether it still applies to SSD‑based MySQL in 2023.
Environment
Database: MySQL 8.0.25 on AWS db.r5.large (2 vCPU, 16 GiB RAM) with gp2 SSD storage.
Test client: Linux kernel 6.1 on AWS t2.micro (1 vCPU, 1 GiB RAM).
Experiment Design
Create tables with identical schema but different row counts: 100 k, 200 k, 500 k, 1 M, 2 M, 5 M, 10 M, 20 M, 30 M, 50 M, 60 M.
Populate each table with random person_id, person_name, insert_time, update_time values using a test client script (link omitted).
Run four types of queries on each table:
Full‑table count: SELECT COUNT(*) FROM <table> Primary‑key lookup: SELECT COUNT(*) FROM <table> WHERE id = 12345 Indexed column lookup: SELECT COUNT(*) FROM <table> WHERE insert_time = 12345 Full row fetch by non‑indexed column: SELECT * FROM <table> WHERE person_name = 'XYZ' After each query batch, restart MySQL to flush the InnoDB buffer pool and avoid stale cache effects.
Results
Query 1 – Full‑table count
The first execution is slower because the InnoDB buffer pool is empty and must load the entire table from disk (no‑cache round). Subsequent executions are faster once the data is cached (cache round).
Observations:
First run takes noticeably longer than later runs.
The query forces the whole table into the buffer pool, revealing the size of the pool relative to the table.
If the buffer pool cannot hold the entire table, the latency spikes when the table size exceeds the pool capacity.
With an 11 GiB pool, the spike occurs around a 50 MiB table; with a 7 GiB pool, around 30 MiB; with a 3 GiB pool, around 20 MiB. When the table cannot be fully cached, the query must perform expensive disk I/O, causing a sharp increase in execution time.
Query 2 & 3 – Indexed lookups
These queries use a B+‑tree index. Because they are point lookups, only a few index pages (3‑4 I/O operations) are needed to warm the buffer pool. After the first run, latency drops to ~0.5 ms (network RTT) as the pages are served from memory.
If an index page is evicted, the query incurs another 3‑4 disk I/O operations.
Query 4 – Full row fetch
This query requires two index lookups: one on the indexed column to locate the primary key, then a second lookup on the primary‑key index to retrieve the full row (including non‑indexed columns). Consequently, it loads roughly twice as many pages as Queries 2/3 and runs 1.5‑2× slower (6‑10 ms).
How the Rumor Originated
The original rule was based on HDD‑dominated environments where random I/O was expensive. Engineers limited B+‑tree depth (typically 3‑4 levels) to keep random I/O low. With modern SSDs, random I/O is much cheaper, and the theoretical row capacity of a 4‑level B+‑tree (≈24 million rows for 1 KB rows) is no longer a hard limit.
Using utf8mb4 (4 bytes per character) and ~816 bytes per row, a 4‑level tree can hold about 29.5 million rows, close to the old 20 million guideline but not a strict ceiling.
Conclusion
The relationship between InnoDB buffer‑pool size and table size determines whether performance degradation occurs.
A more meaningful metric for deciding to split a table is the ratio of query latency to buffer‑pool hit rate; if queries always hit the pool, size is irrelevant.
Increasing the InnoDB buffer pool or overall DB memory can mitigate the issue without sharding.
Avoid SELECT * in production when possible, as it forces two index lookups and higher I/O.
Given SSD prevalence, the 20 million‑row rule is no longer a hard limit; it was an empirical guideline for HDD‑based systems.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
