Databases 11 min read

Do 20‑Million‑Row MySQL Tables Still Slow Down on SSDs? Experimental Results

The article investigates whether the long‑standing rule of avoiding MySQL tables larger than 20 million rows still holds on modern SSD‑backed instances by creating tables of varying sizes, measuring query latency under different InnoDB buffer pool configurations, and analyzing the impact of caching and index usage.

ITPUB

Dec 10, 2023

Do 20‑Million‑Row MySQL Tables Still Slow Down on SSDs? Experimental Results

Rumor

There is a widely‑circulated claim that a MySQL table should not exceed 20 million rows because performance degrades sharply once the table passes this soft limit. The claim originated when HDDs were common; the article tests whether it still applies to SSD‑based MySQL in 2023.

Environment

Database: MySQL 8.0.25 on AWS db.r5.large (2 vCPU, 16 GiB RAM) with gp2 SSD storage.

Test client: Linux kernel 6.1 on AWS t2.micro (1 vCPU, 1 GiB RAM).

Experiment Design

Create tables with identical schema but different row counts: 100 k, 200 k, 500 k, 1 M, 2 M, 5 M, 10 M, 20 M, 30 M, 50 M, 60 M.

Populate each table with random person_id, person_name, insert_time, update_time values using a test client script (link omitted).

Run four types of queries on each table:

Full‑table count: SELECT COUNT(*) FROM <table> Primary‑key lookup: SELECT COUNT(*) FROM <table> WHERE id = 12345 Indexed column lookup: SELECT COUNT(*) FROM <table> WHERE insert_time = 12345 Full row fetch by non‑indexed column: SELECT * FROM <table> WHERE person_name = 'XYZ' After each query batch, restart MySQL to flush the InnoDB buffer pool and avoid stale cache effects.

Results

Query 1 – Full‑table count

The first execution is slower because the InnoDB buffer pool is empty and must load the entire table from disk (no‑cache round). Subsequent executions are faster once the data is cached (cache round).

Observations:

First run takes noticeably longer than later runs.

The query forces the whole table into the buffer pool, revealing the size of the pool relative to the table.

If the buffer pool cannot hold the entire table, the latency spikes when the table size exceeds the pool capacity.

With an 11 GiB pool, the spike occurs around a 50 MiB table; with a 7 GiB pool, around 30 MiB; with a 3 GiB pool, around 20 MiB. When the table cannot be fully cached, the query must perform expensive disk I/O, causing a sharp increase in execution time.

Query 2 & 3 – Indexed lookups

These queries use a B+‑tree index. Because they are point lookups, only a few index pages (3‑4 I/O operations) are needed to warm the buffer pool. After the first run, latency drops to ~0.5 ms (network RTT) as the pages are served from memory.

If an index page is evicted, the query incurs another 3‑4 disk I/O operations.

Query 4 – Full row fetch

This query requires two index lookups: one on the indexed column to locate the primary key, then a second lookup on the primary‑key index to retrieve the full row (including non‑indexed columns). Consequently, it loads roughly twice as many pages as Queries 2/3 and runs 1.5‑2× slower (6‑10 ms).

How the Rumor Originated

The original rule was based on HDD‑dominated environments where random I/O was expensive. Engineers limited B+‑tree depth (typically 3‑4 levels) to keep random I/O low. With modern SSDs, random I/O is much cheaper, and the theoretical row capacity of a 4‑level B+‑tree (≈24 million rows for 1 KB rows) is no longer a hard limit.

Using utf8mb4 (4 bytes per character) and ~816 bytes per row, a 4‑level tree can hold about 29.5 million rows, close to the old 20 million guideline but not a strict ceiling.

Conclusion

The relationship between InnoDB buffer‑pool size and table size determines whether performance degradation occurs.

A more meaningful metric for deciding to split a table is the ratio of query latency to buffer‑pool hit rate; if queries always hit the pool, size is irrelevant.

Increasing the InnoDB buffer pool or overall DB memory can mitigate the issue without sharding.

Avoid SELECT * in production when possible, as it forces two index lookups and higher I/O.

Given SSD prevalence, the 20 million‑row rule is no longer a hard limit; it was an empirical guideline for HDD‑based systems.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

performance InnoDB mysql Large Tables SSD Buffer Pool

Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Rumor

Environment

Experiment Design

Results

Query 1 – Full‑table count

Query 2 & 3 – Indexed lookups

Query 4 – Full row fetch

How the Rumor Originated

Conclusion

ITPUB

How this landed with the community

Was this worth your time?

0 Comments

Query 1 – Full‑table count

Query 2 & 3 – Indexed lookups

Query 4 – Full row fetch