Databases 10 min read

Why MySQL Single Tables Hit Performance Limits and How B‑Link Trees Help

This article examines the technical reasons why MySQL single tables struggle with massive data volumes, debunks the index‑depth myth, explains InnoDB's SMO concurrency bottlenecks, and shows how B‑Link Tree indexes and heap‑organized tables in GaussDB provide superior performance for large‑scale workloads.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
Why MySQL Single Tables Hit Performance Limits and How B‑Link Trees Help

Recently I read an article titled "I say MySQL tables should not exceed 20 million rows, the interviewer asked me to wait for a notice," which sparked interest in the limits of MySQL single‑table size.

During an interview, a candidate mentioned storing user operation data in MySQL, generating three tables daily for about 50 million rows each by modulo partitioning.

Common lore in the industry claims that tables over 5 million rows should be sharded, and performance sharply degrades beyond 20 million rows. This article analyzes the fundamental reasons why MySQL single tables cannot be too large.

Hypothesis 1: Is it the index depth?

Many assume that when data exceeds 5 million or 20 million rows, the B+Tree height increases, lengthening the search path and degrading performance. In reality, MySQL uses an index‑organized table where leaf nodes store data and non‑leaf nodes store primary‑key‑to‑page mappings.

With an 8‑byte primary key, a non‑leaf node entry occupies 12 bytes (8 bytes key + 4 bytes page offset). Given MySQL's default 16 KB page (≈15 KB usable), a non‑leaf page can hold about 1 280 entries, and each leaf page stores roughly 15 rows (assuming 1 KB per row). Three levels of B+Tree can therefore hold up to 24.6 million rows before the depth increases to four, so index depth does not grow easily.

The impact of longer search paths was more significant on older mechanical disks (≈100 IOPS) and limited memory, but modern SSDs (tens of thousands of IOPS) and terabyte‑scale RAM mitigate this factor.

Hypothesis 2: Is SMO (Structure‑Modification‑Operation) the concurrency bottleneck?

InnoDB uses B+Tree indexes, and B+Tree modifications are non‑atomic. During an SMO, multiple nodes may be altered, requiring concurrency control.

InnoDB employs optimistic and pessimistic locking. Simple leaf‑node updates can use shared (S) locks on root and non‑leaf nodes, and exclusive (X) locks on the leaf if no structural change occurs.

If an SMO is needed (e.g., node split), InnoDB escalates to a pessimistic lock: a global SX lock on the root and X locks on affected nodes, forcing other SMOs to wait. This limits write concurrency for large tables with frequent SMOs.

Solution: B‑Link Tree

B‑Link Tree improves upon B+Tree by adding a link pointer to the right sibling and a high‑key field in each node, allowing lock granularity to be limited to the current level during SMO. This enables concurrent writes without locking the root.

In practice, when a node splits, only the parent node needs to be locked for the final pointer update, avoiding global root locks and greatly improving write concurrency.

GaussDB adopts the B‑Link Tree index structure.

InnoDB Index‑Organized Tables Trigger SMO More Frequently

With a 16 KB page and 1 KB rows, an index‑organized leaf node holds only about 16 rows. Low fan‑out increases SMO frequency, amplifying the concurrency limitation.

Heap‑organized tables, as used by GaussDB, store row pointers in leaf nodes, allowing more rows per leaf and reducing SMO occurrences under the same workload.

Performance Comparison

Two 8U32G servers were set up with MySQL (B+Tree + index‑organized tables) and GaussDB (B‑Link Tree + heap‑organized tables). A 10 GB base table with 10 million rows (1 KB each) was populated, then 10 million additional rows were inserted concurrently.

Result: As concurrency increased, GaussDB’s TPS scaled steadily, while MySQL’s TPS showed little improvement.

Conclusion: MySQL’s inability to handle massive concurrent modifications stems from its index concurrency control design and the use of index‑organized tables, making it suitable mainly for simple primary‑key‑lookup workloads. In contrast, B‑Link Tree and heap‑organized tables in GaussDB deliver superior performance for complex, high‑concurrency scenarios.

Link: https://www.cnblogs.com/huaweiyun/p/17420032.html

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

InnoDBMySQLDatabase PerformanceB-TreeGaussDBSMOB-Link Tree
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.