Managing Large MySQL Tables: Evaluation, Root Causes, and Optimization Strategies
This article explains how to assess MySQL table size, why massive tables degrade performance due to B+‑tree depth, and presents three practical solutions—table partitioning, database sharding, and hot‑cold data archiving—along with guidance on selecting the appropriate approach.
When a MySQL table grows to tens of millions or even billions of rows, insert and query latency increase, schema changes become costly, and only a subset of data (e.g., recent time windows) is often needed.
Evaluating Table Size – Consider table capacity, disk usage, and instance capacity. For OLTP tables, keep rows under 20 million and total size under 15 GB, with read/write throughput below 1,600 ops/s. Common queries include:
select count(*) from table; show table status like 'table_name'\G;Disk usage can be inspected via information_schema.tables :
select table_schema as 'Database',
table_name as 'Table',
table_rows as 'Rows',
truncate(data_length/1024/1024,2) as 'Data_MB',
truncate(index_length/1024/1024,2) as 'Index_MB'
from information_schema.tables
order by data_length desc, index_length desc;The recommendation is to keep disk usage below 70 % and consider archiving fast‑growing data to slower storage.
Root Cause of Slow Queries – As table size grows, the B+‑tree index height increases, leading to more disk I/O per lookup. InnoDB pages are 16 KB; a leaf node stores about 16 rows of 1 KB each, while internal nodes store pointers (≈14 B). A height‑2 B+‑tree can hold ~18 k rows, height‑3 about 22 million rows, explaining why millions of rows already stress the index.
Optimization Solutions
Solution 1: Table Partitioning
Partitioning splits a logical table into multiple physical files based on a range or list, reducing the scan range and improving index hit rate. Benefits include easier data deletion, better parallel aggregation, and higher I/O throughput. Limitations: max 1,024 partitions, partition key must include primary/unique keys, no foreign keys, and partitions apply to both data and indexes.
Check partition support:
show variables like '%partition%';Solution 2: Database Sharding (Horizontal/Vertical)
Sharding reduces per‑table row count, lowering B+‑tree height and I/O. Horizontal sharding distributes rows across multiple tables or databases (e.g., modulo 4). Vertical sharding separates rarely used columns into another table. Common strategies:
Modulo: user_id % 4 determines target table.
Range: rows within a numeric range go to a specific table.
Combined hash‑plus‑range for balanced load and future scalability.
Solution 3: Hot‑Cold Data Archiving
Identify hot data (e.g., last week or month) and keep it in the primary table, while moving older (cold) data to archive tables or separate databases, thus keeping the active table small and fast.
Choosing a Strategy – Use partitioning for simple range queries, sharding when write concurrency is high, and archiving when data naturally separates by time. Often a combination yields the best results.
Finally, the author includes promotional links and encourages readers to like, share, and follow the “码猿技术专栏” public account for additional resources.
Code Ape Tech Column
Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.