Evaluating Large MySQL Table Size and Optimizing Query Performance with Partitioning, Sharding, and Archiving
This article explains how to assess MySQL table size through table capacity, disk usage, and instance resources, illustrates the impact of B+‑tree index depth on query speed, and presents three practical solutions—table partitioning, database sharding, and hot‑cold data archiving—to improve performance.
When a business table grows, common symptoms include long insert/query times, difficulty adding new columns, and the need to filter only recent data.
We can evaluate the data volume from three perspectives: table capacity, disk space, and instance capacity.
Table capacity considers record count, average row length, growth rate, read/write load, and total size. For OLTP tables, it is recommended not to exceed 20 million rows or 15 GB, with read/write load under 1600 ops/s.
Typical SQL to get row count:
select count(*) from table;
select count(1) from table; -- may timeout on very large tablesTo view detailed table metadata:
use database_name;
show table status like 'table_name';
show table status like 'table_name'\G;Disk space can be inspected with queries on information_schema.tables :
select
table_schema as '数据库',
table_name as '表名',
table_rows as '记录数',
truncate(data_length/1024/1024, 2) as '数据容量(MB)',
truncate(index_length/1024/1024, 2) as '索引容量(MB)'
from information_schema.tables
order by data_length desc, index_length desc;Adding a WHERE clause limits the result to a specific database:
select
table_schema as '数据库',
table_name as '表名',
table_rows as '记录数',
truncate(data_length/1024/1024, 2) as '数据容量(MB)',
truncate(index_length/1024/1024, 2) as '索引容量(MB)'
from information_schema.tables
where table_schema='mysql'
order by data_length desc, index_length desc;Instance capacity notes that MySQL uses a thread‑based model, so a single instance may become a bottleneck under high concurrency.
The root cause of slowdown for very large tables is the increased height of the B+‑tree index. InnoDB pages are 16 KB; a leaf node that stores 1 KB rows can hold 16 rows, while internal nodes store 14‑byte pointers, allowing roughly 1 170 pointers per node. A height‑2 tree can store about 18 720 rows, a height‑3 tree about 21 902 400 rows, which explains why millions of rows still perform well, but higher trees increase disk I/O.
To address the performance degradation, three main strategies are presented:
Solution 1: Table Partitioning
Partitioning limits query ranges and can improve index hit rates. Benefits include easier data deletion, parallel aggregation, and higher throughput. Limitations: maximum 1024 partitions, partition key must include primary/unique keys, no foreign keys, and partitions apply to both data and indexes.
Check partition support:
show variables like '%partition%';
+-------------------+-------+
| Variable_name | Value |
+-------------------+-------+
| have_partitioning | YES |
+-------------------+-------+
1 row in set (0.00 sec)Solution 2: Database Sharding
Sharding reduces table size and B+‑tree height. Two types: horizontal (splitting rows across tables/databases) and vertical (splitting columns). Common schemes:
Modulo: distribute rows by id % N .
Range: assign rows to tables based on value intervals.
Combined hash‑modulo and range for balanced load and future scalability.
Solution 3: Hot‑Cold Data Archiving
Separate frequently accessed “hot” data from older “cold” data, moving the latter to archive tables that mirror the original schema. The process includes creating archive tables, initializing data, handling incremental business data, and retrieving archived data when needed.
Choosing the right approach depends on workload characteristics: high‑traffic tables with massive data benefit from partitioning; tables that can be cleanly split by key are good candidates for sharding; and workloads with clear temporal access patterns suit hot‑cold archiving.
Architect's Guide
Dedicated to sharing programmer-architect skills—Java backend, system, microservice, and distributed architectures—to help you become a senior architect.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.