Evaluating MySQL Table Size and Optimizing Large Tables with Partitioning, Sharding, and Hot‑Cold Archiving
This article explains how to assess MySQL table size from table capacity, disk usage, and instance limits, describes why large tables degrade performance due to B+‑tree depth, and presents three practical solutions—partitioning, sharding, and hot‑cold archiving—to improve query speed and manage growth.
Evaluating Table Size
When a business table grows, you may encounter slow inserts and queries, heavy impact from adding new columns, and the need to query only recent data.
Table Capacity
Table capacity is evaluated by row count, average row length, growth rate, read/write volume, and total size. For OLTP tables, it is recommended not to exceed 20 million rows (≈15 GB) and keep read/write operations under 1,600 ops/s.
Typical row‑count queries:
SELECT COUNT(*) FROM table;
SELECT COUNT(1) FROM table; (may time out on very large tables)
USE database_name;
SHOW TABLE STATUS LIKE 'table_name'\G;
The SHOW TABLE STATUS command returns detailed metadata such as engine, version, row count, and average row size.
Disk Space
Query the size of a specific database:
SELECT table_schema AS 'Database',
table_name AS 'Table',
table_rows AS 'Rows',
TRUNCATE(data_length/1024/1024, 2) AS 'Data Size (MB)',
TRUNCATE(index_length/1024/1024, 2) AS 'Index Size (MB)'
FROM information_schema.tables
ORDER BY data_length DESC, index_length DESC;Query the size of all tables in a single database (e.g., mysql ):
SELECT table_schema AS 'Database',
table_name AS 'Table',
table_rows AS 'Rows',
TRUNCATE(data_length/1024/1024, 2) AS 'Data Size (MB)',
TRUNCATE(index_length/1024/1024, 2) AS 'Index Size (MB)'
FROM information_schema.tables
WHERE table_schema='mysql'
ORDER BY data_length DESC, index_length DESC;It is advisable to keep disk usage below 70 % of the available space; fast‑growing data can be archived to slower storage.
Instance Capacity
MySQL uses a thread‑per‑connection model, so in high‑concurrency scenarios a single instance may become a CPU bottleneck. Consider scaling out with multiple instances.
Root Cause of Slow Queries
When a table reaches tens of millions or billions of rows, the benefit of indexes diminishes because the B+‑tree height increases, leading to more disk I/O per lookup.
The InnoDB page size is 16 KB . Assuming a row size of 1 KB, a leaf page holds 16 rows. With a bigint primary key (8 bytes) and a 6‑byte pointer, a non‑leaf page holds about 1,170 pointers.
Thus, a B+‑tree of height 2 can store roughly 11,700 × 16 ≈ 18,720 rows; height 3 can store about 11,700 × 11,700 × 16 ≈ 21.9 million rows, which satisfies most million‑level workloads. Higher trees increase I/O and degrade performance.
How to Solve Large‑Table Performance Issues
Three common approaches are presented: table partitioning, database/table sharding, and hot‑cold data archiving.
Solution 1: Table Partitioning
Partitioning limits query ranges to specific partitions, improving index hit rates and reducing I/O. Benefits include easier data deletion, parallel aggregation, and higher throughput across multiple disks.
Limitations:
Maximum of 1,024 partitions per table.
Older MySQL versions require integer partition expressions.
All primary‑key or unique‑key columns must be part of the partition key.
Foreign keys are not supported on partitioned tables.
Partitioning applies to the whole table and its indexes.
Check partition support:
mysql> SHOW VARIABLES LIKE '%partition%';
+-------------------+-------+
| Variable_name | Value |
+-------------------+-------+
| have_partitioning | YES |
+-------------------+-------+
1 row in set (0.00 sec)Solution 2: Database/Table Sharding
Sharding reduces the size of each individual table, lowering B+‑tree height and I/O. Two types exist:
Horizontal sharding : split rows across multiple tables or databases based on a rule (e.g., user_id % 4).
Vertical sharding : split columns, keeping frequently accessed columns in one table and rarely used columns in another.
Common sharding strategies:
Modulo : distribute rows by id % N . Simple but re‑sharding is costly when N changes.
Range : allocate a range of IDs to each table. Easy to expand but may create hotspots.
Hash + Range : combine hash modulo for distribution and range for locality, mitigating both hotspot and re‑sharding issues.
Solution 3: Hot‑Cold Archiving
Separate recent (hot) data from older (cold) data. Archive cold data to another database or table, keeping hot tables small for fast queries.
Create an archive table with the same schema as the original.
Initialize the archive with historical data.
Subsequent incremental data is written to the hot table, while periodic jobs move aged rows to the archive.
Choosing the Right Approach
Consider your workload, data growth pattern, and operational constraints. Partitioning is transparent to the application, sharding offers finer‑grained scaling at the cost of complexity, and hot‑cold archiving is ideal when data naturally separates by recency.
Final Note
If this article helped you, please like, view, share, or bookmark it. Your support encourages the author to keep producing quality content.
For deeper learning, the author offers paid knowledge‑sharing groups covering Spring, MyBatis, micro‑services, and large‑scale sharding practices.
Code Ape Tech Column
Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.