How to Diagnose and Solve MySQL Table Size Bottlenecks
This article explains how to assess MySQL table size using table capacity, disk usage, and instance limits, reveals why large tables slow down due to B+‑tree overhead, and presents three practical solutions—partitioning, sharding, and hot‑cold archiving—to restore query performance.
Assessing Table Data Volume
We can evaluate a table's data volume from three perspectives: table capacity, disk space, and instance capacity.
Table Capacity
Table capacity is measured by record count, average row length, growth rate, read/write throughput, and total size. For OLTP tables, it is recommended to keep a single table under 20 million rows and its total size below 15 GB, with read/write traffic under 1 600 operations per second.
Typical row‑count queries:
select count(*) from table_name; select count(1) from table_name;(may time out on very large tables)
use database_name; show table status like 'table_name';or show table status like 'table_name'\G; The show table status command returns detailed metadata such as engine, version, row count, average row length, etc.
Disk Space
To inspect the disk usage of a specific database or all tables, query the information_schema.tables view:
select
table_schema as 'Database',
table_name as 'Table',
table_rows as 'Rows',
truncate(data_length/1024/1024, 2) as 'Data Size (MB)',
truncate(index_length/1024/1024, 2) as 'Index Size (MB)'
from information_schema.tables
order by data_length desc, index_length desc;For a single database (e.g., mysql), add a where table_schema='mysql' clause.
Recommendation: keep data usage below 70 % of the available disk space. For rapidly growing data, consider moving older data to slower, larger disks for archiving.
Instance Capacity
MySQL uses a thread‑per‑connection model, so in high‑concurrency scenarios a single instance may not fully utilize CPU cores, causing throughput to be limited at the MySQL layer. Adjust the instance configuration or consider scaling out with multiple instances.
Root Cause of Slow Queries on Large Tables
When a table reaches tens of millions or hundreds of millions of rows, the benefit of additional indexes diminishes because the B+‑tree height grows, increasing the number of disk I/O operations required for each lookup.
In InnoDB, the minimum storage unit is a 16 KB page. Assuming a 1 KB row size, a leaf page can hold 16 rows. With an 8‑byte BIGINT primary key and a 6‑byte pointer, a non‑leaf page can hold roughly 1 170 pointers.
Thus, a B+‑tree of height 2 can store about 1 170 × 16 ≈ 18 720 rows; a height 3 tree can store about 1 170 × 1 170 × 16 ≈ 21 902 400 rows, which already covers tens of millions of records. Higher tree heights increase I/O and degrade performance.
Solutions for Large‑Table Performance Degradation
Three main strategies are presented: table partitioning, database/table sharding, and hot‑cold data archiving.
Solution 1: Table Partitioning
Partitioning narrows query ranges and can improve index hit rates. It stores data in separate file segments while keeping a single logical table.
Allows more data than a single disk or file system partition.
Facilitates easy deletion of obsolete partitions.
Enables partition‑pruned queries, reducing scan scope.
Improves parallel execution of aggregate functions (e.g., SELECT col, COUNT(*) FROM table GROUP BY col;).
Distributes I/O across multiple disks.
Limitations:
Maximum of 1 024 partitions per table.
Older MySQL versions require integer partition expressions.
All primary‑key or unique‑key columns must be part of the partition key.
Foreign‑key constraints are not supported on partitioned tables.
Partitioning applies to both data and indexes; you cannot partition only one of them.
Check partition support:
show variables like '%partition%';
-- Expected output: have_partitioning | YESSolution 2: Database/Table Sharding
Sharding reduces the amount of data each table holds, thereby lowering B+‑tree height and I/O.
Two common approaches:
Horizontal sharding : split rows across multiple tables or databases based on a rule (e.g., user ID modulo 4).
Vertical sharding : split columns, moving rarely used columns to a separate table while keeping primary‑key relationships.
Typical sharding strategies:
Modulo : target_table = user_id % N. Simple but re‑sharding is costly when N changes.
Range : allocate a continuous ID range to each table, easing future expansion.
Combined hash‑range : first hash to a database group, then range to a table within that group, balancing hotspot avoidance and scalability.
Solution 3: Hot‑Cold Data Archiving
When data exhibits clear hot and cold patterns (e.g., recent week/month is hot), move cold data to an archive table or separate database to keep the hot table small.
Archiving steps:
Create an archive table with a schema identical to the original.
Initialize the archive with historical data.
Continuously migrate old rows from the hot table to the archive.
Query hot data from the primary table; query cold data from the archive when needed.
Choosing the Right Strategy
Select a solution based on workload characteristics:
If queries often filter by a date range, partitioning by that range is straightforward.
If the table is extremely large and write traffic is high, horizontal sharding provides the most reduction in B+‑tree height.
If only a portion of the data is rarely accessed, hot‑cold archiving minimizes impact on the primary workload.
Complex scenarios may combine partitioning with sharding or archiving to balance performance, scalability, and operational simplicity.
Architect's Guide
Dedicated to sharing programmer-architect skills—Java backend, system, microservice, and distributed architectures—to help you become a senior architect.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
