Databases 17 min read

How to Diagnose and Solve MySQL Table Size Bottlenecks

This article explains how to assess MySQL table size using table capacity, disk usage, and instance limits, reveals why large tables slow down due to B+‑tree overhead, and presents three practical solutions—partitioning, sharding, and hot‑cold archiving—to restore query performance.

Architect's Guide
Architect's Guide
Architect's Guide
How to Diagnose and Solve MySQL Table Size Bottlenecks

Assessing Table Data Volume

We can evaluate a table's data volume from three perspectives: table capacity, disk space, and instance capacity.

Table Capacity

Table capacity is measured by record count, average row length, growth rate, read/write throughput, and total size. For OLTP tables, it is recommended to keep a single table under 20 million rows and its total size below 15 GB, with read/write traffic under 1 600 operations per second.

Typical row‑count queries:

select count(*) from table_name;
select count(1) from table_name;

(may time out on very large tables)

use database_name;
show table status like 'table_name';

or show table status like 'table_name'\G; The show table status command returns detailed metadata such as engine, version, row count, average row length, etc.

Disk Space

To inspect the disk usage of a specific database or all tables, query the information_schema.tables view:

select
  table_schema as 'Database',
  table_name   as 'Table',
  table_rows   as 'Rows',
  truncate(data_length/1024/1024, 2) as 'Data Size (MB)',
  truncate(index_length/1024/1024, 2) as 'Index Size (MB)'
from information_schema.tables
order by data_length desc, index_length desc;

For a single database (e.g., mysql), add a where table_schema='mysql' clause.

Recommendation: keep data usage below 70 % of the available disk space. For rapidly growing data, consider moving older data to slower, larger disks for archiving.

Instance Capacity

MySQL uses a thread‑per‑connection model, so in high‑concurrency scenarios a single instance may not fully utilize CPU cores, causing throughput to be limited at the MySQL layer. Adjust the instance configuration or consider scaling out with multiple instances.

Root Cause of Slow Queries on Large Tables

When a table reaches tens of millions or hundreds of millions of rows, the benefit of additional indexes diminishes because the B+‑tree height grows, increasing the number of disk I/O operations required for each lookup.

In InnoDB, the minimum storage unit is a 16 KB page. Assuming a 1 KB row size, a leaf page can hold 16 rows. With an 8‑byte BIGINT primary key and a 6‑byte pointer, a non‑leaf page can hold roughly 1 170 pointers.

Thus, a B+‑tree of height 2 can store about 1 170 × 16 ≈ 18 720 rows; a height 3 tree can store about 1 170 × 1 170 × 16 ≈ 21 902 400 rows, which already covers tens of millions of records. Higher tree heights increase I/O and degrade performance.

Solutions for Large‑Table Performance Degradation

Three main strategies are presented: table partitioning, database/table sharding, and hot‑cold data archiving.

Solution 1: Table Partitioning

Partitioning narrows query ranges and can improve index hit rates. It stores data in separate file segments while keeping a single logical table.

Allows more data than a single disk or file system partition.

Facilitates easy deletion of obsolete partitions.

Enables partition‑pruned queries, reducing scan scope.

Improves parallel execution of aggregate functions (e.g., SELECT col, COUNT(*) FROM table GROUP BY col;).

Distributes I/O across multiple disks.

Limitations:

Maximum of 1 024 partitions per table.

Older MySQL versions require integer partition expressions.

All primary‑key or unique‑key columns must be part of the partition key.

Foreign‑key constraints are not supported on partitioned tables.

Partitioning applies to both data and indexes; you cannot partition only one of them.

Check partition support:

show variables like '%partition%';
-- Expected output: have_partitioning | YES

Solution 2: Database/Table Sharding

Sharding reduces the amount of data each table holds, thereby lowering B+‑tree height and I/O.

Two common approaches:

Horizontal sharding : split rows across multiple tables or databases based on a rule (e.g., user ID modulo 4).

Vertical sharding : split columns, moving rarely used columns to a separate table while keeping primary‑key relationships.

Typical sharding strategies:

Modulo : target_table = user_id % N. Simple but re‑sharding is costly when N changes.

Range : allocate a continuous ID range to each table, easing future expansion.

Combined hash‑range : first hash to a database group, then range to a table within that group, balancing hotspot avoidance and scalability.

Solution 3: Hot‑Cold Data Archiving

When data exhibits clear hot and cold patterns (e.g., recent week/month is hot), move cold data to an archive table or separate database to keep the hot table small.

Archiving steps:

Create an archive table with a schema identical to the original.

Initialize the archive with historical data.

Continuously migrate old rows from the hot table to the archive.

Query hot data from the primary table; query cold data from the archive when needed.

Choosing the Right Strategy

Select a solution based on workload characteristics:

If queries often filter by a date range, partitioning by that range is straightforward.

If the table is extremely large and write traffic is high, horizontal sharding provides the most reduction in B+‑tree height.

If only a portion of the data is rarely accessed, hot‑cold archiving minimizes impact on the primary workload.

Complex scenarios may combine partitioning with sharding or archiving to balance performance, scalability, and operational simplicity.

Table size and disk usage chart
Table size and disk usage chart
B+ tree structure illustration
B+ tree structure illustration
Horizontal sharding diagram
Horizontal sharding diagram
Vertical sharding diagram
Vertical sharding diagram
Modulo sharding example
Modulo sharding example
Range sharding example
Range sharding example
Combined hash‑range sharding diagram
Combined hash‑range sharding diagram
Archiving workflow
Archiving workflow
Data increment handling
Data increment handling
Data retrieval process
Data retrieval process
Strategy selection matrix
Strategy selection matrix
shardingMySQLB+Treedatabase partitioningHot/Cold Archiving
Architect's Guide
Written by

Architect's Guide

Dedicated to sharing programmer-architect skills—Java backend, system, microservice, and distributed architectures—to help you become a senior architect.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.