Databases 16 min read

Handling Large MySQL Tables: Partitioning, Sharding, and Archiving Strategies

This article explains how to assess MySQL table size, identify performance issues caused by massive data volumes, and presents three practical solutions—table partitioning, database sharding, and hot‑cold data archiving—along with detailed SQL examples and considerations for implementation.

Top Architect
Top Architect
Top Architect
Handling Large MySQL Tables: Partitioning, Sharding, and Archiving Strategies

Scenario

When a business table grows to tens of millions of rows, insert and query latency increase, schema changes become costly, and only a subset of data (e.g., recent time ranges) is needed.

Evaluating Table Size

Table Capacity

For OLTP tables, keep rows under 20 million and total size under 15 GB; read/write throughput should stay below 1 600 ops/s.

Typical row‑count query:

select count(*) from table;

When the table is huge, use SHOW TABLE STATUS LIKE 'table_name'\G to view detailed metadata.

Disk Space

Query database and table storage usage:

select table_schema as 'database',
       table_name as 'table',
       table_rows as 'rows',
       truncate(data_length/1024/1024,2) as 'data_MB',
       truncate(index_length/1024/1024,2) as 'index_MB'
from information_schema.tables
order by data_length desc, index_length desc;

Keep disk usage below 70 % of capacity; consider moving fast‑growing data to slower disks for archiving.

Instance Capacity

MySQL’s thread‑per‑connection model can become a bottleneck under high concurrency; scaling instances or using sharding may be required.

Root Causes of Slow Queries

Large tables (>10 M rows) increase B+‑tree height, leading to more disk I/O per lookup. InnoDB pages are 16 KB; a leaf node storing 1 KB rows holds ~16 rows, while internal nodes store pointers (≈14 B) and can hold ~1 170 pointers. A height‑2 B+‑tree can store ~18 720 rows; height‑3 can store ~21 902 400 rows, illustrating why millions of rows degrade performance.

Solutions

1. Table Partitioning

Partitioning splits a single logical table into multiple physical files based on a range or list expression, reducing the data scanned per query and improving index selectivity. Benefits include easier data deletion, parallel aggregation, and higher throughput across disks. Limitations: max 1 024 partitions, integer‑only expressions in MySQL 5.1, foreign keys not supported, and partition key must include all primary/unique columns.

Check partition support:

show variables like '%partition%';

2. Database Sharding (Horizontal/Vertical)

Horizontal sharding distributes rows across multiple tables or databases, lowering per‑table row count and B+‑tree height. Example: split 40 M rows into four tables of 10 M each.

Common sharding strategies:

Modulo : id % 4 determines target table (e.g., id = 17 → table 2).

Range : allocate contiguous id ranges to different tables.

Combined Modulo + Range : first hash to a database group, then range‑partition within the group.

Sharding drawbacks: transaction management across databases, cross‑shard joins, and increased application complexity.

3. Hot‑Cold Data Archiving

Separate frequently accessed “hot” data (e.g., recent week/month) from older “cold” data into archive tables or databases, reducing the active table size and I/O.

Archiving steps:

Create an archive table mirroring the original schema.

Initialize archive data (e.g., copy historical rows).

Continuously move new cold data into the archive.

Query hot data from the primary table; retrieve cold data from the archive when needed.

Choosing the Right Approach

Use partitioning when you need range‑based pruning and can keep a single logical table. Opt for sharding when the overall data volume exceeds what a single instance can handle or when you need to scale write throughput. Combine both when you have massive data with hot‑cold access patterns. Consider archiving for clear temporal separation of data.

Conclusion

The article provides a practical guide to diagnosing large‑table performance issues in MySQL and offers three concrete mitigation techniques, each with advantages, constraints, and implementation tips.

Performance OptimizationShardingMySQLdata archivingdatabase partitioning
Top Architect
Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.