Databases 15 min read

Optimizing Large MySQL Tables: Partitioning, Sharding, and Cold Archiving Strategies

This article explains how to evaluate MySQL table size, understand B+‑tree limitations, and improve query performance by using table partitioning, horizontal/vertical sharding, and hot‑cold data archiving, while also comparing the trade‑offs of each approach for practical selection.

Java Architect Essentials
Java Architect Essentials
Java Architect Essentials
Optimizing Large MySQL Tables: Partitioning, Sharding, and Cold Archiving Strategies

When a business table grows to tens of millions or even billions of rows, insert and query latency increase, schema changes become costly, and only recent time‑range data is often needed.

We can assess table size from three perspectives: table capacity (row count, average row length, growth rate, read/write volume, total size), disk space usage, and instance capacity.

For OLTP tables, it is recommended to keep a single table under 20 million rows and 15 GB in size, with read/write throughput below 1,600 ops/s.

Typical row‑count queries such as select count(*) from table or select count(1) from table may time out on large tables, so alternative methods like use database_name followed by show table status like 'table_name' (or with \G for formatted output) are preferred to retrieve row counts and detailed metadata.

Disk‑space queries can be performed with SQL like:

select
  table_schema as '数据库',
  table_name as '表名',
  table_rows as '记录数',
  truncate(data_length/1024/1024, 2) as '数据容量(MB)',
  truncate(index_length/1024/1024, 2) as '索引容量(MB)'
from information_schema.tables
order by data_length desc, index_length desc;

and for a specific schema:

select
  table_schema as '数据库',
  table_name as '表名',
  table_rows as '记录数',
  truncate(data_length/1024/1024, 2) as '数据容量(MB)',
  truncate(index_length/1024/1024, 2) as '索引容量(MB)'
from information_schema.tables
where table_schema='mysql'
order by data_length desc, index_length desc;

It is advisable to keep data‑disk usage under 70 % and consider archiving rapidly growing data to slower storage.

The root cause of performance degradation in very large tables is the increased height of the B+‑tree index: larger trees require more disk I/O for each lookup.

In InnoDB, the minimum storage unit is a 16 KB page. A B+‑tree leaf can store roughly 16 rows of 1 KB each, while internal nodes store 8‑byte primary keys plus 6‑byte pointers, yielding about 1,170 pointers per node. Consequently, a height‑2 B+‑tree can hold ~18,720 rows, a height‑3 tree ~21.9 million rows, which explains why MySQL can handle tens of millions of rows with a 3‑level index.

To address slow queries on massive tables, three solutions are presented:

Table partitioning – dividing a table into logical partitions based on ranges to reduce scan scope and improve index hit rates.

Database sharding (horizontal and vertical) – splitting data across multiple tables or databases to lower per‑table row count and index height.

Hot‑cold data archiving – moving stale data to separate tables or databases, keeping only recent “hot” data in the primary table.

Partitioning benefits include easier data deletion, parallel aggregation, and better I/O distribution, but it is limited to 1,024 partitions, requires integer expressions (MySQL 5.1), and cannot use foreign keys.

Sharding can be implemented via hash‑modulo or range‑based schemes. Hash sharding distributes rows evenly but makes future scaling harder; range sharding aligns with natural data intervals but may create hotspots.

Combining hash and range sharding can mitigate each method’s drawbacks, providing balanced distribution and scalable expansion.

Compared to sharding, partitioning keeps a single logical table (no extra .MYD/.MYI files) and is transparent to applications, while sharding creates independent tables with separate storage files.

Both techniques improve MySQL performance under high concurrency, but sharding focuses on parallelism across tables, whereas partitioning aims to overcome disk I/O limits.

Potential issues with sharding include complex transaction management across databases, inability to perform cross‑database joins, and increased data‑management overhead for operations like global ranking.

Cold archiving follows a two‑step process: create an archive table mirroring the original schema, then migrate stale rows, allowing the primary table to retain only hot data for faster access.

Finally, the article advises selecting the appropriate strategy based on business scenarios: use partitioning for very large tables with uniform access, sharding when high concurrency and horizontal scaling are needed, and hot‑cold archiving when data exhibits clear temporal access patterns.

performance optimizationShardingMySQLB+ Treedatabase partitioningCold Archiving
Java Architect Essentials
Written by

Java Architect Essentials

Committed to sharing quality articles and tutorials to help Java programmers progress from junior to mid-level to senior architect. We curate high-quality learning resources, interview questions, videos, and projects from across the internet to help you systematically improve your Java architecture skills. Follow and reply '1024' to get Java programming resources. Learn together, grow together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.