Databases 15 min read

How to Tackle Large MySQL Tables: Partitioning, Sharding, and Archiving Strategies

When MySQL tables grow to millions of rows, insert and query latency increase, schema changes become costly, and irrelevant data bloats storage, so this guide evaluates table size, explains why B+‑tree depth hurts performance, and presents partitioning, sharding, and hot‑cold archiving solutions with practical SQL examples.

Java High-Performance Architecture

Nov 2, 2023

How to Tackle Large MySQL Tables: Partitioning, Sharding, and Archiving Strategies

As business data accumulates, large MySQL tables cause slow inserts, long query times, cumbersome schema changes, and the need to filter only recent records.

Evaluating Table Size

Table size can be assessed from three angles: table capacity, disk usage, and instance capacity.

Table Capacity

For OLTP tables, it is recommended to keep rows under 20 million and total size under 15 GB, with read/write load below 1 600 ops/s.

select count(*) from table_name;

select count(1) from table_name; (use alternative methods for very large tables)

Additional commands to view table metadata:

use database_name;

show table status like 'table_name'\G;

Disk Space

Query the information_schema.tables to get per‑table size:

select table_schema as 'Database',
       table_name as 'Table',
       table_rows as 'Rows',
       truncate(data_length/1024/1024,2) as 'Data_MB',
       truncate(index_length/1024/1024,2) as 'Index_MB'
from information_schema.tables
order by data_length desc, index_length desc;

Keep disk usage below 70 % of capacity; consider moving fast‑growing data to slower storage for archiving.

Instance Capacity

MySQL’s thread‑based model may become a bottleneck under high concurrency; scaling the instance or using multiple instances can improve CPU utilization.

Root Causes of Slow Queries

When a table reaches tens of millions of rows, the B+‑tree index height grows, increasing disk I/O per lookup. InnoDB pages are 16 KB; a leaf page can hold about 16 rows of 1 KB each, while internal nodes store 8‑byte keys plus 6‑byte pointers, allowing roughly 1 170 pointers per page. A B+‑tree of height 2 can store ~18 720 rows, height 3 up to ~21 902 400 rows, illustrating why very large tables degrade performance.

How to Solve Large‑Table Performance Issues

Option 1: Table Partitioning

Partitioning limits query ranges and can improve index hit rates. Benefits include easier data deletion, parallel aggregation, and higher throughput across disks. Limitations: max 1 024 partitions, partition key must include primary/unique keys, no foreign keys, and partitions apply to both data and indexes.

Check partition support:

show variables like '%partition%';

Option 2: Database/Table Sharding

Sharding reduces per‑table row count, lowering B+‑tree height. Horizontal sharding splits rows across multiple tables or databases (e.g., modulo or range based on ID). Vertical sharding separates rarely used columns into a different table, linked by primary key.

Typical horizontal sharding using modulo:

Assign rows to N tables by id % N. After sharding, remove auto_increment from the original tables and generate IDs via a separate sequence or Redis.

Range‑based sharding stores specific ID ranges in dedicated tables, facilitating future expansion but risking hotspot concentration.

Combining hash modulo and range can balance load and scalability.

Option 3: Hot/Cold Archiving

Separate frequently accessed “hot” data (e.g., recent week or month) from older “cold” data. Create an archive table mirroring the original schema, migrate cold rows, and keep hot data in the primary table for fast operations.

Create the archive table with the same structure.

Initialize archive data by copying cold rows.

Subsequent incremental processes move new cold data into the archive.

Selecting the Right Approach

Use partitioning when you need to reduce disk I/O on a single large table; use sharding when you need to increase concurrency across multiple MySQL instances; combine both for very high‑traffic tables. Consider transaction complexity, cross‑shard joins, and operational overhead when deciding.

All three techniques aim to keep MySQL performant under large data volumes.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

sharding mysql Database Optimization B+Tree Table Partitioning

Written by

Java High-Performance Architecture

Sharing Java development articles and resources, including SSM architecture and the Spring ecosystem (Spring Boot, Spring Cloud, MyBatis, Dubbo, Docker), Zookeeper, Redis, architecture design, microservices, message queues, Git, etc.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.