Databases 18 min read

How to Tame Massive MySQL Tables: Partitioning, Sharding, and Archiving Strategies

This article walks through evaluating massive MySQL tables, explains why large row counts slow queries, and compares three practical solutions—table partitioning, database sharding, and hot/cold archiving—while highlighting their trade‑offs and offering guidance on selecting the right approach for a given workload.

Architect

Oct 3, 2024

How to Tame Massive MySQL Tables: Partitioning, Sharding, and Archiving Strategies

Scenario

When a business table grows to tens of millions or even billions of rows, insert and query latency increase, schema changes become painful, and many rows are no longer actively used, leading to performance bottlenecks.

Long insert/query times

Schema changes affect many rows

Only recent data is frequently accessed

Evaluating Table Size

Three dimensions are used to assess data volume: table capacity, disk usage, and instance capacity.

Table Capacity

For OLTP tables, keep rows below 20 million and total size under 15 GB; keep read/write QPS under 1 600 per second.

select count(*) from table;
select count(1) from table;

When row counts are huge, these count queries may time out, so use metadata queries instead:

use db_name;
show table status like 'table_name';
show table status like 'table_name'\G;

Disk Space

select
  table_schema as 'Database',
  table_name   as 'Table',
  table_rows   as 'Rows',
  truncate(data_length/1024/1024,2) as 'Data Size (MB)',
  truncate(index_length/1024/1024,2) as 'Index Size (MB)'
from information_schema.tables
order by data_length desc, index_length desc;

Result example (image omitted) shows that disk usage should stay below 70 % of capacity; fast‑growing tables may need a larger slow‑disk for archiving.

Instance Capacity

MySQL uses a thread‑per‑connection model; under high concurrency a single instance may become a bottleneck, so consider scaling out or using multiple instances.

Root Causes of Slow Queries on Large Tables

When a table reaches tens of millions of rows, index effectiveness drops because the B+‑tree height grows, leading to more disk I/O per lookup.

InnoDB stores data in 16 KB pages. A leaf page holding 1 KB rows can store 16 rows; a non‑leaf node with 14‑byte pointers can hold about 1 170 pointers. Thus a B+‑tree of height 2 can store roughly 18 720 rows, height 3 about 21 902 400 rows, matching the observed limits. Larger tables increase tree height, causing more I/O and slower queries.

Solution 1: Table Partitioning

Partitioning splits a single logical table into multiple physical partitions based on a range or list expression, allowing the optimizer to scan only relevant partitions.

Benefits

More data can be stored than a single file system partition.

Obsolete data can be dropped by dropping whole partitions.

Queries that filter on the partition key can skip irrelevant partitions, improving performance.

Aggregations (SUM, COUNT) can be executed in parallel across partitions.

Data is spread across multiple disks, increasing throughput.

Limitations

Maximum 1 024 partitions per table.

Older MySQL versions require integer partition expressions.

All primary‑key or unique‑key columns must be included in the partition key.

No foreign‑key constraints on partitioned tables.

Partitioning applies to the whole table, not just data or index separately.

Check partition support:

show variables like '%partition%';

Solution 2: Database Sharding (Horizontal & Vertical Splitting)

Sharding reduces the size of each individual table by distributing rows across multiple tables or databases.

Horizontal Sharding

Rows are split based on a rule (e.g., modulo of an ID). Example: 40 million rows divided into four tables of 10 million each.

Vertical Sharding

Columns are split into separate tables based on usage patterns, keeping frequently accessed columns together and moving rarely used columns to another table.

Drawbacks include increased JOIN complexity and potential data consistency challenges.

Sharding Strategies

Modulo : id % N determines target table. Example: id=17, N=4 → table2.

Range : IDs within a range go to a specific table.

Combined Hash‑Range : First hash to a database, then range to a table within that database.

Solution 3: Hot/Cold Archiving

Separate rarely accessed (cold) data into archive tables or databases, keeping hot data (e.g., last week or month) in the primary tables for fast access.

Archiving Process

Create an archive table with the same schema as the source table.

Initialize the archive with historical data.

Continuously move new cold data into the archive.

Update application logic to read/write hot data from the main table and cold data from the archive.

Differences and Trade‑offs Between Partitioning and Sharding

Implementation : Partitioning keeps a single logical table; sharding creates multiple physical tables or databases.

Performance : Partitioning reduces I/O by limiting scan range; sharding improves concurrency by spreading load across instances.

Complexity : Sharding often requires application‑level routing and may introduce cross‑shard JOIN challenges; partitioning is transparent to the application.

Choosing the Right Approach

Consider the following factors:

Data growth rate – fast‑growing tables benefit from sharding or combined hash‑range.

Query patterns – range‑based queries favor partitioning; uniform key distribution favors sharding.

Operational overhead – partitioning is easier to manage; sharding adds routing and consistency complexity.

Hot vs. cold data – archiving is ideal when a clear temporal separation exists.

Often a hybrid solution (e.g., partitioned tables within each shard) provides the best balance of scalability and maintainability.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Optimization Sharding mysql Large Tables database partitioning Hot Cold Archiving

Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.