How to Speed Up Massive MySQL Tables: Partitioning, Sharding, and Hot/Cold Archiving
When a MySQL table grows to tens of millions of rows, insert and query latency spikes, schema changes become painful, and only a subset of data is needed; this guide explains how to assess table size, why large tables slow down, and three practical solutions—partitioning, sharding, and hot/cold archiving—along with their trade‑offs and implementation steps.
Scenario
When a business table grows to tens of millions of rows, inserts become slow, query latency increases, schema changes become costly, and most queries only need recent data.
Assessing Table Size
Evaluate data volume from three perspectives:
Table capacity : record count, average row length, growth rate, read/write QPS, and total size. For OLTP tables keep rows < 20 million, total size < 15 GB, and QPS < 1 600 ops/s.
Disk space :
SELECT table_schema AS 'database',
table_name AS 'table',
table_rows AS 'rows',
TRUNCATE(data_length/1024/1024,2) AS 'data_MB',
TRUNCATE(index_length/1024/1024,2) AS 'index_MB'
FROM information_schema.tables
ORDER BY data_length DESC, index_length DESC;Use WHERE table_schema='your_db' to limit to a single database.
Instance capacity : MySQL’s thread‑based model may not fully utilize CPU under high concurrency; consider scaling the instance or using clustering.
Root Cause of Slow Queries
Large tables increase the height of the InnoDB B+Tree index. InnoDB pages are 16 KB; a leaf page holds about 16 rows of 1 KB each, while internal pages store ~1 170 pointers (8‑byte primary key + 6‑byte pointer). A height‑2 tree stores ~18 720 rows, a height‑3 tree ~21.9 million rows. Beyond this, deeper trees cause more disk I/O per lookup, degrading query performance.
Solution 1: Table Partitioning
Partitioning splits a table’s physical storage into multiple files while keeping a single logical name.
Increases storage capacity across disks.
Allows easy deletion of obsolete partitions.
Optimizes queries that target a specific partition via WHERE clauses.
Enables parallel execution of aggregate functions (e.g., SUM(), COUNT()) across partitions.
Improves throughput by distributing data across disks.
Limitations:
Maximum 1 024 partitions per table.
MySQL 5.1 requires integer partition expressions; non‑integer support added in 5.5.
All primary‑key or unique‑key columns must be part of the partition key.
Foreign keys are not supported on partitioned tables.
Partitioning applies to the whole table and its indexes; you cannot partition only data or only indexes.
Check partition support:
SHOW VARIABLES LIKE '%partition%';
+-------------------+-------+
| Variable_name | Value |
+-------------------+-------+
| have_partitioning | YES |
+-------------------+-------+
1 row in set (0.00 sec)Solution 2: Database Sharding (Horizontal / Vertical Splitting)
Horizontal Sharding
Rows are distributed across multiple tables or databases based on a key.
Range sharding : assign contiguous key ranges to separate tables.
Modulo sharding : use id % N to select the target table.
Pros:
Even data distribution reduces hotspot risk.
Cons:
Changing the shard count requires data migration.
Cross‑shard joins become impossible or expensive.
Aggregations (e.g., top‑N) need per‑shard queries and a final merge.
Vertical Sharding
Columns are split into separate tables based on usage frequency. Frequently accessed columns stay in a “hot” table; rarely used columns move to a “cold” table linked by the primary key.
Drawback: queries that need both hot and cold columns require joins, increasing complexity.
Solution 3: Hot/Cold Data Archiving
When access patterns are clearly temporal, move older (cold) data to archive tables or separate databases, keeping only recent (hot) data in the primary tables.
Typical steps:
Create an archive table with the same schema as the source.
Initialize the archive with historical rows, e.g.:
INSERT INTO archive
SELECT * FROM source
WHERE create_time < '2023-01-01';Schedule a regular job to move newly‑cold data from the hot table to the archive and optionally delete it from the hot table.
Choosing the Right Approach
Use the following guidelines to match a solution to your workload:
Table Partitioning : best for very large tables where queries are confined to a date or range and no complex joins are required. It keeps the logical table name unchanged but is limited to 1 024 partitions.
Sharding (horizontal or vertical) : suitable when the table size is large but there is no obvious hot/cold boundary, or when uniform data distribution is needed. It reduces B+Tree height but adds operational complexity and makes cross‑shard joins difficult.
Hot/Cold Archiving : ideal for tables with clear temporal access patterns. It has minimal code changes and low migration impact, but requires a clear archiving rule and extra storage for the archive.
Pick the method that aligns with your data volume, query patterns, and operational constraints.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
