Databases 15 min read

Evaluating and Optimizing Large MySQL Tables: Partitioning, Sharding, and Hot‑Cold Archiving Strategies

This article explains how to assess MySQL table size using table capacity, disk usage, and instance resources, describes why large tables degrade performance, and presents three practical solutions—partitioning, sharding (horizontal/vertical), and hot‑cold data archiving—along with their trade‑offs and implementation tips.

Architecture Digest

Sep 23, 2024

Evaluating and Optimizing Large MySQL Tables: Partitioning, Sharding, and Hot‑Cold Archiving Strategies

When a business database grows, common pain points include long insert/query times, difficulty adding new columns, and the need to filter only valid data within a time range.

Evaluating Table Size

Three dimensions are used:

Table capacity (rows, average length, growth, read/write volume, total size). For OLTP tables, keep rows < 20 million and size < 15 GB, with read/write < 1600/s.

Disk space – query information_schema.tables to get data and index size.

Instance capacity – MySQL’s thread‑based model may limit CPU utilization under high concurrency.

Why Large Tables Slow Down

When a table reaches tens of millions or billions of rows, the B+‑tree index height grows, increasing disk I/O for each lookup. InnoDB pages are 16 KB; a leaf can store about 16 rows of 1 KB each, while internal nodes store pointers (≈14 B), limiting the number of records per node.

Solution 1: Table Partitioning

Partitioning splits a table’s data into multiple physical files while keeping a single logical table. Benefits include reduced query range, better index hit rate, and easier deletion of obsolete partitions. Limitations: max 1024 partitions, integer‑only expressions in older MySQL versions, and no foreign keys.

<code style="font-family: Consolas, Menlo, Courier, monospace">SELECT COUNT(*) FROM table;</code>
<code style="font-family: Consolas, Menlo, Courier, monospace">SELECT COUNT(1) FROM table;</code>

To view partition status:

<code style="font-family: Consolas, Menlo, Courier, monospace">USE db_name;</code>
<code style="font-family: Consolas, Menlo, Courier, monospace">SHOW TABLE STATUS LIKE 'table_name'\G;</code>

Solution 2: Sharding (Database/Table Splitting)

Sharding reduces per‑table row count, lowering B+‑tree height. Two main types:

Horizontal sharding : split rows across multiple tables or databases based on a rule (e.g., modulo of an ID).

Vertical sharding : split columns into separate tables, keeping frequently used columns in a “hot” table.

Example of modulo sharding:

<code style="font-family: Consolas, Menlo, Courier, monospace">id = 17;  // 17 % 4 = 1 → store in user2 table</code>

After horizontal sharding, avoid auto_increment on each shard; use a global ID generator such as redis INCR.

Solution 3: Hot‑Cold Data Archiving

Separate recent (hot) data from older (cold) data. Archive cold data to another database/table to keep hot tables small and fast.

Typical archiving steps:

Create an archive table with the same schema as the source.

Initialize the archive with historical data.

Continuously move new cold data into the archive.

Choosing the Right Strategy

• Use partitioning when you need range‑based queries and want transparent handling.

• Use sharding when the table size exceeds partition limits or you need to scale write throughput across multiple servers.

• Combine hash‑modulo and range sharding to balance hotspot avoidance and future scalability.

• Apply hot‑cold archiving when data naturally separates by recency, reducing I/O for frequent queries.

Additional Considerations

Transaction management becomes complex across shards; distributed transactions are costly.

Cross‑shard joins are not supported directly; you may need multiple queries and aggregation in the application layer.

Data management overhead increases with more shards/partitions.

By evaluating table size, understanding B+‑tree behavior, and selecting an appropriate combination of partitioning, sharding, and archiving, you can maintain MySQL performance even as data grows to tens or hundreds of millions of rows.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Sharding mysql database optimization partitioning B+Tree HotColdArchiving

Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.