Databases 16 min read

Three Strategies Big Companies Use to Manage Massive MySQL Tables

The article walks through how to assess MySQL table size, explains why large tables slow down due to B+‑tree index overhead, and compares three practical solutions—table partitioning, database sharding (horizontal, vertical, hash‑range), and hot‑cold archiving—while outlining their trade‑offs and selection criteria.

Programmer XiaoFu

Oct 10, 2024

Three Strategies Big Companies Use to Manage Massive MySQL Tables

Scenario

When a business table reaches tens of millions of rows, insert and query latency increase, schema changes become costly, and many rows are inactive.

Evaluating Table Size

Table capacity: record count, average row length, growth rate, read/write QPS. For OLTP tables, keep rows < 20 million and size < 15 GB; QPS < 1 600 /s.

Disk usage: query information_schema.tables to obtain data and index size in MB.

Instance capacity: MySQL’s thread‑based model can become a bottleneck under high concurrency.

Root Cause of Slow Queries

When a table grows to tens of millions of rows, index maintenance cost rises because the B+‑tree height increases, causing more disk I/O per lookup.

InnoDB stores data in 16 KB pages. A B+‑tree of height 2 can hold roughly 2 × 10⁷ rows (e.g., 1 KB rows → 16 rows per leaf, ~1 170 pointers per internal node). Height 3 can store ~2.2 × 10⁷ rows. Larger tables force higher tree levels, increasing I/O.

Solution 1: Table Partitioning

Partitioning splits a single logical table into multiple physical partitions based on a range or list condition, reducing the amount of data scanned per query.

Benefits

More data can be stored than a single disk or file‑system partition.

Obsolete data can be removed by dropping the corresponding partition.

Queries that filter on the partition key read only the relevant partitions, improving index selectivity.

Aggregate functions (e.g., SUM(), COUNT()) can be executed in parallel across partitions.

Data is spread across multiple disks, increasing throughput.

Limitations

Maximum 1 024 partitions per table.

MySQL 5.1 requires integer partition expressions; MySQL 5.5 adds non‑integer support.

If a primary key or unique index is present, all its columns must be part of the partition key.

Foreign keys are not supported on partitioned tables.

Partitioning applies to the whole table and its indexes; you cannot partition only data or only indexes.

Check partition support:

mysql> SHOW VARIABLES LIKE '%partition%';
+-------------------+-------+
| Variable_name    | Value |
+-------------------+-------+
| have_partitioning| YES   |
+-------------------+-------+
1 row in set (0.00 sec)

Solution 2: Database Sharding

Sharding reduces the size of each physical table by distributing rows across multiple tables or databases.

Horizontal Sharding (Modulo)

Example: 40 million rows divided into four tables of 10 million each. Modulo rule determines the target table.

id = 17 → 17 % 4 = 1 → store in user2 table

After horizontal sharding, remove auto_increment . Use a temporary ID generator (e.g., Redis INCR ) to assign unique IDs.

Vertical Sharding

Split columns: frequently accessed columns stay in a “hot” table; rarely used columns move to a “cold” table linked by the primary key.

Range Sharding

Rows are assigned to tables based on ID ranges (e.g., IDs 1‑10 M → table 1, 10‑20 M → table 2).

Combined Hash‑Range Sharding

Hash selects a database, then range selects a table within that database, balancing hotspot avoidance and future scalability.

Drawbacks

Distributed transactions become complex and incur high overhead.

Cross‑database joins are impossible; queries may require multiple round‑trips and client‑side aggregation.

Additional data‑management burden: locating data, handling CRUD across shards, and merging results (e.g., per‑shard top‑100 then final merge).

Solution 3: Hot‑Cold Archiving

Separate recent “hot” data (e.g., last week or month) from historical “cold” data. Move cold data to archive tables or separate storage, reducing the active table size.

Archiving Process

Create an archive table with the same schema as the original.

Initialize the archive table with historical rows.

Continuously move new cold data into the archive (incremental batch jobs).

Update the application to read hot data from the primary table and cold data from the archive when needed.

Partition vs. Sharding

Implementation

Partitioning keeps a single logical table; each partition is stored in separate files but the table metadata remains unified.

Sharding creates independent tables (and optionally databases), each with its own .MYD, .MYI, and .frm files.

Performance Impact

Partitioning improves I/O by limiting the amount of data a single query scans.

Sharding improves concurrency by reducing the per‑instance row count and index height.

Complexity

Both can be implemented with simple MERGE -based sharding or native MySQL partitioning; sharding can be more involved when using custom routing logic.

Partitioning is generally easier to set up and transparent to the application.

Choosing an Approach

Data growth rate and whether hot/cold access patterns exist.

Maximum acceptable table size (≈ 20 M rows, ≤ 15 GB).

Need for cross‑shard transactions or joins.

Operational overhead you are willing to manage.

Time‑based hot data often fits hot‑cold archiving or date‑based partitioning. When table size alone drives latency and higher concurrency is required, horizontal sharding (hash or range) is preferable. Vertical sharding helps when only a subset of columns is frequently accessed.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Sharding MySQL Database Scaling Table Partitioning Hot Cold Archiving

Written by

Programmer XiaoFu

xiaofucode.com – a programmer learning guide driven by the pursuit of profit

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Scenario

Evaluating Table Size

Root Cause of Slow Queries

Solution 1: Table Partitioning

Benefits

Limitations

Solution 2: Database Sharding

Horizontal Sharding (Modulo)

Vertical Sharding

Range Sharding

Combined Hash‑Range Sharding

Drawbacks

Solution 3: Hot‑Cold Archiving

Archiving Process

Partition vs. Sharding

Implementation

Performance Impact

Complexity

Choosing an Approach

Programmer XiaoFu

How this landed with the community

Was this worth your time?

0 Comments

Solution 1: Table Partitioning

Solution 2: Database Sharding

Solution 3: Hot‑Cold Archiving