Three Strategies Big Companies Use to Manage Massive MySQL Tables
The article walks through how to assess MySQL table size, explains why large tables slow down due to B+‑tree index overhead, and compares three practical solutions—table partitioning, database sharding (horizontal, vertical, hash‑range), and hot‑cold archiving—while outlining their trade‑offs and selection criteria.
Scenario
When a business table reaches tens of millions of rows, insert and query latency increase, schema changes become costly, and many rows are inactive.
Evaluating Table Size
Table capacity: record count, average row length, growth rate, read/write QPS. For OLTP tables, keep rows < 20 million and size < 15 GB; QPS < 1 600 /s.
Disk usage: query information_schema.tables to obtain data and index size in MB.
Instance capacity: MySQL’s thread‑based model can become a bottleneck under high concurrency.
Root Cause of Slow Queries
When a table grows to tens of millions of rows, index maintenance cost rises because the B+‑tree height increases, causing more disk I/O per lookup.
InnoDB stores data in 16 KB pages. A B+‑tree of height 2 can hold roughly 2 × 10⁷ rows (e.g., 1 KB rows → 16 rows per leaf, ~1 170 pointers per internal node). Height 3 can store ~2.2 × 10⁷ rows. Larger tables force higher tree levels, increasing I/O.
Solution 1: Table Partitioning
Partitioning splits a single logical table into multiple physical partitions based on a range or list condition, reducing the amount of data scanned per query.
Benefits
More data can be stored than a single disk or file‑system partition.
Obsolete data can be removed by dropping the corresponding partition.
Queries that filter on the partition key read only the relevant partitions, improving index selectivity.
Aggregate functions (e.g., SUM(), COUNT()) can be executed in parallel across partitions.
Data is spread across multiple disks, increasing throughput.
Limitations
Maximum 1 024 partitions per table.
MySQL 5.1 requires integer partition expressions; MySQL 5.5 adds non‑integer support.
If a primary key or unique index is present, all its columns must be part of the partition key.
Foreign keys are not supported on partitioned tables.
Partitioning applies to the whole table and its indexes; you cannot partition only data or only indexes.
Check partition support:
mysql> SHOW VARIABLES LIKE '%partition%';
+-------------------+-------+
| Variable_name | Value |
+-------------------+-------+
| have_partitioning| YES |
+-------------------+-------+
1 row in set (0.00 sec)Solution 2: Database Sharding
Sharding reduces the size of each physical table by distributing rows across multiple tables or databases.
Horizontal Sharding (Modulo)
Example: 40 million rows divided into four tables of 10 million each. Modulo rule determines the target table.
id = 17 → 17 % 4 = 1 → store in user2 tableAfter horizontal sharding, remove auto_increment . Use a temporary ID generator (e.g., Redis INCR ) to assign unique IDs.
Vertical Sharding
Split columns: frequently accessed columns stay in a “hot” table; rarely used columns move to a “cold” table linked by the primary key.
Range Sharding
Rows are assigned to tables based on ID ranges (e.g., IDs 1‑10 M → table 1, 10‑20 M → table 2).
Combined Hash‑Range Sharding
Hash selects a database, then range selects a table within that database, balancing hotspot avoidance and future scalability.
Drawbacks
Distributed transactions become complex and incur high overhead.
Cross‑database joins are impossible; queries may require multiple round‑trips and client‑side aggregation.
Additional data‑management burden: locating data, handling CRUD across shards, and merging results (e.g., per‑shard top‑100 then final merge).
Solution 3: Hot‑Cold Archiving
Separate recent “hot” data (e.g., last week or month) from historical “cold” data. Move cold data to archive tables or separate storage, reducing the active table size.
Archiving Process
Create an archive table with the same schema as the original.
Initialize the archive table with historical rows.
Continuously move new cold data into the archive (incremental batch jobs).
Update the application to read hot data from the primary table and cold data from the archive when needed.
Partition vs. Sharding
Implementation
Partitioning keeps a single logical table; each partition is stored in separate files but the table metadata remains unified.
Sharding creates independent tables (and optionally databases), each with its own .MYD, .MYI, and .frm files.
Performance Impact
Partitioning improves I/O by limiting the amount of data a single query scans.
Sharding improves concurrency by reducing the per‑instance row count and index height.
Complexity
Both can be implemented with simple MERGE -based sharding or native MySQL partitioning; sharding can be more involved when using custom routing logic.
Partitioning is generally easier to set up and transparent to the application.
Choosing an Approach
Data growth rate and whether hot/cold access patterns exist.
Maximum acceptable table size (≈ 20 M rows, ≤ 15 GB).
Need for cross‑shard transactions or joins.
Operational overhead you are willing to manage.
Time‑based hot data often fits hot‑cold archiving or date‑based partitioning. When table size alone drives latency and higher concurrency is required, horizontal sharding (hash or range) is preferable. Vertical sharding helps when only a subset of columns is frequently accessed.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Programmer XiaoFu
xiaofucode.com – a programmer learning guide driven by the pursuit of profit
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
