How to Solve Read‑Amplification in MySQL Sharding: Strategies & Tips
This article explains the concepts of database sharding, compares vertical and horizontal partitioning, discusses ID‑range and modulo sharding methods, analyzes the read‑amplification problem caused by non‑sharding keys, and presents practical solutions such as auxiliary index tables, Elasticsearch integration, and TiDB adoption.
Sharding Overview
When a single table grows beyond hundreds of thousands of rows, the B+‑tree depth increases, causing more disk I/O and slower queries. To handle large data volumes, developers split the table into multiple smaller tables, a practice known as sharding (or "分库分表").
Vertical vs Horizontal Sharding
Vertical sharding moves a subset of columns to a new table, reducing row size and allowing more rows per data page.
Horizontal sharding is the most common approach: the original user table is divided into user_0, user_1, user_2 … user_N tables, each storing a portion of the data.
Sharding by ID Range
Assume each shard can hold about 20 000 rows. user_0 stores IDs 1‑20 000, user_1 stores 20 001‑40 000, and so on. The routing logic calculates the target shard from the ID range, allowing the business code to continue using a single logical user table while the middleware directs reads/writes to the appropriate physical table.
Sharding by Modulo
Another common method uses the modulo of the ID. For example, with five shards, ID 31 yields 31%5=1, so the record resides in user1. This approach distributes load evenly but complicates scaling because changing the number of shards requires data migration.
Combining Range and Modulo
To get the benefits of both, apply modulo within each ID‑range shard. For instance, IDs 20 001‑40 000 belong to user1; within that range, apply id%5 to further distribute rows across user1‑0 … user1‑4. This reduces write hotspots and improves scalability across multiple databases.
Read‑Amplification Problem
When queries use a non‑sharding column (e.g., name), the system cannot determine which shard holds the matching rows, so it must execute the query on every shard concurrently. As the number of shards grows, the number of queries—and thus the load—grows proportionally.
select * from user where name = "小白";Solution 1: Auxiliary Index Table
Create a new table that stores only the primary key id and the indexed column name. Query this table first to obtain the relevant IDs, then fetch the full rows from the original shards using those IDs. This reduces the number of shards accessed and mirrors the behavior of an inverted index.
Solution 2: Elasticsearch Integration
Synchronize MySQL binlog changes to Elasticsearch using tools like canal. Elasticsearch provides near‑real‑time search and built‑in inverted indexes, allowing efficient multi‑dimensional queries without read amplification.
Solution 3: TiDB Adoption
TiDB is a distributed SQL database that supports range‑based sharding and secondary indexes out of the box, eliminating the need for custom auxiliary tables. Migration tools simplify moving data from MySQL to TiDB.
Summary
When a single MySQL table becomes too large, horizontal sharding improves query performance.
Choose a shard key (usually the primary key) and split data by ID range or modulo; combining both methods balances load and scalability.
Queries on non‑shard columns cause read amplification; mitigate it with auxiliary index tables, Elasticsearch, or TiDB.
Avoid premature optimization—don’t create excessive shards unless necessary.
References
《图解分库分表》
https://mp.weixin.qq.com/s/OI5y4HMTuEZR1hoz9aOMxg
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Su San Talks Tech
Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
