Databases 14 min read

How to Solve Read‑Amplification in MySQL Sharding: Strategies & Tips

This article explains the concepts of database sharding, compares vertical and horizontal partitioning, discusses ID‑range and modulo sharding methods, analyzes the read‑amplification problem caused by non‑sharding keys, and presents practical solutions such as auxiliary index tables, Elasticsearch integration, and TiDB adoption.

Su San Talks Tech
Su San Talks Tech
Su San Talks Tech
How to Solve Read‑Amplification in MySQL Sharding: Strategies & Tips

Sharding Overview

When a single table grows beyond hundreds of thousands of rows, the B+‑tree depth increases, causing more disk I/O and slower queries. To handle large data volumes, developers split the table into multiple smaller tables, a practice known as sharding (or "分库分表").

Vertical vs Horizontal Sharding

Vertical sharding moves a subset of columns to a new table, reducing row size and allowing more rows per data page.

Horizontal sharding is the most common approach: the original user table is divided into user_0, user_1, user_2 … user_N tables, each storing a portion of the data.

Sharding by ID Range

Assume each shard can hold about 20 000 rows. user_0 stores IDs 1‑20 000, user_1 stores 20 001‑40 000, and so on. The routing logic calculates the target shard from the ID range, allowing the business code to continue using a single logical user table while the middleware directs reads/writes to the appropriate physical table.

Sharding by Modulo

Another common method uses the modulo of the ID. For example, with five shards, ID 31 yields 31%5=1, so the record resides in user1. This approach distributes load evenly but complicates scaling because changing the number of shards requires data migration.

Combining Range and Modulo

To get the benefits of both, apply modulo within each ID‑range shard. For instance, IDs 20 001‑40 000 belong to user1; within that range, apply id%5 to further distribute rows across user1‑0 … user1‑4. This reduces write hotspots and improves scalability across multiple databases.

Read‑Amplification Problem

When queries use a non‑sharding column (e.g., name), the system cannot determine which shard holds the matching rows, so it must execute the query on every shard concurrently. As the number of shards grows, the number of queries—and thus the load—grows proportionally.

select * from user where name = "小白";

Solution 1: Auxiliary Index Table

Create a new table that stores only the primary key id and the indexed column name. Query this table first to obtain the relevant IDs, then fetch the full rows from the original shards using those IDs. This reduces the number of shards accessed and mirrors the behavior of an inverted index.

Solution 2: Elasticsearch Integration

Synchronize MySQL binlog changes to Elasticsearch using tools like canal. Elasticsearch provides near‑real‑time search and built‑in inverted indexes, allowing efficient multi‑dimensional queries without read amplification.

Solution 3: TiDB Adoption

TiDB is a distributed SQL database that supports range‑based sharding and secondary indexes out of the box, eliminating the need for custom auxiliary tables. Migration tools simplify moving data from MySQL to TiDB.

Summary

When a single MySQL table becomes too large, horizontal sharding improves query performance.

Choose a shard key (usually the primary key) and split data by ID range or modulo; combining both methods balances load and scalability.

Queries on non‑shard columns cause read amplification; mitigate it with auxiliary index tables, Elasticsearch, or TiDB.

Avoid premature optimization—don’t create excessive shards unless necessary.

References

《图解分库分表》

https://mp.weixin.qq.com/s/OI5y4HMTuEZR1hoz9aOMxg

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

ElasticsearchshardingmysqlTiDBhorizontal partitioningdatabase scalingread amplification
Su San Talks Tech
Written by

Su San Talks Tech

Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.