Master Database Sharding: Solving Read Diffusion with Range and Modulo Strategies
This article explains why large MySQL tables degrade performance, introduces vertical and horizontal sharding techniques—including range‑based and modulo‑based partitioning—covers the read‑diffusion problem caused by non‑sharding keys, and presents practical solutions such as auxiliary index tables, Elasticsearch integration, and TiDB as a distributed alternative.
Database Sharding
In typical project development we start with a single table, but when a table grows beyond 200k rows the underlying B+ tree deepens, causing more disk I/O and slower queries.
Therefore, when a single table holds too much data we need to consider sharding . Sharding can be vertical or horizontal.
Vertical Sharding
Vertical sharding simply moves a few columns to a new table, reducing row size so more rows fit into each 16KB data page.
Horizontal Sharding
Horizontal sharding splits the original user table into many small tables such as user_0, user_1, ... user_N. The application still reads/writes a logical user table, while a routing layer directs operations to the appropriate physical table.
Each small table typically stores 500k–2M rows. The routing logic determines which table an id belongs to by calculating the range:
For example, if each table holds 2M rows, user0 stores ids 1–2M, user1 stores 2M+1–4M, and so on.
Range‑Based Sharding Example
Assume each table can hold 2M rows. An id of 3M falls into user1 because 3M / 2M = 1.5, floor to 1, so the data is routed to user1. The business code continues to operate on a single logical table without knowing the underlying split.
Modulo‑Based Sharding
Another common approach is to shard by id % N. For 5 tables, an id of 31 maps to user1 because 31 % 5 = 1. This distributes reads and writes evenly across tables.
Modulo sharding is simple but has a drawback: when the number of tables changes, the mapping changes, requiring data migration.
Read Diffusion Problem
When a non‑sharding column (e.g., name) is queried, the system cannot determine which physical table to read, so the query must be executed on all shards concurrently. With 100 shards this means 100 queries; with 200 shards, 200 queries, leading to the read‑diffusion issue.
Introducing an Auxiliary Index Table
To avoid scanning all shards, create a new table that stores only the primary key id and the non‑sharding column (e.g., name). Query this auxiliary table first to obtain the relevant id s, then fetch the rows from the original sharded tables using those ids. This mimics an inverted index.
The drawback is the need to maintain two tables and keep them in sync when the indexed column changes.
Using Elasticsearch for Multi‑Dimensional Queries
Elasticsearch provides built‑in sharding and inverted indexes, allowing near‑real‑time queries on fields other than the primary key. Data can be synchronized from MySQL via tools like canal that capture binlog changes.
Switching to TiDB
TiDB is a distributed SQL database that supports range‑based sharding (e.g., 0–2M, 2M–4M) and also shards secondary indexes, offering a similar experience to the Elasticsearch approach but with MySQL‑compatible syntax.
Summary
When a single MySQL table grows large, query performance degrades; horizontal sharding is needed.
Horizontal sharding requires a shard key, usually the primary key, and can be implemented by range or modulo partitioning.
Queries on non‑shard keys cause read diffusion; solutions include auxiliary index tables, Elasticsearch integration, or using a distributed database like TiDB.
Avoid premature optimization; do not create an excessive number of shards unless truly necessary.
Final Thoughts
In a real‑world game project we planned for billions of registrations and hundreds of thousands of concurrent users, ultimately partitioning data into four tables using range‑based sharding, which proved sufficient for the load.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
macrozheng
Dedicated to Java tech sharing and dissecting top open-source projects. Topics include Spring Boot, Spring Cloud, Docker, Kubernetes and more. Author’s GitHub project “mall” has 50K+ stars.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
