Databases 15 min read

Master Database Sharding: Solving Read Diffusion with Range and Modulo Strategies

This article explains why large MySQL tables degrade performance, introduces vertical and horizontal sharding techniques—including range‑based and modulo‑based partitioning—covers the read‑diffusion problem caused by non‑sharding keys, and presents practical solutions such as auxiliary index tables, Elasticsearch integration, and TiDB as a distributed alternative.

macrozheng

Sep 19, 2025

Database Sharding

In typical project development we start with a single table, but when a table grows beyond 200k rows the underlying B+ tree deepens, causing more disk I/O and slower queries.

Therefore, when a single table holds too much data we need to consider sharding . Sharding can be vertical or horizontal.

Vertical Sharding

Vertical sharding simply moves a few columns to a new table, reducing row size so more rows fit into each 16KB data page.

Horizontal Sharding

Horizontal sharding splits the original user table into many small tables such as user_0, user_1, ... user_N. The application still reads/writes a logical user table, while a routing layer directs operations to the appropriate physical table.

Each small table typically stores 500k–2M rows. The routing logic determines which table an id belongs to by calculating the range:

For example, if each table holds 2M rows, user0 stores ids 1–2M, user1 stores 2M+1–4M, and so on.

Range‑Based Sharding Example

Assume each table can hold 2M rows. An id of 3M falls into user1 because 3M / 2M = 1.5, floor to 1, so the data is routed to user1. The business code continues to operate on a single logical table without knowing the underlying split.

Modulo‑Based Sharding

Another common approach is to shard by id % N. For 5 tables, an id of 31 maps to user1 because 31 % 5 = 1. This distributes reads and writes evenly across tables.

Modulo sharding is simple but has a drawback: when the number of tables changes, the mapping changes, requiring data migration.

Read Diffusion Problem

When a non‑sharding column (e.g., name) is queried, the system cannot determine which physical table to read, so the query must be executed on all shards concurrently. With 100 shards this means 100 queries; with 200 shards, 200 queries, leading to the read‑diffusion issue.

Introducing an Auxiliary Index Table

To avoid scanning all shards, create a new table that stores only the primary key id and the non‑sharding column (e.g., name). Query this auxiliary table first to obtain the relevant id s, then fetch the rows from the original sharded tables using those ids. This mimics an inverted index.

The drawback is the need to maintain two tables and keep them in sync when the indexed column changes.

Using Elasticsearch for Multi‑Dimensional Queries

Elasticsearch provides built‑in sharding and inverted indexes, allowing near‑real‑time queries on fields other than the primary key. Data can be synchronized from MySQL via tools like canal that capture binlog changes.

Switching to TiDB

TiDB is a distributed SQL database that supports range‑based sharding (e.g., 0–2M, 2M–4M) and also shards secondary indexes, offering a similar experience to the Elasticsearch approach but with MySQL‑compatible syntax.

Summary

When a single MySQL table grows large, query performance degrades; horizontal sharding is needed.

Horizontal sharding requires a shard key, usually the primary key, and can be implemented by range or modulo partitioning.

Queries on non‑shard keys cause read diffusion; solutions include auxiliary index tables, Elasticsearch integration, or using a distributed database like TiDB.

Avoid premature optimization; do not create an excessive number of shards unless truly necessary.

Final Thoughts

In a real‑world game project we planned for billions of registrations and hundreds of thousands of concurrent users, ultimately partitioning data into four tables using range‑based sharding, which proved sufficient for the load.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Elasticsearch TiDB database sharding Horizontal Partitioning read diffusion Range Sharding auxiliary index table modulo sharding

Written by

macrozheng

Dedicated to Java tech sharing and dissecting top open-source projects. Topics include Spring Boot, Spring Cloud, Docker, Kubernetes and more. Author’s GitHub project “mall” has 50K+ stars.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.