Databases 15 min read

Master Database Sharding: Solving Read Diffusion with Range and Modulo Strategies

This article explains why large MySQL tables degrade performance, introduces vertical and horizontal sharding techniques—including range‑based and modulo‑based partitioning—covers the read‑diffusion problem caused by non‑sharding keys, and presents practical solutions such as auxiliary index tables, Elasticsearch integration, and TiDB as a distributed alternative.

macrozheng
macrozheng
macrozheng
Master Database Sharding: Solving Read Diffusion with Range and Modulo Strategies

Database Sharding

In typical project development we start with a single table, but when a table grows beyond 200k rows the underlying B+ tree deepens, causing more disk I/O and slower queries.

Therefore, when a single table holds too much data we need to consider sharding . Sharding can be vertical or horizontal.

Vertical Sharding

Vertical sharding simply moves a few columns to a new table, reducing row size so more rows fit into each 16KB data page.

Horizontal Sharding

Horizontal sharding splits the original user table into many small tables such as user_0, user_1, ... user_N. The application still reads/writes a logical user table, while a routing layer directs operations to the appropriate physical table.

Each small table typically stores 500k–2M rows. The routing logic determines which table an id belongs to by calculating the range:

For example, if each table holds 2M rows, user0 stores ids 1–2M, user1 stores 2M+1–4M, and so on.

Sharding illustration
Sharding illustration

Range‑Based Sharding Example

Assume each table can hold 2M rows. An id of 3M falls into user1 because 3M / 2M = 1.5, floor to 1, so the data is routed to user1. The business code continues to operate on a single logical table without knowing the underlying split.

Modulo‑Based Sharding

Another common approach is to shard by id % N. For 5 tables, an id of 31 maps to user1 because 31 % 5 = 1. This distributes reads and writes evenly across tables.

Modulo sharding illustration
Modulo sharding illustration

Modulo sharding is simple but has a drawback: when the number of tables changes, the mapping changes, requiring data migration.

Read Diffusion Problem

When a non‑sharding column (e.g., name) is queried, the system cannot determine which physical table to read, so the query must be executed on all shards concurrently. With 100 shards this means 100 queries; with 200 shards, 200 queries, leading to the read‑diffusion issue.

Read diffusion illustration
Read diffusion illustration

Introducing an Auxiliary Index Table

To avoid scanning all shards, create a new table that stores only the primary key id and the non‑sharding column (e.g., name). Query this auxiliary table first to obtain the relevant id s, then fetch the rows from the original sharded tables using those ids. This mimics an inverted index.

Auxiliary index table solution
Auxiliary index table solution

The drawback is the need to maintain two tables and keep them in sync when the indexed column changes.

Using Elasticsearch for Multi‑Dimensional Queries

Elasticsearch provides built‑in sharding and inverted indexes, allowing near‑real‑time queries on fields other than the primary key. Data can be synchronized from MySQL via tools like canal that capture binlog changes.

MySQL to Elasticsearch sync
MySQL to Elasticsearch sync

Switching to TiDB

TiDB is a distributed SQL database that supports range‑based sharding (e.g., 0–2M, 2M–4M) and also shards secondary indexes, offering a similar experience to the Elasticsearch approach but with MySQL‑compatible syntax.

TiDB architecture
TiDB architecture

Summary

When a single MySQL table grows large, query performance degrades; horizontal sharding is needed.

Horizontal sharding requires a shard key, usually the primary key, and can be implemented by range or modulo partitioning.

Queries on non‑shard keys cause read diffusion; solutions include auxiliary index tables, Elasticsearch integration, or using a distributed database like TiDB.

Avoid premature optimization; do not create an excessive number of shards unless truly necessary.

Final Thoughts

In a real‑world game project we planned for billions of registrations and hundreds of thousands of concurrent users, ultimately partitioning data into four tables using range‑based sharding, which proved sufficient for the load.

Project demo GIF
Project demo GIF
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

ElasticsearchTiDBdatabase shardinghorizontal partitioningread diffusionRange Shardingauxiliary index tablemodulo sharding
macrozheng
Written by

macrozheng

Dedicated to Java tech sharing and dissecting top open-source projects. Topics include Spring Boot, Spring Cloud, Docker, Kubernetes and more. Author’s GitHub project “mall” has 50K+ stars.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.