Databases 13 min read

Mastering Database Sharding: Solving Read Amplification and Partition Strategies

This article explains database sharding concepts, compares vertical and horizontal partitioning, details ID‑range and modulo sharding methods, discusses the read‑amplification issue caused by non‑shard queries, and presents solutions such as auxiliary index tables, Elasticsearch integration, and TiDB as alternatives.

Su San Talks Tech

Oct 19, 2023

Mastering Database Sharding: Solving Read Amplification and Partition Strategies

Sharding Basics

When a single table grows beyond roughly 20 million rows, the B+ tree depth increases, leading to more disk I/O and slower queries. To handle large data, tables are split into multiple tables, either vertically (splitting columns) or horizontally (splitting rows).

Horizontal Partitioning (Sharding)

Horizontal sharding creates many small tables such as user_0, user_1 … user_N, each storing a subset of rows. Typical shard size is 500k–2M rows per table.

Sharding by ID Range

Assuming each shard holds 2M rows, IDs 1‑2M go to user_0, 2M+1‑4M to user_1, and so on. The routing logic calculates the shard by integer division of the ID.

Sharding by Modulo

Another common method uses ID % N to select a shard. For example, ID 31 with 5 shards maps to user_1 because 31%5=1. This approach is simple but makes scaling the number of shards difficult, requiring data migration.

Combining Range and Modulo

Range sharding provides stable shard allocation, while modulo distributes writes evenly. By applying modulo within a range—e.g., IDs 2M‑4M are further split into five sub‑shards—the write hotspot problem is reduced.

Read Amplification Problem

When queries use non‑shard columns (e.g., name), the system must query all shards concurrently, leading to many duplicate queries. As the number of shards grows, the read amplification cost increases.

select * from user where name = "小白";

Solution: Auxiliary Index Table

Create a separate table that stores only the primary key and the indexed column (e.g., name). Queries first retrieve the IDs from this table, then fetch the rows from the original shards, reducing the number of shards accessed.

Alternative: Use Search Engine or Distributed DB

Integrating Elasticsearch allows real‑time queries with inverted indexes, while TiDB provides built‑in range sharding and secondary index support, eliminating many of the read‑amplification issues.

Summary

Large single tables degrade performance; horizontal sharding is needed.

Select a shard key (usually the primary key) and shard by ID range or modulo.

Non‑shard queries cause read amplification; auxiliary index tables or external search engines can mitigate it.

TiDB or Elasticsearch are viable alternatives for scalable querying.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

sharding mysql TiDB read amplification

Written by

Su San Talks Tech

Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.