Databases 11 min read

Mastering Database Sharding: When and How to Split Tables and Schemas

This article reviews the complete process of database and table sharding, covering when to shard, range and hash strategies, combining both, data migration challenges, business compatibility, and long‑term archiving to keep MySQL workloads manageable.

Programmer DD
Programmer DD
Programmer DD
Mastering Database Sharding: When and How to Split Tables and Schemas

Background

Earlier I posted two articles about table partitioning: "A Pitfall‑Driven Practice of Table Splitting" and "Things to Note After Partitioning". At that time we only did table sharding; as the business grew we also performed database sharding, which has been smooth so far, so I’m doing a retrospective.

Here is the overall process of database and table sharding:

The process is straightforward and matches the development path of most companies. Few businesses design sharding from the start; they usually start with a single table and consider sharding only when a table can no longer support the load.

When a table reaches tens of millions or even billions of rows and daily growth exceeds 2%, query speed degrades and IO stays high, sharding becomes necessary.

Table Sharding

We focus on horizontal sharding.

Horizontal sharding distributes rows of a large table across N smaller tables using a routing algorithm.

Range

Range sharding splits data by a range, e.g., by month of creation time or by primary‑key intervals such as 1‑10000, 10001‑20000, etc.

This method is suitable for archiving scenarios where only recent data is queried.

Pros: natural horizontal expansion without much intervention.

Cons: data may become uneven (e.g., a month with a traffic spike).

Hash

Range sharding is simple but narrow; most queries are not time‑based. For use‑cases like “all orders of a user”, a hash‑mod combination is preferred.

Example: split an order table into 64 tables using hash+mod. The hash step hashes the sharding column to achieve uniform distribution; if the column is already a unique integer, the mod step alone can be used.

Choosing the Number of Tables

The number of shards (e.g., 64) should be chosen based on projected growth. Each shard should stay below the tens‑of‑millions threshold for several years. It is advisable to pick a power of two (2^n) to simplify future expansion.

Range + Hash

Combining range and hash can mitigate the need for massive data migration when expanding shards. For example, start with hash sharding, then add monthly range tables to avoid full‑scale re‑sharding.

Data Migration Pain Points

After sharding goes live, existing data must be migrated, otherwise business impact is severe. Our migration process for a 200‑million‑row table took 4‑5 days, during which old data was invisible to users.

All new writes go to the sharded tables; old data remains in the original table.

During a transition period (about two months) both old and new tables are accessed based on routing logic.

After migration completes, routing logic is removed and all operations target the sharded tables.

Database Sharding

Table sharding relieves single‑table pressure but does not reduce overall database IO. We moved unrelated large tables to a separate database and accessed them via a Dubbo service, optionally using a message queue for high‑throughput writes.

Business Compatibility

Reporting now requires aggregating results from all shards, often using multithreaded queries or a big‑data platform for massive data sets.

Pagination queries must include the sharding key; otherwise they would need to scan all shards, which is impractical for billions of rows.

Summary

Historical data older than N months should be archived to systems like HBase, keeping the MySQL dataset within an acceptable size. The whole sharding practice illustrates the challenges of retrofitting scaling solutions to an already‑running system.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Data MigrationScalabilitydatabase shardinghorizontal partitioningHash ShardingRange Sharding
Programmer DD
Written by

Programmer DD

A tinkering programmer and author of "Spring Cloud Microservices in Action"

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.