Mastering Database Sharding: When and How to Split Tables and Schemas
This article reviews the complete process of database and table sharding, covering when to shard, range and hash strategies, combining both, data migration challenges, business compatibility, and long‑term archiving to keep MySQL workloads manageable.
Background
Earlier I posted two articles about table partitioning: "A Pitfall‑Driven Practice of Table Splitting" and "Things to Note After Partitioning". At that time we only did table sharding; as the business grew we also performed database sharding, which has been smooth so far, so I’m doing a retrospective.
Here is the overall process of database and table sharding:
The process is straightforward and matches the development path of most companies. Few businesses design sharding from the start; they usually start with a single table and consider sharding only when a table can no longer support the load.
When a table reaches tens of millions or even billions of rows and daily growth exceeds 2%, query speed degrades and IO stays high, sharding becomes necessary.
Table Sharding
We focus on horizontal sharding.
Horizontal sharding distributes rows of a large table across N smaller tables using a routing algorithm.
Range
Range sharding splits data by a range, e.g., by month of creation time or by primary‑key intervals such as 1‑10000, 10001‑20000, etc.
This method is suitable for archiving scenarios where only recent data is queried.
Pros: natural horizontal expansion without much intervention.
Cons: data may become uneven (e.g., a month with a traffic spike).
Hash
Range sharding is simple but narrow; most queries are not time‑based. For use‑cases like “all orders of a user”, a hash‑mod combination is preferred.
Example: split an order table into 64 tables using hash+mod. The hash step hashes the sharding column to achieve uniform distribution; if the column is already a unique integer, the mod step alone can be used.
Choosing the Number of Tables
The number of shards (e.g., 64) should be chosen based on projected growth. Each shard should stay below the tens‑of‑millions threshold for several years. It is advisable to pick a power of two (2^n) to simplify future expansion.
Range + Hash
Combining range and hash can mitigate the need for massive data migration when expanding shards. For example, start with hash sharding, then add monthly range tables to avoid full‑scale re‑sharding.
Data Migration Pain Points
After sharding goes live, existing data must be migrated, otherwise business impact is severe. Our migration process for a 200‑million‑row table took 4‑5 days, during which old data was invisible to users.
All new writes go to the sharded tables; old data remains in the original table.
During a transition period (about two months) both old and new tables are accessed based on routing logic.
After migration completes, routing logic is removed and all operations target the sharded tables.
Database Sharding
Table sharding relieves single‑table pressure but does not reduce overall database IO. We moved unrelated large tables to a separate database and accessed them via a Dubbo service, optionally using a message queue for high‑throughput writes.
Business Compatibility
Reporting now requires aggregating results from all shards, often using multithreaded queries or a big‑data platform for massive data sets.
Pagination queries must include the sharding key; otherwise they would need to scan all shards, which is impractical for billions of rows.
Summary
Historical data older than N months should be archived to systems like HBase, keeping the MySQL dataset within an acceptable size. The whole sharding practice illustrates the challenges of retrofitting scaling solutions to an already‑running system.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Programmer DD
A tinkering programmer and author of "Spring Cloud Microservices in Action"
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
