Databases 11 min read

When Should You Shard Your Database? Practical Guide to Table Partitioning and Scaling

After initially implementing table sharding, we later added database sharding, sharing a detailed retrospective that covers when to partition tables, various sharding strategies (range, hash, range+hash), migration challenges, business compatibility, and final archiving practices, illustrated with diagrams and practical tips.

dbaplus Community
dbaplus Community
dbaplus Community
When Should You Shard Your Database? Practical Guide to Table Partitioning and Scaling

Background

Earlier articles introduced table sharding; as the business grew we also introduced database sharding. This post revisits the entire process, summarizing lessons learned and providing a reference for similar migrations.

Sharding Overview

The typical evolution for many companies starts with a single table, then moves to table sharding when a table can no longer handle the data volume, and finally to database sharding when overall DB I/O becomes a bottleneck.

When to Shard a Table

Table sharding becomes necessary when a single table reaches tens of millions or even billions of rows and daily growth exceeds 2%, causing noticeable query slowdown and high I/O.

Sharding Strategies

1. Range Sharding

Data is divided by a range, such as month‑based tables or primary‑key intervals (e.g., 1‑10,000 in one table, 10,001‑20,000 in another). This works well for archival scenarios where only recent data is queried.

Advantage: Provides horizontal scalability with minimal intervention.

Disadvantage: May lead to uneven data distribution if a particular range spikes.

2. Hash Sharding

Hash + mod is used to map a sharding key to one of many tables, ensuring a more uniform distribution. For example, a large order table can be split into 64 tables using hash(key) % 64. If the key is already a unique integer, the hash step can be omitted.

The number of shards (e.g., 64) should be chosen based on projected growth; a power‑of‑two (2^n) is recommended to simplify future expansion.

3. Range + Hash

Combining range and hash can mitigate the limitations of each method. For instance, start with hash sharding, then add a monthly range layer to avoid massive data migrations when scaling from 64 to 256 tables.

Range+Hash diagram
Range+Hash diagram

Data Migration Challenges

After sharding goes live, existing data must be migrated, which can take several days for tables with hundreds of millions of rows. To minimize business impact, we adopted a hybrid approach:

New writes go directly to the sharded tables.

Historical reads continue to use the old table.

After a warm‑up period (about two months), we gradually route more operations to the sharded tables and start a background migration.

Once migration completes, the routing logic is removed and all traffic uses the sharded tables.

Business Compatibility

Sharding affects reporting and pagination:

Reporting: Queries must aggregate results from all shards, often using multithreaded parallelism. For very large datasets, a big‑data platform may be required.

Pagination: Traditional offset‑based pagination is impractical on billions of rows; queries must include the sharding key to avoid scanning all shards.

Database Sharding

Even after table sharding, overall DB I/O may remain high due to other large tables. We isolated those tables into a separate database accessed via a dedicated Dubbo service, optionally switching to asynchronous messaging for high‑throughput writes.

Final Archiving

Older data (e.g., beyond N months) is periodically archived to HBase or similar storage, keeping the MySQL instance within an acceptable size. Queries on archived data rely on a big‑data service.

Conclusion

Database and table sharding is a complex, high‑risk operation that must be carefully planned and executed. The strategies, migration steps, and compatibility considerations described here aim to help practitioners avoid common pitfalls and achieve a smooth scaling transition.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

HashPartitioningrange
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.