When Should You Shard Your Database? Practical Guide to Table Partitioning and Scaling
After initially implementing table sharding, we later added database sharding, sharing a detailed retrospective that covers when to partition tables, various sharding strategies (range, hash, range+hash), migration challenges, business compatibility, and final archiving practices, illustrated with diagrams and practical tips.
Background
Earlier articles introduced table sharding; as the business grew we also introduced database sharding. This post revisits the entire process, summarizing lessons learned and providing a reference for similar migrations.
Sharding Overview
The typical evolution for many companies starts with a single table, then moves to table sharding when a table can no longer handle the data volume, and finally to database sharding when overall DB I/O becomes a bottleneck.
When to Shard a Table
Table sharding becomes necessary when a single table reaches tens of millions or even billions of rows and daily growth exceeds 2%, causing noticeable query slowdown and high I/O.
Sharding Strategies
1. Range Sharding
Data is divided by a range, such as month‑based tables or primary‑key intervals (e.g., 1‑10,000 in one table, 10,001‑20,000 in another). This works well for archival scenarios where only recent data is queried.
Advantage: Provides horizontal scalability with minimal intervention.
Disadvantage: May lead to uneven data distribution if a particular range spikes.
2. Hash Sharding
Hash + mod is used to map a sharding key to one of many tables, ensuring a more uniform distribution. For example, a large order table can be split into 64 tables using hash(key) % 64. If the key is already a unique integer, the hash step can be omitted.
The number of shards (e.g., 64) should be chosen based on projected growth; a power‑of‑two (2^n) is recommended to simplify future expansion.
3. Range + Hash
Combining range and hash can mitigate the limitations of each method. For instance, start with hash sharding, then add a monthly range layer to avoid massive data migrations when scaling from 64 to 256 tables.
Data Migration Challenges
After sharding goes live, existing data must be migrated, which can take several days for tables with hundreds of millions of rows. To minimize business impact, we adopted a hybrid approach:
New writes go directly to the sharded tables.
Historical reads continue to use the old table.
After a warm‑up period (about two months), we gradually route more operations to the sharded tables and start a background migration.
Once migration completes, the routing logic is removed and all traffic uses the sharded tables.
Business Compatibility
Sharding affects reporting and pagination:
Reporting: Queries must aggregate results from all shards, often using multithreaded parallelism. For very large datasets, a big‑data platform may be required.
Pagination: Traditional offset‑based pagination is impractical on billions of rows; queries must include the sharding key to avoid scanning all shards.
Database Sharding
Even after table sharding, overall DB I/O may remain high due to other large tables. We isolated those tables into a separate database accessed via a dedicated Dubbo service, optionally switching to asynchronous messaging for high‑throughput writes.
Final Archiving
Older data (e.g., beyond N months) is periodically archived to HBase or similar storage, keeping the MySQL instance within an acceptable size. Queries on archived data rely on a big‑data service.
Conclusion
Database and table sharding is a complex, high‑risk operation that must be carefully planned and executed. The strategies, migration steps, and compatibility considerations described here aim to help practitioners avoid common pitfalls and achieve a smooth scaling transition.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
