Combining Hash and Range Sharding to Avoid Hotspots Without Data Migration
This article explains vertical and horizontal database sharding, compares hash-modulo and range-based partitioning, and proposes a hybrid design that balances load, prevents hotspots, and eliminates costly data migrations during scaling.
In large‑scale projects, when data volume grows, developers often need to split databases either vertically (by business domain) or horizontally (by rows). Vertical splitting separates logical domains, such as orders and users, into different databases. Horizontal splitting distributes rows of a single table across multiple tables or databases.
Horizontal Sharding Scenarios
For example, an order table with 40 million rows exceeds MySQL's recommended single‑table size (around 1 million rows). To maintain performance, the data can be divided into four tables or more, possibly combined with database‑level splitting.
Common Sharding Strategies
Two typical approaches are hash‑modulo and range partitioning, both relying on a routing algorithm that maps a key (e.g., id) to a specific table.
1. Hash‑Modulo Scheme
Assume an estimated 40 million orders and a capacity of 10 million rows per table, resulting in four tables. The routing key id is taken modulo the total number of tables (4). For instance, id=12 → 12 % 4 = 0, so the record goes to table 0; id=13 → 1, so it goes to table 1.
Advantages: Data is evenly distributed, reducing hotspot contention.
Disadvantages: Adding tables changes the modulo base, causing existing rows to map to different tables, which requires costly data migration and makes scaling painful.
2. Range Scheme
Rows are grouped by predefined ID ranges, e.g., IDs 0‑12 million in table 0, 12‑24 million in table 1, etc. Adding new tables later does not affect existing ranges, so no data migration is needed.
Advantages: Easy to scale without moving existing data.
Disadvantages: Since IDs increase monotonically, recent IDs concentrate in the highest‑range table, creating a hotspot.
Hybrid Design Idea
The proposed solution combines the uniform distribution of hash sharding with the migration‑free property of range sharding. Data is first assigned to a group based on its ID range (range step). Within each group, a hash‑modulo operation is performed on the total number of tables across all databases in the group, not just the number of databases.
Example: Group 01 covers IDs 0‑40 million and contains three databases (DB_0, DB_1, DB_2) with a total of 10 tables (4 in DB_0, 3 in DB_1, 3 in DB_2). An ID is hashed modulo 10 to select a table, then the table’s belonging database is determined by predefined table‑to‑DB mapping. This allows DB_0 to store more tables (higher capacity) while DB_1 and DB_2 store fewer, matching server performance and storage differences.
Core Process Flow
The flow diagram (included in the original article) shows the steps: determine the group by ID range, compute id % total_table_count, map the result to a specific DB and table, and finally execute the query. This approach spreads IDs uniformly across tables, mitigates hotspots, and respects heterogeneous server capabilities.
Scaling / Expansion
When the data volume exceeds the current group’s range, a new group (e.g., Group 02) is defined with its own ID range and database/table allocation. Because each group operates independently, adding a new group does not require migrating existing data.
Configuration can be stored in a distributed configuration center (e.g., Zookeeper or a cloud‑based config service) to avoid restarting services during expansion. Caching the group‑DB‑table mapping locally (JVM cache) further improves performance.
System Design Overview
Three main tables capture the relationships:
Group ↔ Database mapping (Group 01 → DB_0, DB_1, DB_2)
Database ↔ Table mapping (DB_0 → Table_0‑3, DB_1 → Table_0‑2, DB_2 → Table_0‑2)
Routing logic that uses the hash‑modulo result to select the appropriate table.
Developers can cache these mappings to avoid frequent lookups.
Conclusion
The hybrid sharding strategy leverages both hash and range techniques to achieve even data distribution, eliminate the need for data migration during scaling, and allow allocation of storage based on server capabilities, thereby solving hotspot problems while supporting seamless expansion.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
