Avoid Hotspots and Data Migration: Hybrid Hash‑Range Sharding Strategy
This article explains vertical and horizontal database partitioning, compares hash‑modulo and range sharding schemes, and proposes a hybrid approach that balances load, prevents hotspots, and eliminates the need for costly data migrations during scaling.
Introduction
In medium‑to‑large projects, when data volume grows large, developers need to split the data. There are two basic ways: vertical and horizontal partitioning.
Vertical vs. Horizontal Partitioning
Vertical partitioning is simple: a single database is split into multiple databases based on business domains, such as separate order and user databases.
Horizontal partitioning (sharding) distributes rows of the same business entity across multiple tables or databases.
Sharding Schemes
Two common routing algorithms are used:
1. Hash‑modulo scheme
Assume an estimated order volume of 40 million and each table can hold 10 million rows, so four tables are created. The routing key (e.g., id) is modulo‑divided by the total number of tables. For example, id=12 % 4 = 0 stores the order in table 0, id=13 % 4 = 1 in table 1.
Advantages: data is evenly distributed, avoiding hotspot tables.
Hotspot definition: many operations concentrate on a single table, causing performance pressure.
Disadvantages: expanding the number of tables changes the modulo base, making existing data unreachable without migration.
When the business grows beyond the original capacity, increasing the number of tables (e.g., from 4 to 8) requires re‑hashing all existing data, which is painful and often impossible for large companies that cannot afford downtime.
2. Range scheme
Data is split by value ranges. For example, id values 0‑9 999 999 go to table 0, 10 000 000‑19 999 999 to table 1, etc.
Advantages: adding new tables does not require migrating existing data because the original ranges stay unchanged.
Disadvantages: because id is monotonically increasing, recent data tends to concentrate in the highest‑range table, creating a hotspot.
Hybrid Idea
The hash scheme provides uniform distribution, while the range scheme avoids migration. The proposed hybrid approach first uses a range to assign an id to a logical group, then applies hash modulo within that group to distribute rows evenly across tables.
Group‑Based Design
A group contains several databases and tables. Example: group01 handles id 0‑40 million and consists of three databases (DB_0, DB_1, DB_2) with a total of ten tables.
Routing steps:
Determine the group by the id range.
Use hash modulo with the total number of tables (10) to select a table.
Map the resulting table index to a specific database (e.g., indices 0‑3 → DB_0, 4‑6 → DB_1, 7‑9 → DB_2).
This allows databases with higher performance or storage to hold more tables (DB_0 gets 4 tables, DB_1 and DB_2 get 3 each), achieving load balancing according to server capacity.
Scalability and Expansion
When the data volume exceeds the current group’s range, a new group (e.g., group02) is created with its own range and the same routing logic. No data migration is required because existing id values remain in their original groups.
Configuration changes can be applied via distributed configuration centers (e.g., Zookeeper) to avoid service restarts.
System Design Overview
The architecture consists of three main tables linking groups, databases, and tables. Caching the mapping in the JVM avoids frequent lookups.
Overall, the hybrid hash‑range sharding design eliminates hotspots, supports incremental scaling, and removes the need for costly data migrations.
Key Points
Routing key must be ordered and auto‑incrementing.
Hash ensures uniform distribution; range ensures migration‑free expansion.
Group‑based routing enables uneven allocation based on server capabilities.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Backend Technology
Focus on Java-related technologies: SSM, Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading. Occasionally cover DevOps tools like Jenkins, Nexus, Docker, and ELK. Also share technical insights from time to time, committed to Java full-stack development!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
