Combining Horizontal and Vertical Sharding to Eliminate Hotspots Without Data Migration
This article explains vertical and horizontal database partitioning, compares hash‑modulo and range sharding schemes, and proposes a hybrid group‑based design that balances uniform data distribution with painless scaling, eliminating hotspots while avoiding costly data migrations.
Vertical vs. Horizontal Partitioning
In medium‑to‑large projects, when data volume grows, it is common to split the data. There are two approaches: vertical and horizontal.
Vertical splitting is straightforward: a single database is divided into multiple databases based on business domains, such as separate order and user databases.
Horizontal splitting distributes rows of a single table across multiple tables or databases. For example, 40 million order records exceed MySQL’s recommended single‑table size (≈1 million), so the data can be split into four tables or more, possibly combined with database splitting.
Sharding Schemes
Two common sharding strategies are hash‑modulo and range.
1. Hash‑Modulo Scheme
Design tables to hold a fixed number of rows (e.g., 10 million per table). Route data by computing id % total_table_count. For 40 million orders, four tables can be used.
The routing works by taking the routing key (e.g., id ) and applying modulo with the total number of tables. If id=12 and the total tables are 4, 12 % 4 = 0 , so the record goes to table 0.
Advantages: data is evenly distributed, avoiding hotspot issues.
Hotspot definition: when many operations concentrate on a single table, causing performance bottlenecks.
Disadvantages: expanding the number of tables changes the modulo base, requiring costly data migration.
When the table count increases (e.g., from 4 to 8), previously stored rows may no longer map to the same table, making the old data inaccessible without migration.
2. Range Scheme
Data is split by predefined value ranges. For example, id values 0‑10 million go to table 0, 10‑20 million to table 1, etc.
Advantages: no data migration needed when adding new tables; existing data stays in its original range.
Disadvantages: hotspot risk because sequential id values tend to fall into the same range, overloading a single table.
Summary of the Two Schemes
Hash‑Modulo : avoids hotspots but makes scaling painful due to migration.
Range : avoids migration but can create hotspots.
Hybrid Approach: Combining Hash and Range
The idea is to first use a range to allocate a data block to a group , then apply hash routing within that group to achieve uniform distribution.
Define a group that contains several databases and tables. For example, id=0‑40 million belongs to group01 , which has three databases (DB_0, DB_1, DB_2) and a total of ten tables.
Routing steps:
Determine the group based on the id range.
Apply id % total_table_count (10 in the example) to select a table.
Map the resulting table index to a specific database (e.g., indices 0‑3 → DB_0, 4‑6 → DB_1, 7‑9 → DB_2).
This allows servers with higher performance to host more tables, matching storage capacity to hardware capability.
The design eliminates hotspots and respects server capacity differences.
Scalability and Expansion
When data grows beyond the current group’s range, create a new group (e.g., group02 ) with its own id range (e.g., 40 million‑55 million). No data migration is required because existing ranges remain unchanged.
New groups can be added without restarting services by updating a distributed configuration center (e.g., Zookeeper or a dedicated config service) and refreshing the local JVM cache.
System Design Overview
Relationships:
Group ↔ Databases: each group contains multiple databases.
Database ↔ Tables: each database holds a subset of the group’s tables.
During runtime, cache the routing metadata locally to avoid frequent lookups.
When expanding, add a new group definition to the configuration center; services can reload the config without downtime.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Programmer DD
A tinkering programmer and author of "Spring Cloud Microservices in Action"
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
