Databases 12 min read

Avoid Hotspots and Data Migration: Hybrid Hash‑Range Sharding Strategy

This article explains vertical and horizontal database partitioning, compares hash‑modulo and range sharding schemes, and proposes a hybrid approach that balances load, prevents hotspots, and eliminates the need for costly data migrations during scaling.

Java Backend Technology
Java Backend Technology
Java Backend Technology
Avoid Hotspots and Data Migration: Hybrid Hash‑Range Sharding Strategy

Introduction

In medium‑to‑large projects, when data volume grows large, developers need to split the data. There are two basic ways: vertical and horizontal partitioning.

Vertical vs. Horizontal Partitioning

Vertical partitioning is simple: a single database is split into multiple databases based on business domains, such as separate order and user databases.

Horizontal partitioning (sharding) distributes rows of the same business entity across multiple tables or databases.

Sharding Schemes

Two common routing algorithms are used:

1. Hash‑modulo scheme

Assume an estimated order volume of 40 million and each table can hold 10 million rows, so four tables are created. The routing key (e.g., id) is modulo‑divided by the total number of tables. For example, id=12 % 4 = 0 stores the order in table 0, id=13 % 4 = 1 in table 1.

Advantages: data is evenly distributed, avoiding hotspot tables.

Hotspot definition: many operations concentrate on a single table, causing performance pressure.

Disadvantages: expanding the number of tables changes the modulo base, making existing data unreachable without migration.

When the business grows beyond the original capacity, increasing the number of tables (e.g., from 4 to 8) requires re‑hashing all existing data, which is painful and often impossible for large companies that cannot afford downtime.

2. Range scheme

Data is split by value ranges. For example, id values 0‑9 999 999 go to table 0, 10 000 000‑19 999 999 to table 1, etc.

Advantages: adding new tables does not require migrating existing data because the original ranges stay unchanged.

Disadvantages: because id is monotonically increasing, recent data tends to concentrate in the highest‑range table, creating a hotspot.

Hybrid Idea

The hash scheme provides uniform distribution, while the range scheme avoids migration. The proposed hybrid approach first uses a range to assign an id to a logical group, then applies hash modulo within that group to distribute rows evenly across tables.

Group‑Based Design

A group contains several databases and tables. Example: group01 handles id 0‑40 million and consists of three databases (DB_0, DB_1, DB_2) with a total of ten tables.

Routing steps:

Determine the group by the id range.

Use hash modulo with the total number of tables (10) to select a table.

Map the resulting table index to a specific database (e.g., indices 0‑3 → DB_0, 4‑6 → DB_1, 7‑9 → DB_2).

This allows databases with higher performance or storage to hold more tables (DB_0 gets 4 tables, DB_1 and DB_2 get 3 each), achieving load balancing according to server capacity.

Scalability and Expansion

When the data volume exceeds the current group’s range, a new group (e.g., group02) is created with its own range and the same routing logic. No data migration is required because existing id values remain in their original groups.

Configuration changes can be applied via distributed configuration centers (e.g., Zookeeper) to avoid service restarts.

System Design Overview

The architecture consists of three main tables linking groups, databases, and tables. Caching the mapping in the JVM avoids frequent lookups.

Overall, the hybrid hash‑range sharding design eliminates hotspots, supports incremental scaling, and removes the need for costly data migrations.

Key Points

Routing key must be ordered and auto‑incrementing.

Hash ensures uniform distribution; range ensures migration‑free expansion.

Group‑based routing enables uneven allocation based on server capabilities.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Scalabilityshardingdatabase partitioninghash modulorange partitioninghotspot mitigation
Java Backend Technology
Written by

Java Backend Technology

Focus on Java-related technologies: SSM, Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading. Occasionally cover DevOps tools like Jenkins, Nexus, Docker, and ELK. Also share technical insights from time to time, committed to Java full-stack development!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.