Databases 13 min read

Combine Hash and Range Sharding to Prevent Hotspots Without Data Migration

This article explains how to split large-scale database tables using vertical and horizontal partitioning, compares hash-modulo and range-based sharding, and proposes a hybrid approach that leverages both methods to balance data distribution, avoid hotspot issues, and eliminate the need for costly data migrations during scaling.

Java Backend Technology
Java Backend Technology
Java Backend Technology
Combine Hash and Range Sharding to Prevent Hotspots Without Data Migration

Introduction

In medium to large projects, when data volume becomes large, it is necessary to split the data. There are two main types of splitting: vertical and horizontal.

Vertical and Horizontal Splitting

Vertical splitting is simple: a single database is divided into multiple databases from a business perspective when the data volume grows.

Horizontal splitting means that the same business data is split horizontally after the data volume becomes large.

For example, order data may reach 40 million rows, exceeding MySQL's recommended single-table size of about one million rows, which would degrade performance. Horizontal splitting can distribute the 40 million rows across four or more tables, possibly combined with database splitting.

Sharding Schemes

1. Hash Modulo Scheme

The routing works by taking the routing key (e.g., id ) and applying modulo with the total number of tables. For instance, with four tables, id=12 yields 12 % 4 = 0 , so the order is stored in table 0; id=13 yields 1 , stored in table 1.

Advantages: Orders are evenly distributed across the tables, avoiding hotspot problems.

Hotspot means that operations concentrate on a single table while others see little activity. Since order IDs are time‑based, recent orders may cluster in one table, creating a hotspot.

Disadvantages: Data migration and scaling become difficult. When the number of tables increases, the modulo base changes, causing previously stored IDs to map to different tables and making the old data unreachable.

To handle this, a data migration must be performed, which can be painful and may require downtime.

2. Range Scheme

Range splitting stores a defined range of IDs in a specific table (e.g., IDs 0‑10 000 000 in table 0, IDs 13 000 000‑... in table 1). The range is designed in advance, and routing is performed by ID.

Advantages: Future expansions do not require data migration because the original ranges remain unchanged.

Disadvantages: Hotspot issues can still occur because IDs increase monotonically; a large batch of recent IDs may all fall into the same table.

Summary of Schemes

Hash Modulo: No hotspot problem, but scaling and data migration are painful.

Range Scheme: No data migration needed, but hotspot problems may arise.

Hybrid Solution Idea

The idea is to combine the strengths of both schemes: use the range scheme to allocate IDs to a predefined range, ensuring no migration is needed when the range expands, and within each range use the hash scheme to distribute data evenly.

Design Details

Define a group that contains several databases and tables.

Key points:

1) IDs 0‑40 000 000 belong to group01. 2) Group01 has three databases; routing to a specific DB uses hash modulo based on the total number of tables (10 in the example). 3) The modulo result determines which DB and which table within that DB the ID maps to.

Scaling Strategy

When scaling, create a new group (e.g., group02) with its own data range, avoiding any data migration. The new group distributes its data according to server capacity, keeping hotspot mitigation.

System Design

The overall design links groups, databases, and tables. Only three tables are needed; the relationships are straightforward.

Group‑to‑DB relationship:

Table‑to‑DB relationship:

Cache the mapping locally to avoid frequent database lookups, and use a distributed configuration center (e.g., Zookeeper) for dynamic updates without downtime.

Overall, the hybrid approach balances data distribution, prevents hotspots, and eliminates costly data migrations during scaling.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

data migrationdatabase shardinghorizontal scalingHash ShardingRange Shardinghotspot mitigation
Java Backend Technology
Written by

Java Backend Technology

Focus on Java-related technologies: SSM, Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading. Occasionally cover DevOps tools like Jenkins, Nexus, Docker, and ELK. Also share technical insights from time to time, committed to Java full-stack development!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.