Databases 11 min read

Choosing the Right Sharding Algorithm: From Range to Consistent Hash

This article compares common database sharding strategies—range sharding, mapping tables, ID modulo, hash‑based sharding, and consistent hashing—explaining their implementation complexity, data distribution characteristics, scalability concerns, and practical trade‑offs with code examples.

ITPUB

Apr 22, 2023

Choosing the Right Sharding Algorithm: From Range to Consistent Hash

Range Sharding

When the sharding key is a timestamp, range sharding simply splits data by fixed time intervals. This approach is easy to implement and query‑friendly because all records for a given period reside in the same shard, making it suitable for low‑concurrency B‑end systems. However, for high‑traffic C‑end applications, time‑based ranges can cause hotspot skew and performance alerts.

Data Mapping Table

A mapping table stores the relationship between each sharding key and its target shard. Queries first look up the shard in this table, then access the appropriate database. This method offers flexible data placement and straightforward scaling—adjust the mapping to migrate data—but it adds an extra database round‑trip and can become a bottleneck if the mapping table grows large or is frequently updated.

ID Modulo Sharding

Dividing data by taking the user ID modulo the number of shards is a classic technique. For example, with 10 shards, user ID 2023007 maps to shard 7 (2023007 % 10 = 7). This method yields good uniformity when IDs are sequential and evenly distributed, but it relies on the ID’s continuity and can suffer if the ID pattern correlates with other attributes.

public class ShardingUtil {
    // Compute table number
    public static int shard(int id, int tableCount) {
        return id % tableCount;
    }
}

Hash Sharding

Hash sharding first hashes the sharding key, then applies modulo to the shard count, improving distribution over plain ID modulo. The implementation remains simple, but expanding the number of shards still requires power‑of‑two scaling, forcing migration of roughly half the data during each expansion.

public class ShardingUtil {
    // Compute hash value
    public static int hash(String key) {
        int hash = 0;
        for (int i = 0; i < key.length(); i++) {
            hash = hash * 31 + key.charAt(i);
        }
        return hash;
    }

    // Compute table number
    public static int shard(int hash, int tableCount) {
        return (hash & Integer.MAX_VALUE) % tableCount;
    }
}

Consistent Hash Algorithm

Consistent hashing maps both data keys and database nodes onto a fixed‑size hash ring (typically 2^32‑1). A key is stored on the first node encountered clockwise from its hash position. Adding or removing a node only affects the neighboring segment of the ring, minimizing data movement.

In practice, using raw IP addresses for hashing can lead to uneven node distribution, especially when the number of shards is small or IPs change in cloud environments. Common mitigations include:

Hashing logical node identifiers (e.g., Node1, Node2) instead of IPs.

Introducing virtual nodes to smooth out distribution.

Further refinements draw from Redis Cluster’s slot‑based approach, simplifying the ring into a fixed number of slots (e.g., 2048) and assigning each slot to a node. This reduces migration to at most 1/N of the data when scaling.

public static final int SHARDING_COUNT = 4;
// Compute table number
public static int shard(Long id) {
    return (Math.abs(id.hashCode()) % 2048) / (2048 / SHARDING_COUNT);
}

Advantages of consistent hashing include uniform data distribution, easy scaling without a strict power‑of‑two constraint, and minimal data movement during expansion. Drawbacks are higher implementation complexity and potential uneven node placement, which can be mitigated with virtual nodes or slot‑based mapping.

Redis Cluster uses a variant of consistent hashing called hash slots. The ring is divided into 16384 slots, each node owns a subset of slots. Adding or removing nodes only requires reassigning the affected slots, greatly reducing data migration compared to classic consistent hashing.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Sharding Hashing Consistent Hash

Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.