Backend Development 21 min read

Consistent Hashing: Principles, Optimizations, Graceful Scaling and Comparison with Redis HashSlot

This article explains the concept of consistent hashing, its application in distributed cache load balancing, analyzes issues like data skew and cache avalanche, presents virtual‑node optimizations with Java test code, discusses graceful scaling strategies, and compares it to Redis’s HashSlot and P2P approaches.

Top Architect

Apr 27, 2020

Consistent Hashing: Principles, Optimizations, Graceful Scaling and Comparison with Redis HashSlot

Consistent hashing is a special hash algorithm valued for its balanced and stable mapping, widely used in load‑balancing scenarios such as Nginx and Memcached.

The article first introduces the basic idea of consistent hashing, showing how ordinary hash functions (e.g., MD5) scatter similar inputs into completely random outputs, and why a simple modulo‑based hash fails when the number of nodes changes.

It then describes the ring‑based consistent‑hash space: each node is placed on a circular hash ring, and a key is mapped to the first node clockwise from its hash value. This design ensures that only a small portion of keys move when nodes are added or removed.

Problems and Optimizations

Two main problems are identified: data skew (few nodes receive most keys when the node count is small) and cache avalanche (all keys of a removed node shift to a single neighbor, overloading it). The solution is to introduce virtual nodes—each real node is represented by many virtual nodes on the ring, which evens out the distribution and randomizes node order.

Code examples are provided to demonstrate the concepts. The first utility class implements the FNV1_32_HASH algorithm:

public class HashUtil {
    /**
     * Compute hash using FNV1_32_HASH
     */
    public static int getHash(String str) {
        final int p = 16777619;
        int hash = (int)2166136261L;
        for (int i = 0; i < str.length(); i++) {
            hash = (hash ^ str.charAt(i)) * p;
        }
        hash += hash << 13;
        hash ^= hash >> 7;
        hash += hash << 3;
        hash ^= hash >> 17;
        hash += hash << 5;
        if (hash < 0) {
            hash = Math.abs(hash);
        }
        return hash;
    }
}

A simple test without virtual nodes uses a TreeMap as the hash ring:

public class ConsistentHashingWithoutVirtualNode {
    private static String[] groups = {"192.168.0.0:111", "192.168.0.1:111", "192.168.0.2:111", "192.168.0.3:111", "192.168.0.4:111"};
    private static SortedMap<Integer, String> sortedMap = new TreeMap<>();
    static {
        for (String group : groups) {
            int hash = HashUtil.getHash(group);
            System.out.println("[" + group + "] launched @ " + hash);
            sortedMap.put(hash, group);
        }
    }
    private static String getServer(String widgetKey) {
        int hash = HashUtil.getHash(widgetKey);
        SortedMap<Integer, String> subMap = sortedMap.tailMap(hash);
        if (subMap == null || subMap.isEmpty()) {
            return sortedMap.get(sortedMap.firstKey());
        }
        return subMap.get(subMap.firstKey());
    }
    public static void main(String[] args) {
        Map<String, Integer> resMap = new HashMap<>();
        for (int i = 0; i < 100000; i++) {
            String server = getServer(Integer.toString(i));
            resMap.put(server, resMap.getOrDefault(server, 0) + 1);
        }
        resMap.forEach((k, v) -> System.out.println("group " + k + ": " + v + "(" + v/1000.0 + "%)"));
    }
}

The test shows an uneven load distribution. Adding 1,000 virtual nodes per real node dramatically improves balance:

public class ConsistentHashingWithVirtualNode {
    private static String[] groups = {"192.168.0.0:111", "192.168.0.1:111", "192.168.0.2:111", "192.168.0.3:111", "192.168.0.4:111"};
    private static List<String> realGroups = new LinkedList<>();
    private static SortedMap<Integer, String> virtualNodes = new TreeMap<>();
    private static final int VIRTUAL_NODE_NUM = 1000;
    static {
        realGroups.addAll(Arrays.asList(groups));
        for (String real : realGroups) {
            for (int i = 0; i < VIRTUAL_NODE_NUM; i++) {
                String vName = real + "&&VN" + i;
                int hash = HashUtil.getHash(vName);
                System.out.println("[" + vName + "] launched @ " + hash);
                virtualNodes.put(hash, vName);
            }
        }
    }
    private static String getRealNodeName(String virtualName) {
        return virtualName.split("&&")[0];
    }
    private static String getServer(String widgetKey) {
        int hash = HashUtil.getHash(widgetKey);
        SortedMap<Integer, String> subMap = virtualNodes.tailMap(hash);
        String vNode = (subMap == null || subMap.isEmpty()) ? virtualNodes.get(virtualNodes.firstKey()) : subMap.get(subMap.firstKey());
        return getRealNodeName(vNode);
    }
    // main method similar to the one above
}

After adding virtual nodes, the load distribution becomes almost uniform. The article also shows how to handle node addition and removal gracefully by refreshing the virtual‑node map:

private static void refreshHashCircle() {
    virtualNodes.clear();
    for (String real : realGroups) {
        for (int i = 0; i < VIRTUAL_NODE_NUM; i++) {
            String vName = getVirtualNodeName(real, i);
            virtualNodes.put(HashUtil.getHash(vName), vName);
        }
    }
}
private static void addGroup(String identifier) {
    realGroups.add(identifier);
    refreshHashCircle();
}
private static void removeGroup(String identifier) {
    realGroups.remove(identifier);
    refreshHashCircle();
}

Graceful scaling strategies are discussed: high‑frequency key pre‑warming (pulling hot keys to a new node before it serves traffic) and historical‑hash fallback (querying the old hash ring when a key is not found on the new ring). The article also covers shrinking problems, such as circuit‑breaker protection and the delay in propagating cluster‑state updates to all load‑balancers.

Comparison with Redis HashSlot

Redis Cluster uses a P2P‑based HashSlot algorithm: 16,384 slots are evenly divided among nodes, and a key’s slot is computed as CRC16(key) % 16384. Adding or removing a node moves only the slots belonging to that node, and each node maintains a full slot‑to‑node map, enabling decentralized request routing via the Gossip protocol.

Compared with classic consistent hashing, HashSlot + P2P eliminates the single‑point‑of‑failure of a central router and offers finer‑grained rebalancing, but its implementation is more complex.

In conclusion, consistent hashing with enough virtual nodes provides a simple yet effective load‑balancing solution for distributed caches, while Redis’s HashSlot approach offers better scalability and decentralization at the cost of added complexity.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

distributed cache Consistent Hashing virtual nodes Hash Slot

Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.