Understanding Consistent Hashing: From Simple Modulo Hash to Optimizations
This article explains the drawbacks of a basic modulo hash algorithm for key distribution, demonstrates how consistent hashing resolves scaling and node‑failure issues, and discusses virtual‑node techniques to mitigate data skew and improve load balancing in distributed cache systems.
In a simple hash algorithm, keys are hashed and then the hash value is taken modulo the number of nodes to determine the node that stores the key, which works well only when the cluster size remains constant.
When a new node is added, the modulo divisor changes, causing many keys to be remapped to different nodes; without data migration this leads to massive cache invalidation and costly data movement. Similarly, if a node fails, the remaining nodes receive a different set of keys, resulting in cache misses and potential cache penetration.
Consistent hashing improves upon the basic modulo approach by mapping both nodes and keys onto a logical hash ring. Each node is placed at a point on the ring, and a key is stored on the first node encountered when moving clockwise from the key’s position, which limits the impact of node addition or removal to only the keys that fall between the affected nodes.
When a new node (e.g., node3) joins, only the keys that fall between the predecessor node and the new node need to be moved, leaving the rest of the cluster untouched. If a node crashes (e.g., node2), its keys are automatically reassigned to the next clockwise node without affecting other keys.
However, consistent hashing can suffer from data skew when the number of physical nodes is small or when nodes are unevenly distributed on the ring, causing many keys to map to a single node, overloading it and potentially triggering a cascade of node failures.
To alleviate data skew, virtual nodes are introduced: each physical node is represented by multiple virtual nodes placed at different points on the hash ring. Keys are still mapped to the nearest virtual node, but the load is spread more evenly across physical nodes, preventing any single node from becoming a hotspot.
In summary, the article outlines the limitations of ordinary modulo hashing, shows how consistent hashing addresses scaling and failure challenges, and presents virtual‑node optimization to solve data‑skew problems in distributed caching systems.
Laravel Tech Community
Specializing in Laravel development, we continuously publish fresh content and grow alongside the elegant, stable Laravel framework.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.