How Consistent Hashing Solves Cache Scaling and Reduces Data Skew
This article explains the principles of consistent hashing, compares it with simple modulo hashing, and shows how the hash ring, virtual nodes, and key mapping improve load balancing, minimize data loss during node changes, and prevent cache avalanche in distributed caching systems.
Hello everyone, I'm Su San. Recently I saw a discussion about consistent hashing in a tech group, so I decided to introduce its principles using a classic distributed cache scenario that often appears in interviews.
Build Scenario
Assume we have three cache servers named
node0,
node1, and
node2, and 30 million
keyobjects that need to be evenly cached across the three machines.
Hash Problem
The straightforward solution is the modulo algorithm
hash(key) % N, where N is the number of servers. This maps each key to one of the three nodes directly, which works but has a major limitation when the number of servers changes.
If a node fails or a new node is added, the expression
hash(key) % Nchanges, causing many keys to be remapped to different servers. This leads to massive cache invalidation and potential cache avalanche, which is unacceptable in production.
Consistent Hash
Consistent hashing also uses a modulo operation, but instead of modulo N it takes modulo 2^32, effectively mapping both servers and keys onto a circular hash ring.
“IPv4 addresses consist of four 8‑bit groups, so using 2^32 ensures a unique mapping for each IP address.”
The 2^32 possible values are visualized as a circle (the
hash ring), with positions ranging from 0 to 2^32‑1.
Mapping Servers to the Hash Ring
Each server’s IP address is hashed, and the result modulo 2^32 determines its position on the ring. Thus
node0,
node1, and
node2are placed at distinct points on the ring.
Mapping Keys to the Hash Ring
Each key is also hashed and placed on the same ring using
hash(key) % 2^32. The key is stored on the first server encountered when moving clockwise from the key’s position.
“Starting from the key’s position, the first server encountered clockwise is the server that will cache the object.”
Because both server and key hashes are stable, a given key will consistently map to the same server as long as the server set does not change.
key-1 -> node-1 key-3 -> node-2 key-4 -> node-2 key-5 -> node-2 key-2 -> node-0Advantages of Consistent Hash
When adding a new node (e.g.,
node-4), only the keys that fall between the new node and its predecessor on the ring need to be remapped, affecting a small portion of data. Similarly, if a node fails, only the keys that were mapped to that node are reassigned to the next clockwise node, minimizing disruption.
Thus, changes in the number of servers affect only a small subset of keys, keeping the cache service largely available.
Data Skew Problem
If nodes are unevenly distributed on the ring, some nodes may receive a disproportionate amount of keys, leading to data skew and resource imbalance.
Introducing virtual nodes solves this by hashing multiple points for each physical server, spreading the load more evenly.
Virtual Nodes
Each server generates several virtual nodes, e.g.,
node-1#1,
node-1#2,
node-1#3, each hashed with
hash(IP#i) % 2^32. These virtual positions are placed on the ring, making the distribution of keys more uniform.
hash(10.24.23.227#1) % 2^32 hash(10.24.23.227#2) % 2^32 hash(10.24.23.227#3) % 2^32“The more virtual nodes you add, the more evenly the hash ring is populated, reducing skew.”
When a key is looked up, the process becomes
key → virtual node → real node.
Application Scenarios
Consistent hashing is the preferred load‑balancing algorithm in distributed systems. It can be implemented on the client side or within middleware such as the cache servers
memcachedand
redis. Memcached uses it for client‑side routing, while Redis clusters use a similar slot‑based approach.
RPCframeworks like Dubbo for service provider selection
Sharding in distributed relational databases
LVS load‑balancer scheduling
…and many other cases
Summary
Consistent hashing provides a robust way to distribute keys across a dynamic set of nodes, minimizing cache invalidation during scaling or failures. However, when the number of nodes becomes very large or updates are frequent, lookup performance may degrade, and a reliable routing service is required to avoid a single point of failure.
Overall, the technique’s benefits outweigh its drawbacks, making it a valuable tool for solving real‑world scaling problems.
Su San Talks Tech
Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.