Consistent Hashing Algorithm: Principles, Virtual Nodes, and Practical Applications
This article explains the consistent hashing algorithm, its hash‑ring model, how it overcomes the limitations of simple modulo hashing in distributed caches, the role of virtual nodes for load balancing, and common use cases such as memcached, Redis clusters, and load‑balancing routers.
When distributing millions of keys across a few cache servers, the naive modulo hashing method (hash(key) % N) works but suffers from massive data movement when servers are added or removed, leading to cache avalanches and uneven load.
Consistent hashing, proposed by MIT in 1997, maps both servers and keys onto a logical 2^32‑sized ring. Each server is placed on the ring by hashing its IP (or identifier) and taking the result modulo 2^32. A key is stored on the first server encountered when moving clockwise from the key’s position on the ring.
Because only the neighboring segment of the ring changes when a server joins or leaves, only a small subset of keys need to be remapped, dramatically reducing cache invalidation.
To improve balance, virtual nodes are introduced: each physical server is represented by multiple points on the ring (e.g., IP#1, IP#2, …). Keys first map to a virtual node, which then maps to the underlying physical server, smoothing out uneven distribution.
Typical steps:
hash(nodeIP#1) % 2^32
hash(nodeIP#2) % 2^32
hash(nodeIP#3) % 2^32When a server fails, only keys that were mapped to its segment are reassigned to the next server clockwise. When a new server is added, only keys in the segment between the new server and its predecessor need to move.
Consistent hashing is widely used in distributed systems for load balancing, including memcached client‑side routing, Redis cluster hash slots, RPC frameworks like Dubbo, sharding in relational databases, and LVS load‑balancing.
In practice, increasing the number of virtual nodes per physical server yields a more uniform key distribution, though it adds the overhead of maintaining the virtual‑to‑physical mapping.
Overall, consistent hashing provides a flexible, scalable solution for evenly distributing data and requests across dynamic server pools.
Architect's Guide
Dedicated to sharing programmer-architect skills—Java backend, system, microservice, and distributed architectures—to help you become a senior architect.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.