Master Consistent Hashing: Principles, Virtual Nodes, and Go Implementation
Consistent hashing, a cornerstone of distributed systems, balances load, enhances scalability, and minimizes data migration; this article explains its fundamentals, the drawbacks of basic implementations, the role of virtual nodes, and provides a complete Go-language example with code for adding, removing, and locating nodes.
What Is Consistent Hashing?
First, quote a definition from Wikipedia.
Consistent hashing is a special hashing algorithm. After using consistent hashing, changing the number of hash table slots (size) on average requires remapping only K/n keys, where K is the number of keys and n is the number of slots. In traditional hash tables, adding or removing a slot almost always requires remapping all keys. — Wikipedia
In distributed systems, consistent hashing appears everywhere—CDN, key‑value stores, load balancers—and is a foundational algorithm. It offers several advantages:
Load balancing: The algorithm distributes data evenly across nodes.
Scalability: When nodes are added or removed, only a portion of data needs to be remapped, making horizontal scaling easier.
Reduced data migration: Compared with traditional hashing, consistent hashing requires far less data movement when nodes change, lowering overhead, instability, and latency.
The goal of this article is to study consistent hashing and its simple implementation.
Principles of Consistent Hashing
Basic Consistent Hash Algorithm
The simplest consistent‑hash algorithm places nodes directly on a ring, dividing the value space. A key is hashed with hash(x) and the resulting value falls into a node’s range. The most common value‑space size is 2^32‑1, and nodes are assigned intervals within this space.
Node A stores the hash range [0, 2^12). Node B stores the hash range [2^12, 2^28). Node C stores the hash range [2^28, 0). The basic algorithm has obvious drawbacks:
Random node placement makes it hard to achieve uniform distribution of hash values.
After dynamically adding nodes, maintaining uniformity becomes difficult.
Node addition or removal causes serious issues:
If a node fails, its load shifts entirely to an adjacent node.
When a new node joins, it can only share the load of one neighboring node.
Virtual Nodes
Rob Pike once said, “In computing, there is no problem that cannot be solved by adding a layer of indirection.” Consistent hashing follows the same principle.
If three physical nodes are imbalanced, we create many virtual nodes—e.g., A[a1, a2, …, a1024]—and map them onto the hash ring. Each virtual node owns a hash interval and forwards keys to its corresponding physical node.
Introducing enough virtual nodes balances data across physical nodes (at the cost of engineering overhead). If a node crashes, its data is evenly redistributed across the cluster; similarly, a new node can share the load of all existing nodes.
Go Implementation
Full source code: https://github.com/hxzhouh/blog-example/tree/main/Distributed/consistent_hashing
The hash ring is defined using crc32.ChecksumIEEE as the default hash function.
type VirtualNode struct {
// virtual node
Hash uint32
Node *Node
}
type Node struct {
// physical node
ID string
Addr string
}
type HashRing struct {
Nodes map[string]*Node
VirtualNodes []VirtualNode
mu sync.Mutex
hash HashFunc
}
func NewHashRing(hash HashFunc) *HashRing {
if hash == nil {
hash = crc32.ChecksumIEEE
}
return &HashRing{
Nodes: make(map[string]*Node),
VirtualNodes: make([]VirtualNode, 0),
hash: hash,
}
}Adding a node creates a set of virtual nodes and keeps them sorted:
func (hr *HashRing) AddNode(node *Node) {
hr.mu.Lock()
defer hr.mu.Unlock()
hr.Nodes[node.ID] = node
for i := 0; i < VirtualNodesPerNode; i++ {
virtualNodeID := fmt.Sprintf("%s-%d", node.ID, i)
hash := hr.hash([]byte(virtualNodeID))
hr.VirtualNodes = append(hr.VirtualNodes, VirtualNode{Hash: hash, Node: node})
}
sort.Slice(hr.VirtualNodes, func(i, j int) bool {
return hr.VirtualNodes[i].Hash < hr.VirtualNodes[j].Hash
})
}Removing a node deletes its virtual nodes:
func (hr *HashRing) RemoveNode(nodeID string) {
hr.mu.Lock()
defer hr.mu.Unlock()
delete(hr.Nodes, nodeID)
var virtualNodes []VirtualNode
for _, vn := range hr.VirtualNodes {
if vn.Node.ID != nodeID {
virtualNodes = append(virtualNodes, vn)
}
}
hr.VirtualNodes = virtualNodes
}Lookup finds the appropriate virtual node and then its physical node:
func (hr *HashRing) GetNode(key string) *Node {
hr.mu.Lock()
defer hr.mu.Unlock()
if len(hr.VirtualNodes) == 0 {
return nil
}
hash := hr.hash([]byte(key))
idx := sort.Search(len(hr.VirtualNodes), func(i int) bool {
return hr.VirtualNodes[i].Hash >= hash
})
if idx == len(hr.VirtualNodes) {
idx = 0
}
return hr.VirtualNodes[idx].Node
}Example KV system using the hash ring:
type KVSystem struct {
hashRing *HashRing
kvStores map[string]*kvstorage.KVStore
}
func NewKVSystem(nodes int) *KVSystem {
hashRing := NewHashRing(crc32.ChecksumIEEE)
for i := 0; i < nodes; i++ {
node := &Node{
ID: fmt.Sprintf("Node%d", i),
Addr: fmt.Sprintf("192.168.1.%d", i+1),
}
hashRing.AddNode(node)
}
kvStores := make(map[string]*kvstorage.KVStore)
for id := range hashRing.Nodes {
kvStores[id] = kvstorage.NewKVStore()
}
return &KVSystem{hashRing: hashRing, kvStores: kvStores}
}
func (kv *KVSystem) Get(key string) (string, bool) {
node := kv.hashRing.GetNode(key)
return kv.kvStores[node.ID].Get(key)
}
func (kv *KVSystem) Set(key, value string) {
node := kv.hashRing.GetNode(key)
kv.kvStores[node.ID].Set(key, value)
}
func (kv *KVSystem) Delete(key string) {
node := kv.hashRing.GetNode(key)
kv.kvStores[node.ID].Delete(key)
}
func (kv *KVSystem) DeleteNode(nodeID string) {
allData := kv.kvStores[nodeID].GetAll()
kv.hashRing.RemoveNode(nodeID)
delete(kv.kvStores, nodeID)
for key, value := range allData {
kv.Set(key, value)
}
}
func (kv *KVSystem) AddNode() {
node := &Node{
ID: fmt.Sprintf("Node%d", len(kv.hashRing.Nodes)),
Addr: fmt.Sprintf("192.168.1.%d", len(kv.hashRing.Nodes)+1),
}
kv.hashRing.AddNode(node)
kv.kvStores[node.ID] = kvstorage.NewKVStore()
}This simple consistent‑hash‑based KV store demonstrates how a few lines of Go code can underpin large‑scale network services.
References
Consistent_hashing [3]
Links
[1] medium: https://medium.huizhou92.com/
[2] https://github.com/hxzhouh/blog-example/tree/main/Distributed/consistent_hashing
[3] https://en.wikipedia.org/wiki/Consistent_hashing
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
