Fundamentals 11 min read

Master Consistent Hashing: Principles, Virtual Nodes, and Go Implementation

Consistent hashing, a cornerstone of distributed systems, balances load, enhances scalability, and minimizes data migration; this article explains its fundamentals, the drawbacks of basic implementations, the role of virtual nodes, and provides a complete Go-language example with code for adding, removing, and locating nodes.

Radish, Keep Going!
Radish, Keep Going!
Radish, Keep Going!
Master Consistent Hashing: Principles, Virtual Nodes, and Go Implementation

What Is Consistent Hashing?

First, quote a definition from Wikipedia.

Consistent hashing is a special hashing algorithm. After using consistent hashing, changing the number of hash table slots (size) on average requires remapping only K/n keys, where K is the number of keys and n is the number of slots. In traditional hash tables, adding or removing a slot almost always requires remapping all keys. — Wikipedia

In distributed systems, consistent hashing appears everywhere—CDN, key‑value stores, load balancers—and is a foundational algorithm. It offers several advantages:

Load balancing: The algorithm distributes data evenly across nodes.

Scalability: When nodes are added or removed, only a portion of data needs to be remapped, making horizontal scaling easier.

Reduced data migration: Compared with traditional hashing, consistent hashing requires far less data movement when nodes change, lowering overhead, instability, and latency.

The goal of this article is to study consistent hashing and its simple implementation.

Principles of Consistent Hashing

Basic Consistent Hash Algorithm

The simplest consistent‑hash algorithm places nodes directly on a ring, dividing the value space. A key is hashed with hash(x) and the resulting value falls into a node’s range. The most common value‑space size is 2^32‑1, and nodes are assigned intervals within this space.

Pasted image 20240620095057
Pasted image 20240620095057

Node A stores the hash range [0, 2^12). Node B stores the hash range [2^12, 2^28). Node C stores the hash range [2^28, 0). The basic algorithm has obvious drawbacks:

Random node placement makes it hard to achieve uniform distribution of hash values.

After dynamically adding nodes, maintaining uniformity becomes difficult.

Node addition or removal causes serious issues:

If a node fails, its load shifts entirely to an adjacent node.

When a new node joins, it can only share the load of one neighboring node.

Virtual Nodes

Rob Pike once said, “In computing, there is no problem that cannot be solved by adding a layer of indirection.” Consistent hashing follows the same principle.

If three physical nodes are imbalanced, we create many virtual nodes—e.g., A[a1, a2, …, a1024]—and map them onto the hash ring. Each virtual node owns a hash interval and forwards keys to its corresponding physical node.

Pasted image 20240620100713
Pasted image 20240620100713

Introducing enough virtual nodes balances data across physical nodes (at the cost of engineering overhead). If a node crashes, its data is evenly redistributed across the cluster; similarly, a new node can share the load of all existing nodes.

Go Implementation

Full source code: https://github.com/hxzhouh/blog-example/tree/main/Distributed/consistent_hashing

The hash ring is defined using crc32.ChecksumIEEE as the default hash function.

type VirtualNode struct {
    // virtual node
    Hash uint32
    Node *Node
}
type Node struct {
    // physical node
    ID   string
    Addr string
}
type HashRing struct {
    Nodes        map[string]*Node
    VirtualNodes []VirtualNode
    mu           sync.Mutex
    hash          HashFunc
}
func NewHashRing(hash HashFunc) *HashRing {
    if hash == nil {
        hash = crc32.ChecksumIEEE
    }
    return &HashRing{
        Nodes:        make(map[string]*Node),
        VirtualNodes: make([]VirtualNode, 0),
        hash:         hash,
    }
}

Adding a node creates a set of virtual nodes and keeps them sorted:

func (hr *HashRing) AddNode(node *Node) {
    hr.mu.Lock()
    defer hr.mu.Unlock()
    hr.Nodes[node.ID] = node
    for i := 0; i < VirtualNodesPerNode; i++ {
        virtualNodeID := fmt.Sprintf("%s-%d", node.ID, i)
        hash := hr.hash([]byte(virtualNodeID))
        hr.VirtualNodes = append(hr.VirtualNodes, VirtualNode{Hash: hash, Node: node})
    }
    sort.Slice(hr.VirtualNodes, func(i, j int) bool {
        return hr.VirtualNodes[i].Hash < hr.VirtualNodes[j].Hash
    })
}

Removing a node deletes its virtual nodes:

func (hr *HashRing) RemoveNode(nodeID string) {
    hr.mu.Lock()
    defer hr.mu.Unlock()
    delete(hr.Nodes, nodeID)
    var virtualNodes []VirtualNode
    for _, vn := range hr.VirtualNodes {
        if vn.Node.ID != nodeID {
            virtualNodes = append(virtualNodes, vn)
        }
    }
    hr.VirtualNodes = virtualNodes
}

Lookup finds the appropriate virtual node and then its physical node:

func (hr *HashRing) GetNode(key string) *Node {
    hr.mu.Lock()
    defer hr.mu.Unlock()
    if len(hr.VirtualNodes) == 0 {
        return nil
    }
    hash := hr.hash([]byte(key))
    idx := sort.Search(len(hr.VirtualNodes), func(i int) bool {
        return hr.VirtualNodes[i].Hash >= hash
    })
    if idx == len(hr.VirtualNodes) {
        idx = 0
    }
    return hr.VirtualNodes[idx].Node
}

Example KV system using the hash ring:

type KVSystem struct {
    hashRing *HashRing
    kvStores map[string]*kvstorage.KVStore
}
func NewKVSystem(nodes int) *KVSystem {
    hashRing := NewHashRing(crc32.ChecksumIEEE)
    for i := 0; i < nodes; i++ {
        node := &Node{
            ID:   fmt.Sprintf("Node%d", i),
            Addr: fmt.Sprintf("192.168.1.%d", i+1),
        }
        hashRing.AddNode(node)
    }
    kvStores := make(map[string]*kvstorage.KVStore)
    for id := range hashRing.Nodes {
        kvStores[id] = kvstorage.NewKVStore()
    }
    return &KVSystem{hashRing: hashRing, kvStores: kvStores}
}
func (kv *KVSystem) Get(key string) (string, bool) {
    node := kv.hashRing.GetNode(key)
    return kv.kvStores[node.ID].Get(key)
}
func (kv *KVSystem) Set(key, value string) {
    node := kv.hashRing.GetNode(key)
    kv.kvStores[node.ID].Set(key, value)
}
func (kv *KVSystem) Delete(key string) {
    node := kv.hashRing.GetNode(key)
    kv.kvStores[node.ID].Delete(key)
}
func (kv *KVSystem) DeleteNode(nodeID string) {
    allData := kv.kvStores[nodeID].GetAll()
    kv.hashRing.RemoveNode(nodeID)
    delete(kv.kvStores, nodeID)
    for key, value := range allData {
        kv.Set(key, value)
    }
}
func (kv *KVSystem) AddNode() {
    node := &Node{
        ID:   fmt.Sprintf("Node%d", len(kv.hashRing.Nodes)),
        Addr: fmt.Sprintf("192.168.1.%d", len(kv.hashRing.Nodes)+1),
    }
    kv.hashRing.AddNode(node)
    kv.kvStores[node.ID] = kvstorage.NewKVStore()
}

This simple consistent‑hash‑based KV store demonstrates how a few lines of Go code can underpin large‑scale network services.

References

Consistent_hashing [3]

Links

[1] medium: https://medium.huizhou92.com/

[2] https://github.com/hxzhouh/blog-example/tree/main/Distributed/consistent_hashing

[3] https://en.wikipedia.org/wiki/Consistent_hashing

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

distributed systemsload balancingconsistent hashingvirtual nodes
Radish, Keep Going!
Written by

Radish, Keep Going!

Personal sharing

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.