Fundamentals 17 min read

How Consistent Hashing Minimizes Data Movement in Scalable Systems

This article explains the principle of consistent hashing, shows how to build a 2^32‑range hash ring, map nodes and keys onto it, and demonstrates node addition and removal with a Python implementation that highlights reduced data reshuffling compared to simple modulo hashing.

ITPUB
ITPUB
ITPUB
How Consistent Hashing Minimizes Data Movement in Scalable Systems

1. Build an empty hash ring

Consistent hashing maps values onto a virtual circle ranging from 0 to 2^32‑1. The circle is created by taking the modulo of a 32‑bit unsigned integer, which aligns with Java's int range and provides a large address space for uniform distribution.

2. Construct node topology on the ring

Each physical node (identified by IP or hostname) is hashed with SHA‑1, converted to an integer, and reduced modulo 2^32. The resulting hash value becomes the node’s position on the ring, and a dictionary ring stores the mapping from hash to node while a sorted list sortedKeys maintains the clockwise order.

3. Map keys to the hash ring

Keys are processed the same way: hash → integer → modulo 2^32. The key’s hash is then located on the ring; the first node encountered when moving clockwise is the responsible node. This ensures each key is assigned to exactly one node.

4. Simulate node removal

When a node goes offline, only the keys that were mapped to the segment between the removed node and its immediate predecessor (counter‑clockwise) need to be remapped to the next clockwise node. All other segments remain untouched, dramatically reducing the amount of data movement.

5. Simulate node addition

Adding a new node inserts a new point on the ring. Only the keys that fall between the new node and its immediate predecessor need to be moved to the new node, leaving the rest of the system stable.

6. Model analysis

Compared with plain modulo hashing, consistent hashing limits the impact of a node change to a single segment, improving fault tolerance and scalability. The algorithm therefore offers a clear advantage for distributed storage, caching, and load‑balancing scenarios.

Code implementation

The following Python script implements the described logic, including node addition, removal, and key allocation. It prints the mapping of each test key before and after topology changes, demonstrating the minimal redistribution effect.

#!/usr/bin/env python
# -*- coding:utf8 -*-
# author: lzj
import hashlib
import sys
reload(sys)
sys.setdefaultencoding('utf-8')

class ConsistentHash(object):
    def __init__(self, nodes=None):
        '''
        a. Initialize the node list.
        b. ring maps hash values to nodes.
        c. sortedKeys keeps the hash values in ascending order, representing the ring.
        d. Add the initial nodes to the ring.
        '''
        self.nodes = nodes
        self.ring = {}
        self.sortedKeys = []
        self.addNodes(nodes)

    # nodes is a list of node identifiers
    def addNodes(self, nodes):
        if nodes:
            for node in nodes:
                nodeHashResult = hashlib.sha1(node).hexdigest()
                intNodeHashResult = int(nodeHashResult, 16)
                modIntNodeHashResult = intNodeHashResult % (2 ** 32)
                self.ring[modIntNodeHashResult] = node
                self.sortedKeys.append(modIntNodeHashResult)
                self.sortedKeys.sort()

    def removeNodes(self, nodes):
        '''
        Remove nodes from the ring and clean up their hash entries.
        '''
        if nodes:
            for node in nodes:
                nodeHashResult = hashlib.sha1(node).hexdigest()
                intNodeHashResult = int(nodeHashResult, 16)
                modIntNodeHashResult = intNodeHashResult % (2 ** 32)
                self.ring.pop(modIntNodeHashResult)
                self.sortedKeys.remove(modIntNodeHashResult)

    def getNode(self, modKeyHashResult):
        '''
        Find the first node whose hash is greater than the key's hash.
        If none is found, wrap around to the first node.
        '''
        position = 0
        for _modIntNodeHashResult in self.sortedKeys:
            position += 1
            if modKeyHashResult < _modIntNodeHashResult:
                return self.ring[_modIntNodeHashResult]
            else:
                continue
        if position == len(self.sortedKeys):
            return self.ring[self.sortedKeys[0]]

    def allocateKey(self, number):
        keyNodeMap = []
        # Simulate a number of keys
        for i in range(number):
            keyName = 'testKey' + str(i)
            keyHashResult = hashlib.sha1(keyName).hexdigest()
            intKeyHashResult = int(keyHashResult, 16)
            modKeyHashResult = intKeyHashResult % (2 ** 32)
            _node = self.getNode(modKeyHashResult)
            print('%s is allocateKey to %s' % (keyName, _node))
            keyNodeMap.append(keyName + '_' + _node)
        return keyNodeMap

# Simulate a cluster with four nodes
iniServers = ['192.168.1.1', '192.168.1.2', '192.168.1.3', '192.168.1.4']
print('Initialize hash ring and node topology')
h = ConsistentHash(iniServers)
print(h.ring)
print(h.sortedKeys)

print('Allocate 40 keys to the ring')
oldMap = h.allocateKey(40)

print('Add a new node and re‑allocate')
newNode = ['192.168.1.5']
h.addNodes(newNode)
print(h.ring)
print(h.sortedKeys)
addNodeMap = h.allocateKey(40)

print('Remove a node and re‑allocate')
removeNode = ['192.168.1.1']
h.removeNodes(removeNode)
print(h.ring)
print(h.sortedKeys)
removeNodeMap = h.allocateKey(40)
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

load balancingconsistent hashinghash ringNode Scaling
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.