How Consistent Hashing Minimizes Data Movement in Scalable Systems
This article explains the principle of consistent hashing, shows how to build a 2^32‑range hash ring, map nodes and keys onto it, and demonstrates node addition and removal with a Python implementation that highlights reduced data reshuffling compared to simple modulo hashing.
1. Build an empty hash ring
Consistent hashing maps values onto a virtual circle ranging from 0 to 2^32‑1. The circle is created by taking the modulo of a 32‑bit unsigned integer, which aligns with Java's int range and provides a large address space for uniform distribution.
2. Construct node topology on the ring
Each physical node (identified by IP or hostname) is hashed with SHA‑1, converted to an integer, and reduced modulo 2^32. The resulting hash value becomes the node’s position on the ring, and a dictionary ring stores the mapping from hash to node while a sorted list sortedKeys maintains the clockwise order.
3. Map keys to the hash ring
Keys are processed the same way: hash → integer → modulo 2^32. The key’s hash is then located on the ring; the first node encountered when moving clockwise is the responsible node. This ensures each key is assigned to exactly one node.
4. Simulate node removal
When a node goes offline, only the keys that were mapped to the segment between the removed node and its immediate predecessor (counter‑clockwise) need to be remapped to the next clockwise node. All other segments remain untouched, dramatically reducing the amount of data movement.
5. Simulate node addition
Adding a new node inserts a new point on the ring. Only the keys that fall between the new node and its immediate predecessor need to be moved to the new node, leaving the rest of the system stable.
6. Model analysis
Compared with plain modulo hashing, consistent hashing limits the impact of a node change to a single segment, improving fault tolerance and scalability. The algorithm therefore offers a clear advantage for distributed storage, caching, and load‑balancing scenarios.
Code implementation
The following Python script implements the described logic, including node addition, removal, and key allocation. It prints the mapping of each test key before and after topology changes, demonstrating the minimal redistribution effect.
#!/usr/bin/env python
# -*- coding:utf8 -*-
# author: lzj
import hashlib
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
class ConsistentHash(object):
def __init__(self, nodes=None):
'''
a. Initialize the node list.
b. ring maps hash values to nodes.
c. sortedKeys keeps the hash values in ascending order, representing the ring.
d. Add the initial nodes to the ring.
'''
self.nodes = nodes
self.ring = {}
self.sortedKeys = []
self.addNodes(nodes)
# nodes is a list of node identifiers
def addNodes(self, nodes):
if nodes:
for node in nodes:
nodeHashResult = hashlib.sha1(node).hexdigest()
intNodeHashResult = int(nodeHashResult, 16)
modIntNodeHashResult = intNodeHashResult % (2 ** 32)
self.ring[modIntNodeHashResult] = node
self.sortedKeys.append(modIntNodeHashResult)
self.sortedKeys.sort()
def removeNodes(self, nodes):
'''
Remove nodes from the ring and clean up their hash entries.
'''
if nodes:
for node in nodes:
nodeHashResult = hashlib.sha1(node).hexdigest()
intNodeHashResult = int(nodeHashResult, 16)
modIntNodeHashResult = intNodeHashResult % (2 ** 32)
self.ring.pop(modIntNodeHashResult)
self.sortedKeys.remove(modIntNodeHashResult)
def getNode(self, modKeyHashResult):
'''
Find the first node whose hash is greater than the key's hash.
If none is found, wrap around to the first node.
'''
position = 0
for _modIntNodeHashResult in self.sortedKeys:
position += 1
if modKeyHashResult < _modIntNodeHashResult:
return self.ring[_modIntNodeHashResult]
else:
continue
if position == len(self.sortedKeys):
return self.ring[self.sortedKeys[0]]
def allocateKey(self, number):
keyNodeMap = []
# Simulate a number of keys
for i in range(number):
keyName = 'testKey' + str(i)
keyHashResult = hashlib.sha1(keyName).hexdigest()
intKeyHashResult = int(keyHashResult, 16)
modKeyHashResult = intKeyHashResult % (2 ** 32)
_node = self.getNode(modKeyHashResult)
print('%s is allocateKey to %s' % (keyName, _node))
keyNodeMap.append(keyName + '_' + _node)
return keyNodeMap
# Simulate a cluster with four nodes
iniServers = ['192.168.1.1', '192.168.1.2', '192.168.1.3', '192.168.1.4']
print('Initialize hash ring and node topology')
h = ConsistentHash(iniServers)
print(h.ring)
print(h.sortedKeys)
print('Allocate 40 keys to the ring')
oldMap = h.allocateKey(40)
print('Add a new node and re‑allocate')
newNode = ['192.168.1.5']
h.addNodes(newNode)
print(h.ring)
print(h.sortedKeys)
addNodeMap = h.allocateKey(40)
print('Remove a node and re‑allocate')
removeNode = ['192.168.1.1']
h.removeNodes(removeNode)
print(h.ring)
print(h.sortedKeys)
removeNodeMap = h.allocateKey(40)Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
