Testing Consistent Hashing: Real‑World Node Scaling, Load Balance & Avalanche Risks
This article experimentally validates consistent hashing by adding and removing nodes in a simulated cluster, examines key redistribution, reveals load‑imbalance and avalanche vulnerabilities, and proposes solutions such as virtual nodes and adaptive replica factors, providing Python code examples and detailed observations.
Node Expansion Test
A four‑node cluster (192.168.1.1‑192.168.1.4) is expanded with a fifth node (192.168.1.5). The script prints the number of keys that must be migrated and the specific key‑node pairs.
print '上线一个节点,需要迁移节点个数:{}'.format(len(list(set(addNodeMap).difference(set(oldMap)))))
print '需要迁移的key-node信息:',list(set(addNodeMap).difference(set(oldMap)))Result: three keys are moved to the new node 192.168.1.5.
上线一个节点,需要迁移节点个数:3
需要迁移的key-node信息: ['testKey15_192.168.1.5', 'testKey36_192.168.1.5', 'testKey23_192.168.1.5']Node Shrink Test
The cluster now has five nodes. Removing node 192.168.1.5 triggers a similar migration.
print '需要迁移节点个数:{}' .format(len(list(set(removeNodeMap).difference(set(addNodeMap)))) )
print '等待迁移的key信息:{}' .format(list(set(removeNodeMap).difference(set(addNodeMap))))Result: three keys are migrated back to node 192.168.1.2.
下线节点['192.168.1.5'],需要迁移节点个数:3
等待迁移的key信息:['testKey23_192.168.1.2', 'testKey36_192.168.1.2', 'testKey15_192.168.1.2']Conclusion from Basic Tests
The experiments confirm that adding or removing a single node only affects the keys whose hash falls in the interval between the departing/arriving node and its immediate neighbor on the hash ring. This matches the theoretical expectation of consistent hashing.
Discovering Issues
Issue 1 – Uneven Key Distribution
When different nodes are removed, the number of affected keys varies dramatically (e.g., 14 keys when removing 192.168.1.2 versus 3 keys when removing 192.168.1.5). This leads to load imbalance: some nodes become hot while others remain idle.
Root cause: too few physical nodes (few shards) for the given key set.
Issue 2 – Avalanche Risk
When a node goes down, all its keys are reassigned to the next node clockwise on the ring. If that node is already heavily loaded, the sudden influx can cause it to fail, triggering a cascade of failures across the cluster.
Proposed Solutions
Solution 1 – Physical Expansion
Adding more physical machines improves balance but may be cost‑inefficient for small key volumes.
Solution 2 – Virtual Nodes
Introduce multiple virtual nodes per physical machine. For example, with a replica factor of 2, each physical node appears twice on the ring (e.g., A‑01, A‑02). This increases the number of points on the ring, making key distribution more uniform without adding hardware.
class ConsistentHash(object):
def __init__(self, nodes=None, replicaFactor=10):
self.nodes = nodes
self.replicaFactor = replicaFactor
self.ring = {}
self.sortedKeys = []
self.addNodes(nodes)
def addNodes(self, nodes):
if nodes:
for node in nodes:
for number in range(self.replicaFactor):
replicaNode = "%s%s" % (node, number)
replicaNodeHashResult = hashlib.sha1(replicaNode).hexdigest()
intReplicaNodeHashResult = int(replicaNodeHashResult, 16)
modIntReplicaNodeHashResult = intReplicaNodeHashResult % (2 ** 32)
self.ring[modIntReplicaNodeHashResult] = node
self.sortedKeys.append(modIntReplicaNodeHashResult)
self.sortedKeys.sort()
def removeNodes(self, nodes):
if nodes:
for node in nodes:
for number in range(self.replicaFactor):
replicaNode = "%s%s" % (node, number)
replicaNodeHashResult = hashlib.sha1(replicaNode).hexdigest()
intReplicaNodeHashResult = int(replicaNodeHashResult, 16)
modIntReplicaNodeHashResult = intReplicaNodeHashResult % (2 ** 32)
self.ring.pop(modIntReplicaNodeHashResult)
self.sortedKeys.remove(modIntReplicaNodeHashResult)
def getNode(self, modKeyHashResult):
for _modIntNodeHashResult in self.sortedKeys:
if modKeyHashResult < _modIntNodeHashResult:
return self.ring[_modIntNodeHashResult]
return self.ring[self.sortedKeys[0]]
def allocateKey(self, number):
keyNodeMap = []
for i in range(number):
keyName = 'testKey' + str(i)
keyHashResult = hashlib.sha1(keyName).hexdigest()
intKeyHashResult = int(keyHashResult, 16)
modKeyHashResult = intKeyHashResult % (2 ** 32)
_node = self.getNode(modKeyHashResult)
keyNodeMap.append(keyName + '_' + _node)
return keyNodeMapRe‑evaluation with Virtual Nodes
Running the same removal tests after adding virtual nodes shows that keys are now redistributed across multiple physical nodes, mitigating both load imbalance and avalanche scenarios.
下线节点['192.168.1.1'],需要迁移节点个数:15
等待迁移的key信息:['testKey27_192.168.1.5', 'testKey15_192.168.1.5', ...]
下线节点['192.168.1.2'],需要迁移节点个数:6
等待迁移的key信息:['testKey2_192.168.1.1', 'testKey10_192.168.1.1', ...]Further Discussion
Even with virtual nodes, perfect uniformity is unattainable; the goal is relative balance. When key request rates differ, additional strategies such as per‑node replicaFactor tuning or dynamic key‑migration based on load metrics may be required. The optimal number of virtual nodes depends on hardware, workload, and acceptable computation overhead, and should be determined experimentally.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
