Fundamentals 9 min read

Consistent Hashing Explained Through the Tale of Han Xin’s Soldier Allocation

The article uses a historical story to illustrate basic hash‑based partitioning, its shortcomings when nodes change, and then introduces consistent hashing, hash rings, and virtual nodes as solutions for balanced load distribution and minimal data migration in distributed systems.

Wukong Talks Architecture
Wukong Talks Architecture
Wukong Talks Architecture
Consistent Hashing Explained Through the Tale of Han Xin’s Soldier Allocation

Using a fictional dialogue between Liu Bang, Han Xin, and Xiao He, the article introduces the problem of evenly distributing a large number of soldiers (data items) into groups (nodes) and explains how a simple hash algorithm works by taking the modulo of a soldier's numeric ID with the number of groups.

Han Xin's first skill: Hash algorithm Each soldier's six‑digit number is treated as a hash value; the remainder after dividing by the group count determines the group. This method is simple but causes massive data reshuffling when the number of groups changes.

One Skill: Hash Algorithm

Grouping

The hash algorithm assigns a soldier to a group by num % N , where N is the number of groups. For example, soldier 123456 belongs to group 0 because 123456 % 3 = 0.

Finding a Soldier

To locate soldier 666666, compute 666666 % 3 = 0, indicating the first group, then search within that group.

Drawbacks of Simple Hashing

When the group count changes (e.g., from 3 to 4), many soldiers are reassigned to different groups, breaking existing relationships and causing large data migration.

Second Skill: Consistent Hashing

Hash Ring

Consistent hashing also uses modulo operations but maps the entire 2^32 hash space onto a virtual ring. Nodes are placed on the ring, and each data item is assigned to the first node encountered clockwise.

Hash algorithm : modulo the number of nodes.

Consistent hash algorithm : modulo 2^32, creating a ring.

Soldier Allocation on the Ring

With three nodes (A, B, C) the ring is divided into three intervals (C‑A, A‑B, B‑C). A soldier whose hash falls into C‑A is stored on node A, and so on.

Adding a New Group

When a fourth node D is added, only the interval that D now covers (e.g., B‑D) needs to migrate its data from the previous node, while the rest of the data remains untouched, dramatically reducing migration.

Hash Ring Drawbacks

If nodes are few, the intervals on the ring can be uneven, leading to hot‑spot and cold‑spot problems in real‑world services.

Third Skill: Virtual Nodes

Virtual nodes are multiple logical points for each physical node, evenly spread around the ring. For example, 12 virtual nodes N1‑N12 are mapped to physical nodes A, B, C, D, ensuring a more uniform distribution and balanced load.

With virtual nodes, the previously small B‑D interval is split among several virtual nodes, making the data distribution much more even.

Summary

Simple hash algorithms cause large data migration when nodes are added or removed.

Consistent hashing reduces migration by assigning data based on a ring.

Few nodes lead to uneven interval sizes and load imbalance.

Virtual node mapping solves the uneven distribution problem.

More nodes mean less data to move; consistent hashing is ideal for load‑balancing scenarios.

backendDistributed Systemsload balancingHashingconsistent hashingvirtual nodes
Wukong Talks Architecture
Written by

Wukong Talks Architecture

Explaining distributed systems and architecture through stories. Author of the "JVM Performance Tuning in Practice" column, open-source author of "Spring Cloud in Practice PassJava", and independently developed a PMP practice quiz mini-program.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.