Backend Development 14 min read

Data Distribution Algorithms: Modulo and Consistent Hash Implementations in Java

The article introduces a Java HashNodeService interface and demonstrates two data distribution algorithms—simple modulo hashing, which evenly spreads keys but remaps them heavily when nodes change, and consistent hashing with optional virtual nodes, which maintains balanced placement and high cache‑hit rates despite topology adjustments.

vivo Internet Technology
vivo Internet Technology
vivo Internet Technology
Data Distribution Algorithms: Modulo and Consistent Hash Implementations in Java

In distributed environments, data placement is often determined by a rule that generates a key and then applies an algorithm to decide the target node. This article introduces two common approaches: the modulo algorithm and consistent hashing.

Interface Definition

/**
 * 数据分布hash算法接口定义
 * @author xingchuan.qxc
 */
public interface HashNodeService {
    /**
     * 集群增加一个数据存储节点
     * @param node
     */
    public void addNode(Node node);
    /**
     * 数据存储时查找具体使用哪个节点来存储
     * @param key
     * @return
     */
    public Node lookupNode(String key);
    /**
     * hash的算法
     * @param key
     * @return
     */
    public Long hash(String key);
    /**
     * 模拟意外情况断掉一个节点,用于测试缓存命中率
     * @param node
     */
    public void removeNodeUnexpected(Node node);
}

1. Modulo Algorithm

The modulo algorithm hashes a user key (using CRC32) and then takes the remainder modulo the number of storage nodes to obtain an index.

/**
 * 取模数据分布算法实现
 * @author xingchuan.qxc
 */
public class NormalHashNodeServiceImpl implements HashNodeService {
    /** 存储节点列表 */
    private List
nodes = new ArrayList<>();
    @Override
    public void addNode(Node node) {
        this.nodes.add(node);
    }
    @Override
    public Node lookupNode(String key) {
        long k = hash(key);
        int index = (int) (k % nodes.size());
        return nodes.get(index);
    }
    @Override
    public Long hash(String key) {
        CRC32 crc32 = new CRC32();
        crc32.update(key.getBytes());
        return crc32.getValue();
    }
    @Override
    public void removeNodeUnexpected(Node node) {
        nodes.remove(node);
    }
}

Test code populates eight nodes, distributes 100,000 keys, and prints the distribution map.

HashNodeService nodeService = new NormalHashNodeServiceImpl();
Node addNode1 = new Node("xingchuan.node1", "192.168.0.11");
Node addNode2 = new Node("xingchuan.node2", "192.168.0.12");
Node addNode3 = new Node("xingchuan.node3", "192.168.0.13");
Node addNode4 = new Node("xingchuan.node4", "192.168.0.14");
Node addNode5 = new Node("xingchuan.node5", "192.168.0.15");
Node addNode6 = new Node("xingchuan.node6", "192.168.0.16");
Node addNode7 = new Node("xingchuan.node7", "192.168.0.17");
Node addNode8 = new Node("xingchuan.node8", "192.168.0.18");
nodeService.addNode(addNode1);
nodeService.addNode(addNode2);
nodeService.addNode(addNode3);
nodeService.addNode(addNode4);
nodeService.addNode(addNode5);
nodeService.addNode(addNode6);
nodeService.addNode(addNode7);
nodeService.addNode(addNode8);

//用于检查数据分布情况
Map
countmap = new HashMap<>();
Node node = null;
for (int i = 1; i <= 100000; i++) {
    String key = String.valueOf(i);
    node = nodeService.lookupNode(key);
    node.cacheString(key, "TEST_VALUE");
    String k = node.getIp();
    Integer count = countmap.get(k);
    if (count == null) {
        count = 1;
        countmap.put(k, count);
    } else {
        count++;
        countmap.put(k, count);
    }
}
System.out.println("初始化数据分布情况:" + countmap);

The output shows an almost even distribution across the eight nodes, confirming the algorithm’s load‑balancing property.

Drawbacks of Modulo

When the number of nodes changes, the modulo operation causes massive remapping of keys, leading to severe cache invalidation and data migration.

2. Consistent Hashing

Consistent hashing maps both nodes and keys onto a logical ring (e.g., 0‑4294967296). A key is stored on the first node encountered clockwise from its hash value, reducing the amount of data that needs to move when nodes are added or removed.

Basic implementation (without virtual nodes):

@Override
public void addNode(Node node) {
    nodeList.add(node);
    long crcKey = hash(node.getIp());
    nodeMap.put(crcKey, node);
}

@Override
public Node lookupNode(String key) {
    long crcKey = hash(key);
    Node node = findValidNode(crcKey);
    if(node == null){
        return findValidNode(0);
    }
    return node;
}

private Node findValidNode(long crcKey) {
    //顺时针找到最近的一个节点
    Map.Entry
entry = nodeMap.ceilingEntry(crcKey);
    if(entry != null){
        return entry.getValue();
    }
    return null;
}

@Override
public Long hash(String key) {
    CRC32 crc = new CRC32();
    crc.update(key.getBytes());
    return crc.getValue();
}

Test code is analogous to the modulo test but uses ConsistentHashNodeServiceImpl . The results demonstrate a more stable cache hit rate after adding a node (≈0.97) compared with the modulo case (≈0.11).

HashNodeService nodeService = new ConsistentHashNodeServiceImpl();
// add the same eight nodes as before …
// run the same distribution and hit‑rate checks as in the modulo test

To further improve balance, virtual nodes are introduced. Each real node is represented by 128 virtual nodes on the ring, and removal of a real node also removes its virtual replicas. The article includes diagrams (omitted here) illustrating the ring and virtual node placement.

With virtual nodes, the distribution becomes much more uniform and the cache hit rate after adding a node remains high (≈0.91).

Summary

The article explains how modulo hashing provides simple load balancing but suffers from massive data reshuffling on topology changes. Consistent hashing mitigates this issue, and the addition of virtual nodes further smooths data distribution, making it a preferred strategy for distributed cache and storage systems.

Distributed SystemsjavaLoad Balancingvirtual nodesConsistent Hashmodulo hashing
vivo Internet Technology
Written by

vivo Internet Technology

Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.