How Consistent Hashing Powers Distributed Memcached Caching

This article explains how memcached’s client‑side distribution works, illustrates the role of consistent hashing in assigning keys to servers, compares traditional modulo hashing with consistent hashing, discusses hash function choices, and provides complete Java and Python implementations for a scalable distributed cache.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
How Consistent Hashing Powers Distributed Memcached Caching

Background Introduction

Memcached is often called a “distributed” cache server, but the distribution logic resides entirely in the client library. The server only provides simple in‑memory storage; the client decides which server stores a given key based on the key’s hash and the known server list.

For example, with three servers (node1‑node3) and keys “tokyo”, “kanagawa”, “chiba”, “saitama”, “gunma”, the client hashes each key and routes the data to the appropriate server.

When adding a key, the client selects a server, stores the key/value pair, and later retrieves it by hashing the same key to locate the original server.

Basic Principles

Consistent Hashing maps each memcached node to a point on a 0‑2^32 ring, then maps each key to the ring and stores the key on the first node encountered clockwise. If the ring wraps past the maximum, the key is stored on the first node.

Adding a new server only moves keys that fall between the new server’s position and the next server counter‑clockwise, minimizing cache miss.

Virtual nodes are often used: each physical server is represented by many points on the ring (e.g., 100‑200) to achieve a more uniform distribution.

The hit‑rate after adding m servers to an existing n -server cluster can be approximated by (1 - n/(n+m)) * 100 %.

Hash Function Selection

Although the official memcached release does not include Consistent Hashing, third‑party libraries such as libketama (originating from last.fm) provide implementations. The Java version typically uses an MD5‑based hash.

/**
 * Calculates the ketama hash value for a string
 * @param s
 * @return
 */
public static Long md5HashingAlg(String key) {
    if(md5==null) {
        try {
            md5 = MessageDigest.getInstance("MD5");
        } catch (NoSuchAlgorithmException e) {
            log.error("++++ no md5 algorithm found");
            throw new IllegalStateException("++++ no md5 algorithm found");
        }
    }
    md5.reset();
    md5.update(key.getBytes());
    byte[] bKey = md5.digest();
    long res = ((long)(bKey[3]&0xFF) << 24) |
               ((long)(bKey[2]&0xFF) << 16) |
               ((long)(bKey[1]&0xFF) << 8) |
               (long)(bKey[0]&0xFF);
    return res;
}

Implementation in Java

The Java implementation focuses on two aspects: handling virtual nodes and returning the first matching node during lookup.

import java.util.Collection;
import java.util.SortedMap;
import java.util.TreeMap;

public class ConsistentHash<T> {
    private final HashFunction hashFunction;
    private final int numberOfReplicas;
    private final SortedMap<Integer, T> circle = new TreeMap<Integer, T>();

    public ConsistentHash(HashFunction hashFunction,
                          int numberOfReplicas, Collection<T> nodes) {
        this.hashFunction = hashFunction;
        this.numberOfReplicas = numberOfReplicas;
        for (T node : nodes) {
            add(node);
        }
    }

    public void add(T node) {
        for (int i = 0; i < numberOfReplicas; i++) {
            circle.put(hashFunction.hash(node.toString() + ":" + i), node);
        }
    }

    public void remove(T node) {
        for (int i = 0; i < numberOfReplicas; i++) {
            circle.remove(hashFunction.hash(node.toString() + ":" + i));
        }
    }

    public T get(Object key) {
        if (circle.isEmpty()) {
            return null;
        }
        int hash = hashFunction.hash(key);
        SortedMap<Integer, T> tailMap = circle.tailMap(hash);
        hash = tailMap.isEmpty() ? circle.firstKey() : tailMap.firstKey();
        return circle.get(hash);
    }
}

Implementation in Python

The Python version provides a similar ring with virtual nodes and lookup methods.

import md5

class HashRing(object):
    def __init__(self, nodes=None, replicas=3):
        """Manages a hash ring."""
        self.replicas = replicas
        self.ring = dict()
        self._sorted_keys = []
        if nodes:
            for node in nodes:
                self.add_node(node)

    def add_node(self, node):
        """Adds a node to the hash ring (including replicas)."""
        for i in xrange(0, self.replicas):
            key = self.gen_key('%s:%s' % (node, i))
            self.ring[key] = node
            self._sorted_keys.append(key)
        self._sorted_keys.sort()

    def remove_node(self, node):
        """Removes a node from the hash ring and its replicas."""
        for i in xrange(0, self.replicas):
            key = self.gen_key('%s:%s' % (node, i))
            del self.ring[key]
            self._sorted_keys.remove(key)

    def get_node(self, string_key):
        """Returns the node responsible for the given key."""
        return self.get_node_pos(string_key)[0]

    def get_node_pos(self, string_key):
        """Returns the node and its position on the ring."""
        if not self.ring:
            return None, None
        key = self.gen_key(string_key)
        nodes = self._sorted_keys
        for i in xrange(0, len(nodes)):
            node = nodes[i]
            if key <= node:
                return self.ring[node], i
        return self.ring[nodes[0]], 0

    def get_nodes(self, string_key):
        """Generator yielding nodes that can hold the key."""
        if not self.ring:
            yield None, None
        node, pos = self.get_node_pos(string_key)
        for key in self._sorted_keys[pos:]:
            yield self.ring[key]
        while True:
            for key in self._sorted_keys:
                yield self.ring[key]

    def gen_key(self, key):
        """Generates a long hash value using MD5."""
        m = md5.new()
        m.update(key)
        return long(m.hexdigest(), 16)
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Backendconsistent hashingdistributed cachingMemcached
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.