Consistent Hashing and Jump Consistent Hash: Concepts, Implementation, and Use Cases
This article explains the fundamentals of consistent hashing, compares the classic ring‑based method with the more efficient jump consistent hash algorithm, provides reference implementations in C++, discusses their time‑complexity and practical trade‑offs, and shows how they are applied in systems such as Greenplum.
Consistent hashing is a crucial algorithm in distributed systems for smooth scaling and dynamic load balancing, minimizing the number of keys that need to be remapped when the number of buckets changes. The article first reviews the classic ring‑based consistent hashing scheme and then introduces its variant, jump consistent hash.
Ring‑Based Consistent Hashing
Originally proposed in 1997 by David Karger et al., the ring method maps the entire hash space onto a circular ring (typically [0, 2^32‑1]). Nodes (or physical servers) are placed on the ring by hashing their identifiers, and keys are stored on the first node encountered clockwise. Virtual (or shadow) nodes are used to improve balance, and the number of virtual nodes per physical node can be adjusted to control weight.
Complexity for a system with N buckets and K keys is:
Add/remove node – O(K/N + log N)
Add/remove key – O(log N)
To maintain availability during node addition/removal or failure, two simple strategies are suggested: forwarding the request to the next clockwise node (relay) or writing a duplicate copy to the next node (dual‑write).
Jump Consistent Hash
Introduced by Google in 2014 (John Lamping and Eric Veach), jump consistent hash achieves the same balance and monotonicity properties with a much simpler implementation—only five lines of code:
int32_t JumpConsistentHash(uint64_t key, int32_t num_buckets) {
int64_t b = 1, j = 0;
while (j < num_buckets) {
b = j;
key = key * 2862933555777941757ULL + 1;
j = (b + 1) * (double(1LL << 31) / double((key >> 33) + 1));
}
return b;
}The algorithm uses a linear‑congruential generator (LCG) to produce a pseudo‑random sequence that determines when a key jumps to a new bucket as the bucket count increases. By “jumping” over many intermediate bucket counts, the expected number of iterations becomes O(log n), which is asymptotically better than the O(n) of the naive approach.
Experimental results show that jump consistent hash yields a far smaller standard deviation in bucket load compared with the ring method, while requiring virtually no additional memory.
A notable limitation is that buckets can only be added or removed at the end of the bucket sequence; inserting or deleting in the middle would break the monotonicity property.
Availability during scaling can still be handled by relay (forwarding to the previous bucket) or dual‑write (writing to both the target bucket and its predecessor).
Application in Greenplum
Greenplum’s gpexpand tool originally performed a full reshuffle by converting hash distribution to random distribution, causing downtime and high data movement. Starting with GP v6, jump consistent hash was integrated, enabling online, high‑performance cluster expansion where only 1/4 of the data needs to be redistributed when expanding from three to four nodes.
GP v6 also adds a numsegments field to the catalog table gp_distribution_policy to indicate how many segments a table currently uses, ensuring queries continue to run on the original nodes until redistribution is complete.
The system can be switched back to the legacy modulo‑based hash via the gp_use_legacy_hashops configuration flag, which defaults to false.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
