Fundamentals 12 min read

Understanding Consistent Hashing: Principles, Design, and Real-World Applications

This article explains the fundamentals of hash functions, outlines the key characteristics of a good hash algorithm, and dives deep into consistent hashing—its background, mechanism, desirable properties, fault tolerance, scalability, and the use of virtual nodes to solve data skew in distributed systems.

Architect's Alchemy Furnace

Jul 19, 2020

Understanding Consistent Hashing: Principles, Design, and Real-World Applications

Hash Algorithm Overview

Hash (散列) transforms arbitrary‑length input into a fixed‑length output called a hash value, which is a compression mapping; it improves storage utilization, query efficiency, and can be used for digital signatures, making it widely used in internet applications.

What Is a Hash Algorithm?

A hash algorithm is a concept rather than a fixed implementation; any algorithm that follows the hashing idea—converting data into a marker closely related to each byte—qualifies. Simple modulo operations are basic hash examples.

Key Characteristics of a Good Hash Algorithm

Forward fast: can compute the hash value quickly with limited resources.

Reverse difficult: infeasible to recover the original data from the hash.

Collision avoidance: hard to find two different inputs that produce the same hash.

Consistent Hashing Background

Proposed in 1997 by Karger et al. at MIT to solve hotspot problems in distributed caches, similar to CARP. Consistent hashing modifies simple hash to make DHT usable in P2P environments.

Consistent Hashing Mechanism

Hash each server node and place its value on a 0‑2^32 ring.

Hash each data key and map it onto the same ring.

Traverse the ring clockwise from the data point; the first server encountered stores the data.

Desired Properties of Consistent Hashing

Balance : hash results distribute evenly across buffers.

Monotonicity : when a new node joins, existing allocations remain on their current nodes unless they fall between the new node and its predecessor.

Spread : minimizes inconsistent mappings across different clients.

Load : reduces uneven load caused by data skew.

Smoothness : server count changes smoothly with object placement.

Fault Tolerance and Scalability

When a node fails, only the data that was mapped between the failed node and its predecessor need to be relocated, preserving most data.

Data Skew Problem and Virtual Nodes

Real‑world hash functions can produce clustered values, leading to uneven data distribution. Introducing virtual nodes—multiple hash positions per physical server—balances the load. Typically dozens of virtual nodes per server achieve near‑uniform distribution.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

distributed systems Scalability Load Balancing hashing consistent hashing virtual nodes

Written by

Architect's Alchemy Furnace

A comprehensive platform that combines Java development and architecture design, guaranteeing 100% original content. We explore the essence and philosophy of architecture and provide professional technical articles for aspiring architects.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.