Fundamentals 6 min read

How Consistent Hashing Eliminates Data Migration in Scalable Systems

This article explains the principles of consistent hashing, its classic design with virtual nodes, the shortcomings of the original algorithm, and how Amazon's Dynamo and other extensions improve load balancing and reduce data movement in distributed environments.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
How Consistent Hashing Eliminates Data Migration in Scalable Systems

1. Introduction

Consistent Hashing, proposed by Karger at MIT in 1997, addresses service oscillation caused by server failures and scaling in distributed web systems. The algorithm has been widely applied and further developed.

2. Algorithm Design

1. Problem Scenario

Consider a service with six servers, each storing one‑sixth of the data. When Server1 fails, the service becomes unavailable until data is rehashed across the remaining five servers, causing large data migration and downtime.

2. Classic Consistent Hashing Algorithm

Karger introduced the concept of “virtual nodes”. All data is mapped to a set of virtual nodes that are then mapped to real servers. When a server fails, only the virtual nodes assigned to that server need to be remapped, avoiding a full rehash.

In the classic algorithm, the next real node after a failed server provides the service.

3. Algorithm Improvements

1. Issues with Classic Algorithm

The classic approach still has drawbacks:

When Server1 fails, Server2 must handle double the load, leading to imbalance if Server1 is permanently removed.

If all servers can handle double load, data could be written redundantly to primary‑backup pairs, eliminating the need for migration during failures.

2. Dynamo’s Practical Enhancements

Amazon’s Dynamo uses consistent hashing but stores the mapping of virtual nodes to real servers in a configuration system. When a virtual node becomes unavailable, it is reassigned to another server, reducing data movement and keeping load balanced.

Example mapping (illustrative): virtual nodes 0‑4/5 map to Server0, 10‑14/6 map to Server2, 15‑19/7 map to Server3, 20‑24/8 map to Server4, 24‑29/9 map to Server5.

4. Algorithm Extensions

Consistent hashing can be extended to address load balancing and optimal access strategies in distributed systems. Real‑world factors such as hardware differences, network bandwidth, ISP variations, and attacks cause heterogeneous performance, so dynamic virtual‑node mapping helps achieve optimal service.

For a system with two nodes, each containing three servers, administrators can adjust virtual‑node to server mappings based on observed response rates or latency.

5. References

(1) Consistent hashing (wiki)

(2) Consistent hashing

(3) Dynamo: Amazon’s Highly Available Key‑value Store

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

load balancingconsistent hashingvirtual nodesDynamo
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.