Fundamentals 6 min read

Why Consistent Hashing Matters: Solving Cache Distribution and Scaling Issues

Consistent hashing replaces simple modulo‑based distribution to efficiently locate cached data across changing numbers of servers, using a hash ring and virtual nodes to ensure balanced load, minimize data movement, and improve reliability in distributed caching, load balancing, and database sharding scenarios.

Lobster Programming

Jun 11, 2024

Why Consistent Hashing Matters: Solving Cache Distribution and Scaling Issues

In distributed systems, consistent hashing is introduced to address the limitations of simple modulo‑based hashing for cache server placement.

Why consistent hashing is needed

When using three cache servers (A, B, C) and distributing data via load‑balancing, locating the cached data becomes inefficient because the client must poll all servers. Simple hash(key) % N also fails when the number of servers changes, causing massive data relocation and potential overload.

Consistent hashing maps both servers and data keys onto a 2^32 ring. A key’s hash determines its position on the ring, and the data is stored on the first server encountered clockwise. This allows quick location of data without scanning all servers.

However, without virtual nodes, adding or removing a server can still cause uneven data distribution and “hot spots”.

Virtual nodes

By assigning multiple virtual nodes to each physical server (e.g., hashing the server’s IP multiple times), the ring becomes more uniformly populated. When a server goes offline, only the keys that map to its virtual nodes need to be remapped to the next clockwise node, minimizing data movement.

Increasing the number of virtual nodes further smooths the distribution, preventing the “hash ring skew” where most keys fall into a small region and overload a single machine.

In practice, consistent hashing with virtual nodes is widely used for load balancing, distributed cache partitioning, and database sharding.

Key takeaways:

Consistent hashing solves cache placement and scaling problems in distributed environments.

Virtual nodes ensure even data distribution and reduce the impact of server changes.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Sharding Consistent Hashing distributed-caching virtual nodes

Written by

Lobster Programming

Sharing insights on technical analysis and exchange, making life better through technology.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.