Databases 9 min read

How Dynamo Achieves High‑Availability in Distributed Key‑Value Stores

This article explains Dynamo, the decentralized key‑value storage system, covering its design goals, consistent‑hashing partitioning with virtual nodes, replication strategies, quorum‑based consistency, conflict resolution with vector clocks, hinted handoff, Merkle‑tree synchronization, and gossip‑based failure detection.

360 Zhihui Cloud Developer

Apr 24, 2018

How Dynamo Achieves High‑Availability in Distributed Key‑Value Stores

Introduction

Dynamo is a decentralized key‑value storage system designed for high availability, providing read‑write service at any time. Since 2007 it has received over 4,000 citations and is regarded as a classic paper in distributed storage.

Positioning

Simple key‑value interface, best for small objects (usually < 1 MB).

Targets workloads with weak consistency requirements; does not support transactions.

Can satisfy 99.9% of strong‑latency requests.

Does not address security concerns.

Overall Goal

The paper focuses on achieving always‑available storage despite network partitions or server failures, solving problems such as partitioning, high‑availability writes, temporary failures, replica synchronization, and failure detection.

Partitioning

Dynamo uses consistent hashing to enable scalable partitioning, but to mitigate load imbalance it introduces virtual nodes.

Virtual nodes map each physical node to multiple positions on the hash ring, balancing the load across nodes.

The paper discusses a sharding strategy (Strategy 3) where the key space is evenly divided and each node holds a fixed number of partitions, allowing flexible reassignment when nodes join or leave.

High Availability for Writes

Dynamo writes each key to multiple replicas. The coordinator node for a key selects the next N nodes clockwise on the ring as replica nodes (e.g., N=3).

Replica Consistency

Read and write operations use get() and set() with a quorum‑like protocol. The parameters are:

R – minimum number of nodes that must respond to a read.

W – minimum number of nodes that must acknowledge a write.

N – total number of replicas.

Read/Write Process

The client first selects a coordinator from the preference list, which prefers physical nodes over virtual ones. Reads contact R‑1 additional nodes; writes wait for acknowledgments from W‑1 other replicas.

Conflicts are resolved using vector clocks.

Example: client A writes D1, then D2 (overwrites D1). After node Sx fails, client A writes D3 to Sy; concurrently client B writes D4 to Sz, causing divergence. Clients read both versions and resolve the conflict by writing a merged value back to the appropriate node.

Temporary Failures

For short‑term node outages Dynamo uses hinted handoff: the request is temporarily stored on another node (B) and later transferred to the original coordinator (A) when it recovers.

Replica Synchronization

When nodes leave permanently, Dynamo synchronizes replicas using Merkle trees to efficiently identify differing data ranges, though this incurs recomputation overhead when nodes join or leave.

Failure Detection

Dynamo employs a gossip protocol where each node maintains membership information for the entire ring and disseminates updates through periodic gossip exchanges.

Conclusion

Dynamo provides comprehensive solutions for data distribution, fault tolerance, recovery, and metadata propagation in a decentralized storage system. Although its source code is not public, the design has proven robust over many years and remains a valuable textbook example of distributed storage.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

High Availability replication Distributed storage Consistent Hashing gossip protocol Dynamo vector clock

Written by

360 Zhihui Cloud Developer

360 Zhihui Cloud is an enterprise open service platform that aims to "aggregate data value and empower an intelligent future," leveraging 360's extensive product and technology resources to deliver platform services to customers.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.