Databases 19 min read

How Amazon Dynamo Ensures High Availability with Consistent Hashing

The article explains Amazon’s Dynamo distributed key‑value store architecture, covering its decentralized design, consistent hashing with virtual nodes, replication strategies, quorum‑based read/write parameters, vector‑clock versioning, failure detection, hinted handoff, scaling mechanisms, and how these techniques together provide high availability, scalability, and reliability.

21CTO
21CTO
21CTO
How Amazon Dynamo Ensures High Availability with Consistent Hashing

1. System Overview

1.1 Amazon Platform Overview

Amazon platform is a service‑oriented architecture composed of hundreds of services, following principles of high decentralization, loose coupling, and full distribution. Architecture shown in Figure 1.

Figure 1 Amazon system architecture

1.2 Dynamo Overview

Dynamo is Amazon’s highly available distributed key‑value storage system that satisfies scalability, availability, and reliability. It meets the CAP theorem by using consistent hashing for partition tolerance, replication for availability, and vector clocks for consistency, allowing trade‑offs via the NWR model.

Dynamo has three conceptual layers:

Key‑Value: The key uniquely identifies a data object; the value holds the object’s entity. Operations are performed by key.

Node: A physical host containing three essential components—request coordinator, membership & failure detection, and a local persistence engine (implemented in Java). The engine supports various storage back‑ends, primarily Berkeley DB Transactional Data Store (BDB), with options like BDB Java Edition, MySQL, or in‑memory cache. In production, Dynamo typically uses BDB.

Instance: From an application perspective, a service composed of a set of nodes, possibly spread across different data centers, providing fault tolerance and reliability.

2. Background Conditions

2.1 System Assumptions and Requirements

(1) Query model: Key‑Value, not relational SQL; objects are usually smaller than 1 MB.

(2) ACID properties: Traditional relational databases use ACID but sacrifice availability. Dynamo adopts weak consistency (C) to achieve high availability, does not provide isolation (I), and allows only single‑key updates.

(3) Efficiency: Operates on inexpensive machines while meeting SLA latency and throughput, requiring trade‑offs among performance, cost, availability, and durability.

(4) Other assumptions: Dynamo is used internally within Amazon, assuming a trusted environment.

2.2 Service Level Agreement (SLA)

The SLA defines agreed metrics such as request rate and expected latency; for example, 99.9 % of responses must be under 300 ms at a peak load of 500 requests per second. Dynamo uses the 99.9 th percentile instead of average/median to ensure a good experience for most clients.

2.3 Design Considerations (Data Replication)

Traditional replication sacrifices availability for consistency during failures. Dynamo instead aims for “always writable” by performing conflict detection and resolution during reads, using optimistic replication and final consistency.

Coordination timing: conflicts are resolved either at write time or read time; Dynamo prefers read‑time coordination to avoid rejecting client writes.

Coordinator: Conflict resolution can be handled by the storage system (e.g., last‑write‑wins) or by the client application, which Dynamo chooses to allow application‑level merging.

3. Key Technologies

3.1 Data Partitioning

Hash algorithm: MD5 hashes the key to a 128‑bit identifier, determining the storage node. Dynamo uses consistent hashing to achieve incremental scalability. Virtual nodes are introduced to address uneven data distribution and heterogeneous node performance.

Figure 2 Partitioning and key replication on the Dynamo ring

3.2 Data Replication

Each data item is replicated to N nodes (default N = 3). A coordinator node stores the primary copy and forwards replicas to the next N‑1 clockwise nodes. The (N, R, W) quorum model is used, with Amazon recommending (3, 2, 2). R + W > N yields quorum‑like consistency; R and W are tuned to balance latency, availability, and consistency.

3.3 Version Merging

Multiple replicas may diverge; Dynamo stores each version with a vector clock. Vector clocks capture causal relationships; when concurrent versions are detected, the client merges them (e.g., shopping‑cart use case). Dynamo truncates vector clocks when they exceed a threshold (e.g., 10 entries) by removing the oldest entries.

Figure 3 Object version evolution over time

3.4 Failure Detection

Ring Membership: Nodes persist their position on the ring and gossip membership information every second.

External Discovery: Seed nodes help new nodes discover the ring to avoid split‑brain scenarios.

Failure Detection: A gossip‑style protocol monitors node liveness.

3.5 Failure Handling

Sloppy Quorum and Hinted Handoff ensure writes succeed even when some replicas are down, storing hints on healthy nodes and replaying them when the failed node recovers. Anti‑entropy using Merkle trees synchronizes replicas.

3.6 Scaling (Add/Remove Nodes)

When a new node joins, it receives token ranges and other nodes transfer the corresponding keys; removal reverses the process.

3.7 Read/Write Operations

Requests are coordinated by a request coordinator component, which creates a state machine handling identification of responsible nodes, sending the request, awaiting responses, possible retries, and packaging the response. Reads wait for the minimum number of responses (R); writes wait for W acknowledgments. Read‑repair may update stale replicas.

4. Problem Solving

4.1 Availability

Fully decentralized, no single point of failure, always writable.

4.2 Scalability

Consistent hashing with virtual nodes enables seamless scaling.

4.3 Reliability

Multiple replicas and vector‑clock version merging ensure data durability.

4.4 Configurability

The (N, W, R) model allows tuning between availability and consistency; recommended settings are (3, 2, 2).

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

ReplicationConsistencykey-value storevector clocks
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.