How Amazon Dynamo Ensures High Availability with Consistent Hashing
The article explains Amazon’s Dynamo distributed key‑value store architecture, covering its decentralized design, consistent hashing with virtual nodes, replication strategies, quorum‑based read/write parameters, vector‑clock versioning, failure detection, hinted handoff, scaling mechanisms, and how these techniques together provide high availability, scalability, and reliability.
1. System Overview
1.1 Amazon Platform Overview
Amazon platform is a service‑oriented architecture composed of hundreds of services, following principles of high decentralization, loose coupling, and full distribution. Architecture shown in Figure 1.
Figure 1 Amazon system architecture
1.2 Dynamo Overview
Dynamo is Amazon’s highly available distributed key‑value storage system that satisfies scalability, availability, and reliability. It meets the CAP theorem by using consistent hashing for partition tolerance, replication for availability, and vector clocks for consistency, allowing trade‑offs via the NWR model.
Dynamo has three conceptual layers:
Key‑Value: The key uniquely identifies a data object; the value holds the object’s entity. Operations are performed by key.
Node: A physical host containing three essential components—request coordinator, membership & failure detection, and a local persistence engine (implemented in Java). The engine supports various storage back‑ends, primarily Berkeley DB Transactional Data Store (BDB), with options like BDB Java Edition, MySQL, or in‑memory cache. In production, Dynamo typically uses BDB.
Instance: From an application perspective, a service composed of a set of nodes, possibly spread across different data centers, providing fault tolerance and reliability.
2. Background Conditions
2.1 System Assumptions and Requirements
(1) Query model: Key‑Value, not relational SQL; objects are usually smaller than 1 MB.
(2) ACID properties: Traditional relational databases use ACID but sacrifice availability. Dynamo adopts weak consistency (C) to achieve high availability, does not provide isolation (I), and allows only single‑key updates.
(3) Efficiency: Operates on inexpensive machines while meeting SLA latency and throughput, requiring trade‑offs among performance, cost, availability, and durability.
(4) Other assumptions: Dynamo is used internally within Amazon, assuming a trusted environment.
2.2 Service Level Agreement (SLA)
The SLA defines agreed metrics such as request rate and expected latency; for example, 99.9 % of responses must be under 300 ms at a peak load of 500 requests per second. Dynamo uses the 99.9 th percentile instead of average/median to ensure a good experience for most clients.
2.3 Design Considerations (Data Replication)
Traditional replication sacrifices availability for consistency during failures. Dynamo instead aims for “always writable” by performing conflict detection and resolution during reads, using optimistic replication and final consistency.
Coordination timing: conflicts are resolved either at write time or read time; Dynamo prefers read‑time coordination to avoid rejecting client writes.
Coordinator: Conflict resolution can be handled by the storage system (e.g., last‑write‑wins) or by the client application, which Dynamo chooses to allow application‑level merging.
3. Key Technologies
3.1 Data Partitioning
Hash algorithm: MD5 hashes the key to a 128‑bit identifier, determining the storage node. Dynamo uses consistent hashing to achieve incremental scalability. Virtual nodes are introduced to address uneven data distribution and heterogeneous node performance.
Figure 2 Partitioning and key replication on the Dynamo ring
3.2 Data Replication
Each data item is replicated to N nodes (default N = 3). A coordinator node stores the primary copy and forwards replicas to the next N‑1 clockwise nodes. The (N, R, W) quorum model is used, with Amazon recommending (3, 2, 2). R + W > N yields quorum‑like consistency; R and W are tuned to balance latency, availability, and consistency.
3.3 Version Merging
Multiple replicas may diverge; Dynamo stores each version with a vector clock. Vector clocks capture causal relationships; when concurrent versions are detected, the client merges them (e.g., shopping‑cart use case). Dynamo truncates vector clocks when they exceed a threshold (e.g., 10 entries) by removing the oldest entries.
Figure 3 Object version evolution over time
3.4 Failure Detection
Ring Membership: Nodes persist their position on the ring and gossip membership information every second.
External Discovery: Seed nodes help new nodes discover the ring to avoid split‑brain scenarios.
Failure Detection: A gossip‑style protocol monitors node liveness.
3.5 Failure Handling
Sloppy Quorum and Hinted Handoff ensure writes succeed even when some replicas are down, storing hints on healthy nodes and replaying them when the failed node recovers. Anti‑entropy using Merkle trees synchronizes replicas.
3.6 Scaling (Add/Remove Nodes)
When a new node joins, it receives token ranges and other nodes transfer the corresponding keys; removal reverses the process.
3.7 Read/Write Operations
Requests are coordinated by a request coordinator component, which creates a state machine handling identification of responsible nodes, sending the request, awaiting responses, possible retries, and packaging the response. Reads wait for the minimum number of responses (R); writes wait for W acknowledgments. Read‑repair may update stale replicas.
4. Problem Solving
4.1 Availability
Fully decentralized, no single point of failure, always writable.
4.2 Scalability
Consistent hashing with virtual nodes enables seamless scaling.
4.3 Reliability
Multiple replicas and vector‑clock version merging ensure data durability.
4.4 Configurability
The (N, W, R) model allows tuning between availability and consistency; recommended settings are (3, 2, 2).
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
