Understanding Distributed Consistency: From CAP Theory to Raft and Beyond
This article explains the core principles of distributed consistency, covering CAP, ACID, BASE, and detailed walkthroughs of 2PC, 3PC, Paxos, Raft, Gossip, NWR, Quorum, and Lease mechanisms, highlighting their trade‑offs, failure scenarios, and practical usage in modern systems.
Preface
Consistency in distributed systems is crucial and can be classified as weak or strong. Most modern consistency protocols adopt a special form of weak consistency called eventual consistency. This article starts from basic principles of distributed systems and then organizes several protocols and mechanisms that follow these principles, aiming for a clear and understandable presentation.
Basic Principles and Theory
CAP theorem (Consistency, Availability, Partition tolerance) states that a distributed system cannot simultaneously satisfy all three properties; it can only achieve two of them. Partition tolerance is a basic requirement for any distributed system, so most designs balance consistency and availability under this constraint.
ACID (Atomicity, Consistency, Isolation, Durability) describes the properties of transactions with strong consistency, typically used in single‑node databases. Applying ACID to distributed transactions sacrifices availability and belongs to CP systems.
BASE (Basically Available, Soft state, Eventually consistent) summarizes the practice of large‑scale internet distributed systems, using weak consistency to gain availability. BASE belongs to AP systems.
Two‑Phase Commit (2PC)
2PC is a strong‑consistency algorithm used to ensure atomicity across multiple data copies. It involves a coordinator and participants and proceeds in two phases:
Commit request (voting phase)
Coordinator sends transaction details to participants and asks if they can commit.
Participants execute the transaction locally, recording undo/redo logs.
Participants reply Yes or No.
Commit execution (execution phase)
If all participants replied Yes, the coordinator sends a commit request; otherwise it sends a rollback request.
Participants perform commit or rollback and send an Ack to the coordinator.
Coordinator decides whether the transaction is completed or aborted after receiving all Acks.
2PC provides strong consistency but suffers from single‑point failure of the coordinator, blocking due to synchronous waits, and possible data inconsistency if commit/abort messages are lost. Optimizations such as timeout detection and mutual inquiry can mitigate but not eliminate blocking.
Three‑Phase Commit (3PC)
3PC improves on 2PC by adding an extra preparation phase, similar to TCP’s three‑way handshake, increasing the chance of successful transaction execution and eliminating the blocking problem when the coordinator fails. However, it may still lead to data inconsistency.
Paxos
Paxos is a complete distributed consensus algorithm with roles Proposer, Acceptor, and Learner. It proceeds in two phases:
Prepare phase
Proposer selects a proposal number M and sends a Prepare request to a majority of Acceptors.
Acceptor replies with the highest proposal number N it has seen if M > N, and promises not to accept proposals numbered less than M.
Accept phase
If the proposer receives responses from a majority, it sends an Accept request for [M, V] where V is the highest value among responses.
Acceptor accepts the proposal if it has not already promised a higher number.
Learn phase (outside the selection process)
Proposer disseminates the chosen proposal to all Learners.
Paxos tolerates failures as long as a majority of nodes remain operational, making it suitable for building consistent state machines. Google’s Chubby lock service uses Paxos, while ZooKeeper implements a Paxos‑derived ZAB protocol.
Raft
Raft targets the same problem space as Paxos but is designed to be easier to understand and implement. It defines three roles: Leader, Follower, and Candidate.
All nodes start as Followers; a leader is elected, receives client requests, and replicates log entries to Followers. Once a majority acknowledges the entry, the leader commits it, achieving eventual consistency.
The leader periodically sends heartbeats to Followers. If a follower misses heartbeats, it triggers a new election by becoming a Candidate, incrementing its term, voting for itself, and requesting votes from other nodes. The election ends when the candidate wins, another node becomes leader, or a timeout occurs.
Raft is used in projects such as CockroachDB and TiKV, and many large‑scale distributed databases adopt it.
Gossip
Gossip is a decentralized protocol where no single leader coordinates the process. Each node stores a list of key‑value‑version triples and periodically selects a peer to exchange state, gradually converging to a consistent view. Nodes can join or leave easily, but the approach can suffer from high communication overhead and long convergence times.
NWR Mechanism
N, W, and R denote the number of replicas, the minimum number of successful writes, and the minimum number of successful reads, respectively. When W + R > N, reads and writes intersect, guaranteeing strong consistency. When W + R ≤ N, strong consistency cannot be assured.
Version ordering typically relies on version‑control algorithms such as vector clocks. Large values of W or R increase latency.
Quorum Mechanism
Quorum (often called NWR) assigns one vote to each replica. An operation must obtain at least Vr read votes or Vw write votes. The rules Vr + Vw > V and Vw > V/2 ensure that reads and writes do not overlap and that writes are serialized.
Lease Mechanism
A master assigns data to slaves with a lease time (e.g., one hour). Clients can read from a slave within the lease; after expiration they must query the master. Leases help balance availability and consistency and can resolve split‑brain scenarios by allowing the old master’s leases to expire, forcing it to step down.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Interview Crash Guide
Dedicated to sharing Java interview Q&A; follow and reply "java" to receive a free premium Java interview guide.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
