Understanding Consistency Models and Distributed Consensus Protocols
This article explains the fundamentals of distributed consistency, covering weak and strong consistency, the CAP theorem, ACID and BASE models, and detailed overviews of 2PC, 3PC, Paxos, Raft, Gossip, NWR, Quorum, and Lease mechanisms, highlighting their trade‑offs and practical use cases.
Consistency in distributed systems is crucial and is divided into weak consistency and strong consistency . Most mainstream protocols adopt a special version of weak consistency called eventual consistency . The article starts from basic principles and then introduces several protocols that follow these principles.
Basic Principles and Theory
The CAP ( Consistency , Availability , Partition tolerance ) theorem states that a distributed system cannot simultaneously satisfy all three properties; it must choose two. In practice, P (partition tolerance) is a given, so systems balance C and A .
ACID ( Atomicity , Consistency , Isolation , Durability ) describes transaction properties with strong consistency, typically used in single‑node databases and classified as a CP system.
BASE ( Basically Available , Soft state , Eventually consistent ) summarizes the practice of large‑scale internet systems, using weak consistency to gain availability and is classified as an AP system.
2PC
Two‑Phase Commit (2PC) involves a coordinator and multiple participants . The process has two phases:
Commit request (voting phase) The coordinator sends the transaction to participants and asks if it can be committed. Participants execute the transaction locally, record undo and redo logs, and reply Yes or No .
Commit execution (execution phase) If all participants replied Yes , the coordinator sends a commit request; otherwise it sends a rollback request. Participants act on the request and acknowledge with an Ack message. The coordinator decides whether the transaction completes or aborts after receiving all acknowledgments.
2PC provides strong consistency and is widely used in relational databases (e.g., MySQL XA). However, it suffers from single‑point failure, blocking, and potential inconsistency if network issues prevent participants from receiving the final decision.
3PC
Three‑Phase Commit (3PC) improves on 2PC by adding an extra preparation step, reducing the blocking problem. It also uses a coordinator and participants, with three phases: canCommit , preCommit , and doCommit . If the coordinator fails, participants can still reach a decision by consulting each other.
3PC’s extra phase increases the chance of successful commits but can still lead to inconsistency under certain network partitions.
Paxos
Paxos is a complete distributed consensus algorithm with three roles: Proposer , Acceptor , and Learner . It proceeds in two main phases—Prepare and Accept—followed by a Learn phase where the chosen value is disseminated.
Paxos offers strong fault tolerance; as long as a majority of nodes are alive, the system can elect a leader and maintain consistency. It underlies systems like Google’s Chubby and ZooKeeper’s ZAB variant.
Raft
Raft is designed to be more understandable than Paxos while providing equivalent safety and performance. It defines three roles: Leader , Follower , and Candidate . Leaders handle client requests, replicate log entries to followers, and achieve consensus once a majority acknowledges.
Leader election, heartbeat, and log replication are described, including how candidates revert to followers if they discover a higher‑term leader.
Raft is used in projects such as CockroachDB and TiKV and is gaining adoption in many distributed databases.
Gossip
Gossip protocols differ from the previous algorithms by being fully decentralized—there is no leader. Each node periodically selects another node and exchanges state information, eventually converging to a consistent view.
While gossip eliminates single‑point failures, its high communication overhead and longer convergence times limit its practical use in many systems.
NWR Mechanism
The NWR model defines three parameters: N (number of replicas), W (minimum successful writes), and R (minimum successful reads). When W + R > N , reads are guaranteed to see the latest write, providing strong consistency; otherwise only eventual consistency is possible.
Quorum Mechanism
Quorum is a voting algorithm that ensures data redundancy and consistency. Each replica holds one vote; an operation must obtain a minimum number of read votes ( Vr ) or write votes ( Vw ) satisfying Vr + Vw > V and Vw > V/2 .
Lease Mechanism
In a lease system, a master assigns time‑limited leases to slaves. Clients can read from a slave while the lease is valid; after expiration they must query the master. Leases also help avoid split‑brain scenarios by ensuring that stale masters lose authority once their lease expires.
References
https://en.wikipedia.org/wiki/Two-phase_commit_protocol https://en.wikipedia.org/wiki/Three-phase_commit_protocol https://en.wikipedia.org/wiki/Paxos_(computer_science) https://raft.github.io/ https://en.wikipedia.org/wiki/Raft_(computer_science) https://lamport.azurewebsites.net/pubs/paxos-simple.pdf http://www.infoq.com/cn/articles/raft-paper https://en.wikipedia.org/wiki/Gossip_protocol …
Architect
Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.