Fundamentals 15 min read

Understanding Distributed Consistency: 2PC, 3PC, Paxos, Raft, Gossip & More

This article explains the fundamentals of distributed system consistency, covering weak and strong consistency, the CAP theorem, ACID vs BASE, and detailed overviews of protocols such as 2PC, 3PC, Paxos, Raft, Gossip, NWR, Quorum, and Lease mechanisms.

ITFLY8 Architecture Home
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Understanding Distributed Consistency: 2PC, 3PC, Paxos, Raft, Gossip & More

1. Introduction

Consistency in distributed systems is crucial and can be classified as weak or strong. Most modern systems adopt a special form of weak consistency called eventual consistency. The following sections discuss basic principles and several protocols that implement these principles.

2. Basic Principles and Theory

CAP theorem states that a distributed system cannot simultaneously guarantee Consistency, Availability, and Partition tolerance; it can only satisfy two of them. Partition tolerance is mandatory, so systems usually balance Consistency and Availability.

ACID (Atomicity, Consistency, Isolation, Durability) describes strong consistency for single‑node transactions; applying it to distributed transactions leads to CP systems.

BASE (Basically Available, Soft state, Eventually consistent) is the counterpart for large‑scale internet systems, representing AP systems.

3. 2PC (Two‑Phase Commit)

2PC involves a coordinator and participants and proceeds in two phases:

Prepare phase (vote):

Coordinator sends transaction request to participants.

Participants execute the operation, record undo/redo logs, and reply Yes or No.

Commit phase (execute):

If all participants replied Yes, the coordinator sends a commit request; otherwise, it sends a rollback.

Participants commit or rollback and acknowledge the coordinator.

2PC provides strong consistency but suffers from single‑point failure, blocking, and possible inconsistency due to network issues. Optimizations include timeout handling and mutual inquiry, though blocking cannot be fully eliminated.

4. 3PC (Three‑Phase Commit)

3PC improves on 2PC by adding an extra canCommit phase, reducing the chance of blocking. The three phases are:

canCommit (transaction inquiry): coordinator asks participants if they can commit.

preCommit: if all answer Yes, coordinator sends preCommit; participants log undo/redo and acknowledge.

doCommit: coordinator sends doCommit after receiving all acks; participants finalize or abort accordingly.

An image illustrating 3PC:

3PC diagram
3PC diagram

3PC reduces blocking by allowing participants to continue after coordinator failure, but may still lead to inconsistency.

5. Paxos

Paxos is a complete distributed consensus algorithm with roles: Proposer, Acceptor, and Learner. It proceeds in two phases:

Prepare: Proposer selects a proposal number M and sends Prepare to a majority of Acceptors. Acceptors respond with the highest numbered proposal they have seen (N) if M > N, promising not to accept lower numbers.

Accept: If the proposer receives a majority of Prepare responses, it sends an Accept request for [M, V] where V is the highest value received (or any value if none).

After acceptance, the Learner synchronizes the chosen value. Paxos underlies systems like Google’s Chubby and ZooKeeper’s ZAB.

An image of Paxos:

Paxos diagram
Paxos diagram

6. Raft

Raft offers the same safety and liveness guarantees as Paxos but with a more understandable design. Roles include Leader, Follower, and Candidate.

All nodes start as Followers; a leader is elected. The leader receives client requests, replicates log entries to Followers, and commits once a majority acknowledges, achieving eventual consistency.

Leaders send periodic heartbeats; if Followers miss heartbeats, they become Candidates and start a new election. The election succeeds when a candidate obtains a majority of votes.

Raft is used in projects such as CockroachDB and TiKV.

An animated illustration of Raft:

Raft animation
Raft animation

7. Gossip

Gossip is a decentralized protocol where every node is equal; there is no central coordinator. Each node stores a list of key‑value‑version entries and periodically selects another node to exchange state, eventually converging to a consistent view.

While gossip eliminates single‑point failures, it incurs higher communication overhead and longer convergence times, limiting its practical adoption.

An illustration of Gossip:

Gossip diagram
Gossip diagram

8. NWR Mechanism

NWR defines the numbers of replicas (N), required successful writes (W), and required successful reads (R). When W+R > N, reads are guaranteed to see the latest write (strong consistency). If W+R ≤ N, strong consistency cannot be guaranteed.

Version ordering is typically handled by algorithms such as vector clocks. Large W or R values increase latency.

9. Quorum Mechanism

Quorum is essentially the NWR mechanism expressed as voting rules:

Vr + Vw > V

Vw > V/2

These rules ensure that reads and writes intersect (preventing simultaneous read/write conflicts) and that writes are serialized.

10. Lease Mechanism

In a lease system, a master assigns time‑limited leases to slaves. Clients can read from a slave while its lease is valid; after expiration they must query the master. Leases help avoid split‑brain scenarios by ensuring that stale masters lose authority once their lease expires.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Distributed Systems2PCConsistencyRaftconsensus algorithms
ITFLY8 Architecture Home
Written by

ITFLY8 Architecture Home

ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.