Understanding Consistency Algorithms: Paxos, Raft, ZAB, and Gossip
This article explains why data consistency is essential in distributed systems, defines consistency, compares strong and eventual consistency, and details the design and operation of major algorithms such as Paxos, Multi‑Paxos, Raft, ZAB, and Gossip with illustrative examples and diagrams.
Why Consistency Is Needed
Data must not reside on a single node (host) to avoid single‑point failures.
Multiple nodes need to hold identical data.
Consistency algorithms are created to solve these two problems.
Definition of Consistency Algorithm
Consistency means keeping data identical; in a distributed system it means the values stored on all nodes are the same.
Classification of Consistency
Strong Consistency
The system guarantees that once a change is committed, the cluster state changes immediately.
Paxos
Raft (multi‑Paxos)
ZAB (multi‑Paxos)
Weak Consistency (Eventual Consistency)
The system does not guarantee immediate state change after a commit, but the state will eventually become consistent over time.
DNS systems
Gossip protocol
Examples of Consistency Algorithm Implementations
Google's Chubby distributed lock service, which uses Paxos.
etcd distributed key‑value store, which uses Raft.
ZooKeeper coordination service, an open‑source implementation of Chubby, which uses ZAB.
Paxos Algorithm
Concept Introduction
Proposal: a modification request in the distributed system, represented as [proposal number N, value].
Client: the entity that submits proposals.
Proposer: forwards proposals from the client.
Acceptor: votes on proposals; it rejects any proposal with a number less than or equal to one it has already accepted.
Learner: records proposals that have been accepted.
Basic Paxos Algorithm
Steps:
Proposer prepares a proposal with number N.
Proposer queries a majority of Acceptors to see if they have already accepted N; if any have, the proposal is discarded.
Acceptors vote; they unconditionally accept a proposal they have never seen before, and once a majority agrees, the process proceeds.
Learner records the accepted proposal.
Node Failures
If the Proposer fails, another node can be elected as Proposer.
If an Acceptor fails, the algorithm still works as long as a majority can be reached.
Potential Issue – Live‑Lock
If multiple Proposers continuously send proposals before previous ones achieve a majority, Acceptors may keep abandoning current proposals for newer ones, preventing any proposal from being committed.
Multi‑Paxos Algorithm
Improvement over Basic Paxos: the system has a single Proposer, called the Leader.
Steps:
If there is no Leader, the cluster elects one and declares it the current Leader (term M).
Acceptors only vote on proposals issued by the latest Leader.
Other steps are the same as Basic Paxos.
Algorithm Optimization
In Multi‑Paxos the roles are many; for a computer cluster the Proposer, Acceptor, and Learner can be consolidated on a single node, leaving the rest as Acceptors/Learners. This leads to the Raft algorithm.
Raft Algorithm
Paxos is hard to implement; Raft simplifies and improves upon it.
Concept Introduction
Leader: the node that issues proposals.
Follower: nodes that agree with the Leader's proposals.
Candidate: nodes that compete to become Leader.
Raft splits the consistency problem into two sub‑problems: Leader election and state replication .
Leader Election
Each Follower runs a timer.
When the timer expires and no Leader exists, the Follower becomes a Candidate and starts an election, sending vote requests to other nodes.
Other nodes vote for the Candidate.
The Candidate that receives a majority becomes the new Leader (term M).
The Leader continuously sends heartbeats to prove it is alive; Followers reset their timers upon receiving a heartbeat.
If the Leader fails, Followers become Candidates and start a new election.
If two Candidates receive the same number of votes, they wait a random delay before retrying, avoiding a tie.
State Replication
The Leader receives proposals from Clients (unconfirmed proposals are shown in red).
The proposal is included in the Leader's next heartbeat.
Followers reply to the heartbeat.
After receiving replies from a majority of Followers, the Leader commits the proposal, writes it to its storage, and replies to the Client.
The Leader notifies Followers to commit the proposal, ensuring all nodes store identical data.
If a network partition occurs, multiple Leaders may appear; the minority partition cannot achieve consensus (split‑brain).
When the partition heals, only the latest term Leader remains active; older Leaders step down to Followers, restoring consistency.
ZAB Algorithm
ZAB is an improvement over Multi‑Paxos, similar to Raft, with two main differences:
Raft calls the Leader's term "term"; ZAB calls it "epoch".
In Raft, the Leader sends heartbeats to Followers; in ZAB, Followers send heartbeats to the Leader.
Gossip Algorithm
Gossip is a peer‑to‑peer algorithm where every node is equal. Each node propagates data changes to a few other nodes, quickly spreading updates throughout the cluster.
Cluster startup (example with 20 nodes).
A node receives a data change and forwards it to four other nodes.
Those nodes repeat the process until all nodes are updated.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.