Understanding Consistency Protocols: Logical Clocks, Paxos, ZAB, and Raft
This article explains the fundamentals of distributed consistency protocols, covering Lamport's logical clocks, the basics and multi‑instance extensions of Paxos, Zookeeper's ZAB algorithm, and the Raft protocol, while comparing their designs, strengths, and practical engineering considerations.
During the Chinese New Year break the author reviewed several consistency protocol papers and decided to explain them in plain language, addressing common questions such as the differences between Zookeeper's ZAB and Raft, and how Paxos can be applied in real systems.
Logical Clock – Introduced by Lamport in 1987, a logical clock assigns a timestamp to each event to capture the "happens‑before" partial order in a distributed system. The algorithm increments the local counter on internal events, attaches the counter to sent messages, and on receipt sets the counter to max(local, received) + 1. It is not a full consistency protocol but underlies many consensus algorithms.
Replicated State Machine – A common pattern for high‑availability services where each replica maintains a persistent log and applies commands in the same order, ensuring that reads return identical results across replicas. Consensus modules such as Paxos, ZAB, and Raft provide the ordering guarantees.
Paxos – Lamport’s original algorithm (1990s) achieves agreement on a single value using a two‑phase prepare/accept process with Proposers and Acceptors. The basic version cannot directly implement a replicated log; extensions like Multi‑Paxos add a stable leader to handle a sequence of values. The article walks through a concrete example with three servers proposing values and shows how majority replies drive progress.
Multi‑Paxos – Addresses three limitations of Basic‑Paxos: single‑value agreement, possible livelock with multiple proposers, and incomplete knowledge of the chosen value across acceptors. By electing a leader, using one Paxos instance per log index, and simplifying the prepare phase after leader election, Multi‑Paxos becomes practical for state‑machine replication.
ZAB (Zookeeper Atomic Broadcast) – Zookeeper’s proprietary protocol, created before Raft existed, provides leader election, two‑phase commit, and log replication. It uses an epoch number (similar to Raft’s term) combined with a transaction id (zxid) to order operations. ZAB supports stale reads on followers and strong consistency reads via a sync‑read that performs a no‑op write before reading.
Raft – Designed in 2014 to be more understandable than Paxos. It separates leader election from log replication, uses randomized election timeouts, and relies on a single RPC (AppendEntries) for both heartbeats and log entries. The article details term handling, commit rules, leader failure recovery, and differences such as Raft’s ability to overwrite log entries versus ZAB’s append‑only model.
The concluding section notes that many modern distributed databases (e.g., Chubby, OceanBase, TiKV, PolarDB) use Paxos‑ or Raft‑based protocols, often with optimizations like parallel commit, and anticipates further innovations in consensus algorithms.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
