Understanding Distributed Consensus: Raft Leader Election and Log Replication
This article explains the fundamentals of distributed consistency, introduces key terminology, and provides an in‑depth look at Raft's leader election, term handling, log replication, failure scenarios, and comparisons with Paxos, ZAB, PacificA, and Viewstamped Replication.
1. Introduction – Distributed consistency protocols such as Paxos, Raft, ZAB, and PacificA aim to keep the state of multiple nodes identical despite network partitions and node failures. Raft, designed by Stanford researchers, simplifies Paxos by dividing the problem into leader election, log replication, safety, and membership changes.
2. Terminology – The article defines essential concepts: Distributed Consensus, Distributed Consensus Algorithms, Paxos, Raft, ZAB, PacificA, Viewstamped Replication, Log Replication, Leader Election, Log, Entry, Term, Index, Node, Single/Multiple Node, Leader, Follower, Candidate, Vote, Heal, Step Down, Logical Clock, Quorum, and Brain Split.
3. What Is Distributed Consistency? – Consistency means all nodes hold exactly the same data. Because nodes communicate over unreliable networks, the system must remain available even when some nodes are down.
4. Raft Election
4.1 What Is Leader Election – Raft nodes can be Leader, Follower, or Candidate, but only one role at a time. Only the Leader can accept writes; Candidates initiate elections when they lose contact with the Leader.
4.2 Implementation Details – Two time‑outs govern elections: Election Timeout (random 150‑250 ms) and Heartbeat Timeout . If a Follower does not receive an AppendEntries RPC within the election timeout, it becomes a Candidate and starts a new term. Randomized time‑outs reduce split‑vote scenarios.
4.3 Term vs. Lease – A Raft term is similar to a lease but decentralized; it increments only when an election timeout occurs.
4.4 Election Diagram
When nodes S1 and S2 are stopped, remaining nodes (S3, S4, S5) trigger elections; the node with the highest term and index becomes Leader.
Only the node with the greatest term (S5 in the example) can become Leader, ensuring the cluster always holds the most up‑to‑date term.
4.5 Election Summary – To become Leader a node must have the highest term; if terms tie, the highest index decides; if both tie, the first to start the election wins, but it still needs a majority of votes.
5. Raft Log Replication
5.1 What Is Log Replication – The Leader records client requests as log entries (Entries). Once a majority of Followers acknowledge the entry, the Leader commits it and replies to the client.
5.2 Implementation – The article shows scenarios where the Leader is isolated, causing writes to remain uncommitted, and later recovery steps that bring the cluster back to a consistent state. Images illustrate these state changes.
5.3 Replication During a Brain Split – When a network partition creates two Leaders, the majority side continues operating while the minority side cannot serve reads. After the partition heals, nodes with smaller terms step down and roll back uncommitted entries, restoring consistency.
6. Concept Comparison – A table compares Raft, Paxos, ZAB, PacificA, and Viewstamped Replication across roles (Leader/Proposer, Follower/Acceptor, Candidate/Learner, etc.), terms (Term/Ballot/Epoch/View), and indices.
7. Commonality Discussion
7.1 PacificA, HBase & Kafka – All rely on a coordination service similar to Zookeeper for configuration and shard location discovery.
7.2 Redis, Raft, ZAB & Paxos – Although different systems, they share the quorum principle and use an increasing value (Epoch or Term) to achieve agreement.
8. Condensed Summary
8.1 Three Roles – Leader, Follower, Candidate.
8.2 Two RPCs – AppendEntry (heartbeat & write) and RequestVote (vote request).
8.3 Two Time‑outs – Election Timeout (prevents split votes) and Heartbeat Timeout (must be shorter than election timeout).
9. References
1) https://translate.google.cn 2) https://raft.github.io/raft.pdf 3) https://raft.github.io/ 4) http://thesecretlivesofdata.com/raft/ 5) PacificA – Microsoft’s distributed storage framework 6) Distributed Consensus: Viewstamped Replication, Raft, Paxos 7) https://static.googleusercontent.com/media/research.google.com/en//archive/paxos_made_live.pdf
- END -
Big Data Technology Architecture
Exploring Open Source Big Data and AI Technologies
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.