Overview of Distributed Consistency Algorithms: The Raft Protocol
This article explains the fundamentals of distributed consistency by introducing the Raft consensus algorithm, covering its roles, leader election, log replication, handling of split votes, random timeouts, and various failure scenarios such as leader crashes and network partitions.
As distributed systems become essential for high‑concurrency and massive data processing, they bring advantages like avoiding single‑point failures and enabling horizontal scaling, but also introduce the core challenge of maintaining data consistency across nodes.
Among many consensus protocols, Paxos has a rigorous mathematical proof but is hard to understand, so simplified variants such as Raft are widely used; this article chooses Raft to illustrate distributed consistency.
In a Raft cluster each server assumes one of three roles: Leader, Follower, or Candidate. The Leader handles client requests, Followers replicate logs, and Candidates compete for leadership when they detect a timeout.
The leader election process works like a democratic vote: a Follower that does not receive a heartbeat within its election timeout becomes a Candidate, increments its term, and requests votes from other nodes. If a Candidate receives votes from a majority (> ½) it becomes the Leader and begins sending periodic heartbeats. Randomized election timeouts help avoid repeated split‑vote situations, as shown by examples of normal elections and tie‑vote scenarios.
After a Leader is elected, all client operations are sent to it. The Leader first records the operation in its local log (uncommitted), then replicates the entry to Followers. Once a majority of Followers acknowledge the entry, the Leader marks it as committed, applies it locally, and notifies Followers to apply it as well. The article outlines the five‑step log replication workflow.
Raft also defines handling for abnormal conditions: (1) a write reaches the Leader but is not replicated, (2) a write is replicated to Followers but the Leader crashes before sending ACK, (3) the Leader crashes after committing while some Followers remain uncommitted, and (4) network partitions causing split‑brain. In each case Raft ensures safety by requiring a new Leader to have the most up‑to‑date log and by using majority voting to prevent inconsistent commits.
Overall, the article provides a concise yet comprehensive introduction to Raft’s election and log replication mechanisms, its strategies for avoiding split votes, and its robustness against various failure scenarios, offering a solid foundation for studying other consensus algorithms.
Big Data Technology Architecture
Exploring Open Source Big Data and AI Technologies
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.