Fundamentals 11 min read

Overview of Distributed Consistency Algorithms: The Raft Protocol

This article explains the fundamentals of distributed consistency by introducing the Raft consensus algorithm, covering its roles, leader election, log replication, handling of split votes, random timeouts, and various failure scenarios such as leader crashes and network partitions.

Big Data Technology Architecture
Big Data Technology Architecture
Big Data Technology Architecture
Overview of Distributed Consistency Algorithms: The Raft Protocol

As distributed systems become essential for high‑concurrency and massive data processing, they bring advantages like avoiding single‑point failures and enabling horizontal scaling, but also introduce the core challenge of maintaining data consistency across nodes.

Among many consensus protocols, Paxos has a rigorous mathematical proof but is hard to understand, so simplified variants such as Raft are widely used; this article chooses Raft to illustrate distributed consistency.

In a Raft cluster each server assumes one of three roles: Leader, Follower, or Candidate. The Leader handles client requests, Followers replicate logs, and Candidates compete for leadership when they detect a timeout.

The leader election process works like a democratic vote: a Follower that does not receive a heartbeat within its election timeout becomes a Candidate, increments its term, and requests votes from other nodes. If a Candidate receives votes from a majority (> ½) it becomes the Leader and begins sending periodic heartbeats. Randomized election timeouts help avoid repeated split‑vote situations, as shown by examples of normal elections and tie‑vote scenarios.

After a Leader is elected, all client operations are sent to it. The Leader first records the operation in its local log (uncommitted), then replicates the entry to Followers. Once a majority of Followers acknowledge the entry, the Leader marks it as committed, applies it locally, and notifies Followers to apply it as well. The article outlines the five‑step log replication workflow.

Raft also defines handling for abnormal conditions: (1) a write reaches the Leader but is not replicated, (2) a write is replicated to Followers but the Leader crashes before sending ACK, (3) the Leader crashes after committing while some Followers remain uncommitted, and (4) network partitions causing split‑brain. In each case Raft ensures safety by requiring a new Leader to have the most up‑to‑date log and by using majority voting to prevent inconsistent commits.

Overall, the article provides a concise yet comprehensive introduction to Raft’s election and log replication mechanisms, its strategies for avoiding split votes, and its robustness against various failure scenarios, offering a solid foundation for studying other consensus algorithms.

consistencyRaftLeader Electiondistributed consensusLog ReplicationFailure Handling
Big Data Technology Architecture
Written by

Big Data Technology Architecture

Exploring Open Source Big Data and AI Technologies

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.