Fundamentals 11 min read

Understanding the Raft Consensus Algorithm: States, Leader Election, Consistency, and Split‑Brain Handling

This article provides a comprehensive overview of the Raft consensus algorithm, detailing node roles (follower, candidate, leader), the leader election process, mechanisms ensuring cluster consistency, handling of node failures, and strategies for resolving split‑brain scenarios in distributed systems.

Top Architect

Feb 21, 2021

Understanding the Raft Consensus Algorithm: States, Leader Election, Consistency, and Split‑Brain Handling

1: Raft States

Raft clusters have three possible roles for each node: follower (initial state), candidate (temporary role during elections), and leader (the sole node that handles client requests and replicates logs to followers). The term (or election term) is a monotonically increasing integer that all nodes track.

2: Leader Election

When a Raft cluster starts or when the current leader fails, nodes initiate an election. Each node starts as a follower, sets its term to 0, and begins a random election timeout (150‑300 ms). The first node whose timeout expires becomes a candidate, increments its term, votes for itself, and requests votes from other nodes. If it receives votes from a majority, it becomes the leader and begins sending periodic heartbeats (AppendEntries RPC) to maintain authority.

If two candidates time out simultaneously, they may split votes; the election repeats with new random timeouts until a leader is chosen.

Election Scenarios

Leader crash : Followers that miss heartbeats start a new election.

Follower crash : The leader continues sending heartbeats; the follower rejoins as a follower when it recovers.

3: Ensuring Cluster Consistency

Log replication is driven by the leader. When a client submits a command, the leader creates an uncommitted log entry, replicates it to a majority of followers, and waits for acknowledgments. Once a majority have stored the entry, the leader marks it committed, replies to the client, and notifies followers to commit.

The protocol defines several failure cases (leader crash before/after replication, partial replication, etc.) and guarantees that a new leader will always have the most up‑to‑date log, preventing data loss and preserving consistency.

4: Handling Split‑Brain (Network Partition)

During a network partition, separate sub‑clusters may each elect a leader, creating a split‑brain. The partition with the majority of nodes continues to accept writes, while the minority cannot achieve a quorum.

When the network heals, the minority leader steps down (its term is lower), rolls back divergent logs, and synchronizes with the majority leader, restoring a single authoritative leader and consistent state across the cluster.

5: Summary

The article outlines Raft’s node roles, leader election mechanics, log replication for consistency, and the resolution of split‑brain situations, offering a high‑level understanding of how Raft achieves high availability and strong consistency in distributed systems.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Consistency Raft Consensus leader election Split-Brain

Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.