Understanding the Raft Consensus Algorithm: States, Leader Election, Consistency, and Split‑Brain Handling
This article provides a comprehensive overview of the Raft consensus algorithm, detailing node roles (follower, candidate, leader), the leader election process, mechanisms ensuring cluster consistency, handling of node failures, and strategies for resolving split‑brain scenarios in distributed systems.
1: Raft States
Raft clusters have three possible roles for each node: follower (initial state), candidate (temporary role during elections), and leader (the sole node that handles client requests and replicates logs to followers). The term (or election term) is a monotonically increasing integer that all nodes track.
2: Leader Election
When a Raft cluster starts or when the current leader fails, nodes initiate an election. Each node starts as a follower, sets its term to 0, and begins a random election timeout (150‑300 ms). The first node whose timeout expires becomes a candidate, increments its term, votes for itself, and requests votes from other nodes. If it receives votes from a majority, it becomes the leader and begins sending periodic heartbeats (AppendEntries RPC) to maintain authority.
If two candidates time out simultaneously, they may split votes; the election repeats with new random timeouts until a leader is chosen.
Election Scenarios
Leader crash : Followers that miss heartbeats start a new election.
Follower crash : The leader continues sending heartbeats; the follower rejoins as a follower when it recovers.
3: Ensuring Cluster Consistency
Log replication is driven by the leader. When a client submits a command, the leader creates an uncommitted log entry, replicates it to a majority of followers, and waits for acknowledgments. Once a majority have stored the entry, the leader marks it committed, replies to the client, and notifies followers to commit.
The protocol defines several failure cases (leader crash before/after replication, partial replication, etc.) and guarantees that a new leader will always have the most up‑to‑date log, preventing data loss and preserving consistency.
4: Handling Split‑Brain (Network Partition)
During a network partition, separate sub‑clusters may each elect a leader, creating a split‑brain. The partition with the majority of nodes continues to accept writes, while the minority cannot achieve a quorum.
When the network heals, the minority leader steps down (its term is lower), rolls back divergent logs, and synchronizes with the majority leader, restoring a single authoritative leader and consistent state across the cluster.
5: Summary
The article outlines Raft’s node roles, leader election mechanics, log replication for consistency, and the resolution of split‑brain situations, offering a high‑level understanding of how Raft achieves high availability and strong consistency in distributed systems.
Top Architect
Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.