Fundamentals 12 min read

Why Distributed Consistency Algorithms Matter and How Raft Achieves Consensus

This article explains why distributed systems need consistency algorithms, compares weak and strong consistency, outlines the challenges of unreliable networks and clocks, and provides a detailed walkthrough of the Raft consensus protocol, its node states, state variables, RPCs, and a practical lab implementation for leader election.

Tech Musings

Jul 11, 2021

Why Distributed Consistency Algorithms Matter and How Raft Achieves Consensus

Motivation for Distributed Consistency Algorithms

In a share‑nothing distributed system, maintaining a global state is a consensus problem. Weak (eventual) consistency does not guarantee immediate propagation, while strong consistency requires instant cluster‑wide updates. Network unreliability, clock drift, and hardware failures make state synchronization difficult, motivating fault‑tolerant algorithms such as Paxos and Raft.

Raft Consensus Protocol Overview

Raft achieves consensus through log replication. The client‑request flow is:

Leader receives the client request.

Leader appends the request as a log entry to its local log.

Leader replicates the entry to all followers.

Leader waits until a majority of followers acknowledge the entry.

Once committed, the leader applies the entry to its state machine and returns the result to the client.

Leader proceeds to the next request.

In a cluster of 2N+1 nodes, a majority is N+1.

Raft Node Roles

Follower

Initial state of every node.

If no heartbeat is received before the election timeout, the node becomes a Candidate.

Votes for a Candidate that satisfies log‑up‑to‑date and term constraints.

Candidate

Starts an election by sending RequestVote RPCs (including a vote for itself).

If it receives votes from a majority, it becomes Leader; otherwise it retries after a timeout.

Reverts to Follower upon receiving an RPC with a higher term.

Leader

Sends periodic heartbeats ( AppendEntries with no log entries).

Replicates client log entries to followers.

Steps down to Follower if it receives an RPC with a higher term.

Raft State Variables

Persistent State (stored on stable storage before responding to RPCs)

currentTerm

: latest term the node has seen. votedFor: candidate ID that received this node’s vote in the current term (may be null). log[]: ordered list of log entries, each containing a client command and the term when the entry was received.

Volatile State (lost on crash)

commitIndex

: index of the highest log entry known to be committed (replicated on a majority). lastApplied: index of the highest log entry applied to the state machine.

Leader‑only Volatile State

nextIndex[]

: for each follower, the index of the next log entry to send. matchIndex[]: for each follower, the index of the highest log entry known to be replicated on that follower.

Raft RPC Types

RequestVote RPC (leader election)

If the receiver’s term is higher than the candidate’s term, it rejects the vote.

If the candidate’s term is at least as large as the receiver’s term, the candidate’s log is at least as up‑to‑date as the receiver’s log, and the receiver has not yet voted in this term, it grants the vote.

AppendEntries RPC (log replication and heartbeat)

Rejects the request if the receiver’s term is higher than the leader’s term.

Verifies that prevLogIndex and prevLogTerm match the receiver’s log; otherwise rejects.

If a conflicting entry is found, deletes that entry and all following entries.

Appends any new entries not already present.

Updates commitIndex to min(leaderCommit, index of last new entry) when appropriate.

Lab 2A: Implementing Raft Leader Election

The lab implements only the election phase of Raft, omitting log replication. Each node starts with a random election timeout. When the timeout expires without receiving a valid leader heartbeat, the node becomes a Candidate, increments its currentTerm, votes for itself, and sends RequestVote RPCs to all other nodes.

Key implementation points:

A node may vote for at most one candidate per term; once it has voted, it must reject other votes in the same term.

Network messages are asynchronous (often sent in separate goroutines), so a node must handle stale or out‑of‑order RPCs by comparing term numbers.

The election timeout should be a random value within a configurable range to reduce the probability of split votes.

After winning a majority, the node transitions to Leader and begins sending periodic AppendEntries heartbeats (empty entries) to assert its leadership.

Heartbeat logic typically runs in a ticker goroutine; careful lock management is required to avoid deadlocks when accessing shared state such as currentTerm, votedFor, and role.

distributed-systems Raft Consensus Algorithm leader election State Machine Replication

Written by

Tech Musings

Capturing thoughts and reflections while coding.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Motivation for Distributed Consistency Algorithms

Raft Consensus Protocol Overview

Raft Node Roles

Follower

Candidate

Leader

Raft State Variables

Persistent State (stored on stable storage before responding to RPCs)

Volatile State (lost on crash)

Leader‑only Volatile State

Raft RPC Types

RequestVote RPC (leader election)

AppendEntries RPC (log replication and heartbeat)

Lab 2A: Implementing Raft Leader Election

Tech Musings

How this landed with the community

Was this worth your time?

0 Comments

Lab 2A: Implementing Raft Leader Election