Fundamentals 22 min read

A Beginner’s Guide to Paxos: From Consistency Problems to the Full Algorithm

This article explains what Paxos is, why distributed systems need strong consistency, compares it with Raft, walks through replication strategies, derives Paxos from majority read/write, details the classic Paxos phases, shows example executions, and discusses optimizations such as Multi‑Paxos and Fast‑Paxos.

High Availability Architecture
High Availability Architecture
High Availability Architecture
A Beginner’s Guide to Paxos: From Consistency Problems to the Full Algorithm

Preface

Paxos is an algorithm that guarantees strong consistency for replicated data in distributed systems.

Without Paxos a set of machines is merely "distributed"; with Paxos they form a true distributed system.

"There is only one consistency algorithm in the world, and that is Paxos…" – Mike Burrows, author of Google Chubby.

Other consistency algorithms can be seen as variants or extensions of Paxos. Raft is often mentioned as a more practical implementation of the same ideas.

The article is split into two parts: the first explains the problem and derives Paxos in plain language; the second gives a strict description of the Paxos protocol suitable for implementation.

Problems Distributed Systems Need to Solve

Paxos works by coordinating a group of machines so that they act as a single system where all replicas agree on the state. For example, in a three‑replica cluster, when an image is uploaded to one node it must be copied to the other two nodes to achieve a consistent state.

Most distributed storage systems rely on replication for reliability, and replication requires a consensus algorithm like Paxos to keep the replicas consistent.

Imperfect Replication Strategies

Master‑Slave Asynchronous Replication : Simple to implement but vulnerable to data loss if the master fails before the data is fully replicated.

Master‑Slave Synchronous Replication : Guarantees durability by acknowledging writes only after all replicas have stored the data, but any replica failure blocks writes.

Semi‑Synchronous Replication : Writes are acknowledged after being replicated to a majority of nodes, providing better availability but still can leave the system in an inconsistent state.

Majority Read/Write : Requires a write to be stored on more than half of the nodes and a read to query a majority. This approach still suffers from write‑write conflicts and requires timestamps to resolve ambiguities.

Deriving Paxos from Majority Read/Write

To eliminate the problems of majority read/write, Paxos introduces two phases. Phase‑1 (prepare) records a proposal number (round) on a majority of acceptors and learns any previously uncommitted value. Phase‑2 (accept) writes the chosen value if the proposer still holds the highest round.

request:
    rnd: int
response:
    last_rnd: int
    v: "xxx",
    vrnd: int

If a proposer receives responses from a quorum, it can proceed. If any response contains a non‑null value, the proposer must adopt the value with the highest round instead of its own value, thus preserving previously committed writes.

Acceptors compare the round in a Phase‑2 request with the round they recorded during Phase‑1; only matching rounds are allowed to write the value, preventing lost updates.

Paxos Algorithm Description

The classic Paxos protocol defines three roles:

Proposer – the client that initiates a proposal.

Acceptor – the storage node that votes on proposals.

Quorum – a majority of acceptors required for both phases.

Each round is identified by a monotonically increasing number generated by the proposer. Acceptors store the last round they saw (last_rnd), the last accepted value (v), and the round of that value (vrnd).

request:
    v: "xxx",
    rnd: int
response:
    ok: bool

If a proposer cannot obtain a quorum in Phase‑1, the system stalls, which is why Paxos can tolerate fewer than half of the nodes failing.

Example Executions

Several scenarios illustrate how Paxos resolves conflicts:

Two proposers X and Y compete; Y wins a higher round, causing X’s Phase‑2 to be rejected on some acceptors.

When X’s Phase‑2 fails, it retries with a higher round, learns the highest accepted value (e.g., Y’s value), and writes that value to achieve consensus.

Fast‑Paxos can commit in a single round if a larger quorum (greater than ¾ n) acknowledges the value; otherwise it falls back to classic Paxos.

Paxos Optimizations

Multi‑Paxos : Reduces the two‑round overhead by running Phase‑1 once for a series of consecutive proposals, effectively turning the protocol into a log replication mechanism similar to Raft.

Fast‑Paxos : Uses a larger quorum to achieve consensus in a single round. If the fast quorum is not reached, the protocol degrades to classic Paxos. The fast quorum must be larger than ¾ n to guarantee that any classic quorum will intersect with it.

Other

Additional optimizations and references are provided, including links to the original papers, related articles, and downloadable PDF/HTML versions of the tutorial.

References:

Original Paxos tutorial (PDF/HTML)

Raft implementation

Leslie Lamport’s homepage

Byzantine Paxos

Classic Paxos paper

Fast Paxos paper

Distributed SystemsReliabilityalgorithmsconsensusPaxos
High Availability Architecture
Written by

High Availability Architecture

Official account for High Availability Architecture.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.