How Paxos Guarantees Consistency in Distributed Systems – A Deep Dive
This article explains the Paxos consensus algorithm, detailing its roles, three-phase execution process, key properties such as consistency, availability and fault tolerance, and showcases its practical applications in distributed databases, file systems, and coordination services.
Introduction
In distributed systems, achieving consensus among replicas is essential for consistency. Paxos, introduced by Leslie Lamport in the 1990s, provides a fault‑tolerant way for a majority of nodes to agree on a single value despite crashes or network partitions.
Roles
Proposer : generates a proposal with a unique, monotonically increasing proposal number and tries to obtain acceptance from a majority of acceptors.
Acceptor : promises not to accept proposals with lower numbers than the highest one it has promised, and may accept a proposal that matches its promise.
Learner : receives commit notifications and learns the final decided value.
Protocol Phases
1. Prepare Phase
The proposer selects a new proposal number (e.g., a timestamp combined with its node ID) and sends a Prepare request to a majority of acceptors.
Each acceptor compares the received number with any previously promised number. If the new number is higher, the acceptor replies with a Promise and includes the highest-numbered proposal it has already accepted, if any.
2. Accept Phase
After collecting a majority of promises, the proposer chooses a value:
If any acceptor reported an already‑accepted proposal, the proposer must adopt the value of the highest-numbered such proposal.
Otherwise it may propose its own value.
The proposer then sends an AcceptRequest (containing the proposal number and chosen value) to the same majority of acceptors.
An acceptor that receives an AcceptRequest whose number matches its most recent promise records the value as accepted and replies with an Accepted message.
3. Commit Phase
When the proposer receives Accepted responses from a majority, it considers the proposal committed.
The proposer broadcasts a Commit message to all learners (and optionally to all acceptors) indicating the final value.
Learners update their state with the committed value.
Key Properties
Safety (Consistency) : No two distinct values can be committed; all correct nodes eventually see the same value.
Liveness (Availability) : As long as a majority of nodes are reachable and can communicate, progress is possible.
Fault Tolerance : The algorithm tolerates up to ⌊(N‑1)/2⌋ crash failures in a system of N nodes.
Typical Deployments
Distributed Databases : Google Spanner uses Paxos to replicate data across data centers, ensuring strong consistency.
Distributed File Systems : Ceph’s metadata service employs a modified Paxos variant to keep file‑system metadata synchronized.
Coordination Services : Apache ZooKeeper implements the Zab protocol, which is derived from Paxos, for leader election and configuration management.
References
Lamport, L. (1998). The Part‑Time Parliament. ACM Transactions on Computer Systems , 16(2), 133‑169.
Chandra, T. D., Griesemer, R., & Redstone, J. (2007). Paxos Made Live – An Engineering Perspective. Proceedings of the 26th ACM Symposium on Principles of Distributed Computing.
Google Spanner Documentation. https://cloud.google.com/spanner/
Ceph Documentation. https://docs.ceph.com/docs/master/
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Ops Development & AI Practice
DevSecOps engineer sharing experiences and insights on AI, Web3, and Claude code development. Aims to help solve technical challenges, improve development efficiency, and grow through community interaction. Feel free to comment and discuss.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
