Demystifying Paxos: How Distributed Systems Achieve Consensus
This article explains why Paxos is needed for consistency in distributed systems, details its roles and three-phase protocol, illustrates the algorithm with a real‑world analogy, and shows how Paxos underpins high‑availability database replication such as MySQL binlog synchronization.
Why Paxos Is Needed
Distributed systems replicate data across multiple machines to ensure high availability, but they must keep all replicas consistent despite concurrent client operations. Paxos provides a solution to this consistency problem by coordinating updates without a single lock, avoiding single‑point failures.
Typical Uses of Paxos
Two main applications are:
Implementing global lock, naming, and configuration services (e.g., Google Chubby, Apache ZooKeeper).
Replicating user data across data centers (e.g., Google Megastore, Google Spanner).
Consider a distributed key‑value store offering Put and Get operations. Multiple servers form a cluster so that a successful Put("a",1) must appear on every server, and concurrent Put requests must be ordered consistently.
Paxos Algorithm Overview
The protocol involves three roles, often combined in each server:
Proposer : initiates a proposal.
Acceptor : votes on proposals.
Learner : learns the chosen value.
The algorithm proceeds in three phases:
1. Prepare Phase
Proposer sends a Prepare(epochNo, value) request to a majority of Acceptors.
Each Acceptor replies with the highest-numbered proposal it has already accepted (if any) or rejects if the epoch number is lower than one it has seen.
Proposer must receive responses from a majority before moving to the Accept phase; otherwise it restarts the Prepare phase.
2. Accept Phase
After a successful Prepare, the Proposer either proposes a new value (if no prior accepted proposal) or adopts the highest‑numbered accepted proposal.
Acceptor accepts the proposal only if its epoch number exceeds any previously promised number; otherwise it rejects.
3. Learn Phase
Acceptors inform Learners of accepted proposals. Once a Learner sees that a proposal has been accepted by a majority, the value is considered chosen, and the Learner can apply it, after which the Proposer stops sending further requests for that value.
Intuitive Analogy
Imagine ten travelers wanting to decide on a destination, with five team leaders who only communicate with the travelers via messages. The leaders act as Acceptors, the travelers as Proposers, and the message timestamps as epoch numbers. The process of requesting communication, receiving majority approval, and finally learning the agreed destination mirrors the Prepare‑Accept‑Learn phases, illustrating how majority agreement leads to a consistent decision.
Applying Paxos to Database High Availability
Traditional master‑slave replication has three common modes:
Strong synchronous replication : the master waits for the slave to acknowledge the binlog before confirming the transaction, sacrificing availability if the network or slave fails.
Asynchronous replication : the master returns success immediately, risking data loss on failure.
Semi‑synchronous replication : the master waits for at least one slave acknowledgment, offering a compromise that may fall back to asynchronous under poor network conditions.
All these modes share the challenge of electing a primary and handling node failures. Using Paxos (or its variant Raft) for log replication addresses these issues by requiring a majority of nodes (⌊N/2⌋+1) to agree on each log entry, ensuring strong consistency while tolerating up to ⌊N/2⌋ node failures.
In practice, Paxos can replicate redo logs or binlogs, providing a fault‑tolerant, strongly consistent cluster. Major cloud providers (e.g., Alibaba Cloud) already employ Paxos or Raft for three‑node MySQL clusters.
Key prerequisites for Paxos‑based replication are:
A majority of replica nodes must be alive and able to communicate.
Network partitions that prevent a majority from communicating will halt the service.
Summary
Paxos solves distributed consistency by using a majority‑based voting mechanism across roles of Proposer, Acceptor, and Learner. Its three‑phase protocol ensures that only one value is chosen, even in the presence of failures, making it suitable for high‑availability database replication and other consensus‑critical services.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
