Understanding the Paxos Consensus Algorithm and Its Use in Database High Availability
This article explains why Paxos is needed for distributed consistency, describes its three roles and phases, illustrates the protocol with a real‑world analogy, and shows how Paxos‑based log replication can provide strong, fault‑tolerant high‑availability for databases.
Recently many people have been discussing the Paxos algorithm; after studying various sources, I summarize my understanding of Paxos to help others grasp the concept.
Paxos addresses the consistency problem in distributed systems where data replicas on different machines must stay synchronized; without a central lock, ordering of concurrent client operations must be agreed upon. Paxos is used in services such as Google Chubby, Apache ZooKeeper, Google Megastore, and Google Spanner.
Consider a distributed key‑value store with two operations, Put and Get, deployed on a cluster of three servers (see image). The cluster must ensure that a successful Put("a",1) is reflected on every server, and that concurrent Put requests are ordered consistently—this is the problem Paxos solves.
The Paxos algorithm involves three roles: Proposer (the client proposing a value), Acceptor (the decision makers), and Learner (the entities that learn the chosen value). In practice a fixed set of servers often play all three roles.
Paxos proceeds in three phases:
Prepare phase: The Proposer sends a Prepare request with an epoch number to a majority of Acceptors. Acceptors reject if the epoch is lower than one they have seen, otherwise they return the highest accepted proposal.
Accept phase: After receiving a majority of Prepare responses, the Proposer either proposes a new value or adopts the highest‑numbered accepted proposal and sends an Accept request. Acceptors accept if the proposal’s epoch is higher than any previously promised epoch.
Learn phase: Acceptors notify Learners (or Learners poll) about accepted proposals; once a proposal is accepted by a majority, its value is considered chosen and Learners can apply it.
To illustrate, I use a travel‑planning analogy: a group of hikers (proposers) send time‑stamped messages to team leaders (acceptors). Leaders only communicate with the latest message they receive, and a majority must agree before a proposal proceeds, mirroring the Prepare and Accept phases. The example shows how majority agreement leads to a final decision, similar to Paxos achieving consensus.
The core idea of Paxos is that if more than half of the nodes (N/2+1) agree on a value, the system reaches consensus, providing strong consistency even if some nodes fail, as long as a majority remain operational.
In practice, basic Paxos is often replaced by Multi‑Paxos for efficiency; Fast Paxos and other optimizations further reduce latency. Paxos is not a single protocol but a family of consensus protocols.
Applying Paxos to database high‑availability, traditional master‑slave replication suffers from trade‑offs: strong synchronous replication sacrifices availability, asynchronous replication sacrifices consistency, and semi‑synchronous offers a compromise. Paxos‑based log replication can satisfy the key HA requirements—no data loss, continuous service, automatic leader election, and fault tolerance—by replicating redo or binlog entries across a majority of nodes.
With Paxos, a virtual IP can point to the current primary, and the protocol ensures that all replicas apply the same writes, enabling seamless failover. Major cloud providers (e.g., Alibaba Cloud) use Paxos or Raft for three‑node MySQL clusters.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Qunar Tech Salon
Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
