Database High Availability Challenges and Paxos‑Based Solutions
This article examines the difficulties of achieving strong consistency and continuous availability in database replication, reviews the Paxos consensus protocol and its variants, and explains how Multi‑Paxos and clock‑driven leader election can be applied to build robust high‑availability architectures.
The article begins by highlighting the critical importance of data consistency and continuous availability for e‑commerce and internet‑finance services, and explains the trade‑off between synchronous and asynchronous replication in traditional primary‑standby setups such as Oracle's Max Protection, Max Performance, and Max Availability modes.
It then outlines three fundamental requirements for database HA: no data loss, uninterrupted service, and automatic primary‑standby failover, and introduces the Paxos protocol as a way to satisfy these needs under the assumption that a majority of nodes remain reachable.
A concise review of Paxos follows, describing the two‑phase Prepare and Accept steps, the “two promises” made by acceptors, and the importance of persisting proposal IDs before responding.
The “Basic Paxos” model is presented, where each log entry is treated as an independent Paxos instance with a monotonically increasing LogID, and the handling of conflicts, timeouts, and recovery scenarios such as “maximum commit principle” is discussed.
Next, the article moves to Multi‑Paxos in practice, showing how a designated leader simplifies log synchronization, reduces the number of Prepare phases, and enables automatic leader election, log confirmation, and replay optimizations such as sliding windows and “re‑confirmation” of uncertain logs.
A clock‑driven variant of the leader election protocol designed by Alibaba’s Yang Zhenkun is analyzed, detailing its three‑phase voting window, timing calculations involving clock drift (Tdiff) and network latency (Tst), and the conditions under which the algorithm may suffer split‑brain or double‑leader situations.
The piece concludes with a Q&A section that compares ZooKeeper’s ZAB protocol with Paxos, discusses the feasibility of global Paxos deployments, contrasts Raft with Multi‑Paxos, and explains testing strategies for verifying Paxos implementations in OceanBase.
High Availability Architecture
Official account for High Availability Architecture.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.