Databases 17 min read

Database High Availability Challenges and Paxos‑Based Solutions

This article examines the difficulties of achieving strong consistency and continuous availability in database replication, reviews the Paxos consensus protocol and its variants, and explains how Multi‑Paxos and clock‑driven leader election can be applied to build robust high‑availability architectures.

High Availability Architecture

Jan 18, 2016

Database High Availability Challenges and Paxos‑Based Solutions

The article begins by highlighting the critical importance of data consistency and continuous availability for e‑commerce and internet‑finance services, and explains the trade‑off between synchronous and asynchronous replication in traditional primary‑standby setups such as Oracle's Max Protection, Max Performance, and Max Availability modes.

It then outlines three fundamental requirements for database HA: no data loss, uninterrupted service, and automatic primary‑standby failover, and introduces the Paxos protocol as a way to satisfy these needs under the assumption that a majority of nodes remain reachable.

A concise review of Paxos follows, describing the two‑phase Prepare and Accept steps, the “two promises” made by acceptors, and the importance of persisting proposal IDs before responding.

The “Basic Paxos” model is presented, where each log entry is treated as an independent Paxos instance with a monotonically increasing LogID, and the handling of conflicts, timeouts, and recovery scenarios such as “maximum commit principle” is discussed.

Next, the article moves to Multi‑Paxos in practice, showing how a designated leader simplifies log synchronization, reduces the number of Prepare phases, and enables automatic leader election, log confirmation, and replay optimizations such as sliding windows and “re‑confirmation” of uncertain logs.

A clock‑driven variant of the leader election protocol designed by Alibaba’s Yang Zhenkun is analyzed, detailing its three‑phase voting window, timing calculations involving clock drift (Tdiff) and network latency (Tst), and the conditions under which the algorithm may suffer split‑brain or double‑leader situations.

The piece concludes with a Q&A section that compares ZooKeeper’s ZAB protocol with Paxos, discusses the feasibility of global Paxos deployments, contrasts Raft with Multi‑Paxos, and explains testing strategies for verifying Paxos implementations in OceanBase.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Database Replication Consensus leader election Multi-Paxos Paxos

Written by

High Availability Architecture

Official account for High Availability Architecture.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.