How X‑Paxos Transforms Distributed Consensus for High‑Performance Databases
X‑Paxos is Alibaba’s high‑performance, independently designed Paxos library that extends the classic consensus algorithm with multi‑threaded architecture, pluggable logging, adaptive batching and pipelining, and flexible node roles, delivering strong consistency, high availability, and low latency for global distributed databases and services.
Paxos is a foundational distributed consensus algorithm, widely regarded as the de‑facto protocol for achieving strong consistency and high availability in distributed systems. X‑Paxos is Alibaba’s independent implementation of a high‑performance Paxos library, built to meet the demands of global deployment, high throughput, and the specific characteristics of Alibaba’s services.
Background – While Paxos has been studied for over 17 years, mature open‑source independent libraries remain scarce. Existing solutions such as Google’s internal implementations, Facebook’s undisclosed systems, and Apache Zookeeper either lack high‑throughput state‑machine replication or do not provide a standalone library for rapid integration.
Vision – X‑Paxos aims to provide a production‑tested, highly reliable independent Paxos library that can be easily integrated into backend services to obtain strong consistency, high availability, and automatic disaster recovery, making the traditionally complex Paxos algorithm approachable for a wide range of applications.
Architecture
The overall architecture consists of four layers: network layer, service layer, algorithm module, and log module.
Network Layer – Built on Alibaba’s mature libeasy library, providing asynchronous networking and a customized reconnection mechanism suitable for distributed protocols.
Service Layer – A C++11‑based multithreaded asynchronous framework that offers event‑driven execution, timer callbacks, and a flexible worker model, eliminating the CPU bottleneck of single‑threaded designs.
Algorithm Module – Implements a unique‑proposer multi‑Paxos design, offering better performance than basic Paxos and supporting extensive functional and performance enhancements tailored to Alibaba’s workloads.
Log Module – Decoupled from the algorithm to allow pluggable high‑performance logging implementations; users can integrate existing WAL systems to avoid redundant storage and improve throughput.
Feature Enhancements
Online node addition/removal and leader transfer.
Strategy‑based majority and weighted leader election, enabling user‑defined rules for disaster recovery.
Customizable node roles (Proposer/Accepter/Learner) allowing trimmed‑down nodes for specific use cases.
Witness SDK that abstracts the Learner role as a data‑stream subscriber, facilitating downstream log consumption, backup, and configuration push.
Performance Optimizations
Adaptive batching and pipelining to maximize throughput over high‑latency networks, with the relationship M/R * P = D guiding optimal batch size (M) and pipeline depth (P) based on bandwidth (R) and propagation delay (D).
Multi‑threaded implementation that removes the single‑thread limitation of many Paxos libraries, achieving significantly higher per‑partition performance.
Locality‑aware content distribution that reduces load on the primary node and minimizes cross‑region bandwidth usage.
new ThreadTimer(srv_->getThreadTimerService(), srv_, electionTimeout_, ThreadTimer::Oneshot, &Paxos::checkLeaderTransfer, this, targetId, currentTerm_.load(), log_->getLastLogIndex());Correctness Verification
Integration with Jepsen to validate behavior under network partitions and failures.
Formal modeling with TLA+ to prove safety properties.
Automated random fault injection system and regression test suite for continuous reliability checks.
Competitor Analysis
Compared with XCOM (MySQL Group Replication) and phxpaxos, X‑Paxos demonstrates superior performance: over 100× higher throughput within a region and only a 3.5% throughput drop in cross‑region scenarios, whereas phxpaxos struggles under high latency.
Current Status and Future Work
X‑Paxos Phase 1 is already deployed in Alibaba’s AliSQL X‑Cluster and other internal services. Future directions include multi‑partition support with deep shared asynchronous frameworks, strong consistent reads across multiple nodes, and continued performance tuning for global deployments.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
