How AliSQL X‑Cluster Achieves Strong Consistency and Global Scalability
AliSQL X‑Cluster is Alibaba's MySQL‑compatible distributed database that integrates the X‑Paxos consensus protocol to provide strong consistency, multi‑region deployment, low‑cost replica types, asynchronous transaction commit, hotspot‑update optimizations and superior performance compared with native MySQL and Group Replication, while offering flexible online configuration and robust failover mechanisms.
Introduction
Since its inception MySQL has been popular for its simplicity, ease of use and open‑source nature, becoming a first‑choice database for many developers. Alibaba launched the "去IOE" initiative in 2008, replacing commercial Oracle with a heavily customized MySQL branch called AliSQL. Rapid business growth and the need for cross‑region active‑active deployments exposed the limits of traditional master‑slave architectures, prompting the development of a new distributed solution.
AliSQL X‑Cluster Overview
AliSQL X‑Cluster (referred to as X‑Cluster) is a MySQL 5.7‑compatible distributed database product that provides strong data consistency and supports global deployment.
Core Consistency Protocol – X‑Paxos
X‑Paxos is Alibaba’s self‑developed high‑performance consensus protocol library that fills the gap left by existing open‑source solutions. By integrating X‑Paxos into the database kernel, X‑Cluster replaces the original replication module, enabling automatic leader election, log synchronization, and strong consistency across the cluster.
Architecture
X‑Cluster follows a single‑writer, multi‑reader model. At any moment only one node (the Leader) handles write operations, avoiding write‑write conflicts and delivering higher throughput. Each instance runs as a single process with X‑Paxos deeply integrated, using a customized MySQL binlog as the consensus log.
Transaction Commit and Replication
During the prepare phase, the Leader gathers transaction logs and hands them to the X‑Paxos layer. Once a majority of nodes have persisted the log, X‑Paxos notifies the transaction to proceed to the commit phase. If the Leader fails during this window, the Paxos log determines the appropriate rollback. Followers receive the log, append it to their consensus log, and replay it using parallel group‑commit pipelines, employing multithreading, asynchronous batching and pipelining to improve latency.
Failover
If a majority of nodes remain alive, the cluster continues serving requests. Leader failure automatically triggers a new election driven by X‑Paxos, which selects a new Leader based on configured priorities.
Optimization Features
Cross‑Region Deployment
X‑Cluster can be deployed across regions while maintaining strong consistency. Even if an entire city‑level data center fails, the cluster remains available as long as a majority of nodes survive. The design leverages X‑Paxos and asynchronous worker threads to keep latency low over high‑RTT networks.
Dynamic Configuration and Election
Cluster configuration can be changed online without service interruption. Supported operations include adding/removing nodes, changing node types (consistency or read‑only), adjusting node priorities, modifying the Leader, and updating read‑only replication sources. All changes are recorded via Paxos and applied safely.
Node Priorities
Nodes can be assigned weights to influence leader selection. During elections, higher‑weight nodes are chosen earlier, and a weight‑check mechanism can trigger a leader transfer if a higher‑weight node becomes available, preventing undesirable failover to low‑spec nodes.
Strategy‑Based Majority Replication
Replication can be classified as strong or weak. Strong replication requires logs to be persisted on all strong‑replica nodes before the transaction is considered committed, providing a “max‑protection” mode that tolerates strong‑replica failures.
Low‑Cost Replica Management
Replica types are divided into Normal (full data and state machine) and Log (stores only the consensus log). Log replicas are cheaper in storage and CPU, making them ideal for disaster‑recovery nodes.
Read‑Only Node Management
Read‑only nodes belong to a Learner Source Group. If their upstream consistency node fails, the group reassigns the read‑only node to another healthy node, ensuring continuous data replication.
High‑Performance Consensus Log
The consensus log offers basic operations (Append, Get, Truncate, Purge) controlled by X‑Paxos, with indexing, caching and pre‑read mechanisms that dramatically improve log handling efficiency.
Asynchronous Transaction Commit
Transaction processing is split into two stages: a waiting‑sync queue (awaiting Paxos majority) and a waiting‑commit queue (ready to be committed). Worker threads that would otherwise block on log synchronization are freed to handle new client requests, yielding a 10 % throughput gain in same‑city deployments and several‑fold improvements across regions.
Hotspot Update Optimization
A new hotspot row lock allows concurrent updates on the same row. X‑Cluster batches hotspot updates, marks them with special log flags, and merges their logs, achieving up to 200× performance improvement for hotspot workloads.
Integrated Client‑Server Ecosystem
The X‑Driver client subscribes to server changes, automatically discovers leaders, and maintains up‑to‑date instance lists without external metadata services.
Backup and Data Subscription
Leveraging X‑Paxos’s globally unique log positions, backup and subscription services can reliably consume logs, providing real‑time data replication and robust failover handling.
Deployment Scenarios
Two classic deployment patterns are highlighted:
Same‑city three‑node deployment (2 data nodes + 1 log node) for zero data loss and data‑center‑level disaster recovery.
Cross‑region five‑node deployment (4 data nodes + 1 log node) for city‑level disaster recovery.
Performance Evaluation
Benchmarks were conducted using Sysbench (insert/OLTP) on three‑node clusters in both same‑city and cross‑region networks. Compared with MySQL 5.7.19 and Group Replication, X‑Cluster showed:
More than double the throughput and ~55 % lower latency for insert workloads in same‑city tests.
~5 % higher throughput and ~70 % lower latency for OLTP workloads.
In cross‑region tests, X‑Cluster outperformed Group Replication by up to 5× in throughput and achieved roughly one‑quarter of its latency.
Correctness Assurance
Extensive gray‑box testing, fault injection (network partitions, I/O errors, node crashes) and Jepsen tests are used to validate linearizability, isolation and consistency. Continuous data and log verification, along with automated benchmark stress, ensure robustness before production release.
Comparison with Similar Solutions
Galera
Galera uses a Totem multicast protocol (P2P) for multi‑master writes, which suffers from increased latency as node count grows and can become unavailable during failover. It also relies on a dedicated gcache for incremental state transfer, incurring extra storage and compute overhead.
Group Replication
MySQL Group Replication, based on XCOM and GTID, supports multi‑master writes but is limited to nine nodes, uses binlog replication, and still exhibits performance and stability issues in cross‑region deployments.
Conclusion
X‑Cluster delivers a strong‑consistency, high‑performance distributed database solution tailored for workloads with stringent data quality requirements. Its autonomous consensus layer, flexible deployment options, and extensive testing make it a compelling choice for large‑scale, globally distributed applications, with future plans to add multi‑shard Paxos, strong‑read capabilities and further optimizations.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
