Databases 28 min read

How AliSQL X‑Cluster Achieves Strong Consistency and Global Scalability

AliSQL X‑Cluster is Alibaba's MySQL‑compatible distributed database that integrates the X‑Paxos consensus protocol to provide strong consistency, multi‑region deployment, low‑cost replica types, asynchronous transaction commit, hotspot‑update optimizations and superior performance compared with native MySQL and Group Replication, while offering flexible online configuration and robust failover mechanisms.

Alibaba Cloud Developer

Aug 9, 2017

How AliSQL X‑Cluster Achieves Strong Consistency and Global Scalability

Introduction

Since its inception MySQL has been popular for its simplicity, ease of use and open‑source nature, becoming a first‑choice database for many developers. Alibaba launched the "去IOE" initiative in 2008, replacing commercial Oracle with a heavily customized MySQL branch called AliSQL. Rapid business growth and the need for cross‑region active‑active deployments exposed the limits of traditional master‑slave architectures, prompting the development of a new distributed solution.

AliSQL X‑Cluster Overview

AliSQL X‑Cluster (referred to as X‑Cluster) is a MySQL 5.7‑compatible distributed database product that provides strong data consistency and supports global deployment.

Core Consistency Protocol – X‑Paxos

X‑Paxos is Alibaba’s self‑developed high‑performance consensus protocol library that fills the gap left by existing open‑source solutions. By integrating X‑Paxos into the database kernel, X‑Cluster replaces the original replication module, enabling automatic leader election, log synchronization, and strong consistency across the cluster.

Architecture

X‑Cluster follows a single‑writer, multi‑reader model. At any moment only one node (the Leader) handles write operations, avoiding write‑write conflicts and delivering higher throughput. Each instance runs as a single process with X‑Paxos deeply integrated, using a customized MySQL binlog as the consensus log.

Transaction Commit and Replication

During the prepare phase, the Leader gathers transaction logs and hands them to the X‑Paxos layer. Once a majority of nodes have persisted the log, X‑Paxos notifies the transaction to proceed to the commit phase. If the Leader fails during this window, the Paxos log determines the appropriate rollback. Followers receive the log, append it to their consensus log, and replay it using parallel group‑commit pipelines, employing multithreading, asynchronous batching and pipelining to improve latency.

Failover

If a majority of nodes remain alive, the cluster continues serving requests. Leader failure automatically triggers a new election driven by X‑Paxos, which selects a new Leader based on configured priorities.

Optimization Features

Cross‑Region Deployment

X‑Cluster can be deployed across regions while maintaining strong consistency. Even if an entire city‑level data center fails, the cluster remains available as long as a majority of nodes survive. The design leverages X‑Paxos and asynchronous worker threads to keep latency low over high‑RTT networks.

Dynamic Configuration and Election

Cluster configuration can be changed online without service interruption. Supported operations include adding/removing nodes, changing node types (consistency or read‑only), adjusting node priorities, modifying the Leader, and updating read‑only replication sources. All changes are recorded via Paxos and applied safely.

Node Priorities

Nodes can be assigned weights to influence leader selection. During elections, higher‑weight nodes are chosen earlier, and a weight‑check mechanism can trigger a leader transfer if a higher‑weight node becomes available, preventing undesirable failover to low‑spec nodes.

Strategy‑Based Majority Replication

Replication can be classified as strong or weak. Strong replication requires logs to be persisted on all strong‑replica nodes before the transaction is considered committed, providing a “max‑protection” mode that tolerates strong‑replica failures.

Low‑Cost Replica Management

Replica types are divided into Normal (full data and state machine) and Log (stores only the consensus log). Log replicas are cheaper in storage and CPU, making them ideal for disaster‑recovery nodes.

Read‑Only Node Management

Read‑only nodes belong to a Learner Source Group. If their upstream consistency node fails, the group reassigns the read‑only node to another healthy node, ensuring continuous data replication.

High‑Performance Consensus Log

The consensus log offers basic operations (Append, Get, Truncate, Purge) controlled by X‑Paxos, with indexing, caching and pre‑read mechanisms that dramatically improve log handling efficiency.

Asynchronous Transaction Commit

Transaction processing is split into two stages: a waiting‑sync queue (awaiting Paxos majority) and a waiting‑commit queue (ready to be committed). Worker threads that would otherwise block on log synchronization are freed to handle new client requests, yielding a 10 % throughput gain in same‑city deployments and several‑fold improvements across regions.

Hotspot Update Optimization

A new hotspot row lock allows concurrent updates on the same row. X‑Cluster batches hotspot updates, marks them with special log flags, and merges their logs, achieving up to 200× performance improvement for hotspot workloads.

Integrated Client‑Server Ecosystem

The X‑Driver client subscribes to server changes, automatically discovers leaders, and maintains up‑to‑date instance lists without external metadata services.

Backup and Data Subscription

Leveraging X‑Paxos’s globally unique log positions, backup and subscription services can reliably consume logs, providing real‑time data replication and robust failover handling.

Deployment Scenarios

Two classic deployment patterns are highlighted:

Same‑city three‑node deployment (2 data nodes + 1 log node) for zero data loss and data‑center‑level disaster recovery.

Cross‑region five‑node deployment (4 data nodes + 1 log node) for city‑level disaster recovery.

Performance Evaluation

Benchmarks were conducted using Sysbench (insert/OLTP) on three‑node clusters in both same‑city and cross‑region networks. Compared with MySQL 5.7.19 and Group Replication, X‑Cluster showed:

More than double the throughput and ~55 % lower latency for insert workloads in same‑city tests.

~5 % higher throughput and ~70 % lower latency for OLTP workloads.

In cross‑region tests, X‑Cluster outperformed Group Replication by up to 5× in throughput and achieved roughly one‑quarter of its latency.

Correctness Assurance

Extensive gray‑box testing, fault injection (network partitions, I/O errors, node crashes) and Jepsen tests are used to validate linearizability, isolation and consistency. Continuous data and log verification, along with automated benchmark stress, ensure robustness before production release.

Comparison with Similar Solutions

Galera

Galera uses a Totem multicast protocol (P2P) for multi‑master writes, which suffers from increased latency as node count grows and can become unavailable during failover. It also relies on a dedicated gcache for incremental state transfer, incurring extra storage and compute overhead.

Group Replication

MySQL Group Replication, based on XCOM and GTID, supports multi‑master writes but is limited to nine nodes, uses binlog replication, and still exhibits performance and stability issues in cross‑region deployments.

Conclusion

X‑Cluster delivers a strong‑consistency, high‑performance distributed database solution tailored for workloads with stringent data quality requirements. Its autonomous consensus layer, flexible deployment options, and extensive testing make it a compelling choice for large‑scale, globally distributed applications, with future plans to add multi‑shard Paxos, strong‑read capabilities and further optimizations.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

High Availability MySQL distributed databases consensus protocol Cross-Region Deployment

Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.