Understanding Ant Financial’s LDC Architecture: Unitization, CAP Analysis, and OceanBase Design
The article explains how Ant Financial scales Double‑11 payment traffic to hundreds of thousands of TPS by employing logical data centers (LDC), unit‑based system design (RZone, GZone, CZone), database sharding, CAP theorem analysis, Paxos‑based consensus, and the OceanBase distributed database, while also detailing disaster‑recovery and traffic‑shifting mechanisms.
Since the first Double‑11 event in 2008, Ant Financial has continuously pushed the limits of its payment system, increasing peak TPS from 2 × 10⁴ /minute in 2010 to 5.44 × 10⁵ /second in 2019, a 1 360‑fold growth.
The core of this scalability is the Logical Data Center (LDC), a logical abstraction that treats geographically distributed resources as a unified data center, enabling massive horizontal scaling through unit‑based architecture.
Unitization means splitting the whole system into independent units (RZone, GZone, CZone). RZone handles user‑specific data via sharding, GZone stores globally shared data, and CZone optimizes read‑heavy data with a write‑read delay of about 100 ms.
System evolution progressed from a monolithic single‑application model, to a distributed multi‑instance setup, then to master‑slave database clusters, and finally to full sharding with database‑level partitioning (horizontal and vertical).
Database bottlenecks led to the adoption of sharding (分库分表) and routing logic that moves user‑to‑database mapping from the application layer to the gateway layer, reducing connection explosion and enabling each unit to access only its own shard.
Traffic‑shifting (流量挑拨) is achieved via custom reverse‑proxy (Spanner) and global load balancer (GLSB) that route requests to the appropriate IDC and unit based on user ID ranges, with configuration examples such as:
RZ0* → a
RZ1* → b
RZ2* → c
RZ3* → dDisaster‑recovery is organized into three levels: intra‑machine‑room, intra‑city, and inter‑city, each using active‑active units and pre‑planned traffic‑cutover scripts to reassign data‑partition ownership.
CAP analysis shows that early architectures were CP (single‑master databases) or AP (horizontal scaling without strong consistency). Ant’s LDC aims to achieve AP with eventual consistency, using Paxos consensus to avoid split‑brain scenarios.
OceanBase, Ant’s home‑grown distributed database, implements Paxos, requiring only (N/2)+1 nodes for a successful write, thus providing partition tolerance, high availability, and eventual consistency (AP+ C). Example of write conflict handling under partition is described.
In summary, the high TPS of Double‑11 is enabled by:
RZone‑based sharding that isolates user groups.
OceanBase’s Paxos‑based consensus to prevent brain split.
CZone’s local‑read optimization for data with write‑read delay.
Robust disaster‑recovery and traffic‑shifting mechanisms.
These techniques, combined with operational practices such as peak‑shaving and pre‑warming, allow Ant Financial to sustain and further increase massive payment loads.
Laravel Tech Community
Specializing in Laravel development, we continuously publish fresh content and grow alongside the elegant, stable Laravel framework.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.