How Alipay Handles 540k TPS on Double 11: Inside the LDC Architecture
This article explains how Ant Financial’s Alipay scales to hundreds of thousands of transactions per second during Double 11 by using logical data centers (LDC), unitized architecture, CAP theorem analysis, OceanBase database, and multi‑region disaster‑recovery strategies.
Since 2008, Ant Financial has continuously pushed the limits of technology to handle Double 11 traffic, scaling payment TPS from 20 k per minute in 2010 to 544 k per second in 2019.
LDC and Unitization
LDC (Logical Data Center) is a logical view of a data center that unifies distributed physical resources, enabling coordinated availability and partition tolerance.
Unitization means splitting a large internet system into independent units (RZone, GZone, CZone). Each unit serves a distinct user segment, allowing the overall TPS to grow linearly by adding more units.
For example, if each unit can handle 100 k TPS, N units can achieve N × 100 k TPS.
System Architecture Evolution
Early monolithic deployments suffered from single‑point failures. The first distributed step added horizontal scaling of application servers, but database bottlenecks remained.
Introducing master‑slave clusters alleviated read pressure but left write bottlenecks, leading to the adoption of sharding (horizontal and vertical) and eventually unitized deployments.
RZ0* -> a
RZ1* -> b
RZ2* -> c
RZ3* -> dTraffic Routing and Disaster Recovery
Traffic is first routed by region using a custom GLSB (Global Server Load Balancing) that maps client IP to the nearest IDC. The request then reaches the appropriate Spanner gateway, which forwards it to the correct RZone based on user‑ID mapping.
In case of a failure, the system reassigns data‑partition ownership to healthy units and updates the user‑ID to RZone mapping, ensuring continuous service.
if (!partitionPossible || partitionDoesNotAffectAvailabilityOrConsistency) {
if (availabilityPartitionTolerant) return "AP";
else if (consistencyPartitionTolerant) return "CP";
} else {
// partition exists but not handled
if (hasAvailability && hasConsistency) return "AC";
}CAP Analysis of LDC
CAP states a distributed system can satisfy at most two of Consistency, Availability, and Partition tolerance.
LDC achieves high availability and partition tolerance (AP) while providing eventual consistency through the Paxos consensus algorithm used in OceanBase.
OceanBase and Paxos
OceanBase uses Paxos to achieve consensus among a quorum of (N/2)+1 nodes, allowing the system to remain available during partitions and to converge to a single consistent state after partitions heal.
Writes are accepted only when a majority of nodes can confirm the operation, preventing split‑brain scenarios.
Key Takeaways
Unitized design (RZone/GZone/CZone) enables linear TPS scaling.
Paxos in OceanBase provides partition tolerance and eventual consistency.
Multi‑region traffic routing and configurable disaster‑recovery plans ensure high availability.
Overall, Alipay’s LDC architecture combines logical data centers, unitization, Paxos‑based consensus, and sophisticated traffic routing to sustain massive payment volumes during peak events.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java High-Performance Architecture
Sharing Java development articles and resources, including SSM architecture and the Spring ecosystem (Spring Boot, Spring Cloud, MyBatis, Dubbo, Docker), Zookeeper, Redis, architecture design, microservices, message queues, Git, etc.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
