How Alipay Handles 540k TPS on Double 11: Inside the LDC Architecture

This article explains how Ant Financial’s Alipay scales to hundreds of thousands of transactions per second during Double 11 by using logical data centers (LDC), unitized architecture, CAP theorem analysis, OceanBase database, and multi‑region disaster‑recovery strategies.

Java High-Performance Architecture
Java High-Performance Architecture
Java High-Performance Architecture
How Alipay Handles 540k TPS on Double 11: Inside the LDC Architecture

Since 2008, Ant Financial has continuously pushed the limits of technology to handle Double 11 traffic, scaling payment TPS from 20 k per minute in 2010 to 544 k per second in 2019.

LDC and Unitization

LDC (Logical Data Center) is a logical view of a data center that unifies distributed physical resources, enabling coordinated availability and partition tolerance.

Unitization means splitting a large internet system into independent units (RZone, GZone, CZone). Each unit serves a distinct user segment, allowing the overall TPS to grow linearly by adding more units.

For example, if each unit can handle 100 k TPS, N units can achieve N × 100 k TPS.

System Architecture Evolution

Early monolithic deployments suffered from single‑point failures. The first distributed step added horizontal scaling of application servers, but database bottlenecks remained.

Introducing master‑slave clusters alleviated read pressure but left write bottlenecks, leading to the adoption of sharding (horizontal and vertical) and eventually unitized deployments.

RZ0* -> a
RZ1* -> b
RZ2* -> c
RZ3* -> d

Traffic Routing and Disaster Recovery

Traffic is first routed by region using a custom GLSB (Global Server Load Balancing) that maps client IP to the nearest IDC. The request then reaches the appropriate Spanner gateway, which forwards it to the correct RZone based on user‑ID mapping.

In case of a failure, the system reassigns data‑partition ownership to healthy units and updates the user‑ID to RZone mapping, ensuring continuous service.

if (!partitionPossible || partitionDoesNotAffectAvailabilityOrConsistency) {
    if (availabilityPartitionTolerant) return "AP";
    else if (consistencyPartitionTolerant) return "CP";
} else {
    // partition exists but not handled
    if (hasAvailability && hasConsistency) return "AC";
}

CAP Analysis of LDC

CAP states a distributed system can satisfy at most two of Consistency, Availability, and Partition tolerance.

LDC achieves high availability and partition tolerance (AP) while providing eventual consistency through the Paxos consensus algorithm used in OceanBase.

OceanBase and Paxos

OceanBase uses Paxos to achieve consensus among a quorum of (N/2)+1 nodes, allowing the system to remain available during partitions and to converge to a single consistent state after partitions heal.

Writes are accepted only when a majority of nodes can confirm the operation, preventing split‑brain scenarios.

Key Takeaways

Unitized design (RZone/GZone/CZone) enables linear TPS scaling.

Paxos in OceanBase provides partition tolerance and eventual consistency.

Multi‑region traffic routing and configurable disaster‑recovery plans ensure high availability.

Overall, Alipay’s LDC architecture combines logical data centers, unitization, Paxos‑based consensus, and sophisticated traffic routing to sustain massive payment volumes during peak events.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

CAP theoremdisaster recoveryunitizationOceanBaseHigh TPSLDC
Java High-Performance Architecture
Written by

Java High-Performance Architecture

Sharing Java development articles and resources, including SSM architecture and the Spring ecosystem (Spring Boot, Spring Cloud, MyBatis, Dubbo, Docker), Zookeeper, Redis, architecture design, microservices, message queues, Git, etc.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.