Databases 41 min read

How Ant Financial Scales to 540k TPS: Inside LDC Architecture, Unitization, and CAP Analysis

This article explains how Ant Financial’s payment system grew from 20,000 transactions per minute in 2010 to 540,000 TPS in 2019 by adopting logical data centers (LDC), unitized architecture (RZone, GZone, CZone), OceanBase’s Paxos‑based consensus, and sophisticated traffic steering and disaster‑recovery strategies.

ITFLY8 Architecture Home
ITFLY8 Architecture Home
ITFLY8 Architecture Home
How Ant Financial Scales to 540k TPS: Inside LDC Architecture, Unitization, and CAP Analysis

Rapid Growth of Payment TPS Since 2008

Since the first Double 11 in 2008, Ant Financial has continuously pushed the limits of its technology. In 2010 the payment peak was 20,000 transactions per minute, and by Double 11 2017 it reached 256,000 transactions per second. In 2018 the peak was 480,000 TPS, and in 2019 it hit 544,000 TPS, a 1,360‑fold increase over 2009.

What Lies Behind the Massive Payment TPS?

The key to handling such traffic is not just application‑level optimizations like peak shaving, but the fundamental unit‑based design called LDC (Logical Data Center).

What is the most crucial design behind Alipay’s massive payments?

What is LDC and how does it achieve multi‑active and disaster‑recovery across regions?

What is the CAP theorem and how does it apply?

What is a split‑brain and its relation to CAP?

What is Paxos and what problem does it solve?

How does Paxos relate to CAP?

Can OceanBase escape the CAP limitation?

LDC and Unitization

LDC (Logical Data Center) is a concept opposite to traditional IDC (Internet Data Center). It means that regardless of the physical distribution, the whole data center operates as a coordinated, unified logical entity.

This design addresses the core challenge of distributed systems: achieving overall coordination (availability, partition tolerance) and uniformity (consistency).

Unitization is the inevitable trend for large‑scale internet systems. A simple analogy: each large internet company can be seen as an independent unit serving its own users without interfering with others.

Because a single company's TPS rarely exceeds a few hundred thousand, scaling beyond that requires splitting the system into multiple units.

In practice, the bottleneck is the database storage layer; even with horizontal scaling of servers, the database I/O limits the overall TPS.

Key Insight:

The sum of TPS from many independent companies can be huge, but each individual company struggles to increase its own TPS due to database bottlenecks.

Therefore, each unit (or "RZone") handles a specific user segment, allowing the overall system to scale linearly by adding more units.

System Architecture Evolution

Early on, all functions were packed into a single application, leading to a monolithic architecture (see Figure 1).

Monolithic architecture diagram
Monolithic architecture diagram

Horizontal scaling of application servers (Figure 2) alleviated CPU pressure but exposed the database as a single point of failure.

Horizontal scaling with multiple app servers
Horizontal scaling with multiple app servers

Introducing master‑slave replication reduced read pressure but left the write bottleneck unsolved (Figure 3).

Master‑slave replication
Master‑slave replication

Sharding (both horizontal table sharding and vertical database sharding) further increased capacity, but the number of database connections grew dramatically, leading to a Cartesian product of connections.

To solve this, the routing logic was moved from the database layer to the gateway layer, ensuring that a request from a user is always directed to the unit responsible for that user’s data partition.

Unitization in Practice

In Ant’s architecture, units are divided into three types (CRG architecture):

RZone (Region Zone) : The smallest unit that can be independently deployed and handles a specific user segment via sharding.

GZone (Global Zone) : Holds globally shared data (e.g., system configuration). Only one active instance exists at a time; others serve as disaster‑recovery.

CZone (City Zone) : Deployed per city, stores data that exhibits a "write‑read time gap" (e.g., user profiles). It allows fast local reads while writes go through GZone.

The "write‑read time gap" means that after data is written, it is typically not read for a significant period (minutes to hours), allowing it to be cached locally in CZone.

Data can be classified as:

User‑streaming data : Orders, comments, behavior logs – naturally isolated per user.

User‑shared data : Account information, personal blogs – accessed by multiple users.

Non‑user‑specific data : Products, system configs, financial statistics – shared across all users.

RZone handles the first category, GZone the second, and CZone the third when the write‑read gap is large enough.

Traffic Steering and Disaster Recovery

Traffic steering ("flow diversion") is the foundation of disaster‑recovery. When a unit fails, its traffic is rerouted to a healthy unit ("cut‑flow"). Ant’s LDC defines three levels of disaster‑recovery:

Intra‑machine‑room unit disaster‑recovery.

Intra‑city disaster‑recovery.

Inter‑city disaster‑recovery.

For intra‑machine‑room failures, each RZone has two instances (A and B) that can take over each other’s load.

For intra‑city failures, traffic is manually re‑mapped: the data partitions originally served by the failed RZone are reassigned to RZones in another IDC. Example mapping changes are shown in the code blocks below.

RZ0* --> a
RZ1* --> b
RZ2* --> c
RZ3* --> d

After re‑mapping, the configuration becomes:

RZ0* --> /
RZ1* --> /
RZ2* --> a
RZ2* --> c
RZ3* --> b
RZ3* --> d

User‑ID to RZone mapping is also updated accordingly (see the second code block).

[00-24] --> RZ2A(50%),RZ2B(50%)
[25-49] --> RZ3A(50%),RZ3B(50%)
[50-74] --> RZ2A(50%),RZ2B(50%)
[75-99] --> RZ3A(50%),RZ3B(50%)

These steps ensure that traffic is redirected to healthy units before the database mappings are changed, avoiding request failures.

CAP Theorem Review

CAP states that a distributed system can satisfy at most two of Consistency, Availability, and Partition tolerance.

Consistency

All nodes must see the same data at the same time. Achieving this often requires atomic transactions.

Availability

The system must always be able to serve reads and writes.

Partition Tolerance

The system continues to operate despite network partitions.

In practice, many systems choose AP (high availability and partition tolerance) and settle for eventual consistency (BASE).

CAP Analysis of Various Architectures

Horizontal scaling with a single database instance provides AP but lacks consistency.

Single‑instance database with transactions satisfies Consistency and Availability (CP) but not Partition tolerance.

Horizontal scaling + master‑slave replication offers Availability and (partial) Consistency, but Partition tolerance is not fully addressed because split‑brain can occur.

OceanBase (OB) with Paxos achieves Partition tolerance by requiring only a quorum (N/2+1) for writes, providing Availability in the presence of partitions, and guaranteeing eventual consistency through consensus.

OceanBase CAP Summary

Partition tolerance: Yes – nodes communicate and can tolerate partitions.

Availability under partition: Yes – writes succeed as long as a quorum is reachable.

Consistency under partition: Not immediate, but Paxos ensures a single committed value and eventual consistency.

Thus OB is effectively AP with eventual consistency.

Conclusion

The key takeaways for achieving massive payment TPS are:

RZone design based on user sharding, giving each user segment an exclusive unit.

OB’s Paxos‑based consensus prevents split‑brain during network partitions or disaster‑recovery.

CZone enables fast local reads for data with a write‑read time gap.

GZone handles globally shared data that cannot be localized.

Combined, these mechanisms allow Ant’s LDC CRG architecture to sustain tens of millions of TPS, far beyond the current 540,000 TPS record.

Beyond architecture, operational techniques such as traffic pre‑heating, peak shaving, and coordinated efforts across many teams also contribute to Double 11 success.

References

Practice of Cloud System Administration, Volume 2. Thomas A. Limoncelli, Strata R. Chalup, Christina J. Hogan.

MySQL 5.7 Semi‑Synchronous Replication Technology – https://www.cnblogs.com/zero-gg/p/9057092.html

BASE Theory Analysis – https://www.jianshu.com/p/f6157118e54b

Keepalived – https://baike.baidu.com/item/Keepalived/10346758?fr=aladdin

Paxos – https://en.wikipedia.org/wiki/Paxos_(computer_science)

OceanBase Technical Principles Behind 213.5 Billion Transactions – https://www.cnblogs.com/antfin/articles/10299396.html

Backup – https://en.wikipedia.org/wiki/Backup

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Distributed SystemsCAP theoremhigh availabilitydatabase shardingOceanBase
ITFLY8 Architecture Home
Written by

ITFLY8 Architecture Home

ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.