Databases 35 min read

Designing High‑Throughput Payment Systems: Ant Group’s LDC Architecture, CRG Zones, and CAP Analysis

The article explains how Ant Group’s Alipay handles massive double‑11 payment traffic by using logical data centers (LDC), unit‑based architecture with RZone, GZone and CZone, traffic routing, disaster‑recovery strategies, and a CAP‑aware design built on the OceanBase distributed database.

Architecture Digest
Architecture Digest
Architecture Digest
Designing High‑Throughput Payment Systems: Ant Group’s LDC Architecture, CRG Zones, and CAP Analysis

Since the first Double‑11 in 2008, Ant Group’s payment peak has grown from 20,000 transactions per minute to over 540,000 TPS, forcing the system to break existing technical limits.

The core solution is the Logical Data Center (LDC), a unit‑based design where each unit (RZone) owns a specific user shard, enabling horizontal scaling by adding more units.

Three zone types are defined: RZone (regional units handling user‑specific data), GZone (global single‑instance services), and CZone (city‑level units for data with a write‑read delay, allowing local reads).

Traffic is first routed by GLSB to the appropriate IDC, then the Spanner gateway forwards requests to the correct RZone; if a request involves another user, additional routing may cross IDC boundaries.

Disaster recovery is organized in three layers—same‑room, same‑city, and cross‑city—by reassigning data partitions and user‑to‑RZone mappings, ensuring continuity during failures.

The article reviews the CAP theorem, clarifies consistency, availability, and partition tolerance, and provides a practical method to classify distributed systems.

Applying CAP to typical architectures shows that simple horizontal scaling with a single database yields AP characteristics, while master‑slave setups aim for AC but lack partition tolerance.

Ant’s LDC relies on the OceanBase distributed database, which uses Paxos consensus to achieve partition tolerance (P) and high availability (A) while providing eventual consistency (C) after partitions.

OceanBase requires only a quorum (N/2+1) for writes, allowing the system to remain available during network splits and to resolve write conflicts via consensus.

In conclusion, Ant’s high‑TPS capability stems from user‑sharded RZones, Paxos‑backed OceanBase for consistency, and CZone local reads, all orchestrated within the CRG architecture.

distributed systemsCAP theoremDatabase ScalingOceanBasehigh TPSLDC
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.