How Ant Financial Scales Payments with Distributed Architecture and OceanBase

The article summarizes Xu Wenqi's 2019 Alibaba Cloud Summit talk on Ant Financial's distributed architecture, covering the shift from monolithic to microservices, modular development, load‑balancing, database sharding, the distributed TA system, task scheduling, gray‑release, full‑link stress testing, and OceanBase high‑availability solutions.

21CTO
21CTO
21CTO
How Ant Financial Scales Payments with Distributed Architecture and OceanBase

This article compiles the 2019 Alibaba Cloud Summit talk by senior technical expert Xu Wenqi on the practice of distributed architecture at Ant Financial.

1. Advantages and Concepts of Distributed Architecture

Traditional Monolithic Architecture

Typical startup projects begin with a monolithic architecture.

Advantages : Fast development, testing, and deployment; a single WAR package can be released directly.

Disadvantages : Slow compilation and startup, code conflicts, painful merges, and unreliable release success.

2. Microservices vs. Monolith

When complexity grows, monolithic productivity drops sharply, making service‑oriented decomposition worthwhile.

Microservices thrive on unpredictable business changes, enabling continuous self‑evolution and rapid adaptation.

3. Modular Development

Microservices start with top‑level business design, splitting modules by business lines and separating presentation, logic, and data layers from the monolith.

Key concerns include business continuity and data integrity during the split.

4. Load‑Balancing Advantages of Microservices

Traditional load balancers (LVS, F5) provide rate limiting, load distribution, and security.

In microservices, the gateway acts as the entry layer, offering lightweight load balancing, protocol conversion, and authentication, while service‑governance frameworks (e.g., Dubbo) handle registration, discovery, and isolation.

5. Database Vertical Sharding

Vertical sharding separates databases by user, transaction, or accounting domains, relieving storage and access pressure; read‑write separation with master‑slave setups is also possible.

6. Database Horizontal Sharding

Horizontal sharding splits large tables/databases by criteria such as transaction time, creating multiple tables or databases.

Complex cross‑table queries can be handled with Elasticsearch, ID‑based routing, or distributed massive‑database solutions like OceanBase.

7. Distributed TA System Example

Traditional TA System : Serial clearing with low efficiency, unable to scale linearly, relies on large transactions that roll back entirely on failure.

Distributed TA System Architecture includes access layer, business service layer, SOFAStack layer, LAAS, operation toolchain, and governance control.

Key components:

Access layer: protocol conversion, access control, file transfer, operations console.

Business service layer: core services such as account, transaction, billing, clearing.

SOFAStack: Ant Financial's open‑source microservice framework, distributed transaction, scheduling, messaging, data proxy, tracing, etc.

The system tackles challenges of efficient distributed clearing and correctness under failure conditions.

8. Distributed Task Scheduling Platform

Features:

Custom sharding to fully utilize cluster resources.

Pause/resume/cancel tasks during execution.

Retry mechanism for failed tasks to ensure overall success.

9. Clearing Task Scheduling

Architecture consists of task splitting (file generation and logical sharding), task execution (storing processed data into a ledger database), and core services (transaction, clearing, accounting, account).

10. Clearing Fault‑Tolerance and Reconciliation

Processes include daily initialization, file import, clearing, profit calculation, share adjustment, export, secondary clearing, and profit export, each supporting rollback and precise per‑record verification, with the ability to roll back by file, user, or backup point.

11. Reliability and Stability Mechanisms

Gray Release

Steps: beta release, group release, gray traffic, full release. Gray release for clearing can target specific user segments, shortening rollout time.

Online Full‑Link Stress Testing

Uses data access proxies to direct test data to shadow tables, ensuring production data remains unaffected. Benefits: production‑like environment for reliable results and table‑level isolation of test data.

OceanBase High‑Availability Mechanism

Based on Paxos three‑replica deployment, providing strong consistency, continuous availability, automatic master‑slave failover, and fault tolerance across machines, data centers, and cities without service interruption or data loss.

OceanBase’s distributed database outperforms traditional master‑slave solutions by allowing writes to succeed as long as a majority of nodes are healthy, supporting multi‑active deployments across regions and gray upgrades.

OceanBase Deployment Options

Same‑city three‑data‑center deployment (latency 0.5‑2 ms).

Two‑city three‑center deployment (similar latency, with one city’s node failure adding cross‑region sync delay).

Same‑City Dual‑Active Disaster‑Recovery Architecture

Primary data center handles most traffic; secondary handles a small portion. Features: same‑site preference, zero application intrusion, single‑site‑like development/deployment, automatic failover.

Overall, the distributed architecture and its supporting technologies enable Ant Financial to achieve high scalability, reliability, and rapid business evolution.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Distributed SystemsBackend ArchitectureMicroservicesload balancingdatabase shardingOceanBase
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.