How Ant Financial Scales Payments with Distributed Architecture and OceanBase
The article summarizes Xu Wenqi's 2019 Alibaba Cloud Summit talk on Ant Financial's distributed architecture, covering the shift from monolithic to microservices, modular development, load‑balancing, database sharding, the distributed TA system, task scheduling, gray‑release, full‑link stress testing, and OceanBase high‑availability solutions.
This article compiles the 2019 Alibaba Cloud Summit talk by senior technical expert Xu Wenqi on the practice of distributed architecture at Ant Financial.
1. Advantages and Concepts of Distributed Architecture
Traditional Monolithic Architecture
Typical startup projects begin with a monolithic architecture.
Advantages : Fast development, testing, and deployment; a single WAR package can be released directly.
Disadvantages : Slow compilation and startup, code conflicts, painful merges, and unreliable release success.
2. Microservices vs. Monolith
When complexity grows, monolithic productivity drops sharply, making service‑oriented decomposition worthwhile.
Microservices thrive on unpredictable business changes, enabling continuous self‑evolution and rapid adaptation.
3. Modular Development
Microservices start with top‑level business design, splitting modules by business lines and separating presentation, logic, and data layers from the monolith.
Key concerns include business continuity and data integrity during the split.
4. Load‑Balancing Advantages of Microservices
Traditional load balancers (LVS, F5) provide rate limiting, load distribution, and security.
In microservices, the gateway acts as the entry layer, offering lightweight load balancing, protocol conversion, and authentication, while service‑governance frameworks (e.g., Dubbo) handle registration, discovery, and isolation.
5. Database Vertical Sharding
Vertical sharding separates databases by user, transaction, or accounting domains, relieving storage and access pressure; read‑write separation with master‑slave setups is also possible.
6. Database Horizontal Sharding
Horizontal sharding splits large tables/databases by criteria such as transaction time, creating multiple tables or databases.
Complex cross‑table queries can be handled with Elasticsearch, ID‑based routing, or distributed massive‑database solutions like OceanBase.
7. Distributed TA System Example
Traditional TA System : Serial clearing with low efficiency, unable to scale linearly, relies on large transactions that roll back entirely on failure.
Distributed TA System Architecture includes access layer, business service layer, SOFAStack layer, LAAS, operation toolchain, and governance control.
Key components:
Access layer: protocol conversion, access control, file transfer, operations console.
Business service layer: core services such as account, transaction, billing, clearing.
SOFAStack: Ant Financial's open‑source microservice framework, distributed transaction, scheduling, messaging, data proxy, tracing, etc.
The system tackles challenges of efficient distributed clearing and correctness under failure conditions.
8. Distributed Task Scheduling Platform
Features:
Custom sharding to fully utilize cluster resources.
Pause/resume/cancel tasks during execution.
Retry mechanism for failed tasks to ensure overall success.
9. Clearing Task Scheduling
Architecture consists of task splitting (file generation and logical sharding), task execution (storing processed data into a ledger database), and core services (transaction, clearing, accounting, account).
10. Clearing Fault‑Tolerance and Reconciliation
Processes include daily initialization, file import, clearing, profit calculation, share adjustment, export, secondary clearing, and profit export, each supporting rollback and precise per‑record verification, with the ability to roll back by file, user, or backup point.
11. Reliability and Stability Mechanisms
Gray Release
Steps: beta release, group release, gray traffic, full release. Gray release for clearing can target specific user segments, shortening rollout time.
Online Full‑Link Stress Testing
Uses data access proxies to direct test data to shadow tables, ensuring production data remains unaffected. Benefits: production‑like environment for reliable results and table‑level isolation of test data.
OceanBase High‑Availability Mechanism
Based on Paxos three‑replica deployment, providing strong consistency, continuous availability, automatic master‑slave failover, and fault tolerance across machines, data centers, and cities without service interruption or data loss.
OceanBase’s distributed database outperforms traditional master‑slave solutions by allowing writes to succeed as long as a majority of nodes are healthy, supporting multi‑active deployments across regions and gray upgrades.
OceanBase Deployment Options
Same‑city three‑data‑center deployment (latency 0.5‑2 ms).
Two‑city three‑center deployment (similar latency, with one city’s node failure adding cross‑region sync delay).
Same‑City Dual‑Active Disaster‑Recovery Architecture
Primary data center handles most traffic; secondary handles a small portion. Features: same‑site preference, zero application intrusion, single‑site‑like development/deployment, automatic failover.
Overall, the distributed architecture and its supporting technologies enable Ant Financial to achieve high scalability, reliability, and rapid business evolution.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
