Operations 15 min read

How Ant Financial Scales to 86,000 TPS: Cloud‑Native Operations Lessons

This article details Ant Financial's evolution from supporting 20,000 transactions per minute in 2010 to 86,000 transactions per second in 2015, describing its multi‑active architecture, financial‑grade operation platform, and organizational mechanisms that enable high‑availability, automated capacity management and fault handling in a cloud‑native environment.

Efficient Ops
Efficient Ops
Efficient Ops
How Ant Financial Scales to 86,000 TPS: Cloud‑Native Operations Lessons

Overall Operation System Composition

Ant Financial’s operation system is built on three core components: the operation architecture, the operation platform, and the organization mechanism.

Multi‑Active Architecture

The architecture moves beyond the traditional “two‑location three‑center” model to a self‑developed “multi‑active across locations” design based on LDC (Local Data Center) units. Each LDC is a closed, independently‑managed unit with isolated real‑time data and unified, mostly asynchronous communication.

Key benefits include horizontal scalability without reliance on a single IDC, N+1 disaster‑recovery capability, elimination of single points of failure, and fine‑grained traffic control that enables automated load testing, gray‑release, and rapid failover.

Implemented in 2013 for same‑city LDCs and upgraded in 2015 to a fully distributed multi‑active architecture, the design supports real business traffic and ensures continuous availability.

Financial‑Grade Business Continuity & Automation

During the 2015 Double‑11 event, Ant Financial processed 86,900 transactions per second, involving hundreds of payment scenarios and tens of thousands of security rules. To manage such scale, the operation platform provides three pillars:

Efficiency : platform‑based monitoring, change control, resource scheduling, and service registration.

Safety : automated business validation, big‑data rule engines, data reconciliation, dependency control, and capacity monitoring.

Intelligence : big‑data analytics for automatic fault analysis, capacity prediction, and intelligent decision‑making.

Two typical scenarios illustrate these capabilities:

Automated Capacity Management : full‑link stress testing with shadow LDCs, real‑time metric collection, capacity‑model analysis, and one‑click elastic scaling via the PaaS platform.

Automated Fault Handling : rapid detection through monitoring, root‑cause analysis via big‑data computation, automated notification, dependency graph generation, and either manual or fully automated remediation.

Organization Mechanism

Ant Financial establishes three layers of IT risk management and a multi‑level organizational framework that ensures architecture, policies, and platform practices are consistently applied on the front lines, while feedback from operations drives continuous improvement.

Future Cloud Era

Building on years of operational experience, Ant Financial created the “Ant Financial Cloud” platform, offering standardized development and operation models that abstract infrastructure complexity, accelerate innovation, and enable rapid support for new financial services. The platform has already powered Ant’s wealth‑management, insurance, Sesame Credit, and NetBank businesses and aims to help 1,000 small‑to‑mid‑size financial institutions transition to new‑finance within five years.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Operationsfinancial technology
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.