How Alipay Guarantees Lightning‑Fast Mobile Payments: Inside Its Wireless Network Architecture
In this talk, Alipay engineers reveal how their mobile app achieves ultra‑reliable, low‑latency wireless performance through a layered network architecture, custom protocols like MMTP and MTLS, traffic‑aware load balancing, and data‑driven operations, illustrating strategies that can reduce transaction failures by a thousandth and scale to billions of daily payments.
Abstract: At the Ant Financial and Alibaba Cloud joint online fintech summit, guest Xinwu shared Alipay APP’s recent practices in wireless networking. This article, based on Xinwu’s talk, discusses how a large‑scale app’s wireless performance can determine its market position and how Alipay ensures it.
1. Background Overview
A typical case: after a meal, a user scans Alipay to pay, but network instability causes multiple failures, forcing cash payment. Reducing the wireless failure rate by one‑thousandth could add 100,000 successful daily transactions, saving 30‑40 million per year.
Team Analogy: An app is like a tree; the roots are server‑side services, the branches are app functions, and the trunk—wireless networking—is the end‑to‑end connection that guarantees user experience.
Different apps have varying wireless requirements, so recommendations are based on user volume.
Wireless challenges appear throughout the end‑to‑end communication chain—from terminal request, carrier access network, core network, to the data center—introducing latency, low bandwidth, high error and packet loss rates, instability, and even hijacking or tampering.
Business and performance challenges can be grouped into three parts:
Business differentiation demands. Network reach and interaction forms vary, with images and video becoming mainstream.
Complex mobile network environments. Wireless access is non‑real‑time, highly dynamic, with diverse carrier and 2G/3G/4G networks, and users connect anytime, anywhere.
Performance requirements. Handling massive traffic and overall stability.
Core goal: stability, reliability, and speed, likened to a railway transport system.
In railway transport, tracks represent network architecture, trains represent protocols, schedules represent network policies, and dispatching represents network operation control.
2. Network Design Fundamentals
This section discusses Alipay’s network architecture and protocol choices.
Alipay’s network architecture:
Requests pass through the core and backbone to an LVS load balancer, then to an access gateway for protocol handling and encryption, onward to a service gateway, and finally to business systems and the DB.
Two types of service gateways: API gateway (request‑response) and push service gateway (incremental updates).
Key questions: how to handle terminal network anomalies and how to cope with massive traffic spikes.
Terminal control involves a terminal management module and a mobile control center, providing minute‑level disaster‑recovery scheduling.
Features include HTTPDNS, global scheduling, fine‑grained policy control, dedicated channels with security checks, and rapid decision making via push‑pull mechanisms.
To handle traffic surges (e.g., Double 11), three architectural approaches are used:
Network overload protection. Limit connections, new connections, and packet volume to preserve user experience under overload.
Multi‑level gateways. A funnel from access gateway to business gateway to service gateway, filtering traffic from billions to millions to hundreds of thousands.
Lossy services. Prioritize critical resources for critical business, assign service priorities, and consider terminal experience in multi‑level degradation.
Choosing the right protocol is essential. Alipay’s protocol stack:
Transport layer uses TCP with a customized SSL/MTLS for mobile networks.
Presentation layer originally used Google’s SPDY, but Alipay created a custom MMTP protocol for finer control.
Application layer includes HTTP and mobile RPC protocols.
Alipay migrated from HTTP/SPDY to MMTP because its business scenarios require precise network control that HTTP2.0 and SPDY cannot provide.
MMTP (Mobile Multi‑Transport Protocol): A TCP‑based custom application protocol designed for unstable wireless networks, enhancing reliability.
Another highlight is the MTLS protocol, an improvement over traditional SSL/TLS for wireless environments.
Traditional SSL/TLS suffers from high handshake latency and poor performance on mobile networks; MTLS addresses these drawbacks.
3. Network Optimization Practices
Optimization is divided into network subtraction, addition, code tuning, business governance, and power/traffic control.
3.1 Network Subtraction
3.2 Network Addition – Repeating Essential Work
Key questions for handling the network:
When to establish a connection?
Which method to use?
How to keep the connection persistent?
How to detect link issues quickly, and how to handle restricted networks?
Address these via connection timing, strategy, persistence, timeout control, fake connections, and special network handling.
Code Tuning
Continuous code optimization often yields greater performance gains than multiple strategic tweaks.
3.3 Business Governance
Terminal constraints must consider power and traffic consumption when applying any optimization strategy.
Tools and platforms for network optimization are also discussed.
Beyond architecture and strategy, a robust assurance mechanism is needed for orderly network operation.
4. Network Data Operations
Data‑driven operations combine network data collection, storage, analysis, and visualization to guide optimization.
4.1 Global View of Network Data Operations
First, network dataization identifies all data sources; comprehensive data collection, storage, analysis, and interpretation enable targeted troubleshooting.
How to evaluate network technology?
4.2 Core Metrics for Network Performance Evaluation
4.3 Case Studies
Case 1
Case 2
5. Conclusion
Key takeaways on how Alipay ensures wireless network performance for its app.
Future directions include IPv6, QUIC, vendor collaborations, and POP node acceleration.
IPv6
QUIC
Vendor cooperation
Network POP node acceleration
For more practical insights, see the “Efficient Operations” section.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.