How Alibaba’s AWCN Transforms Mobile App Network Performance in Weak Networks
This article details the evolution of Alibaba's unified mobile network library AWCN, explaining its MobileSDN architecture, multi‑protocol scheduling, connection management, vendor acceleration, and weak‑network mitigation techniques that together improve request reliability and user experience across billions of app sessions.
1. Introduction
Since 2013, Alibaba's wireless division has built the Ali Wireless Connection Network (AWCN) as a unified network library for the Taobao app, evolving through IPv6 campaigns and protocol upgrades to become a high‑performance, multi‑protocol, fault‑tolerant, observable network foundation for the entire group.
This article introduces the evolution of the Taobao app's unified network library, describing how a monitoring‑to‑acceleration MobileSDN architecture, weak‑network diagnostics, native multi‑channel technology, and smooth migration to SPDY‑free, large‑scale IPv6/H3 protocols improve app loading experience under weak networks.
2. Terminal Architecture Overview
2.1 MobileSDN Concept
Software‑Defined Networking (SDN) abstracts network resources into a virtualized system, separating forwarding from control, enabling applications to participate in network management, reducing usage and operation costs.
In mobile apps, the goal is a unified network solution where upper‑level business does not need to handle protocol forwarding or timeout degradation, achieving usability, observability, and good experience.
2.2 AWCN Terminal Network Architecture
Based on the above ideas, a north‑south integrated MobileSDN architecture is built to reduce integration/operation costs and improve browsing experience.
The following layers and their responsibilities are introduced:
Network Application: Unified RPC gateway (MTOP), push channel (ACCS), upload (AUS), download (TBDownloader), image loading (Phenix), remote config (Orange), etc.
Northbound Interface: Bridge for upper‑level calls, providing unified synchronous/asynchronous APIs and hook mechanisms.
Network Controller: Request strategy control center, responsible for end‑to‑end link scheduling and optimization.
Southbound Interface: Bridge between control plane and protocol forwarding, abstracting protocols for unified handling.
Protocol Forwarding: Unified adapters for HTTP/1.1, HTTP/2, HTTP/3, Alibaba’s HTTP/2+SSSL and H3‑XQUIC.
Network Performance Management: NPM collects device network status, signal strength, request statistics, latency diagnostics, and provides diagnostic tools.
2.3 Industry Analysis
Compared with Tencent WNS, WeChat Mars, Chromium Cronet, Square OkHttp, AWCN shares similar goals: better IP scheduling, multi‑protocol management, timeout control, and network observability.
2.4 Scale Overview
AWCN is used across Alibaba’s ecosystem, supporting thousands of applications such as Taobao, Xianyu, Youku, Tmall, Lazada, Amap, UC Browser, Ele.me, and third‑party apps via EMAS and Umeng.
The focus of this article is the network controller and how it optimizes request chains to ensure high availability under complex network conditions.
3. Network Acceleration System Details
A complete request flow consists of DNS → connection → data send → first‑packet wait → data receive, with IP strategy, connection management, request management, and vendor acceleration each playing a role.
3.1 IP Strategy Scheduling
Traditional LocalDNS suffers from slow resolution, high failure rates, and lack of traffic scheduling. Alibaba Mobile Dispatch Center (AMDC) provides wireless DNS dispatch.
IP selection uses server‑side list delivery plus client‑side dynamic ranking based on success/failure/latency metrics. Cache replacement considers service pressure and updates TTL‑based records asynchronously.
Elimination: An IP failing continuously within 5 minutes is disabled.
Update: Domains carry TTL; after expiration, asynchronous updates occur with priority‑based modes.
Case 1: Wi‑Fi Identifier Restrictions
Android 8+ requires location permission to obtain BSSID; iOS 14 requires network extension. Without BSSID, the existing storage key fails, affecting ~20 % of users. A new key based on AccessPoint information is introduced, requiring end‑to‑end AMDC coordination.
Performance gains: image CDN latency ↓ 4.439 %, P90 ↓ 1.932 %, P99 ↓ 2.230 %, P999 ↓ 2.668 %.
Case 2: Protocol Evolution for Complex Scheduling
When existing protocols cannot meet refined scheduling needs, a protocol overhaul is required. Data migration ensures smooth transition between old and new protocols, avoiding LocalDNS fallback.
3.2 Connection Management
Connection Establishment
Beyond serial and concurrent connections, hot‑domain pre‑connect and composite connections are provided.
Hot‑Domain Pre‑Connect
Composite Connection (IPv6 Scale)
In dual‑stack environments, Android lacks Happy Eyeballs fallback to IPv4, causing delays. The composite connection follows RFC 6555 to select the fastest link while preferring IPv6 and reducing backend pressure.
When a server's IPv4 path works but IPv6 does not, a dual‑stack client experiences significant delay. This document specifies algorithms to reduce that delay.
Goals: choose the fastest link (prefer IPv6) and avoid simultaneous requests to both addresses.
Dual‑Stack Experience: Select fastest link, prioritize IPv6.
Backend Pressure Reduction: Avoid duplicate requests.
Performance: connection time ↓ 22.12 %, 99th percentile ↓ 60.19 %, request latency ↓ 1.23 %, P99 latency ↓ 6.077 %.
Connection Scheduling
Connections are classified as keep‑alive or regular.
Keep‑Alive: Persistent connections for push/pull scenarios (e.g., messaging).
Regular: Reclaimed when idle, suitable for RPC calls.
Keep‑Alive Detection
Heartbeat PING packets are sent at different intervals for foreground/background and for timeout scenarios.
Idle Recycling
Idle connections are closed after a timeout to save bandwidth and power.
3.3 Request Management
Dynamic Timeout
Fine‑Grained Control: Independent timeout per request stage.
Dynamic Allocation: Adjust timeout based on domain, network type, and quality.
Multi‑Path Competition & Selection
When a request is slow or times out, AWCN chooses the best alternative:
Transport Protocol: Switch from HTTP/3 (UDP) to HTTP/2/1.1 (TCP) if needed.
Underlying Framework: Fall back from custom TNET to system libraries when proprietary protocols cause issues.
Network Channel: Switch from Wi‑Fi to cellular when Wi‑Fi is unreachable.
3.4 Vendor Acceleration
System‑level network optimizations from vendors (bandwidth scheduling, flow acceleration, QoE feedback, weak‑network prediction, dual‑Wi‑Fi aggregation) are integrated via a generic vendor‑acceleration module that abstracts device‑specific capabilities at runtime.
OPPO integration is complete and undergoing large‑scale validation.
4. Taobao Weak‑Network Mitigation Practice
4.1 Metric Definition
The “1 second rule” measures request success; currently >95 % of requests finish within 1 s. Weak‑network requests are defined as errors or long‑tail latency beyond a threshold.
4.2 Diagnosis System
A NPM diagnostic suite collects network quality, signal strength, latency, and provides tools for root‑cause analysis across multiple scenarios.
4.3 Weak‑Network Techniques
4.3.1 Multi‑Channel Transmission
In weak Wi‑Fi, aggressive retries increase load and worsen latency. Multi‑channel transmission allows switching from Wi‑Fi to cellular, reducing timeout errors.
Scale Results
AB testing shows >30 % reduction in request timeout rate during Double‑11 peak.
4.3.2 Native HTTP/2 Support
Although HTTP/2/3 are widely used, ~10 % of traffic remains on HTTP/1.1 due to proprietary implementations (e.g., slight‑ssl) and AMDC dependencies.
Android 5.0+ already hosts OkHttp under com.android.okhttp, enabling native HTTP/2 without adding third‑party libraries.
System bug caused IndexOutOfBoundsException in older OkHttp versions; fixed in 3.12.2+, but Alibaba’s source still uses 2.x. The solution is to avoid async OkHttp APIs, using synchronous calls with exception handling and bridging to third‑party OkHttp when present.
Result: Feeds interface H2 rollout reduced complaints by 23 %, request latency by 21.4 %, and success rate rose by 0.3 pp.
4.4 Summary
More than 10 billion PV of slow‑request mitigation have been achieved, meeting the annual target; MOTP timeout rate improved ~50 % year‑over‑year.
5. Future Directions and Outlook
5.1 More Precise Network State Perception
Integrate vendor‑provided QoE feedback and user‑friendly diagnostics to let users understand and resolve network issues.
5.2 More Dynamic Intelligent Scheduling
Develop adaptive scheduling that reacts to real‑time network quality and predicts changes to apply optimal acceleration strategies.
5.3 Consistent Weak‑Network Interaction Experience
Standardize weak‑network UI behavior (retry prompts, placeholders, graceful degradation) across Taobao’s many business lines.
References
[1] RFC 6555: https://www.rfc-editor.org/rfc/rfc6555
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
