How Alibaba Scaled Double 11 with AI‑Driven Network Automation
Alibaba senior technologist Houyi explains how the company used AI‑powered network intelligence, automated fault detection, traffic scheduling, and smart routing to dramatically improve stability, reduce costs, and boost efficiency during the massive Double 11 shopping event.
Key Network Technologies for Double 11
Alibaba leveraged intelligent methods to strengthen stability, accelerate fault discovery, enable automatic repair, and implement rapid configuration changes, achieving breakthroughs in high‑performance gateways (ANAT throughput 16×, LVS 8×), 25G backbone deployment, and precise traffic evaluation.
Advanced flow‑prediction with the Water Cube platform allowed accurate capacity planning and end‑to‑end QoS optimization for latency‑sensitive services such as transactions and payments.
Improving Fault Handling
Faults are categorized into change‑related and non‑change‑related. Automated change tools mitigate stability risks from deployments, while proactive inspections and real‑time vulnerability detection help engineers address non‑change faults before they impact services.
A fault‑feature library and predictive analytics enable early warning, rapid diagnosis, and automated remediation, forming a closed‑loop self‑recovery system.
Intelligent Scheduling & Auto Isolation
Automatic BGP cut‑over achieved a 100% success rate, and auto‑isolation handled over 90% of port/link and board‑level anomalies with a 95% success rate, dramatically reducing manual intervention.
Beidou Fast Discovery
The Beidou fault‑identification engine processes billions of log entries daily, extracting millions of events, consolidating them into a few hundred complex incidents, and converting the most critical ones into work orders for automated or manual handling.
End‑to‑End Diagnosis System “Paoding”
Paoding automates topology discovery, alarm aggregation, log retrieval, and command execution, shrinking fault localization time from 1–2 hours to about 3 minutes.
NetO Traffic Optimization
NetO uses SDN‑based SR‑TE to compute optimal paths, balance bandwidth, and minimize latency. The “Kuohai” decision engine maximizes business goals by dynamically reallocating traffic to under‑utilized or lower‑cost links, improving load‑balancing during failures and reducing transmission time for big‑data workloads.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
