Operations 16 min read

How Alibaba Scaled Double 11 with AI‑Driven Network Automation

Alibaba senior technologist Houyi explains how the company used AI‑powered network intelligence, automated fault detection, traffic scheduling, and smart routing to dramatically improve stability, reduce costs, and boost efficiency during the massive Double 11 shopping event.

Alibaba Cloud Developer
Alibaba Cloud Developer
Alibaba Cloud Developer
How Alibaba Scaled Double 11 with AI‑Driven Network Automation

Key Network Technologies for Double 11

Alibaba leveraged intelligent methods to strengthen stability, accelerate fault discovery, enable automatic repair, and implement rapid configuration changes, achieving breakthroughs in high‑performance gateways (ANAT throughput 16×, LVS 8×), 25G backbone deployment, and precise traffic evaluation.

Advanced flow‑prediction with the Water Cube platform allowed accurate capacity planning and end‑to‑end QoS optimization for latency‑sensitive services such as transactions and payments.

Improving Fault Handling

Faults are categorized into change‑related and non‑change‑related. Automated change tools mitigate stability risks from deployments, while proactive inspections and real‑time vulnerability detection help engineers address non‑change faults before they impact services.

A fault‑feature library and predictive analytics enable early warning, rapid diagnosis, and automated remediation, forming a closed‑loop self‑recovery system.

Intelligent Scheduling & Auto Isolation

Automatic BGP cut‑over achieved a 100% success rate, and auto‑isolation handled over 90% of port/link and board‑level anomalies with a 95% success rate, dramatically reducing manual intervention.

Beidou Fast Discovery

The Beidou fault‑identification engine processes billions of log entries daily, extracting millions of events, consolidating them into a few hundred complex incidents, and converting the most critical ones into work orders for automated or manual handling.

End‑to‑End Diagnosis System “Paoding”

Paoding automates topology discovery, alarm aggregation, log retrieval, and command execution, shrinking fault localization time from 1–2 hours to about 3 minutes.

NetO Traffic Optimization

NetO uses SDN‑based SR‑TE to compute optimal paths, balance bandwidth, and minimize latency. The “Kuohai” decision engine maximizes business goals by dynamically reallocating traffic to under‑utilized or lower‑cost links, improving load‑balancing during failures and reducing transmission time for big‑data workloads.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

AlibabaAIOperationsfault detectiontraffic optimizationnetwork automation
Alibaba Cloud Developer
Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.