How Alibaba’s Qi Tian Platform Secures Large-Scale Cloud Networks
This article examines Alibaba Cloud’s Qi Tian integrated operation‑management platform, detailing the challenges of massive cloud network management and the innovative data‑fusion, automated change, intent‑aware monitoring, and multi‑plane self‑healing technologies that enable secure, high‑performance operation at million‑device scale.
Introduction
To implement the Ministry of Industry and Information Technology’s 2025 network security plan, the China Academy of Information and Communications Technology hosted a cloud‑service security conference in Hangzhou, where Alibaba Cloud’s ultra‑large‑scale cloud computing network integration platform – Qi Tian – won the “Cloud Service Operation Security Innovation Award” and its team leader received a “Full‑Stack” expert certification.
Core Challenges
Large‑scale cloud networks must handle massive data, million‑level device inventories, high‑frequency topology changes, and heterogeneous equipment while maintaining real‑time monitoring and rapid fault recovery.
Balancing fine‑grained decision data needs with storage and compute costs.
Managing millions of devices with limited human resources.
Meeting sub‑millisecond monitoring requirements amid highly dynamic network topologies.
Detecting and repairing faults across diverse, multi‑plane device architectures efficiently.
Key Technologies
1. High‑Performance Data Management through Intelligence‑Fusion
Qi Tian unifies multi‑modal network data storage, employs a stateless cloud‑native analysis engine, and builds spatiotemporal knowledge graphs, achieving petabyte‑scale storage, million‑level virtual network modeling, and millisecond‑level data analysis.
2. Unattended Multi‑Tenant Dynamic Change
By orchestrating ultra‑high‑dimensional tasks, leveraging micro‑cluster caching, and applying collaborative multi‑metric evaluation, the system performs zero‑loss, zero‑downtime changes on million‑scale devices, dramatically reducing manual effort.
3. Intent‑Aware Adaptive High‑Precision Monitoring
Using user‑intent‑driven virtual network measurement and machine‑learning prediction, the platform attains packet‑level accuracy, millisecond timing, instance‑level traffic visibility, and user‑level alert precision.
4. Multi‑Plane Anomaly Detection and Full‑Link Self‑Healing
Combining formal verification, visual diagnostics, and a trained anomaly library, the system rapidly classifies and isolates faults across physical, virtual, and tenant planes, employing programmable NIC back‑pressure and software‑controlled traffic scheduling for swift recovery.
Conclusion & Outlook
After a decade of development, Qi Tian now powers Alibaba Cloud’s commercial network services for millions of customers, supporting major events such as the 20th Party Congress and the Paris Olympics. With over 40 patents and 20 high‑impact papers, the platform has been recognized by Gartner for unique network performance visualization. Future work will deepen the “intelligence‑fusion, operation‑as‑one” strategy, integrating AI to achieve autonomous, closed‑loop network management from perception to self‑optimizing policy execution.
Alibaba Cloud Infrastructure
For uninterrupted computing services
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.