How AIOps Can Turn IT Operations into Fully Unmanned Systems
This article explains the challenges of traditional human‑centric IT operations, introduces a quantitative unmanned‑operations rating method, describes how AIOps (AI for IT Operations) provides the perception‑decision‑action loop, and showcases intelligent fault detection and knowledge‑graph techniques that together enable a path toward fully autonomous operations.
Professor Pei Dan from Tsinghua University presents a comprehensive view on "Unmanned Operations based on AIOps," highlighting the growing mismatch between human‑driven decision making and the complexity of modern distributed systems.
Unmanned Operations Rating
The proposed rating uses the metric Cores per Op (CPO), i.e., the number of X86 CPU cores managed per weekly 40‑hour operation (Op). Higher CPO indicates a higher level of automation. Example calculations show Level 0 for traditional industries, Level 1–3 for internet companies, and illustrate both horizontal (cross‑company) and vertical (time‑based) comparisons.
Achieving Unmanned Operations with AIOps
AIOps serves as the "brain" that consumes monitoring data (the "eyes") and drives automated actions (the "hands"). It also builds an operations knowledge graph for query and decision support. The architecture separates three modules: perception, decision, and execution, each requiring tailored AI algorithms.
Intelligent Fault Detection
The lab’s research on single‑metric anomaly detection evolved from supervised methods (IMC 2015) to unsupervised approaches (WWW 2018) and further enhancements such as temporal modeling, GAN‑based noise handling, clustering, transfer learning, and semi‑supervised techniques for short‑lifecycle services. These algorithms have been integrated into a unified "Intelligent Fault Discovery" system that automatically pinpoints fault locations, suggests remediation, and correlates with historical incidents.
Operations Knowledge Graph
The knowledge graph automatically extracts entities (servers, containers, services, metrics, etc.), their attributes, and relationships from operational data, providing a centralized, queryable, and continuously updated view that surpasses traditional CMDB and expert‑knowledge approaches. Examples illustrate hardware‑software mappings, dynamic metric profiles, and fault‑propagation graphs that enable precise root‑cause analysis and capacity planning.
Conclusion and Outlook
The speaker emphasizes that while the journey to fully unmanned operations is demanding, the combination of quantitative rating, AIOps architecture, intelligent fault detection, and knowledge‑graph techniques provides a solid foundation. Ongoing research and community collaboration are essential to realize the transformative impact of AIOps across industries.
This article is compiled from Professor Pei Dan’s latest speech at the 10th GOPS Shanghai session.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
