Operations 20 min read

How AIOps Can Turn IT Operations into Fully Unmanned Systems

This article explains the challenges of traditional human‑centric IT operations, introduces a quantitative unmanned‑operations rating method, describes how AIOps (AI for IT Operations) provides the perception‑decision‑action loop, and showcases intelligent fault detection and knowledge‑graph techniques that together enable a path toward fully autonomous operations.

Efficient Ops
Efficient Ops
Efficient Ops
How AIOps Can Turn IT Operations into Fully Unmanned Systems

Professor Pei Dan from Tsinghua University presents a comprehensive view on "Unmanned Operations based on AIOps," highlighting the growing mismatch between human‑driven decision making and the complexity of modern distributed systems.

Unmanned Operations Rating

The proposed rating uses the metric Cores per Op (CPO), i.e., the number of X86 CPU cores managed per weekly 40‑hour operation (Op). Higher CPO indicates a higher level of automation. Example calculations show Level 0 for traditional industries, Level 1–3 for internet companies, and illustrate both horizontal (cross‑company) and vertical (time‑based) comparisons.

Achieving Unmanned Operations with AIOps

AIOps serves as the "brain" that consumes monitoring data (the "eyes") and drives automated actions (the "hands"). It also builds an operations knowledge graph for query and decision support. The architecture separates three modules: perception, decision, and execution, each requiring tailored AI algorithms.

Intelligent Fault Detection

The lab’s research on single‑metric anomaly detection evolved from supervised methods (IMC 2015) to unsupervised approaches (WWW 2018) and further enhancements such as temporal modeling, GAN‑based noise handling, clustering, transfer learning, and semi‑supervised techniques for short‑lifecycle services. These algorithms have been integrated into a unified "Intelligent Fault Discovery" system that automatically pinpoints fault locations, suggests remediation, and correlates with historical incidents.

Operations Knowledge Graph

The knowledge graph automatically extracts entities (servers, containers, services, metrics, etc.), their attributes, and relationships from operational data, providing a centralized, queryable, and continuously updated view that surpasses traditional CMDB and expert‑knowledge approaches. Examples illustrate hardware‑software mappings, dynamic metric profiles, and fault‑propagation graphs that enable precise root‑cause analysis and capacity planning.

Conclusion and Outlook

The speaker emphasizes that while the journey to fully unmanned operations is demanding, the combination of quantitative rating, AIOps architecture, intelligent fault detection, and knowledge‑graph techniques provides a solid foundation. Ongoing research and community collaboration are essential to realize the transformative impact of AIOps across industries.

This article is compiled from Professor Pei Dan’s latest speech at the 10th GOPS Shanghai session.
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Knowledge GraphaiopsIntelligent Fault DetectionOperations RatingUnmanned Operations
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.