Intelligent Operations: Machine‑Learning‑Based AIOps – Lecture Summary by Prof. Pei Dan
In this lecture, Prof. Pei Dan of Tsinghua University outlines the evolution of intelligent operations from rule‑based automation to machine‑learning‑driven AIOps, discusses data, feedback loops, and practical challenges, and calls for stronger collaboration between industry and academia to accelerate research and deployment.
Speaker Introduction – Associate Professor Pei Dan from Tsinghua University’s Computer Science department presented a talk titled “Intelligent Operations Based on Machine Learning,” sharing current challenges and solution ideas for AI‑driven operations.
Background – Operations (运维) is shifting from rule‑based automation to machine‑learning‑based approaches. The speaker highlighted collaborations with Baidu’s operations and search departments and emphasized the need for academic‑industry synergy.
Personal Experience – The speaker described his PhD, internship at AT&T Research (Bell Labs heritage), six years of research, 23 patents, and extensive work in large‑scale operations using big‑data analytics, covering network performance, IPTV, video, and more.
Why Operations Can Be “High‑End” – Citing top conferences like SIGCOMM, he noted that a significant portion of papers (≈40%) relate to operations, indicating the field’s academic relevance.
Lab Overview – NetMan – The NetMan lab focuses on Network Performance Management (NPM) and Application Performance Management (APM), collaborating with internet companies on automation, cloud‑based operation platforms, and big‑data analysis tools.
From Rules to Learning – Historically, operations relied on expert‑crafted rule sets for event correlation. This worked in relatively simple backbone networks but failed at scale in modern micro‑service environments (e.g., Baidu’s >100 product lines, thousands of services).
To overcome this, the team adopted machine‑learning techniques to automatically discover rules from massive logs and ticketing data, creating a closed feedback loop.
Key Requirements for Successful ML‑Driven Operations
Data – Massive logs from internet applications provide rich features.
Process Feedback – Incident tickets and operator annotations serve as labeled data.
Application – Operators become end‑users of the intelligent system, enabling a full modeling‑measurement‑decision‑control cycle.
With abundant data, mature algorithms, and open‑source systems, the speaker predicts rapid growth of intelligent operations in the coming years.
Call to Action – Encourage tighter collaboration: industry supplies real problems and data; academia contributes time, algorithms, and students to co‑create solutions.
Additional resources and recommended top‑tier conferences (e.g., SIGCOMM, Google and Facebook papers) were listed for readers interested in the latest research.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architects' Tech Alliance
Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
