Operations 10 min read

Intelligent Operations: Machine‑Learning‑Based AIOps – Lecture Summary by Prof. Pei Dan

In this lecture, Prof. Pei Dan of Tsinghua University outlines the evolution of intelligent operations from rule‑based automation to machine‑learning‑driven AIOps, discusses data, feedback loops, and practical challenges, and calls for stronger collaboration between industry and academia to accelerate research and deployment.

Architects' Tech Alliance
Architects' Tech Alliance
Architects' Tech Alliance
Intelligent Operations: Machine‑Learning‑Based AIOps – Lecture Summary by Prof. Pei Dan

Speaker Introduction – Associate Professor Pei Dan from Tsinghua University’s Computer Science department presented a talk titled “Intelligent Operations Based on Machine Learning,” sharing current challenges and solution ideas for AI‑driven operations.

Background – Operations (运维) is shifting from rule‑based automation to machine‑learning‑based approaches. The speaker highlighted collaborations with Baidu’s operations and search departments and emphasized the need for academic‑industry synergy.

Personal Experience – The speaker described his PhD, internship at AT&T Research (Bell Labs heritage), six years of research, 23 patents, and extensive work in large‑scale operations using big‑data analytics, covering network performance, IPTV, video, and more.

Why Operations Can Be “High‑End” – Citing top conferences like SIGCOMM, he noted that a significant portion of papers (≈40%) relate to operations, indicating the field’s academic relevance.

Lab Overview – NetMan – The NetMan lab focuses on Network Performance Management (NPM) and Application Performance Management (APM), collaborating with internet companies on automation, cloud‑based operation platforms, and big‑data analysis tools.

From Rules to Learning – Historically, operations relied on expert‑crafted rule sets for event correlation. This worked in relatively simple backbone networks but failed at scale in modern micro‑service environments (e.g., Baidu’s >100 product lines, thousands of services).

To overcome this, the team adopted machine‑learning techniques to automatically discover rules from massive logs and ticketing data, creating a closed feedback loop.

Key Requirements for Successful ML‑Driven Operations

Data – Massive logs from internet applications provide rich features.

Process Feedback – Incident tickets and operator annotations serve as labeled data.

Application – Operators become end‑users of the intelligent system, enabling a full modeling‑measurement‑decision‑control cycle.

With abundant data, mature algorithms, and open‑source systems, the speaker predicts rapid growth of intelligent operations in the coming years.

Call to Action – Encourage tighter collaboration: industry supplies real problems and data; academia contributes time, algorithms, and students to co‑create solutions.

Additional resources and recommended top‑tier conferences (e.g., SIGCOMM, Google and Facebook papers) were listed for readers interested in the latest research.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big Datacloud computingmachine learningaiops
Architects' Tech Alliance
Written by

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.