From Lean to AIOps: How AI is Transforming Modern Operations

This comprehensive guide walks through the evolution from Lean and Agile practices to DevOps and finally AIOps, explaining core concepts, key algorithms, the role of large language models, RAG‑based root‑cause analysis, and practical implementation steps for intelligent operations.

Ops Development Stories
Ops Development Stories
Ops Development Stories
From Lean to AIOps: How AI is Transforming Modern Operations

Hello, I am Joker, an ops engineer and cloud‑native enthusiast.

Before diving into AIOps practice, let’s briefly introduce the related theory, covering the following aspects:

From Lean, Agile, DevOps to AIOps

What is AIOps

Large models and AIOps

AIOps algorithms and application cases

From Lean, Agile, DevOps to AIOps

Whether in product design, development, testing, or operations, all stages revolve around the application lifecycle. Concepts such as Lean, Agile and DevOps provide guiding principles.

What is Lean

Lean

originated in manufacturing at Toyota, aiming to eliminate waste and optimize processes to maximize value delivery. Key elements include Value, Value Stream, Flow, Pull, Eliminate Waste, and Respect for People.

What is Agile Development

Waterfall Development

Before Agile became widespread, Waterfall required strict sequential phases, making it inflexible to rapid requirement changes.

Agile Development

Guided by Lean, Agile adopts a small‑step, fast‑feedback approach, delivering early value, embracing change, and iterating continuously.

However, Agile often overloads the operations side, leading to issues such as over‑burdened ops teams, excessive overtime, stability decline, and the creation of “department walls”.

DevOps

DevOps

combines Development and Operations into a unified workflow, aiming for shared responsibility for product quality. It is a methodology that leverages automation tools to improve developer services and overall software delivery efficiency.

AIOps

AIOps

builds on DevOps by adding AI and machine‑learning techniques to achieve full‑process automation and high‑efficiency operations. It inherits DevOps’ principles, breaks departmental silos, and enhances observability, root‑cause analysis, and predictive problem solving.

AIOps advantages include continuous monitoring of data streams, solving ops “island” problems, predictive issue resolution, and fast root‑cause analysis using machine learning.

Implementing AIOps requires abstracting underlying data for efficient management and automation, including data collection, storage, deep‑learning modeling, and declarative interfaces.

Large Models and AIOps

Advances in large models like GPT simplify data analysis and modeling, lowering technical barriers. They enable natural‑language‑driven interactions, where user queries are transformed into function calls for metric retrieval.

Effective use of large models in AIOps relies on five layers: Model, Prompt templates, Chain calls, Agent, and Multi‑Agent collaboration.

AIOps Algorithms and Use Cases

Common algorithms include statistical analysis (ANOVA, t‑test), time‑series forecasting (SARIMA, LSTM, Prophet), machine‑learning classification/regression (Decision Tree, SVM), and anomaly detection (Isolation Forest, DBSCAN). These enable comprehensive monitoring, fault prediction, and log clustering.

RAG‑Based Root‑Cause Analysis

RAG (Retrieval‑Augmented Generation) combines knowledge retrieval with LLM generation to quickly locate root causes. The workflow consists of a data preparation phase (collecting, cleaning, chunking, embedding, storing in vector databases such as FAISS, Milvus, Weaviate, Elasticsearch) and a query phase (embedding user query, retrieving similar cases, and generating structured RCA reports).

{
  "root_causes": [
    {
      "reason": "Database primary node down",
      "evidence": ["connection timeout", "db ping failed", "replica lag"],
      "probability": 0.92
    }
  ],
  "suggested_actions": ["Check primary DB status", "Switch to replica", "Restart service"]
}

Key Technical Components

Embedding models: BERT, RoBERTa, Sentence‑BERT, SimCSE

Vector databases: FAISS, Milvus, Weaviate, Elasticsearch

Large language models: Qwen, ChatGLM, Llama‑3, Baichuan, Tongyi Qianwen

Integration frameworks: LangChain, Haystack, FastRAG

Summary

The article outlines the progression from Lean, Agile, and DevOps to AIOps, explains each methodology’s principles, describes how AI enhances operations, presents common AIOps algorithms and case studies, and details a RAG‑based root‑cause analysis workflow with its supporting technologies.

large language modelsRAGagileaiopsroot cause analysisLean
Ops Development Stories
Written by

Ops Development Stories

Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.