Tagged articles
307 articles
Page 1 of 4
Full-Stack DevOps & Kubernetes
Full-Stack DevOps & Kubernetes
Apr 22, 2026 · Operations

Avoid 90% of Kubernetes Ops Pitfalls: A Definitive Guide

This guide outlines the five most common Kubernetes operational pitfalls, offers step‑by‑step remediation practices, introduces three emerging trends such as AI‑assisted troubleshooting, serverless clusters, and Tekton CI/CD, and provides three ready‑to‑copy kubectl commands to streamline daily management.

DevOpsKubernetesOperations
0 likes · 9 min read
Avoid 90% of Kubernetes Ops Pitfalls: A Definitive Guide
Shuge Unlimited
Shuge Unlimited
Mar 17, 2026 · Operations

Exploring OpenClaw for K8s AIOps: Four Practical Scenarios from Concept to Deployment

This article analyzes how OpenClaw’s Skills, Subagent, and Cron capabilities can be leveraged to build Kubernetes AIOps solutions, presenting four detailed scenarios—fault diagnosis, resource optimization, security audit, and continuous health checks—while evaluating technical feasibility, security, reliability, cost, and a phased rollout plan.

Cloud NativeKubernetesOpenClaw
0 likes · 19 min read
Exploring OpenClaw for K8s AIOps: Four Practical Scenarios from Concept to Deployment
Shuge Unlimited
Shuge Unlimited
Mar 15, 2026 · Operations

How OpenClaw Fixed a Self‑Upgraded, Unresponsive Instance in Just 3 Minutes

In a real‑world AIOps demo, the OpenClaw AI agent remotely diagnosed, pinpointed the OOM cause of a failed upgrade, rolled back to a stable version, and restored service within three minutes, illustrating its three core capabilities, cost advantages, feasibility analysis, and practical rollout guidance.

AI AgentAuto‑RemediationOpenClaw
0 likes · 13 min read
How OpenClaw Fixed a Self‑Upgraded, Unresponsive Instance in Just 3 Minutes
Efficient Ops
Efficient Ops
Feb 1, 2026 · Operations

How AI Agents Are Revolutionizing AIOps and Boosting Operational Efficiency

This article explains what AI agents are, outlines single‑agent and multi‑agent use cases in AIOps such as knowledge retrieval, tool guidance, fault diagnosis, and process automation, and lists the key technical skills needed to build and manage these intelligent operational assistants.

AIAgentAutomation
0 likes · 8 min read
How AI Agents Are Revolutionizing AIOps and Boosting Operational Efficiency
Raymond Ops
Raymond Ops
Jan 28, 2026 · Artificial Intelligence

From Alert Storms to Smart Ops: Unlocking AIOps for Modern IT Operations

This guide walks through the evolution from noisy alert storms to intelligent AIOps, covering AIOps fundamentals, why it matters now, core capabilities like anomaly detection, root‑cause analysis, capacity forecasting and self‑healing, a practical implementation roadmap, toolchain suggestions, common pitfalls, and future trends.

Capacity PredictionRoot Cause Analysisaiops
0 likes · 22 min read
From Alert Storms to Smart Ops: Unlocking AIOps for Modern IT Operations
Alibaba Cloud Developer
Alibaba Cloud Developer
Jan 12, 2026 · Operations

Why Traditional Monitoring Fails and How UModel Redefines Observability for AI‑Powered Ops

The article explains how legacy monitoring based on isolated metrics, traces, and logs cannot keep up with the massive, fragmented, and dynamic data of modern IT systems, and introduces UModel—a graph‑based observability model that bridges data, model, and engineering gaps to enable AI‑driven operations.

Graph ModelingObservabilityOperations
0 likes · 11 min read
Why Traditional Monitoring Fails and How UModel Redefines Observability for AI‑Powered Ops
Baidu Tech Salon
Baidu Tech Salon
Jan 8, 2026 · Artificial Intelligence

How Baidu’s AI‑Powered Architecture Transforms Network Operations

This article systematically presents Baidu Intelligent Cloud’s three‑layer AI architecture for network intelligent operations, explains the AI base, core, and business layers, showcases the NetStudio digital engineer platform, and details real‑world use cases, performance gains, and a roadmap toward fully autonomous network management.

AIIntelligent Automationagentic workflows
0 likes · 26 min read
How Baidu’s AI‑Powered Architecture Transforms Network Operations
Alibaba Cloud Native
Alibaba Cloud Native
Jan 3, 2026 · Operations

Turning Chaotic Observability Data into Actionable Graphs with UModel

This article examines the evolution of IT observability, explains why traditional metrics, traces, and logs fall short for AI‑driven operations, and introduces UModel—a graph‑based universal observability model that structures fragmented data into a semantic runtime context for autonomous AIOps agents.

Cloud NativeGraph ModelingObservability
0 likes · 12 min read
Turning Chaotic Observability Data into Actionable Graphs with UModel
Ray's Galactic Tech
Ray's Galactic Tech
Dec 2, 2025 · Operations

Build an End‑to‑End AIOps Solution: Log Alerts and Automated Self‑Healing Ops

This guide walks through designing and implementing an intelligent operations workflow that transforms passive log monitoring into proactive alerting and automated remediation, covering core concepts, tech‑stack selection, step‑by‑step configuration of log collection, alert rules, webhook integration, Ansible automation, and best‑practice considerations for scaling and security.

AlertingAnsibleGrafana
0 likes · 7 min read
Build an End‑to‑End AIOps Solution: Log Alerts and Automated Self‑Healing Ops
Huya Tech Engineering
Huya Tech Engineering
Nov 28, 2025 · Operations

How LLMs Accelerate Root‑Cause Diagnosis in Large‑Scale Microservices

By abstracting a massive microservice system as a dynamic multi‑layer graph and integrating large language models, the article outlines three evolution stages—from manual expert debugging to rule‑based AIOps and finally LLM‑driven cognitive reasoning—detailing practical workflows, context engineering, and real‑world case studies that dramatically improve MTTR and accuracy.

Context EngineeringLLMMicroservices
0 likes · 20 min read
How LLMs Accelerate Root‑Cause Diagnosis in Large‑Scale Microservices
Alibaba Cloud Observability
Alibaba Cloud Observability
Nov 10, 2025 · Cloud Native

How a Next‑Gen Cloud‑Native Observability Platform Boosted Ticketing Stability by 80%

A leading digital‑entertainment group tackled severe stability and monitoring challenges in its high‑traffic ticketing system by building a cloud‑native, full‑link observability platform on Alibaba Cloud, achieving an 80% improvement in fault detection speed, a 40% reduction in operational costs, and establishing data‑driven operations as the digital foundation for product growth.

ObservabilityOperationsaiops
0 likes · 15 min read
How a Next‑Gen Cloud‑Native Observability Platform Boosted Ticketing Stability by 80%
Efficient Ops
Efficient Ops
Oct 27, 2025 · Operations

How AI is Revolutionizing Observability and Intelligent Operations

At the GOPS Global Operations Conference in Shanghai, experts from finance, technology and energy sectors examined the challenges of observability, AIOps and intelligent agents, proposing metric standardization, digital‑twin fault simulation, and AI‑driven DevOps as key steps toward scalable, business‑value‑focused intelligent operations.

AI OpsDigital TwinIntelligent Operations
0 likes · 6 min read
How AI is Revolutionizing Observability and Intelligent Operations
Ops Community
Ops Community
Oct 27, 2025 · Operations

From Midnight Alerts to Peaceful Sleep: Building a Zabbix Monitoring System

After a costly midnight outage, the author shares how he designed a three‑layer Zabbix monitoring architecture—covering infrastructure, service, and business metrics—optimizing alert thresholds, automating discovery, and integrating with ITSM, ultimately reducing MTTR to minutes and enabling teams to sleep peacefully.

AlertingAutomationITSM
0 likes · 15 min read
From Midnight Alerts to Peaceful Sleep: Building a Zabbix Monitoring System
Ops Community
Ops Community
Sep 24, 2025 · Operations

How Ops Engineers Can Stop Online Outages in Minutes: A Proven Emergency Playbook

This article outlines why a solid incident‑response plan is critical, describes typical failure scenarios, introduces the 3‑5‑10 rule for rapid diagnosis and mitigation, provides ready‑to‑run scripts for system checks, traffic throttling, service rollback, and showcases automation, AIOps and chaos‑engineering techniques to turn reactive firefighting into proactive resilience.

aiopsemergency planincident response
0 likes · 18 min read
How Ops Engineers Can Stop Online Outages in Minutes: A Proven Emergency Playbook
Wukong Talks Architecture
Wukong Talks Architecture
Sep 22, 2025 · Databases

How AI‑Powered AIOps Transforms TiDB Database Operations

This article explores how integrating AI‑driven AIOps with the TiDB distributed database can automate monitoring, enable proactive anomaly detection, streamline root‑cause analysis, and optimize capacity planning, ultimately shifting database operations from manual firefighting to intelligent, data‑driven management.

Database operationsRoot Cause AnalysisTiDB
0 likes · 12 min read
How AI‑Powered AIOps Transforms TiDB Database Operations
MaGe Linux Operations
MaGe Linux Operations
Sep 12, 2025 · Operations

From Alert Storms to Intelligent Ops: A Practical AIOps Journey

This article explores how AIOps transforms traditional IT operations by using AI for anomaly detection, root‑cause analysis, capacity forecasting, and self‑healing, offering a step‑by‑step roadmap, real‑world code examples, toolchain recommendations, common pitfalls, and future trends for building intelligent, automated operations.

Root Cause Analysisaiopsanomaly detection
0 likes · 24 min read
From Alert Storms to Intelligent Ops: A Practical AIOps Journey
Efficient Ops
Efficient Ops
Aug 25, 2025 · Operations

How SOMM Is Revolutionizing Intelligent Ops with AIOps, SRE & FinOps

The China Academy of Information and Communications Technology introduced the SOMM (System Operation Maturity Model) framework, emphasizing tool intelligence, refined management, and robust operation, and detailed its AIOps, SRE, and FinOps assessment modules, evaluation criteria, maturity levels, and showcase of leading enterprises that have achieved top‑tier certifications.

FinOpsMaturity ModelSRE
0 likes · 8 min read
How SOMM Is Revolutionizing Intelligent Ops with AIOps, SRE & FinOps
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Aug 5, 2025 · Operations

Inside Alibaba’s Tesla: Data‑Driven Ops for 100k+ Big Data Nodes

The article details how Alibaba’s Tesla SRE platform supports the massive offline and real‑time big‑data ecosystems through a layered, data‑driven operations framework—DataOps—integrating unified portals, configuration, job, workflow, and analytics platforms, enabling automated monitoring, intelligent decision‑making, and self‑healing capabilities across 100,000+ nodes.

Big DataDataOpsOperations
0 likes · 20 min read
Inside Alibaba’s Tesla: Data‑Driven Ops for 100k+ Big Data Nodes
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Aug 5, 2025 · Operations

How Alibaba’s Open‑Source SREWorks Transforms Cloud‑Native Data Operations

Alibaba's SREWorks platform, now open‑source, combines cloud‑native architecture, DataOps and AIOps to address the growing complexity of big‑data and AI operations, offering a layered SaaS/PaaS/IaaS solution that streamlines delivery, monitoring, management, control, operation, and service for modern enterprises.

Cloud NativeDataOpsOperations
0 likes · 10 min read
How Alibaba’s Open‑Source SREWorks Transforms Cloud‑Native Data Operations
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Aug 5, 2025 · Operations

How Alibaba Automates Hardware Fault Detection and Self‑Healing at Scale

This article explains how Alibaba’s massive MaxCompute platform tackles the growing challenge of hardware failures by using predictive detection, automated server offline, self‑healing workflows, and cluster rebalancing to close the fault loop before business impact, while detailing the underlying architecture and operational principles.

Alibaba CloudOperations Automationaiops
0 likes · 14 min read
How Alibaba Automates Hardware Fault Detection and Self‑Healing at Scale
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Aug 4, 2025 · Operations

From Scripts to AIOps: How Alibaba’s Ops Evolved and What Skills You Need Today

Tracing Alibaba’s journey from manual, script‑based operations through tool‑centric and platform‑driven DevOps to the data‑focused DataOps era and emerging AIOps, the article outlines the shifting responsibilities, architectural challenges, and the multidisciplinary skill set required for modern operations engineers.

DataOpsOperationsSkill development
0 likes · 8 min read
From Scripts to AIOps: How Alibaba’s Ops Evolved and What Skills You Need Today
Ops Development Stories
Ops Development Stories
Jul 14, 2025 · Artificial Intelligence

Mastering AIOps: Prompt Engineering, Function Calling, RAG, Graph RAG, and Local LLM Deployment

This comprehensive guide explores AIOps techniques such as prompt engineering, chat completions, memory management, function calling, fine‑tuning, retrieval‑augmented generation (RAG), graph‑based RAG, and practical steps for deploying open‑source large language models locally, providing code examples and best‑practice recommendations for modern DevOps environments.

Function CallingGraph RAGRAG
0 likes · 47 min read
Mastering AIOps: Prompt Engineering, Function Calling, RAG, Graph RAG, and Local LLM Deployment
Efficient Ops
Efficient Ops
Jul 2, 2025 · Cloud Computing

How ICBC’s AI‑Native Data Center Is Redefining Cloud Computing for Finance

Amid the AI‑driven wave of large‑model technologies, Industrial and Commercial Bank of China’s data center has transformed its traditional infrastructure into an AI‑native computing hub, boosting operational efficiency, green sustainability, and autonomous control while supporting the financial sector’s shift toward intelligent, cognitive services.

AI-nativeaiopscloud computing
0 likes · 13 min read
How ICBC’s AI‑Native Data Center Is Redefining Cloud Computing for Finance
Ops Development Stories
Ops Development Stories
Jul 1, 2025 · Artificial Intelligence

From Lean to AIOps: How AI is Transforming Modern Operations

This comprehensive guide walks through the evolution from Lean and Agile practices to DevOps and finally AIOps, explaining core concepts, key algorithms, the role of large language models, RAG‑based root‑cause analysis, and practical implementation steps for intelligent operations.

LeanRAGRoot Cause Analysis
0 likes · 19 min read
From Lean to AIOps: How AI is Transforming Modern Operations
Efficient Ops
Efficient Ops
May 26, 2025 · Artificial Intelligence

How AI Agents Are Revolutionizing AIOps: Boosting Automation and Efficiency

This article explains how AI agents enhance large‑model capabilities for AIOps, detailing single‑agent use cases like knowledge retrieval, tool guidance, and fault diagnosis, as well as multi‑agent collaborations, required skills, and future prospects for autonomous operations.

AIAgentAutomation
0 likes · 7 min read
How AI Agents Are Revolutionizing AIOps: Boosting Automation and Efficiency
dbaplus Community
dbaplus Community
Apr 24, 2025 · Operations

How Ctrip Built a Scalable Observability Platform and AIOps Engine for Millions of Metrics and Logs

This article details Ctrip's end‑to‑end observability platform—covering metrics, logging, and tracing—its architecture, data governance, AIOps capabilities, and practical case studies, while addressing challenges like data volume, alert noise, and metric explosion in a massive micro‑service environment.

Ctripaiopscloud‑native
0 likes · 17 min read
How Ctrip Built a Scalable Observability Platform and AIOps Engine for Millions of Metrics and Logs
Continuous Delivery 2.0
Continuous Delivery 2.0
Mar 14, 2025 · Operations

The Birth of DevOps: Breaking the Collaboration Wall

This article traces the evolution of DevOps from its 2009 origin, through automation, security, FinOps, platform engineering, and the rise of AI-driven intelligent automation, highlighting future trends such as AI-native toolchains, cognitive collaboration, and sustainable practices that reshape how development and operations work together.

AIDevOpsFinOps
0 likes · 7 min read
The Birth of DevOps: Breaking the Collaboration Wall
Efficient Ops
Efficient Ops
Feb 26, 2025 · Databases

Efficient Operations for Heterogeneous Databases: Insights from Guangdong Mobile

The article summarizes Lai Kunchi's presentation at the 24th GOPS Global Operations Conference, covering the current state and challenges of database development, Guangdong Mobile's database operation system, and future directions for managing heterogeneous databases in evolving business architectures.

Database operationsDevOpsSRE
0 likes · 3 min read
Efficient Operations for Heterogeneous Databases: Insights from Guangdong Mobile
Alibaba Cloud Observability
Alibaba Cloud Observability
Feb 17, 2025 · Operations

What’s Driving Observability in 2025? AIOps, OpenTelemetry, and eBPF Trends

The article outlines 2025 observability trends, covering the rise of AIOps platforms, AI‑driven prediction, OpenTelemetry becoming the de‑facto standard, unified telemetry platforms, the shift of observability left and right, eBPF’s role in platform engineering, and cost‑effective strategies for modern cloud‑native environments.

ObservabilityOpenTelemetryaiops
0 likes · 10 min read
What’s Driving Observability in 2025? AIOps, OpenTelemetry, and eBPF Trends
Alibaba Cloud Developer
Alibaba Cloud Developer
Feb 13, 2025 · Operations

What Will Observability Look Like in 2025? Key Trends and Technologies

This article compiles predictions from multiple sources to outline ten common observability trends for 2025, covering AIOps platform evolution, AI‑driven prediction, OpenTelemetry adoption, unified monitoring, edge observability, shift‑left development, eBPF integration, log‑centric analytics, cost‑saving strategies, and proactive reliability.

2025 trendsOpenTelemetryaiops
0 likes · 12 min read
What Will Observability Look Like in 2025? Key Trends and Technologies
Efficient Ops
Efficient Ops
Feb 6, 2025 · Operations

Inside Alipay’s Full‑Ecosystem Availability Monitoring: Architecture and Practices

At the 2024 GOPS Global Operations Conference in Shanghai, Alipay’s monitoring lead Tang Liang presented the challenges, architecture, risk‑prevention practices, and implementation details of the company’s full‑ecosystem availability monitoring system, highlighting its role in DevOps, SRE, and AIOps initiatives.

AvailabilityCloud NativeDevOps
0 likes · 4 min read
Inside Alipay’s Full‑Ecosystem Availability Monitoring: Architecture and Practices
Efficient Ops
Efficient Ops
Feb 5, 2025 · Operations

FAW‑Volkswagen’s Integrated Tech‑Ops Platform: Key Practices, Challenges & Future Roadmap

At the 24th GOPS Global Operations Conference in Shanghai, FAW‑Volkswagen’s tech‑ops lead presented a detailed case study covering the platform’s background, implementation roadmap and results, encountered challenges, and future plans, offering practical insights into integrated DevOps, AIOps, and cloud‑native operations.

Case StudyDevOpsFAW-Volkswagen
0 likes · 3 min read
FAW‑Volkswagen’s Integrated Tech‑Ops Platform: Key Practices, Challenges & Future Roadmap
DataFunSummit
DataFunSummit
Jan 31, 2025 · Artificial Intelligence

LLMOps: Building a Prompt‑Driven Engine for AI Operations

This article presents the concept of LLMOps—applying large language models to AIOps—by analyzing prompt challenges, introducing the LogPrompt engine for log analysis, describing a prompt‑learning data flywheel with CoachLM optimization, reporting experimental results, and outlining future multi‑modal directions.

CoachLMData FlywheelLLMOps
0 likes · 16 min read
LLMOps: Building a Prompt‑Driven Engine for AI Operations
JD Tech Talk
JD Tech Talk
Jan 26, 2025 · Operations

Evolution of Operations and the Application of Large Models in Modern IT Ops

This article reviews the transformation of IT operations from manual processes to automation, AIOps, and ChatOps, and examines how large language models enhance intelligent assistance, automated diagnosis, and log analysis to improve efficiency, reliability, and rapid incident resolution.

AutomationChatOpsaiops
0 likes · 7 min read
Evolution of Operations and the Application of Large Models in Modern IT Ops
JD Cloud Developers
JD Cloud Developers
Jan 26, 2025 · Operations

How Large Language Models are Transforming Modern IT Operations

This article traces the evolution of IT operations from manual tasks to automation, AIOps, and ChatOps, and explains how large language models boost efficiency, enable intelligent assistants, automated diagnosis, and smart log analysis for more reliable, automated Ops workflows.

ChatOpsaiopslarge language models
0 likes · 7 min read
How Large Language Models are Transforming Modern IT Operations
Efficient Ops
Efficient Ops
Jan 20, 2025 · Operations

Inside Qunar’s Pre‑Release Platform: Design, Practice, and Future Outlook

The article recaps Li Jingkang’s presentation at the 2024 GOPS Global Operations Conference, detailing the background, principles, design, and real‑world implementation of Qunar’s pre‑release platform, and outlines its future direction within DevOps, SRE, AIOps, and cloud‑native practices.

Cloud NativeDevOpsOperations
0 likes · 3 min read
Inside Qunar’s Pre‑Release Platform: Design, Practice, and Future Outlook
Efficient Ops
Efficient Ops
Dec 2, 2024 · Operations

How AI‑Driven Parameter Governance Transforms DevOps Efficiency

This article explains how AI‑powered parameter governance, integrated with DevOps and AIOps practices, tackles the explosion of configuration parameters in large‑scale financial systems, streamlines design, auditing, detection, and deployment, and ultimately boosts operational efficiency and risk control.

AutomationDevOpsOperations
0 likes · 8 min read
How AI‑Driven Parameter Governance Transforms DevOps Efficiency
21CTO
21CTO
Nov 22, 2024 · Artificial Intelligence

How AI Can Erase Technical Debt and Reignite Developer Joy

Atlassian’s CTO explains how generative AI can eliminate outdated tools, reduce technical debt, streamline documentation, and automate alert handling, ultimately boosting developer productivity and satisfaction while restoring the fun of building innovative software.

AIDeveloper Experienceaiops
0 likes · 8 min read
How AI Can Erase Technical Debt and Reignite Developer Joy
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Nov 22, 2024 · Artificial Intelligence

AI and the Next-Generation Internet: Insights from Alibaba Cloud VP Cai Dezhi at the 2024 Wuzhen Summit

At the 2024 Wuzhen Summit, Alibaba Cloud R&D Vice President Cai Dezhi discussed the convergence of AI and next‑generation internet, outlining the “Network for AI” and “AI for Network” concepts, the HPN7.0 high‑performance network, AI‑driven operations, and the importance of open standards and protocol innovation to lower costs and enable widespread AI adoption.

AINext-Generation InternetOpen standards
0 likes · 4 min read
AI and the Next-Generation Internet: Insights from Alibaba Cloud VP Cai Dezhi at the 2024 Wuzhen Summit
Efficient Ops
Efficient Ops
Oct 24, 2024 · Operations

How Migu’s AI‑Powered Observability Boosts Cloud Gaming Operations

During the 24th GOPS Global Operations Conference, Migu Interactive Entertainment’s Vice President Su Yi discussed how their AI‑driven AIOps observability framework, validated by ITU standards, enhances cloud gaming platform stability, accelerates issue detection, and supports China Mobile’s 5G‑based digital transformation.

AIDigital TransformationObservability
0 likes · 19 min read
How Migu’s AI‑Powered Observability Boosts Cloud Gaming Operations
Efficient Ops
Efficient Ops
Oct 19, 2024 · Operations

How Migu’s Cloud Gaming Platform Achieved Leading AIOps Observability Standards

Migu Interactive Entertainment’s interview reveals how its cloud gaming platform leveraged AI, 5G, and standardized observability practices to pass both international and domestic AIOps assessments, highlighting the strategic importance of intelligent operations for business continuity in complex, distributed systems.

AIDigital TransformationIntelligent Operations
0 likes · 17 min read
How Migu’s Cloud Gaming Platform Achieved Leading AIOps Observability Standards
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Oct 9, 2024 · Operations

AIOps Implementation at Xiaohongshu: Fault Localization and Intelligent Operations

Xiaohongshu’s AIOps initiative builds a four‑layer framework that leverages machine‑learning‑driven anomaly detection, causal analysis, and trace‑based fault localization to automatically identify root‑cause services in micro‑service environments, achieving over 80 % accuracy across 1000 daily diagnoses while guiding future enhancements in change correlation and automated remediation.

DevOpsFault LocalizationIntelligent Operations
0 likes · 28 min read
AIOps Implementation at Xiaohongshu: Fault Localization and Intelligent Operations
DevOps
DevOps
Aug 26, 2024 · Operations

The Evolution of Operations: From Manual Ops to AIOps and ChatOps

This article explores the progression of IT operations—from manual processes through automated DevOps, to AI‑driven AIOps and chat‑based ChatOps—examining concepts, advantages, tools, and future possibilities, while also reflecting on how these trends reshape the role of operations engineers.

AIAutomationChatOps
0 likes · 12 min read
The Evolution of Operations: From Manual Ops to AIOps and ChatOps
Efficient Ops
Efficient Ops
Aug 4, 2024 · Artificial Intelligence

What the 2024 China AIOps Survey Reveals About Smart Operations Trends

The 2024 XOps Forum in Beijing showcased a new era of smart operations, unveiling a record‑breaking AIOps survey that highlights rapid investment growth, rising adoption of large language models, evolving maturity levels, and key challenges such as model accuracy and data quality across Chinese enterprises.

ChinaIT Operationsaiops
0 likes · 7 min read
What the 2024 China AIOps Survey Reveals About Smart Operations Trends
DataFunSummit
DataFunSummit
Jul 15, 2024 · Operations

Intelligent Operations (AIOps) Insights, Planning, and Large‑Model Agent Practices at ByteDance

The article summarizes ByteDance's intelligent operations (AIOps) strategy, covering frontier concepts, a five‑level automation roadmap, large‑model applications for fault diagnosis and smart Q&A, and a comprehensive AIOps platform that accelerates algorithm deployment, improves efficiency, and reduces operational costs.

AI agentsIntelligent OperationsOperations Automation
0 likes · 21 min read
Intelligent Operations (AIOps) Insights, Planning, and Large‑Model Agent Practices at ByteDance
JD Cloud Developers
JD Cloud Developers
Jul 2, 2024 · Operations

How Large Language Models Are Transforming Modern IT Operations

From manual server management to automated scripts, AIOps, and ChatOps, this article traces the evolution of IT operations and demonstrates how large language models boost efficiency, enable intelligent assistants, automated diagnostics, and smart log analysis, aiming for rapid fault detection, localization, and resolution.

AutomationChatOpsOperations
0 likes · 7 min read
How Large Language Models Are Transforming Modern IT Operations
ByteDance SYS Tech
ByteDance SYS Tech
Jun 30, 2024 · Operations

How Large‑Model AI Is Transforming Intelligent Operations (AIOps)

This article explores the latest concepts, planning roadmap, and practical applications of large‑model AI in intelligent operations, detailing AIOps use cases, system‑level automation, multi‑agent architectures, and how a dedicated platform accelerates deployment and efficiency across data‑center environments.

AI agentsAutomationIntelligent Operations
0 likes · 18 min read
How Large‑Model AI Is Transforming Intelligent Operations (AIOps)
Baidu Tech Salon
Baidu Tech Salon
May 27, 2024 · Artificial Intelligence

Intelligent Agent Technology in Commercial Advertising Platforms: Architecture and Applications

The paper describes Baidu’s AI‑native advertising platform that employs a multi‑agent architecture built on large‑language models—combining large‑small model collaboration, domain SOP‑driven coordination, and long‑term memory—to enable natural‑language understanding, proactive planning, execution and human‑like responses, illustrated by GBI analytics and JarvisBot operations, delivering higher consumption, accuracy, speed and efficiency.

AI-native platformsBusiness IntelligenceLLM applications
0 likes · 16 min read
Intelligent Agent Technology in Commercial Advertising Platforms: Architecture and Applications
vivo Internet Technology
vivo Internet Technology
May 15, 2024 · Databases

Challenges and New Technology Exploration in Vivo Database Operations Platform

At the 2024 XCOPS Intelligent Operations Management Annual Meeting in Guangzhou, Vivo’s Deng Song will discuss building a robust database operations platform, addressing availability threats, efficiency levers, 0‑to‑1 development strategies, and considerations of reliability, cost, and data privacy amid emerging AI and large‑model technologies.

ReliabilityTech Talkaiops
0 likes · 3 min read
Challenges and New Technology Exploration in Vivo Database Operations Platform
Efficient Ops
Efficient Ops
May 14, 2024 · Artificial Intelligence

How Large‑Model Agents Are Revolutionizing AIOps and Modern Operations

This article explores why large‑model Agent technology is essential for AIOps, explains single‑ and multi‑Agent architectures, memory and tool integration, and demonstrates practical applications such as anomaly detection, fault diagnosis, automated remediation, ChatOps, and future directions for intelligent, autonomous operations.

AI agentsLLMLarge Model
0 likes · 14 min read
How Large‑Model Agents Are Revolutionizing AIOps and Modern Operations
ByteDance SYS Tech
ByteDance SYS Tech
May 9, 2024 · Operations

How Large‑Model Agents Transform AIOps: From Automation to Self‑Healing Operations

The presentation explains how large‑model agents empower AIOps by automating routine tasks, enhancing anomaly detection, fault diagnosis, and remediation, while outlining architectural components, multi‑agent collaboration, and future directions for building self‑healing, observability‑driven operations platforms.

AgentObservabilityOperations Automation
0 likes · 15 min read
How Large‑Model Agents Transform AIOps: From Automation to Self‑Healing Operations
DataFunSummit
DataFunSummit
Apr 21, 2024 · Operations

The Value, Challenges, and Future of AIOps in Modern Enterprises

AIOps leverages AI to automate IT monitoring, predict failures, and optimize resources, offering modern enterprises reduced operational workload and higher reliability, while facing challenges such as data governance, automation, hierarchical monitoring, and large‑model hallucinations that must be addressed for successful deployment.

IT OperationsOperations Automationaiops
0 likes · 2 min read
The Value, Challenges, and Future of AIOps in Modern Enterprises
Efficient Ops
Efficient Ops
Mar 10, 2024 · Databases

How Machine Learning Can Automate MySQL Index Optimization

This article explains how applying machine learning to database operations—specifically AIOps for MySQL—can automate index recommendation by parsing SQL, extracting semantic and statistical features, generating candidate index combinations, and training an XGBoost model to predict optimal indexes, reducing reliance on manual DBA work.

Index OptimizationSQLaiops
0 likes · 10 min read
How Machine Learning Can Automate MySQL Index Optimization
dbaplus Community
dbaplus Community
Feb 4, 2024 · Operations

How Ant Group Leverages SLO and AIOps for Fine‑Grained Operations

This article details Ant Group's practical implementation of Service Level Objectives (SLO) and AIOps to achieve fine‑grained operations, covering SLO fundamentals, health‑score architecture, GitOps‑based data pipelines, error‑budget alerting, AI‑driven anomaly detection, fault localization techniques, and real‑world case studies on dashboards, Kubernetes SLOs, and emergency response workflows.

Error BudgetFault LocalizationKubernetes
0 likes · 38 min read
How Ant Group Leverages SLO and AIOps for Fine‑Grained Operations
dbaplus Community
dbaplus Community
Jan 29, 2024 · Artificial Intelligence

How Meituan Uses AIOps to Revolutionize Incident Management

This article details Meituan's two‑year exploration of AIOps for incident management, covering the challenges of massive, real‑time operational data, the AI‑driven modules for risk prevention, fault detection, diagnosis, and similar‑incident recommendation, and future directions such as intelligent log detection and change recognition.

OperationsRoot Cause Analysisaiops
0 likes · 22 min read
How Meituan Uses AIOps to Revolutionize Incident Management
Efficient Ops
Efficient Ops
Jan 17, 2024 · Operations

How China’s Telecom Giants Accelerate IT Efficiency with DevOps Maturity Assessments

In the context of digital transformation, six leading Chinese telecom operators applied the CAICT DevOps Capability Maturity Model to evaluate dozens of projects, achieving significant improvements in continuous delivery, technical operations, security, and AIOps, providing valuable references for the industry.

Continuous DeliveryDevOpsIT Operations
0 likes · 18 min read
How China’s Telecom Giants Accelerate IT Efficiency with DevOps Maturity Assessments
Efficient Ops
Efficient Ops
Jan 9, 2024 · Operations

What Do 2023 DevOps & AIOps Assessments Reveal About China’s Digital Transformation?

Amid China's sweeping digital, networked, and intelligent transformation, over 100 leading enterprises across banking, finance, communications, manufacturing, and other sectors have participated in DevOps and AIOps maturity model evaluations, providing a comprehensive view of industry adoption, capability levels, and emerging best practices for 2023.

DevOpsDigital TransformationOperations
0 likes · 15 min read
What Do 2023 DevOps & AIOps Assessments Reveal About China’s Digital Transformation?
High Availability Architecture
High Availability Architecture
Jan 9, 2024 · Operations

AIOps Practices for Incident Management at Meituan: From Risk Prevention to Post‑Operation

This article presents Meituan's two‑year exploration of AIOps in incident management, detailing risk‑prevention change detection, real‑time anomaly discovery, automated root‑cause diagnosis, multi‑dimensional KPI analysis, and similar‑event recommendation, while sharing architectural designs, algorithmic techniques, performance results, and future directions.

NLPOperationsRoot Cause Analysis
0 likes · 24 min read
AIOps Practices for Incident Management at Meituan: From Risk Prevention to Post‑Operation
Efficient Ops
Efficient Ops
Jan 8, 2024 · Operations

What Do 2023 DevOps & AIOps Assessments Reveal About China’s Digital Transformation?

Amid China's sweeping digital transformation, the China Academy of Information and Communications Technology (CAICT) reports that 104 leading enterprises across banking, securities, insurance, telecom, manufacturing and other sectors have completed 336 DevOps maturity assessments and 23 enterprises have finished 45 AIOps assessments in 2023, highlighting industry‑wide adoption of DevOps and AIOps standards and offering detailed breakdowns by sector, evaluation levels, and future guidance.

DevOpsDigital TransformationMaturity Model
0 likes · 16 min read
What Do 2023 DevOps & AIOps Assessments Reveal About China’s Digital Transformation?
Efficient Ops
Efficient Ops
Dec 26, 2023 · Operations

What Is ITU’s New AIOps Standard and How It Shapes Cloud Operations?

The article explains the ITU‑T Y.3550 AIOps standard, its AI‑driven cloud service development and operation requirements, the Chinese AIOps maturity‑model series, and the latest assessment results showing dozens of enterprises adopting these intelligent‑operations capabilities.

AIITU standardaiops
0 likes · 6 min read
What Is ITU’s New AIOps Standard and How It Shapes Cloud Operations?
Meituan Technology Team
Meituan Technology Team
Dec 21, 2023 · Operations

AIOps for Incident Management: Practices and Insights from Meituan

Meituan’s service‑operations team applies AIOps across prevention, detection, and post‑incident stages—using change‑risk analysis, real‑time graph‑based anomaly detection, similarity‑driven root‑cause diagnosis, and NLP‑powered incident recommendation—to achieve sub‑second detection, high precision, 28% faster fault handling, and plans for intelligent log and change recognition.

OperationsRoot Cause Analysisaiops
0 likes · 24 min read
AIOps for Incident Management: Practices and Insights from Meituan
Efficient Ops
Efficient Ops
Dec 18, 2023 · Artificial Intelligence

How Mobile Cloud Earned Top‑Tier AIOps Certification and What It Means for Intelligent Operations

The article details Mobile Cloud's successful third‑level AIOps assessment by the China Information and Communication Academy, explores the platform's architecture and intelligent operation capabilities, shares interview insights on challenges, benefits, and future plans, and presents industry‑wide AIOps maturity statistics.

IT OperationsIntelligent Operationsaiops
0 likes · 12 min read
How Mobile Cloud Earned Top‑Tier AIOps Certification and What It Means for Intelligent Operations
Bilibili Tech
Bilibili Tech
Dec 15, 2023 · Operations

Bilibili Alert Monitoring System: Design, Optimization, and Root‑Cause Analysis

Bilibili revamped its alert monitoring platform to meet rapid growth, focusing on effectiveness, timeliness, and coverage; it introduced a closed‑loop design and governance that cut weekly alerts by 90%, built a knowledge‑graph root‑cause system achieving 87.9% accuracy with sub‑minute latency, and integrated AIOps for ongoing refinement.

Alert MonitoringBilibiliRoot Cause Analysis
0 likes · 21 min read
Bilibili Alert Monitoring System: Design, Optimization, and Root‑Cause Analysis
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Nov 29, 2023 · Operations

How AIOps and DataOps Transform Big Data Operations: Lessons from ABM Platform

This article examines the challenges of big‑data operations, explains how DataOps and AIOps complement each other, and details the ABM intelligent operations architecture, platform components, and real‑world use cases such as Flink hotspot detection, ChatOps assistants, and dynamic MaxCompute resource optimization.

Big Data OperationsDataOpsaiops
0 likes · 11 min read
How AIOps and DataOps Transform Big Data Operations: Lessons from ABM Platform
Efficient Ops
Efficient Ops
Nov 8, 2023 · Operations

How Intelligent Operations (AIOps) Transforms IT Management and Self‑Healing

This article explains what intelligent operations (AIOps) are, outlines a four‑layer platform architecture, and showcases real‑world practices such as load‑balancing link repair, MySQL container self‑healing, composite service tracing, component‑based orchestration, and AI‑driven log analysis, concluding with future prospects.

AutomationIT OperationsIntelligent Operations
0 likes · 7 min read
How Intelligent Operations (AIOps) Transforms IT Management and Self‑Healing

How Transparent AI Boosts Trust in AIOps: Explainable Root‑Cause Solutions

This article examines the rapid growth of the Chinese IT operations market, explains why AIOps faces trust challenges due to opaque deep‑learning models, and presents AsiaInfo's transparent‑model and post‑hoc explanation engine together with three concrete explainable root‑cause analysis methods, concluding with future outlooks for trustworthy AIOps.

AI trustOperationsRoot Cause Analysis
0 likes · 13 min read
How Transparent AI Boosts Trust in AIOps: Explainable Root‑Cause Solutions
Didi Tech
Didi Tech
Sep 5, 2023 · Operations

Observability and Stability Engineering in Didi Ride‑Hailing Platform

At Didi, observability and stability engineering combine automated, AI‑driven alarm generation, distributed tracing, and ChatOps‑based fault handling to manage micro‑service complexity, massive traffic spikes, and cross‑region operations, emphasizing systematic investment, AIOps evolution, and a recruitment call for backend and test engineers.

DidiDistributed SystemsObservability
0 likes · 16 min read
Observability and Stability Engineering in Didi Ride‑Hailing Platform
Ctrip Technology
Ctrip Technology
Aug 3, 2023 · Operations

Intelligent Anomaly Detection for Ctrip Operations: LSTM Forecasting, Trend Analysis, Adaptive Thresholds, and Periodic Anomaly Filtering

The article describes Ctrip's AIOps approach to improving alert quality by combining statistical methods and machine‑learning models such as LSTM, trend analysis, adaptive threshold calculation, and dynamic‑time‑warping based periodic anomaly detection, achieving significant gains in precision and fault‑recall rates.

LSTMTime Seriesadaptive threshold
0 likes · 12 min read
Intelligent Anomaly Detection for Ctrip Operations: LSTM Forecasting, Trend Analysis, Adaptive Thresholds, and Periodic Anomaly Filtering
Efficient Ops
Efficient Ops
Jul 19, 2023 · Operations

What Do the Latest DevOps and AIOps Maturity Assessments Reveal About Chinese Enterprises?

The recent release of China Academy of Information and Communications Technology's DevOps, AIOps, and Identity Governance maturity model assessment results showcases extensive industry adoption, highlights the impact of standardization and tool empowerment on digital transformation, and provides detailed statistics across banking, securities, insurance, telecom, and other sectors.

DevOpsDigital TransformationMaturity Model
0 likes · 12 min read
What Do the Latest DevOps and AIOps Maturity Assessments Reveal About Chinese Enterprises?
Didi Tech
Didi Tech
Jul 11, 2023 · Operations

DevOps Practices and Challenges at Didi Ride‑Hailing: From Development to Operations

Didi’s ride‑hailing R&D team addresses efficiency and stability challenges of a large micro‑service ecosystem by unifying a Go stack, common framework, and data models, using eBPF traffic recording for automated regression testing, and applying AIOps alert filtering, knowledge‑graph root‑cause analysis, and a localization robot for rapid fault recovery, while targeting full CI/CD automation with static analysis, service‑mesh observability, and chaos engineering.

CloudNativeMicroservicesaiops
0 likes · 22 min read
DevOps Practices and Challenges at Didi Ride‑Hailing: From Development to Operations
Efficient Ops
Efficient Ops
Jun 25, 2023 · Operations

How to Build a Next‑Gen “Big Operations” System for Reliability and Observability

This article outlines the evolution from manual operations to DevOps and SRE‑driven “big operations,” detailing system reliability and continuity practices, observability concepts, and the development of AIOps maturity standards, offering a comprehensive guide for building stable, efficient, and secure operational frameworks.

DevOpsObservabilityOperations
0 likes · 14 min read
How to Build a Next‑Gen “Big Operations” System for Reliability and Observability
Continuous Delivery 2.0
Continuous Delivery 2.0
Jun 15, 2023 · Artificial Intelligence

AI‑Driven Software Engineering: From Requirements to Operations in the Era of Software Engineering 3.0

The article outlines how AI, especially large language models and ML‑DevOps, is reshaping software engineering from historical roots through requirement mining, design automation, intelligent coding, testing, and AIOps, culminating in the transformative impact of GPT‑4 on development practices.

AIDesign AutomationML-DevOps
0 likes · 8 min read
AI‑Driven Software Engineering: From Requirements to Operations in the Era of Software Engineering 3.0
Efficient Ops
Efficient Ops
Jun 7, 2023 · Artificial Intelligence

How Guangdong Mobile Scaled AIOps: From Manual Ops to Intelligent Automation

This article details Guangdong Mobile's evolution of IT systems and operations, explains the four domain architecture, chronicles the AIOps adoption timeline, showcases intelligent anomaly detection, change assessment, fault diagnosis, and operation robots, and shares practical promotion methods and future outlook for AI‑driven IT operations.

AutomationFault DiagnosisIT Operations
0 likes · 19 min read
How Guangdong Mobile Scaled AIOps: From Manual Ops to Intelligent Automation
Efficient Ops
Efficient Ops
May 22, 2023 · Operations

What’s Driving China’s AIOps Evolution? Insights from the 2023 Survey

The 2023 China AIOps Status Survey, launched by CAICT and the Cloud Computing Open Source Industry Alliance, gathers input from over 60 enterprises to reveal current intelligent‑operations practices, observability adoption, generative AI prospects, and best‑practice case studies, while inviting participants to shape the upcoming report.

Industry SurveyIntelligent OperationsObservability
0 likes · 9 min read
What’s Driving China’s AIOps Evolution? Insights from the 2023 Survey
Efficient Ops
Efficient Ops
May 16, 2023 · Operations

How China Mobile Built a Scalable AIOps Platform to Cut Incident Resolution Time

This article shares China Mobile IT Center's four‑year journey of designing, deploying, and refining a centralized AIOps platform that automates anomaly detection, fault diagnosis, and remediation, dramatically reducing complaint ticket handling from ten to six hours while scaling to billions of AI model calls per month.

AIaiopsincident management
0 likes · 18 min read
How China Mobile Built a Scalable AIOps Platform to Cut Incident Resolution Time
Efficient Ops
Efficient Ops
May 10, 2023 · Operations

Mastering XOps: From DevOps to FinOps – A Comprehensive Guide

This article presents a systematic overview of the emerging XOps ecosystem—including DevOps, BizDevOps, AIOps, FinOps, and SRE—detailing their relationships, maturity models, standards, and practical guidance for enterprises seeking to achieve efficient, secure, and data‑driven digital transformation.

BizDevOpsDevOpsFinOps
0 likes · 13 min read
Mastering XOps: From DevOps to FinOps – A Comprehensive Guide
Efficient Ops
Efficient Ops
Apr 25, 2023 · Operations

How Industrial Bank’s Cloud‑Native AIOps Earned Top‑Tier Assessment

Industrial and Financial Sectors’ first AIOps maturity assessment, led by China Information & Communication Research Institute, recognized Industrial Bank’s cloud‑native intelligent operation project as a domestic leader, detailing its architecture, performance metrics, interview insights, and future AI‑driven operational strategies.

Cloud NativeDigital TransformationIT Operations
0 likes · 10 min read
How Industrial Bank’s Cloud‑Native AIOps Earned Top‑Tier Assessment
dbaplus Community
dbaplus Community
Apr 19, 2023 · Operations

How Cloud‑Native Fuels Operations Digital Transformation – Insights from China Mobile

This article summarizes Wang Xiaozheng’s 2023 China Data Intelligence Management Summit talk, outlining the challenges of operations transformation under cloud‑native, the core ideas behind digital ops, Zhejiang Mobile’s practical implementation across six "可" dimensions, and future outlooks for AIOps and metaverse‑driven collaboration.

Cloud NativeDevOpsDigital Transformation
0 likes · 10 min read
How Cloud‑Native Fuels Operations Digital Transformation – Insights from China Mobile