Tagged articles

AIOps

319 articles · Page 1 of 4
dbaplus Community
dbaplus Community
Jun 28, 2026 · Operations

Why Tencent Music Rejects AI Hype: Building an OpenClaw‑Powered Intelligent Ops Ecosystem

The article details Tencent Music's step‑by‑step evolution from manual alert handling to a three‑layer cloud‑native AIOps platform, describing data pipelines, dynamic 3‑sigma alerts, full‑link observability, and the OpenClaw sandbox with multi‑agent architecture that prioritises scenario‑driven, safe AI integration.

AIAIOpsCloud Native
0 likes · 17 min read
Why Tencent Music Rejects AI Hype: Building an OpenClaw‑Powered Intelligent Ops Ecosystem
AI Agent Super App
AI Agent Super App
Jun 24, 2026 · Operations

Will AI Replace Ops Engineers by 2025? From Automated Troubleshooting to One‑Click Deployments

The article examines how AI is reshaping operations—from instant fault detection and 47‑second incident resolution to natural‑language deployment scripts, predictive capacity planning, continuous security monitoring, and automated knowledge bases—while arguing that engineers will transition from fire‑fighters to system designers.

AIOpsAutomationcapacity planning
0 likes · 15 min read
Will AI Replace Ops Engineers by 2025? From Automated Troubleshooting to One‑Click Deployments
Alibaba Cloud Native
Alibaba Cloud Native
Jun 8, 2026 · Operations

From Alarm Storms to Proactive Immunity: Geely Auto’s Intelligent Operations Journey

Facing exploding alarm volumes, cross‑cloud data silos, and slow root‑cause resolution, Geely Auto partnered with Alibaba Cloud STAROps to build a three‑step data foundation that unified heterogeneous data, enabled AI‑driven insight, and transformed the ops team from reactive responders to proactive platform operators.

AIOpsCloud NativeData Unification
0 likes · 9 min read
From Alarm Storms to Proactive Immunity: Geely Auto’s Intelligent Operations Journey
Machine Heart
Machine Heart
Jun 7, 2026 · Artificial Intelligence

How GoS Gives Agents a Shared Belief State for True Multi-Agent Collaboration

The paper introduces Graph of States (GoS), a neural‑symbolic framework that equips multi‑agent systems with an explicit, maintainable belief state, enabling backtracking and drill‑down during long‑horizon abductive tasks such as medical diagnosis and distributed‑system fault analysis, and demonstrates superior Match and Relevant scores over existing baselines.

AIOpsabductive reasoningcausal graph
0 likes · 11 min read
How GoS Gives Agents a Shared Belief State for True Multi-Agent Collaboration
Alibaba Cloud Native
Alibaba Cloud Native
Jun 3, 2026 · Operations

How Ontology Can Help Enterprises Overcome Token‑Maxxing Costs

This article analyses why AI agents consume massive token budgets—showing that input tokens dominate costs, presenting data from academic papers, industry benchmarks, and Reddit traces, and demonstrating how ontology‑driven solutions like UModel and STAROps can dramatically reduce token usage in real‑world operations.

AIOpsDependency ExplorationOntology
0 likes · 15 min read
How Ontology Can Help Enterprises Overcome Token‑Maxxing Costs
Mingyi World Elasticsearch
Mingyi World Elasticsearch
May 31, 2026 · Operations

Automating Easysearch Cluster Alerts and Root‑Cause Analysis with AIOps – Full Implementation Guide

This article walks through a practical AIOps solution that replaces brittle keyword rules for Easysearch Elasticsearch clusters with a three‑step pipeline—Filebeat log ingestion, Flask‑driven LLM analysis, and automated email alerts plus ES feedback—detailing configuration, code, pitfalls, and suitability.

AIOpsDeepSeekElasticsearch
0 likes · 12 min read
Automating Easysearch Cluster Alerts and Root‑Cause Analysis with AIOps – Full Implementation Guide
Alibaba Cloud Native
Alibaba Cloud Native
May 28, 2026 · Operations

Can Ontology Really Improve Your AIOps Agent?

The article explains how ontology—an explicit, unambiguous knowledge map—addresses the cognitive and data challenges of AIOps, describes the UModel framework that models entities, relationships, and telemetry, and shows how the STAROps agent built on UModel delivers more accurate, explainable, and trustworthy operations intelligence.

AIOpsCloud NativeKnowledge Graph
0 likes · 16 min read
Can Ontology Really Improve Your AIOps Agent?
Full-Stack DevOps & Kubernetes
Full-Stack DevOps & Kubernetes
Apr 22, 2026 · Operations

Avoid 90% of Kubernetes Ops Pitfalls: A Definitive Guide

This guide outlines the five most common Kubernetes operational pitfalls, offers step‑by‑step remediation practices, introduces three emerging trends such as AI‑assisted troubleshooting, serverless clusters, and Tekton CI/CD, and provides three ready‑to‑copy kubectl commands to streamline daily management.

AIOpsOperationsServerless
0 likes · 9 min read
Avoid 90% of Kubernetes Ops Pitfalls: A Definitive Guide
Shuge Unlimited
Shuge Unlimited
Mar 17, 2026 · Operations

Exploring OpenClaw for K8s AIOps: Four Practical Scenarios from Concept to Deployment

This article analyzes how OpenClaw’s Skills, Subagent, and Cron capabilities can be leveraged to build Kubernetes AIOps solutions, presenting four detailed scenarios—fault diagnosis, resource optimization, security audit, and continuous health checks—while evaluating technical feasibility, security, reliability, cost, and a phased rollout plan.

AIOpsCloud NativeOpenClaw
0 likes · 19 min read
Exploring OpenClaw for K8s AIOps: Four Practical Scenarios from Concept to Deployment
Shuge Unlimited
Shuge Unlimited
Mar 15, 2026 · Operations

How OpenClaw Fixed a Self‑Upgraded, Unresponsive Instance in Just 3 Minutes

In a real‑world AIOps demo, the OpenClaw AI agent remotely diagnosed, pinpointed the OOM cause of a failed upgrade, rolled back to a stable version, and restored service within three minutes, illustrating its three core capabilities, cost advantages, feasibility analysis, and practical rollout guidance.

AI AgentAIOpsAuto‑Remediation
0 likes · 13 min read
How OpenClaw Fixed a Self‑Upgraded, Unresponsive Instance in Just 3 Minutes
Raymond Ops
Raymond Ops
Jan 28, 2026 · Artificial Intelligence

From Alert Storms to Smart Ops: Unlocking AIOps for Modern IT Operations

This guide walks through the evolution from noisy alert storms to intelligent AIOps, covering AIOps fundamentals, why it matters now, core capabilities like anomaly detection, root‑cause analysis, capacity forecasting and self‑healing, a practical implementation roadmap, toolchain suggestions, common pitfalls, and future trends.

AIOpsAnomaly DetectionRoot Cause Analysis
0 likes · 22 min read
From Alert Storms to Smart Ops: Unlocking AIOps for Modern IT Operations
Alibaba Cloud Developer
Alibaba Cloud Developer
Jan 12, 2026 · Operations

Why Traditional Monitoring Fails and How UModel Redefines Observability for AI‑Powered Ops

The article explains how legacy monitoring based on isolated metrics, traces, and logs cannot keep up with the massive, fragmented, and dynamic data of modern IT systems, and introduces UModel—a graph‑based observability model that bridges data, model, and engineering gaps to enable AI‑driven operations.

AIOpsGraph ModelingObservability
0 likes · 11 min read
Why Traditional Monitoring Fails and How UModel Redefines Observability for AI‑Powered Ops
Baidu Tech Salon
Baidu Tech Salon
Jan 8, 2026 · Artificial Intelligence

How Baidu’s AI‑Powered Architecture Transforms Network Operations

This article systematically presents Baidu Intelligent Cloud’s three‑layer AI architecture for network intelligent operations, explains the AI base, core, and business layers, showcases the NetStudio digital engineer platform, and details real‑world use cases, performance gains, and a roadmap toward fully autonomous network management.

AIAIOpsCloud Computing
0 likes · 26 min read
How Baidu’s AI‑Powered Architecture Transforms Network Operations
Alibaba Cloud Native
Alibaba Cloud Native
Jan 3, 2026 · Operations

Turning Chaotic Observability Data into Actionable Graphs with UModel

This article examines the evolution of IT observability, explains why traditional metrics, traces, and logs fall short for AI‑driven operations, and introduces UModel—a graph‑based universal observability model that structures fragmented data into a semantic runtime context for autonomous AIOps agents.

AIOpsCloud NativeGraph Modeling
0 likes · 12 min read
Turning Chaotic Observability Data into Actionable Graphs with UModel
Ray's Galactic Tech
Ray's Galactic Tech
Dec 2, 2025 · Operations

Build an End‑to‑End AIOps Solution: Log Alerts and Automated Self‑Healing Ops

This guide walks through designing and implementing an intelligent operations workflow that transforms passive log monitoring into proactive alerting and automated remediation, covering core concepts, tech‑stack selection, step‑by‑step configuration of log collection, alert rules, webhook integration, Ansible automation, and best‑practice considerations for scaling and security.

AIOpsAlertingAnsible
0 likes · 7 min read
Build an End‑to‑End AIOps Solution: Log Alerts and Automated Self‑Healing Ops
Huya Tech Engineering
Huya Tech Engineering
Nov 28, 2025 · Operations

How LLMs Accelerate Root‑Cause Diagnosis in Large‑Scale Microservices

By abstracting a massive microservice system as a dynamic multi‑layer graph and integrating large language models, the article outlines three evolution stages—from manual expert debugging to rule‑based AIOps and finally LLM‑driven cognitive reasoning—detailing practical workflows, context engineering, and real‑world case studies that dramatically improve MTTR and accuracy.

AIOpsLLMMicroservices
0 likes · 20 min read
How LLMs Accelerate Root‑Cause Diagnosis in Large‑Scale Microservices
Alibaba Cloud Observability
Alibaba Cloud Observability
Nov 10, 2025 · Cloud Native

How a Next‑Gen Cloud‑Native Observability Platform Boosted Ticketing Stability by 80%

A leading digital‑entertainment group tackled severe stability and monitoring challenges in its high‑traffic ticketing system by building a cloud‑native, full‑link observability platform on Alibaba Cloud, achieving an 80% improvement in fault detection speed, a 40% reduction in operational costs, and establishing data‑driven operations as the digital foundation for product growth.

AIOpsMonitoringObservability
0 likes · 15 min read
How a Next‑Gen Cloud‑Native Observability Platform Boosted Ticketing Stability by 80%
Efficient Ops
Efficient Ops
Oct 27, 2025 · Operations

How AI is Revolutionizing Observability and Intelligent Operations

At the GOPS Global Operations Conference in Shanghai, experts from finance, technology and energy sectors examined the challenges of observability, AIOps and intelligent agents, proposing metric standardization, digital‑twin fault simulation, and AI‑driven DevOps as key steps toward scalable, business‑value‑focused intelligent operations.

AI OpsAIOpsDigital Twin
0 likes · 6 min read
How AI is Revolutionizing Observability and Intelligent Operations
Ops Community
Ops Community
Oct 27, 2025 · Operations

From Midnight Alerts to Peaceful Sleep: Building a Zabbix Monitoring System

After a costly midnight outage, the author shares how he designed a three‑layer Zabbix monitoring architecture—covering infrastructure, service, and business metrics—optimizing alert thresholds, automating discovery, and integrating with ITSM, ultimately reducing MTTR to minutes and enabling teams to sleep peacefully.

AIOpsAlertingAutomation
0 likes · 15 min read
From Midnight Alerts to Peaceful Sleep: Building a Zabbix Monitoring System
Ops Community
Ops Community
Sep 24, 2025 · Operations

How Ops Engineers Can Stop Online Outages in Minutes: A Proven Emergency Playbook

This article outlines why a solid incident‑response plan is critical, describes typical failure scenarios, introduces the 3‑5‑10 rule for rapid diagnosis and mitigation, provides ready‑to‑run scripts for system checks, traffic throttling, service rollback, and showcases automation, AIOps and chaos‑engineering techniques to turn reactive firefighting into proactive resilience.

AIOpsMonitoringemergency plan
0 likes · 18 min read
How Ops Engineers Can Stop Online Outages in Minutes: A Proven Emergency Playbook
Wukong Talks Architecture
Wukong Talks Architecture
Sep 22, 2025 · Databases

How AI‑Powered AIOps Transforms TiDB Database Operations

This article explores how integrating AI‑driven AIOps with the TiDB distributed database can automate monitoring, enable proactive anomaly detection, streamline root‑cause analysis, and optimize capacity planning, ultimately shifting database operations from manual firefighting to intelligent, data‑driven management.

AIOpsDatabase operationsRoot Cause Analysis
0 likes · 12 min read
How AI‑Powered AIOps Transforms TiDB Database Operations
MaGe Linux Operations
MaGe Linux Operations
Sep 12, 2025 · Operations

From Alert Storms to Intelligent Ops: A Practical AIOps Journey

This article explores how AIOps transforms traditional IT operations by using AI for anomaly detection, root‑cause analysis, capacity forecasting, and self‑healing, offering a step‑by‑step roadmap, real‑world code examples, toolchain recommendations, common pitfalls, and future trends for building intelligent, automated operations.

AIOpsAnomaly DetectionRoot Cause Analysis
0 likes · 24 min read
From Alert Storms to Intelligent Ops: A Practical AIOps Journey
Efficient Ops
Efficient Ops
Aug 25, 2025 · Operations

How SOMM Is Revolutionizing Intelligent Ops with AIOps, SRE & FinOps

The China Academy of Information and Communications Technology introduced the SOMM (System Operation Maturity Model) framework, emphasizing tool intelligence, refined management, and robust operation, and detailed its AIOps, SRE, and FinOps assessment modules, evaluation criteria, maturity levels, and showcase of leading enterprises that have achieved top‑tier certifications.

AIOpsFinOpsMaturity Model
0 likes · 8 min read
How SOMM Is Revolutionizing Intelligent Ops with AIOps, SRE & FinOps
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Aug 5, 2025 · Operations

Inside Alibaba’s Tesla: Data‑Driven Ops for 100k+ Big Data Nodes

The article details how Alibaba’s Tesla SRE platform supports the massive offline and real‑time big‑data ecosystems through a layered, data‑driven operations framework—DataOps—integrating unified portals, configuration, job, workflow, and analytics platforms, enabling automated monitoring, intelligent decision‑making, and self‑healing capabilities across 100,000+ nodes.

AIOpsBig DataDataOps
0 likes · 20 min read
Inside Alibaba’s Tesla: Data‑Driven Ops for 100k+ Big Data Nodes
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Aug 5, 2025 · Operations

How Alibaba’s Open‑Source SREWorks Transforms Cloud‑Native Data Operations

Alibaba's SREWorks platform, now open‑source, combines cloud‑native architecture, DataOps and AIOps to address the growing complexity of big‑data and AI operations, offering a layered SaaS/PaaS/IaaS solution that streamlines delivery, monitoring, management, control, operation, and service for modern enterprises.

AIOpsCloud NativeDataOps
0 likes · 10 min read
How Alibaba’s Open‑Source SREWorks Transforms Cloud‑Native Data Operations
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Aug 5, 2025 · Operations

How Alibaba Automates Hardware Fault Detection and Self‑Healing at Scale

This article explains how Alibaba’s massive MaxCompute platform tackles the growing challenge of hardware failures by using predictive detection, automated server offline, self‑healing workflows, and cluster rebalancing to close the fault loop before business impact, while detailing the underlying architecture and operational principles.

AIOpsAlibaba CloudOperations Automation
0 likes · 14 min read
How Alibaba Automates Hardware Fault Detection and Self‑Healing at Scale
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Aug 4, 2025 · Operations

From Scripts to AIOps: How Alibaba’s Ops Evolved and What Skills You Need Today

Tracing Alibaba’s journey from manual, script‑based operations through tool‑centric and platform‑driven DevOps to the data‑focused DataOps era and emerging AIOps, the article outlines the shifting responsibilities, architectural challenges, and the multidisciplinary skill set required for modern operations engineers.

AIOpsDataOpsOperations
0 likes · 8 min read
From Scripts to AIOps: How Alibaba’s Ops Evolved and What Skills You Need Today
Ops Development Stories
Ops Development Stories
Jul 14, 2025 · Artificial Intelligence

Mastering AIOps: Prompt Engineering, Function Calling, RAG, Graph RAG, and Local LLM Deployment

This comprehensive guide explores AIOps techniques such as prompt engineering, chat completions, memory management, function calling, fine‑tuning, retrieval‑augmented generation (RAG), graph‑based RAG, and practical steps for deploying open‑source large language models locally, providing code examples and best‑practice recommendations for modern DevOps environments.

AIOpsFunction CallingGraph RAG
0 likes · 47 min read
Mastering AIOps: Prompt Engineering, Function Calling, RAG, Graph RAG, and Local LLM Deployment
Efficient Ops
Efficient Ops
Jul 2, 2025 · Cloud Computing

How ICBC’s AI‑Native Data Center Is Redefining Cloud Computing for Finance

Amid the AI‑driven wave of large‑model technologies, Industrial and Commercial Bank of China’s data center has transformed its traditional infrastructure into an AI‑native computing hub, boosting operational efficiency, green sustainability, and autonomous control while supporting the financial sector’s shift toward intelligent, cognitive services.

AI-nativeAIOpsCloud Computing
0 likes · 13 min read
How ICBC’s AI‑Native Data Center Is Redefining Cloud Computing for Finance
Ops Development Stories
Ops Development Stories
Jul 1, 2025 · Artificial Intelligence

From Lean to AIOps: How AI is Transforming Modern Operations

This comprehensive guide walks through the evolution from Lean and Agile practices to DevOps and finally AIOps, explaining core concepts, key algorithms, the role of large language models, RAG‑based root‑cause analysis, and practical implementation steps for intelligent operations.

AIOpsAgileLean
0 likes · 19 min read
From Lean to AIOps: How AI is Transforming Modern Operations
Efficient Ops
Efficient Ops
May 26, 2025 · Artificial Intelligence

How AI Agents Are Revolutionizing AIOps: Boosting Automation and Efficiency

This article explains how AI agents enhance large‑model capabilities for AIOps, detailing single‑agent use cases like knowledge retrieval, tool guidance, and fault diagnosis, as well as multi‑agent collaborations, required skills, and future prospects for autonomous operations.

AIAIOpsAgent
0 likes · 7 min read
How AI Agents Are Revolutionizing AIOps: Boosting Automation and Efficiency
Continuous Delivery 2.0
Continuous Delivery 2.0
Mar 14, 2025 · Operations

The Birth of DevOps: Breaking the Collaboration Wall

This article traces the evolution of DevOps from its 2009 origin, through automation, security, FinOps, platform engineering, and the rise of AI-driven intelligent automation, highlighting future trends such as AI-native toolchains, cognitive collaboration, and sustainable practices that reshape how development and operations work together.

AIAIOpsFinOps
0 likes · 7 min read
The Birth of DevOps: Breaking the Collaboration Wall
Efficient Ops
Efficient Ops
Feb 26, 2025 · Databases

Efficient Operations for Heterogeneous Databases: Insights from Guangdong Mobile

The article summarizes Lai Kunchi's presentation at the 24th GOPS Global Operations Conference, covering the current state and challenges of database development, Guangdong Mobile's database operation system, and future directions for managing heterogeneous databases in evolving business architectures.

AIOpsDatabase operationsSRE
0 likes · 3 min read
Efficient Operations for Heterogeneous Databases: Insights from Guangdong Mobile
Alibaba Cloud Observability
Alibaba Cloud Observability
Feb 17, 2025 · Operations

What’s Driving Observability in 2025? AIOps, OpenTelemetry, and eBPF Trends

The article outlines 2025 observability trends, covering the rise of AIOps platforms, AI‑driven prediction, OpenTelemetry becoming the de‑facto standard, unified telemetry platforms, the shift of observability left and right, eBPF’s role in platform engineering, and cost‑effective strategies for modern cloud‑native environments.

AIOpsObservabilityOpenTelemetry
0 likes · 10 min read
What’s Driving Observability in 2025? AIOps, OpenTelemetry, and eBPF Trends
Alibaba Cloud Developer
Alibaba Cloud Developer
Feb 13, 2025 · Operations

What Will Observability Look Like in 2025? Key Trends and Technologies

This article compiles predictions from multiple sources to outline ten common observability trends for 2025, covering AIOps platform evolution, AI‑driven prediction, OpenTelemetry adoption, unified monitoring, edge observability, shift‑left development, eBPF integration, log‑centric analytics, cost‑saving strategies, and proactive reliability.

2025 trendsAIOpsOpenTelemetry
0 likes · 12 min read
What Will Observability Look Like in 2025? Key Trends and Technologies
Efficient Ops
Efficient Ops
Feb 5, 2025 · Operations

FAW‑Volkswagen’s Integrated Tech‑Ops Platform: Key Practices, Challenges & Future Roadmap

At the 24th GOPS Global Operations Conference in Shanghai, FAW‑Volkswagen’s tech‑ops lead presented a detailed case study covering the platform’s background, implementation roadmap and results, encountered challenges, and future plans, offering practical insights into integrated DevOps, AIOps, and cloud‑native operations.

AIOpsCase StudyFAW-Volkswagen
0 likes · 3 min read
FAW‑Volkswagen’s Integrated Tech‑Ops Platform: Key Practices, Challenges & Future Roadmap
DataFunSummit
DataFunSummit
Jan 31, 2025 · Artificial Intelligence

LLMOps: Building a Prompt‑Driven Engine for AI Operations

This article presents the concept of LLMOps—applying large language models to AIOps—by analyzing prompt challenges, introducing the LogPrompt engine for log analysis, describing a prompt‑learning data flywheel with CoachLM optimization, reporting experimental results, and outlining future multi‑modal directions.

AIOpsCoachLMData Flywheel
0 likes · 16 min read
LLMOps: Building a Prompt‑Driven Engine for AI Operations
JD Tech Talk
JD Tech Talk
Jan 26, 2025 · Operations

Evolution of Operations and the Application of Large Models in Modern IT Ops

This article reviews the transformation of IT operations from manual processes to automation, AIOps, and ChatOps, and examines how large language models enhance intelligent assistance, automated diagnosis, and log analysis to improve efficiency, reliability, and rapid incident resolution.

AIOpsAutomationChatOps
0 likes · 7 min read
Evolution of Operations and the Application of Large Models in Modern IT Ops
JD Cloud Developers
JD Cloud Developers
Jan 26, 2025 · Operations

How Large Language Models are Transforming Modern IT Operations

This article traces the evolution of IT operations from manual tasks to automation, AIOps, and ChatOps, and explains how large language models boost efficiency, enable intelligent assistants, automated diagnosis, and smart log analysis for more reliable, automated Ops workflows.

AIOpsChatOpslarge language models
0 likes · 7 min read
How Large Language Models are Transforming Modern IT Operations
Efficient Ops
Efficient Ops
Jan 20, 2025 · Operations

Inside Qunar’s Pre‑Release Platform: Design, Practice, and Future Outlook

The article recaps Li Jingkang’s presentation at the 2024 GOPS Global Operations Conference, detailing the background, principles, design, and real‑world implementation of Qunar’s pre‑release platform, and outlines its future direction within DevOps, SRE, AIOps, and cloud‑native practices.

AIOpsCloud NativeOperations
0 likes · 3 min read
Inside Qunar’s Pre‑Release Platform: Design, Practice, and Future Outlook
Efficient Ops
Efficient Ops
Dec 2, 2024 · Operations

How AI‑Driven Parameter Governance Transforms DevOps Efficiency

This article explains how AI‑powered parameter governance, integrated with DevOps and AIOps practices, tackles the explosion of configuration parameters in large‑scale financial systems, streamlines design, auditing, detection, and deployment, and ultimately boosts operational efficiency and risk control.

AIOpsAutomationOperations
0 likes · 8 min read
How AI‑Driven Parameter Governance Transforms DevOps Efficiency
21CTO
21CTO
Nov 22, 2024 · Artificial Intelligence

How AI Can Erase Technical Debt and Reignite Developer Joy

Atlassian’s CTO explains how generative AI can eliminate outdated tools, reduce technical debt, streamline documentation, and automate alert handling, ultimately boosting developer productivity and satisfaction while restoring the fun of building innovative software.

AIAIOpsDeveloper Experience
0 likes · 8 min read
How AI Can Erase Technical Debt and Reignite Developer Joy
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Nov 22, 2024 · Artificial Intelligence

AI and the Next-Generation Internet: Insights from Alibaba Cloud VP Cai Dezhi at the 2024 Wuzhen Summit

At the 2024 Wuzhen Summit, Alibaba Cloud R&D Vice President Cai Dezhi discussed the convergence of AI and next‑generation internet, outlining the “Network for AI” and “AI for Network” concepts, the HPN7.0 high‑performance network, AI‑driven operations, and the importance of open standards and protocol innovation to lower costs and enable widespread AI adoption.

AIAIOpsNetwork Architecture
0 likes · 4 min read
AI and the Next-Generation Internet: Insights from Alibaba Cloud VP Cai Dezhi at the 2024 Wuzhen Summit
Efficient Ops
Efficient Ops
Nov 20, 2024 · Operations

How China’s Telecom Leaders Accelerate DevOps & AIOps Standards for Faster Delivery

The article outlines China’s 2024‑2027 information standard action plan, the rollout of ITU DevOps and AIOps assessments, and showcases dozens of telecom projects that achieved significant improvements in delivery speed, reliability, automation and observability through standardized DevOps, SRE and AI‑ops practices.

AIOpsSREStandardization
0 likes · 23 min read
How China’s Telecom Leaders Accelerate DevOps & AIOps Standards for Faster Delivery
Efficient Ops
Efficient Ops
Oct 24, 2024 · Operations

How Migu’s AI‑Powered Observability Boosts Cloud Gaming Operations

During the 24th GOPS Global Operations Conference, Migu Interactive Entertainment’s Vice President Su Yi discussed how their AI‑driven AIOps observability framework, validated by ITU standards, enhances cloud gaming platform stability, accelerates issue detection, and supports China Mobile’s 5G‑based digital transformation.

AIAIOpsObservability
0 likes · 19 min read
How Migu’s AI‑Powered Observability Boosts Cloud Gaming Operations
Efficient Ops
Efficient Ops
Oct 19, 2024 · Operations

How Migu’s Cloud Gaming Platform Achieved Leading AIOps Observability Standards

Migu Interactive Entertainment’s interview reveals how its cloud gaming platform leveraged AI, 5G, and standardized observability practices to pass both international and domestic AIOps assessments, highlighting the strategic importance of intelligent operations for business continuity in complex, distributed systems.

AIAIOpsIntelligent Operations
0 likes · 17 min read
How Migu’s Cloud Gaming Platform Achieved Leading AIOps Observability Standards
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Oct 9, 2024 · Operations

AIOps Implementation at Xiaohongshu: Fault Localization and Intelligent Operations

Xiaohongshu’s AIOps initiative builds a four‑layer framework that leverages machine‑learning‑driven anomaly detection, causal analysis, and trace‑based fault localization to automatically identify root‑cause services in micro‑service environments, achieving over 80 % accuracy across 1000 daily diagnoses while guiding future enhancements in change correlation and automated remediation.

AIOpsAnomaly DetectionFault Localization
0 likes · 28 min read
AIOps Implementation at Xiaohongshu: Fault Localization and Intelligent Operations
DevOps
DevOps
Aug 26, 2024 · Operations

The Evolution of Operations: From Manual Ops to AIOps and ChatOps

This article explores the progression of IT operations—from manual processes through automated DevOps, to AI‑driven AIOps and chat‑based ChatOps—examining concepts, advantages, tools, and future possibilities, while also reflecting on how these trends reshape the role of operations engineers.

AIAIOpsAutomation
0 likes · 12 min read
The Evolution of Operations: From Manual Ops to AIOps and ChatOps
Efficient Ops
Efficient Ops
Aug 4, 2024 · Artificial Intelligence

What the 2024 China AIOps Survey Reveals About Smart Operations Trends

The 2024 XOps Forum in Beijing showcased a new era of smart operations, unveiling a record‑breaking AIOps survey that highlights rapid investment growth, rising adoption of large language models, evolving maturity levels, and key challenges such as model accuracy and data quality across Chinese enterprises.

AIOpsChinaCloud Computing
0 likes · 7 min read
What the 2024 China AIOps Survey Reveals About Smart Operations Trends
DataFunSummit
DataFunSummit
Jul 15, 2024 · Operations

Intelligent Operations (AIOps) Insights, Planning, and Large‑Model Agent Practices at ByteDance

The article summarizes ByteDance's intelligent operations (AIOps) strategy, covering frontier concepts, a five‑level automation roadmap, large‑model applications for fault diagnosis and smart Q&A, and a comprehensive AIOps platform that accelerates algorithm deployment, improves efficiency, and reduces operational costs.

AI AgentsAIOpsIntelligent Operations
0 likes · 21 min read
Intelligent Operations (AIOps) Insights, Planning, and Large‑Model Agent Practices at ByteDance
JD Cloud Developers
JD Cloud Developers
Jul 2, 2024 · Operations

How Large Language Models Are Transforming Modern IT Operations

From manual server management to automated scripts, AIOps, and ChatOps, this article traces the evolution of IT operations and demonstrates how large language models boost efficiency, enable intelligent assistants, automated diagnostics, and smart log analysis, aiming for rapid fault detection, localization, and resolution.

AIOpsAutomationChatOps
0 likes · 7 min read
How Large Language Models Are Transforming Modern IT Operations
ByteDance SYS Tech
ByteDance SYS Tech
Jun 30, 2024 · Operations

How Large‑Model AI Is Transforming Intelligent Operations (AIOps)

This article explores the latest concepts, planning roadmap, and practical applications of large‑model AI in intelligent operations, detailing AIOps use cases, system‑level automation, multi‑agent architectures, and how a dedicated platform accelerates deployment and efficiency across data‑center environments.

AI AgentsAIOpsAutomation
0 likes · 18 min read
How Large‑Model AI Is Transforming Intelligent Operations (AIOps)
Baidu Tech Salon
Baidu Tech Salon
May 27, 2024 · Artificial Intelligence

Intelligent Agent Technology in Commercial Advertising Platforms: Architecture and Applications

The paper describes Baidu’s AI‑native advertising platform that employs a multi‑agent architecture built on large‑language models—combining large‑small model collaboration, domain SOP‑driven coordination, and long‑term memory—to enable natural‑language understanding, proactive planning, execution and human‑like responses, illustrated by GBI analytics and JarvisBot operations, delivering higher consumption, accuracy, speed and efficiency.

AI-native platformsAIOpsBusiness Intelligence
0 likes · 16 min read
Intelligent Agent Technology in Commercial Advertising Platforms: Architecture and Applications
vivo Internet Technology
vivo Internet Technology
May 15, 2024 · Databases

Challenges and New Technology Exploration in Vivo Database Operations Platform

At the 2024 XCOPS Intelligent Operations Management Annual Meeting in Guangzhou, Vivo’s Deng Song will discuss building a robust database operations platform, addressing availability threats, efficiency levers, 0‑to‑1 development strategies, and considerations of reliability, cost, and data privacy amid emerging AI and large‑model technologies.

AIOpsPlatformReliability
0 likes · 3 min read
Challenges and New Technology Exploration in Vivo Database Operations Platform
Efficient Ops
Efficient Ops
May 14, 2024 · Artificial Intelligence

How Large‑Model Agents Are Revolutionizing AIOps and Modern Operations

This article explores why large‑model Agent technology is essential for AIOps, explains single‑ and multi‑Agent architectures, memory and tool integration, and demonstrates practical applications such as anomaly detection, fault diagnosis, automated remediation, ChatOps, and future directions for intelligent, autonomous operations.

AI AgentsAIOpsLLM
0 likes · 14 min read
How Large‑Model Agents Are Revolutionizing AIOps and Modern Operations
DataFunSummit
DataFunSummit
Apr 21, 2024 · Operations

The Value, Challenges, and Future of AIOps in Modern Enterprises

AIOps leverages AI to automate IT monitoring, predict failures, and optimize resources, offering modern enterprises reduced operational workload and higher reliability, while facing challenges such as data governance, automation, hierarchical monitoring, and large‑model hallucinations that must be addressed for successful deployment.

AIOpsIT OperationsOperations Automation
0 likes · 2 min read
The Value, Challenges, and Future of AIOps in Modern Enterprises
Efficient Ops
Efficient Ops
Mar 10, 2024 · Databases

How Machine Learning Can Automate MySQL Index Optimization

This article explains how applying machine learning to database operations—specifically AIOps for MySQL—can automate index recommendation by parsing SQL, extracting semantic and statistical features, generating candidate index combinations, and training an XGBoost model to predict optimal indexes, reducing reliance on manual DBA work.

AIOpsIndex OptimizationSQL
0 likes · 10 min read
How Machine Learning Can Automate MySQL Index Optimization
dbaplus Community
dbaplus Community
Feb 4, 2024 · Operations

How Ant Group Leverages SLO and AIOps for Fine‑Grained Operations

This article details Ant Group's practical implementation of Service Level Objectives (SLO) and AIOps to achieve fine‑grained operations, covering SLO fundamentals, health‑score architecture, GitOps‑based data pipelines, error‑budget alerting, AI‑driven anomaly detection, fault localization techniques, and real‑world case studies on dashboards, Kubernetes SLOs, and emergency response workflows.

AIOpsError BudgetFault Localization
0 likes · 38 min read
How Ant Group Leverages SLO and AIOps for Fine‑Grained Operations
dbaplus Community
dbaplus Community
Jan 29, 2024 · Artificial Intelligence

How Meituan Uses AIOps to Revolutionize Incident Management

This article details Meituan's two‑year exploration of AIOps for incident management, covering the challenges of massive, real‑time operational data, the AI‑driven modules for risk prevention, fault detection, diagnosis, and similar‑incident recommendation, and future directions such as intelligent log detection and change recognition.

AIOpsAnomaly DetectionOperations
0 likes · 22 min read
How Meituan Uses AIOps to Revolutionize Incident Management
Efficient Ops
Efficient Ops
Jan 17, 2024 · Operations

How China’s Telecom Giants Accelerate IT Efficiency with DevOps Maturity Assessments

In the context of digital transformation, six leading Chinese telecom operators applied the CAICT DevOps Capability Maturity Model to evaluate dozens of projects, achieving significant improvements in continuous delivery, technical operations, security, and AIOps, providing valuable references for the industry.

AIOpsContinuous DeliveryIT Operations
0 likes · 18 min read
How China’s Telecom Giants Accelerate IT Efficiency with DevOps Maturity Assessments
Efficient Ops
Efficient Ops
Jan 9, 2024 · Operations

What Do 2023 DevOps & AIOps Assessments Reveal About China’s Digital Transformation?

Amid China's sweeping digital, networked, and intelligent transformation, over 100 leading enterprises across banking, finance, communications, manufacturing, and other sectors have participated in DevOps and AIOps maturity model evaluations, providing a comprehensive view of industry adoption, capability levels, and emerging best practices for 2023.

AIOpsOperationsassessment
0 likes · 15 min read
What Do 2023 DevOps & AIOps Assessments Reveal About China’s Digital Transformation?
High Availability Architecture
High Availability Architecture
Jan 9, 2024 · Operations

AIOps Practices for Incident Management at Meituan: From Risk Prevention to Post‑Operation

This article presents Meituan's two‑year exploration of AIOps in incident management, detailing risk‑prevention change detection, real‑time anomaly discovery, automated root‑cause diagnosis, multi‑dimensional KPI analysis, and similar‑event recommendation, while sharing architectural designs, algorithmic techniques, performance results, and future directions.

AIOpsAnomaly DetectionIncident Management
0 likes · 24 min read
AIOps Practices for Incident Management at Meituan: From Risk Prevention to Post‑Operation
Efficient Ops
Efficient Ops
Jan 8, 2024 · Operations

What Do 2023 DevOps & AIOps Assessments Reveal About China’s Digital Transformation?

Amid China's sweeping digital transformation, the China Academy of Information and Communications Technology (CAICT) reports that 104 leading enterprises across banking, securities, insurance, telecom, manufacturing and other sectors have completed 336 DevOps maturity assessments and 23 enterprises have finished 45 AIOps assessments in 2023, highlighting industry‑wide adoption of DevOps and AIOps standards and offering detailed breakdowns by sector, evaluation levels, and future guidance.

AIOpsMaturity ModelOperations
0 likes · 16 min read
What Do 2023 DevOps & AIOps Assessments Reveal About China’s Digital Transformation?
Efficient Ops
Efficient Ops
Dec 26, 2023 · Operations

What Is ITU’s New AIOps Standard and How It Shapes Cloud Operations?

The article explains the ITU‑T Y.3550 AIOps standard, its AI‑driven cloud service development and operation requirements, the Chinese AIOps maturity‑model series, and the latest assessment results showing dozens of enterprises adopting these intelligent‑operations capabilities.

AIAIOpsITU standard
0 likes · 6 min read
What Is ITU’s New AIOps Standard and How It Shapes Cloud Operations?
Meituan Technology Team
Meituan Technology Team
Dec 21, 2023 · Operations

AIOps for Incident Management: Practices and Insights from Meituan

Meituan’s service‑operations team applies AIOps across prevention, detection, and post‑incident stages—using change‑risk analysis, real‑time graph‑based anomaly detection, similarity‑driven root‑cause diagnosis, and NLP‑powered incident recommendation—to achieve sub‑second detection, high precision, 28% faster fault handling, and plans for intelligent log and change recognition.

AIOpsAnomaly DetectionIncident Management
0 likes · 24 min read
AIOps for Incident Management: Practices and Insights from Meituan
Efficient Ops
Efficient Ops
Dec 18, 2023 · Artificial Intelligence

How Mobile Cloud Earned Top‑Tier AIOps Certification and What It Means for Intelligent Operations

The article details Mobile Cloud's successful third‑level AIOps assessment by the China Information and Communication Academy, explores the platform's architecture and intelligent operation capabilities, shares interview insights on challenges, benefits, and future plans, and presents industry‑wide AIOps maturity statistics.

AIOpsCloud ComputingIT Operations
0 likes · 12 min read
How Mobile Cloud Earned Top‑Tier AIOps Certification and What It Means for Intelligent Operations
Bilibili Tech
Bilibili Tech
Dec 15, 2023 · Operations

Bilibili Alert Monitoring System: Design, Optimization, and Root‑Cause Analysis

Bilibili revamped its alert monitoring platform to meet rapid growth, focusing on effectiveness, timeliness, and coverage; it introduced a closed‑loop design and governance that cut weekly alerts by 90%, built a knowledge‑graph root‑cause system achieving 87.9% accuracy with sub‑minute latency, and integrated AIOps for ongoing refinement.

AIOpsAlert MonitoringBilibili
0 likes · 21 min read
Bilibili Alert Monitoring System: Design, Optimization, and Root‑Cause Analysis
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Nov 29, 2023 · Operations

How AIOps and DataOps Transform Big Data Operations: Lessons from ABM Platform

This article examines the challenges of big‑data operations, explains how DataOps and AIOps complement each other, and details the ABM intelligent operations architecture, platform components, and real‑world use cases such as Flink hotspot detection, ChatOps assistants, and dynamic MaxCompute resource optimization.

AIOpsBig Data OperationsDataOps
0 likes · 11 min read
How AIOps and DataOps Transform Big Data Operations: Lessons from ABM Platform
Efficient Ops
Efficient Ops
Nov 8, 2023 · Operations

How Intelligent Operations (AIOps) Transforms IT Management and Self‑Healing

This article explains what intelligent operations (AIOps) are, outlines a four‑layer platform architecture, and showcases real‑world practices such as load‑balancing link repair, MySQL container self‑healing, composite service tracing, component‑based orchestration, and AI‑driven log analysis, concluding with future prospects.

AIOpsAutomationIT Operations
0 likes · 7 min read
How Intelligent Operations (AIOps) Transforms IT Management and Self‑Healing

How Transparent AI Boosts Trust in AIOps: Explainable Root‑Cause Solutions

This article examines the rapid growth of the Chinese IT operations market, explains why AIOps faces trust challenges due to opaque deep‑learning models, and presents AsiaInfo's transparent‑model and post‑hoc explanation engine together with three concrete explainable root‑cause analysis methods, concluding with future outlooks for trustworthy AIOps.

AI TrustAIOpsOperations
0 likes · 13 min read
How Transparent AI Boosts Trust in AIOps: Explainable Root‑Cause Solutions
Didi Tech
Didi Tech
Sep 5, 2023 · Operations

Observability and Stability Engineering in Didi Ride‑Hailing Platform

At Didi, observability and stability engineering combine automated, AI‑driven alarm generation, distributed tracing, and ChatOps‑based fault handling to manage micro‑service complexity, massive traffic spikes, and cross‑region operations, emphasizing systematic investment, AIOps evolution, and a recruitment call for backend and test engineers.

AIOpsDidiObservability
0 likes · 16 min read
Observability and Stability Engineering in Didi Ride‑Hailing Platform