Tag

AI Ops

0 views collected around this technical thread.

Efficient Ops
Efficient Ops
Apr 22, 2025 · Operations

How AI Agents Are Transforming IT Operations and Fault Management

This article explores how AI agents powered by large models can predict failures, perform root‑cause analysis, enhance knowledge‑based Q&A, automate change releases, and enable intelligent decision‑making, dramatically improving efficiency and reliability in modern IT operations.

AI OpsFault Predictionautomation
0 likes · 7 min read
How AI Agents Are Transforming IT Operations and Fault Management
Efficient Ops
Efficient Ops
Oct 19, 2024 · Operations

How Inner Mongolia Mobile Achieved Leading SRE Maturity – Lessons from the DevOps Assessment

The article explores the growing importance of system reliability in China, the national regulations driving SRE adoption, Inner Mongolia Mobile’s successful Level‑3 SRE assessment at the 2024 GOPS conference, and insights from Deputy GM Zhang Yongtao on practices, challenges, and future plans.

AI OpsDevOpsIT Operations
0 likes · 14 min read
How Inner Mongolia Mobile Achieved Leading SRE Maturity – Lessons from the DevOps Assessment
Architect
Architect
Feb 25, 2023 · Cloud Native

Deploying a K8s ChatGPT Bot with Robusta: A Step‑by‑Step Guide

This article walks through installing Robusta, configuring Slack integration, adding Helm repositories, deploying the Robusta platform on a Kubernetes cluster, creating a crash‑loop pod to trigger alerts, and interacting with a ChatGPT bot to automatically troubleshoot Prometheus alerts, providing complete code snippets and screenshots for each step.

AI OpsChatGPTHelm
0 likes · 12 min read
Deploying a K8s ChatGPT Bot with Robusta: A Step‑by‑Step Guide
Efficient Ops
Efficient Ops
Oct 21, 2020 · Operations

How AI Enables Unattended Cloud Server Management and Self‑Service Automation

This article explains how Alibaba Cloud leverages AI and data‑driven automation to provide unattended, self‑service management for ECS instances, reducing operational costs, improving incident response speed, and ensuring stable, efficient cloud server operations.

AI OpsECScloud computing
0 likes · 9 min read
How AI Enables Unattended Cloud Server Management and Self‑Service Automation
Efficient Ops
Efficient Ops
Mar 14, 2019 · Cloud Native

How Alibaba Automates Cloud‑Native Operations at Massive Scale

This article explains Alibaba's intelligent, automated approach to managing large‑scale cloud‑native applications, covering challenges of scale, safety, and efficiency, and how AI‑driven decision making improves stability while reducing operational costs.

AI OpsAlibabacloud automation
0 likes · 8 min read
How Alibaba Automates Cloud‑Native Operations at Massive Scale
58 Tech
58 Tech
Feb 21, 2019 · Artificial Intelligence

Threshold‑Free Business Metric Monitoring Using Machine Learning

This article describes how a machine‑learning‑driven monitoring system replaces fixed thresholds with personalized, anomaly‑based detection for business‑level metrics such as network traffic and access volume, detailing the architecture, sample labeling, model training, alarm grading, and operational benefits.

AI OpsAnomaly Detectionalarm grading
0 likes · 8 min read
Threshold‑Free Business Metric Monitoring Using Machine Learning
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Sep 21, 2018 · Operations

Intelligent Operations Sessions at the 2018 Hangzhou Yunqi Conference

The 2018 Hangzhou Yunqi Conference featured a series of expert talks on intelligent operations, covering Alibaba's AI‑driven maintenance systems, robust supply‑chain optimization, data‑center automation, MSP transformation, and AI‑Ops practices, providing actionable insights for large‑scale infrastructure management.

AI OpsAlibabaIntelligent Operations
0 likes · 12 min read
Intelligent Operations Sessions at the 2018 Hangzhou Yunqi Conference
Qunar Tech Salon
Qunar Tech Salon
Jul 13, 2018 · Operations

Automated Network Failure Detection and Intelligent Switching System at Qunar

This article describes Qunar's automated network outage detection and intelligent traffic switching system, detailing the problem background, solution architecture, component functions, workflow, optimization steps, and future plans for more precise, multi‑level failover handling.

AI OpsDNSFailover
0 likes · 10 min read
Automated Network Failure Detection and Intelligent Switching System at Qunar
Efficient Ops
Efficient Ops
Apr 26, 2018 · Operations

How 360 Detects Network Anomalies with AI‑Powered Time‑Series Algorithms

This article explains how 360’s network operations team uses time‑series analysis, statistical thresholds, EWMA, dynamic limits, and machine‑learning models such as K‑Means and Isolation Forest to automatically detect, locate, and remediate traffic anomalies across massive data‑center exits.

AI OpsAnomaly Detectionmachine learning
0 likes · 15 min read
How 360 Detects Network Anomalies with AI‑Powered Time‑Series Algorithms
Efficient Ops
Efficient Ops
Jan 8, 2018 · Operations

360° Intelligent IT Operations: From Scripts to AI‑Driven Automation

This article summarizes a GOPS 2017 Shanghai talk that outlines a comprehensive, data‑driven IT operations framework for large enterprises, covering management体系, business‑centric monitoring, big‑data log analysis, multi‑dimensional reporting, monitoring platform evolution, and autonomous fault‑healing with AI.

AI OpsBig DataIT Operations
0 likes · 17 min read
360° Intelligent IT Operations: From Scripts to AI‑Driven Automation
Efficient Ops
Efficient Ops
Dec 12, 2017 · Operations

Sogou’s AI‑Powered Ops: Smart Circuit Breaker, Fault Localization & Chatbot

This article examines the three major pain points faced by Sogou's operations engineers—worry cost, insufficient intelligence, and annoyance cost—and explains how the company applies AI through intelligent circuit breaking, fault localization, and a chatbot to streamline reliability and reduce manual effort.

AI OpsChatbotIntelligent Monitoring
0 likes · 10 min read
Sogou’s AI‑Powered Ops: Smart Circuit Breaker, Fault Localization & Chatbot