Tagged articles
13 articles
Page 1 of 1
TAL Education Technology
TAL Education Technology
Jun 13, 2025 · Operations

How Large Language Models Are Revolutionizing Fault Localization

This article explores how the rapid rise of large language models and techniques like Retrieval‑Augmented Generation, Chain‑of‑Thought prompting, and multi‑agent architectures can dramatically improve the speed, accuracy, and automation of fault localization in modern operations environments.

Agent ArchitectureCoTFault Localization
0 likes · 14 min read
How Large Language Models Are Revolutionizing Fault Localization
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Oct 9, 2024 · Operations

AIOps Implementation at Xiaohongshu: Fault Localization and Intelligent Operations

Xiaohongshu’s AIOps initiative builds a four‑layer framework that leverages machine‑learning‑driven anomaly detection, causal analysis, and trace‑based fault localization to automatically identify root‑cause services in micro‑service environments, achieving over 80 % accuracy across 1000 daily diagnoses while guiding future enhancements in change correlation and automated remediation.

DevOpsFault LocalizationIntelligent Operations
0 likes · 28 min read
AIOps Implementation at Xiaohongshu: Fault Localization and Intelligent Operations
Baidu Geek Talk
Baidu Geek Talk
Jul 15, 2024 · Industry Insights

How AI Is Revolutionizing Physical Network Fault Localization

This article explains how Baidu Cloud evolved from manual and integrated network fault detection to AI-driven localization using large language models, detailing structured prompting, multi‑agent workflows, and real‑world comparisons that demonstrate improved accuracy and faster mitigation.

AIFault LocalizationInfrastructure
0 likes · 14 min read
How AI Is Revolutionizing Physical Network Fault Localization
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Jul 10, 2024 · Artificial Intelligence

How AI Transforms Physical Network Fault Localization: From Manual to LLM‑Powered Precision

This article explains how Baidu Cloud evolved its physical network fault‑location workflow—from manual analysis and integrated multi‑signal algorithms to AI‑driven reasoning with large language models—highlighting structured prompting, multi‑agent collaboration, and measurable improvements in accuracy and automation.

AIAutomationFault Localization
0 likes · 15 min read
How AI Transforms Physical Network Fault Localization: From Manual to LLM‑Powered Precision
dbaplus Community
dbaplus Community
Feb 4, 2024 · Operations

How Ant Group Leverages SLO and AIOps for Fine‑Grained Operations

This article details Ant Group's practical implementation of Service Level Objectives (SLO) and AIOps to achieve fine‑grained operations, covering SLO fundamentals, health‑score architecture, GitOps‑based data pipelines, error‑budget alerting, AI‑driven anomaly detection, fault localization techniques, and real‑world case studies on dashboards, Kubernetes SLOs, and emergency response workflows.

Error BudgetFault LocalizationKubernetes
0 likes · 38 min read
How Ant Group Leverages SLO and AIOps for Fine‑Grained Operations
vivo Internet Technology
vivo Internet Technology
Jan 4, 2023 · Artificial Intelligence

Root Cause Localization Algorithm and Its Implementation for Service Fault Diagnosis

The article describes a root‑cause localization algorithm implemented in vivo’s monitoring platform that automatically analyzes latency spikes by splitting service timelines, computing variance, clustering results with K‑means, and recursively tracing downstream services, achieving over 85 % accuracy for dependency failures while still requiring human verification and outlining future AI‑driven enhancements.

Fault LocalizationK-MeansRoot Cause Analysis
0 likes · 13 min read
Root Cause Localization Algorithm and Its Implementation for Service Fault Diagnosis
Baidu Intelligent Testing
Baidu Intelligent Testing
Dec 21, 2022 · Operations

Intelligent Test Localization Practices: Spectrum-Based Fault Localization, Error-Code Build System, Revenue‑Loss Decision, and UI Case Localization

This article presents a comprehensive overview of intelligent test localization techniques—including spectrum‑based fault localization, error‑code driven build‑system localization, commercial revenue‑loss decision making, and UI case‑level tracing—detailing their motivations, methodologies, algorithms, and practical applications within automated testing pipelines.

AutomationFault LocalizationSoftware Testing
0 likes · 10 min read
Intelligent Test Localization Practices: Spectrum-Based Fault Localization, Error-Code Build System, Revenue‑Loss Decision, and UI Case Localization
Baidu Geek Talk
Baidu Geek Talk
Dec 20, 2022 · Industry Insights

How AI‑Powered Fault Localization Transforms Automated Testing at Scale

This article explores Baidu's intelligent testing practices, covering spectrum‑based root‑cause localization, error‑code driven build‑system diagnostics, revenue‑change stop‑loss decision workflows, and search UI case‑level tracing, illustrating how data, algorithms, and engineering combine to reduce manual effort and accelerate issue resolution.

Automated TestingFault LocalizationOperations
0 likes · 10 min read
How AI‑Powered Fault Localization Transforms Automated Testing at Scale
NetEase Game Operations Platform
NetEase Game Operations Platform
Sep 19, 2022 · Artificial Intelligence

Applying AIOps to Game Operations: Roadmap, Anomaly Detection, and Fault Localization

This article describes NetEase's AIOps journey for game operations, explaining the Gartner definition of intelligent operations, the implementation roadmap, detailed anomaly‑detection techniques for business, performance, and log data, and a comprehensive fault‑localization workflow that combines resource, code, and historical analysis.

Fault Localizationaiopsanomaly detection
0 likes · 12 min read
Applying AIOps to Game Operations: Roadmap, Anomaly Detection, and Fault Localization
Xianyu Technology
Xianyu Technology
Jul 28, 2020 · Operations

ShenTan: Automated Fault Localization System for Online Services

ShenTan is an automated fault‑localization platform for online services that quickly (under five seconds) pinpoints server‑side issues with developer‑level accuracy by aggregating real‑time metrics, applying a decision‑tree model enriched by expert knowledge and dynamic thresholds, and presenting results through an integrated alert and visualization system, while planning broader endpoint coverage and multi‑tenant support.

AutomationBig DataFault Localization
0 likes · 12 min read
ShenTan: Automated Fault Localization System for Online Services
Xianyu Technology
Xianyu Technology
Jul 23, 2019 · Operations

Automated Service Fault Localization System Architecture

The automated service fault localization system ingests massive real‑time instrumentation data, builds call‑chain graphs, and instantly pinpoints the exact component causing timeouts or other errors, achieving developer‑level accuracy within seconds instead of minutes while remaining simple, fast, and fully automated.

Big DataFault LocalizationOperations
0 likes · 8 min read
Automated Service Fault Localization System Architecture
Efficient Ops
Efficient Ops
Dec 12, 2017 · Operations

Sogou’s AI‑Powered Ops: Smart Circuit Breaker, Fault Localization & Chatbot

This article examines the three major pain points faced by Sogou's operations engineers—worry cost, insufficient intelligence, and annoyance cost—and explains how the company applies AI through intelligent circuit breaking, fault localization, and a chatbot to streamline reliability and reduce manual effort.

ChatbotFault Localizationintelligent monitoring
0 likes · 10 min read
Sogou’s AI‑Powered Ops: Smart Circuit Breaker, Fault Localization & Chatbot
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Dec 1, 2016 · Operations

Why Distributed Tracing Systems Are Essential for Modern Microservices

As microservice architectures grow, service calls become increasingly complex, involving dozens of services and teams, making rapid fault localization and comprehensive data analysis critical; distributed tracing systems address these challenges by providing end‑to‑end visibility, low‑overhead instrumentation, and scalable monitoring across large‑scale applications.

Distributed TracingFault LocalizationMicroservices
0 likes · 8 min read
Why Distributed Tracing Systems Are Essential for Modern Microservices