Tag

alert management

0 views collected around this technical thread.

Bilibili Tech
Bilibili Tech
Sep 8, 2023 · Operations

Design, Implementation, and Governance of an Alert Management Platform

The article details Bilibili’s comprehensive alert‑management platform—its background, cloud‑vs‑self‑built solution comparison, closed‑loop design, distributed architecture, rule configuration, noise‑reduction, automated root‑cause analysis, and governance practices that cut weekly alerts from 1,000 to under 80, while outlining future enhancements.

DevOpsObservabilitySRE
0 likes · 19 min read
Design, Implementation, and Governance of an Alert Management Platform
DataFunSummit
DataFunSummit
Apr 15, 2023 · Operations

Observability and Intelligent Alert Management Practices

This presentation outlines the observability ecosystem, the role and value of alerts within it, core functionalities of an intelligent alarm management platform, best‑practice recommendations, and a real‑world case study of deploying a unified observability solution for a large state‑owned investment group.

AIOpsIT OperationsObservability
0 likes · 11 min read
Observability and Intelligent Alert Management Practices
Efficient Ops
Efficient Ops
Sep 28, 2022 · Operations

How Event‑Driven Alert Centers Revolutionize Intelligent Operations

This article presents a comprehensive overview of an event‑centric intelligent alert analysis platform, covering its evolution, core challenges, the concept of alert events, AI‑driven correlation techniques, and the MC‑Stack platform that powers modern operations.

AIOpsalert managementevent-driven monitoring
0 likes · 13 min read
How Event‑Driven Alert Centers Revolutionize Intelligent Operations
Efficient Ops
Efficient Ops
Jun 1, 2022 · Operations

What Can Aircraft Monitoring Teach Us About Building Effective IT Operations Monitoring?

The article explores how aviation‑grade monitoring concepts—such as multi‑level alarm classification, diverse alert delivery methods, and comprehensive sensor coverage—can inspire centralized, data‑driven IT operations monitoring architectures that reduce missed alerts, false positives, and improve response times.

AIOpsalert managementcentralized monitoring
0 likes · 33 min read
What Can Aircraft Monitoring Teach Us About Building Effective IT Operations Monitoring?
DeWu Technology
DeWu Technology
May 16, 2022 · Operations

NOC SLA Implementation for Consumer Trading Platform

To tackle growing production complexity and past incident delays, the consumer trading platform introduced a three‑tier NOC‑SLA with intelligent baselines powered by Facebook Prophet, streamlined alert rules, and an SOS‑linked workflow, boosting detection frequency, cutting critical response times to under five minutes, and improving overall system reliability while emphasizing ongoing baseline and rule maintenance.

NOCSLAalert management
0 likes · 13 min read
NOC SLA Implementation for Consumer Trading Platform
Efficient Ops
Efficient Ops
Jun 15, 2021 · Operations

Mastering IT Monitoring: Strategies, Challenges, and Best Practices

This article explores the fundamentals of IT monitoring, examines common challenges such as scalability, reliability, and alert fatigue, compares four implementation approaches—from open‑source to fully custom solutions—and presents practical techniques like alert convergence, suppression, and automation to build a robust, adaptable monitoring platform.

alert managementautomationmonitoring
0 likes · 19 min read
Mastering IT Monitoring: Strategies, Challenges, and Best Practices
Sohu Tech Products
Sohu Tech Products
Oct 23, 2019 · Operations

Google SRE Weekly Alert Limits and Practical Strategies for Reducing Alert Fatigue

This article examines how Google SRE limits weekly alerts to ten, compares it with typical Chinese internet operations teams, and provides practical strategies—including on‑call scheduling, alert escalation, automation, dashboard optimization, and team management—to dramatically reduce alert volume and improve incident response.

SREalert managementincident response
0 likes · 15 min read
Google SRE Weekly Alert Limits and Practical Strategies for Reducing Alert Fatigue
Efficient Ops
Efficient Ops
Aug 12, 2019 · Operations

Mastering Alert Storms: The 5‑Level Maturity Model for Modern Ops

As cloud, container, and micro‑service architectures increase system complexity, this article explains why alert overload occurs, introduces a five‑level alert‑management maturity model, and shows how AIOps‑driven automation can transform chaotic notifications into efficient, self‑healing operations.

AIOpsMaturity Modelalert management
0 likes · 11 min read
Mastering Alert Storms: The 5‑Level Maturity Model for Modern Ops
Efficient Ops
Efficient Ops
Jul 11, 2016 · Operations

How Tencent's Intelligent Monitoring Transforms Ops Automation

Leveraging Tencent's extensive experience in social platform operations, this talk explores intelligent monitoring practices—covering active, passive, and side‑channel techniques, full‑link observability, data processing pipelines, and alert convergence—to enhance reliability, availability, and user experience while reducing noise for ops teams.

Big DataDevOpsalert management
0 likes · 22 min read
How Tencent's Intelligent Monitoring Transforms Ops Automation