Operations 6 min read

Solving Monitoring Pain Points: Unified Framework, Alert Prioritization, and Classification

The article discusses common monitoring challenges such as fragmented tooling and noisy alerts, and proposes solutions including consolidating to a single monitoring framework, prioritizing runtime exceptions, and classifying business alerts with codes and trace information to improve incident response.

Full-Stack Internet Architecture

Jun 19, 2021

Monitoring Pain Points

Monitoring never goes out of date; we have previously discussed how to quickly implement monitoring for daily needs using log‑based alerts, global exception handlers, and tools like Cat, Prometheus, and Sentry.

Regardless of company size—whether a startup or a large enterprise—monitoring is essential. Large companies tend to have more comprehensive monitoring, while smaller ones may tolerate occasional failures.

Pain Point 1: Multiple Monitoring Frameworks

Many organizations end up using a variety of tools (Sentry for exception alerts, log‑based alerts, Cat, SkyWalking, etc.), leading to duplicated alerts and confusion about which system to rely on.

The only upside is that a flood of alerts forces rapid investigation, which can boost self‑driven problem solving.

Pain Point 2: Excessive Alert Volume

More frameworks naturally generate more alerts, and without proper severity classification the alert channel becomes noisy, causing teams to ignore warnings—much like the “boy who cried wolf.”

How to Resolve the Pain Points

Unify the Monitoring System

First, organize the monitoring landscape and adopt a unified framework. In practice, a single solution may not cover every scenario, so a carefully controlled hybrid approach is acceptable.

The goal is to have one framework that handles the majority of cases; if needed, extend an open‑source solution with custom features.

Alert Prioritization

After unifying the system, the main issue becomes alert overload. Not every anomaly needs an alert, and alerts should be tiered.

Runtime exceptions (e.g., NPE) are top‑priority because they indicate bugs that must be fixed immediately. Business exceptions (e.g., out‑of‑stock, product taken down) are lower priority but still require attention.

Fine‑Grained Alert Classification

Business exceptions should be downgraded in severity but still reported, especially for critical flows such as order placement failures (e.g., 100 failures in one minute).

When throwing a business exception, include a specific error code. The alert then carries this code, allowing responders to instantly recognize the issue (e.g., code 1001 = insufficient stock, code 1002 = risk‑check timeout).

Retain contextual data such as request parameters, response payload, and traceId so that the root cause can be identified quickly.

Conclusion

After refactoring, only runtime exceptions or a surge of errors within a short window trigger SMS or phone alerts, reducing noise. Other business alerts are routed to chat groups (DingTalk, Feishu) and can be split by code to separate critical from non‑critical notifications, improving precision and consumption.

Note: The discussion focuses on application‑level exception alerts; infrastructure alerts (CPU, memory, database) remain high‑priority and require separate handling and run‑books.

Recommended reading: Why MySQL Chooses RR as the Default Isolation Level?

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Observability Alerting Incident Management best-practices

Written by

Full-Stack Internet Architecture

Introducing full-stack Internet architecture technologies centered on Java

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.