Operations 8 min read

Understanding Prometheus Alerting: Collection, Evaluation, Grouping, Suppression, Silencing, and Delay

This article explains how Prometheus collects metrics, evaluates alert rules, transitions through alert states, and uses grouping, suppression, silencing, and configurable delay parameters to reduce noise and ensure timely, actionable alerts in monitoring systems.

Aikesheng Open Source Community

Dec 30, 2018

Understanding Prometheus Alerting: Collection, Evaluation, Grouping, Suppression, Silencing, and Delay

Prometheus scrapes metrics from targets at a configurable scrape_interval (default 1 minute) and stores them locally; a separate evaluation_interval (default 1 minute) evaluates alert rules and updates alert states.

Three alert states exist: inactive (threshold not met), pending (threshold met but duration not satisfied), and firing (threshold met and duration satisfied).

Example alert rule for MySQL uptime demonstrates the for clause that defines the required continuous duration before an alert becomes firing:

groups:
- name: example
  rules:
  - alert: mysql_uptime
    expr: mysql:server_status:uptime < 30
    for: 10s
    labels:
      level: "CRITICAL"
    annotations:
      detail: 数据库运行时间

If for is omitted or set to 0, the pending state is skipped.

Alert grouping aggregates similar alerts (e.g., by MySQL instance ID) to reduce noise, merging them into a single notification per group.

Alert suppression (inhibitor) eliminates redundant alerts by suppressing lower‑priority alerts when a higher‑priority condition is active, such as suppressing MySQL alerts when the server itself is down.

Alert silencing (silencer) prevents expected alerts during known periods (e.g., scheduled batch jobs) from reaching operators.

Three delay parameters control alert deduplication in the Alertmanager:

group_wait : time to wait for additional alerts before sending the first notification.

group_interval : minimum interval between notifications for a group when the group’s state changes.

repeat_interval : interval for re‑sending notifications for alerts that remain unresolved.

Scenarios illustrate how group_wait merges initial alerts, group_interval triggers rapid updates on state changes, and repeat_interval provides periodic reminders for persistent issues.

In summary, Prometheus’s collection and evaluation cycles, combined with Alertmanager’s grouping, suppression, silencing, and delay settings, enable precise, low‑noise alerting that helps operators focus on the most critical incidents.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Alerting grouping delay Silencing suppression

Written by

Aikesheng Open Source Community

The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.