Operations 6 min read

Why Does Prometheus Sometimes Fail to Trigger Alerts? Explained

Prometheus alerts may not fire even when metrics exceed thresholds due to the ‘for’ pending duration, sparse sampling, and Grafana’s range queries, and this article explains the underlying mechanisms, illustrates common pitfalls with diagrams, and offers practical strategies to diagnose and resolve missing or unexpected alerts.

Efficient Ops
Efficient Ops
Efficient Ops
Why Does Prometheus Sometimes Fail to Trigger Alerts? Explained

Understanding the "for" parameter

Prometheus evaluates alerts based on a rule that includes a

for

duration, which acts as a pending period to filter out transient spikes.

Why alerts sometimes don’t fire

Even if a metric stays above the threshold, the alert may not fire because the

for

period hasn’t been satisfied due to sparse sampling.

Why alerts sometimes fire

Sampling interval impact

Prometheus stores data as (timestamp, value) points collected at

scrape_interval

. Alert rules are evaluated at fixed intervals, producing sparse samples. Grafana’s range queries use a

step

parameter, which can cause the chart to show points that the alert rule never sees.

Because of this mismatch, charts may display a dip that the alert rule missed, leading to confusion about why an alert was or wasn’t triggered.

How to cope

Accept that Prometheus provides an approximation; use the built‑in

ALERTS

metric to inspect the lifecycle of each alert. For deeper insight, create a Recording Rule to store the computed value and alert on that metric.

- alert: KubeAPILatencyHigh
  annotations:
    message: The API server has a 99th percentile latency of {{ $value }} seconds for {{ $labels.verb }} {{ $labels.resource }}.
  expr: |
    cluster_quantile:apiserver_request_latencies:histogram_quantile{job="apiserver",quantile="0.99",subresource!="log"} > 4
  for: 10m
  labels:
    severity: critical

Beyond the alert rule

After an alert fires, Alertmanager handles grouping, inhibition, silencing, deduplication, and noise reduction before notifying receivers; issues in this stage can also prevent notifications.

Source: https://aleiwu.com/post/prometheus-alert-why/

MonitoringobservabilityAlertingPrometheusGrafana
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.