Why Does Prometheus Sometimes Fail to Trigger Alerts? Explained
Prometheus alerts may not fire even when metrics exceed thresholds due to the ‘for’ pending duration, sparse sampling, and Grafana’s range queries, and this article explains the underlying mechanisms, illustrates common pitfalls with diagrams, and offers practical strategies to diagnose and resolve missing or unexpected alerts.
Understanding the "for" parameter
Prometheus evaluates alerts based on a rule that includes a
forduration, which acts as a pending period to filter out transient spikes.
Why alerts sometimes don’t fire
Even if a metric stays above the threshold, the alert may not fire because the
forperiod hasn’t been satisfied due to sparse sampling.
Why alerts sometimes fire
Sampling interval impact
Prometheus stores data as (timestamp, value) points collected at
scrape_interval. Alert rules are evaluated at fixed intervals, producing sparse samples. Grafana’s range queries use a
stepparameter, which can cause the chart to show points that the alert rule never sees.
Because of this mismatch, charts may display a dip that the alert rule missed, leading to confusion about why an alert was or wasn’t triggered.
How to cope
Accept that Prometheus provides an approximation; use the built‑in
ALERTSmetric to inspect the lifecycle of each alert. For deeper insight, create a Recording Rule to store the computed value and alert on that metric.
- alert: KubeAPILatencyHigh
annotations:
message: The API server has a 99th percentile latency of {{ $value }} seconds for {{ $labels.verb }} {{ $labels.resource }}.
expr: |
cluster_quantile:apiserver_request_latencies:histogram_quantile{job="apiserver",quantile="0.99",subresource!="log"} > 4
for: 10m
labels:
severity: criticalBeyond the alert rule
After an alert fires, Alertmanager handles grouping, inhibition, silencing, deduplication, and noise reduction before notifying receivers; issues in this stage can also prevent notifications.
Source: https://aleiwu.com/post/prometheus-alert-why/
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.