Why Does Prometheus Sometimes Fail to Trigger Alerts? Explained
Prometheus alerts may not fire even when metrics exceed thresholds due to the ‘for’ pending duration, sparse sampling, and Grafana’s range queries, and this article explains the underlying mechanisms, illustrates common pitfalls with diagrams, and offers practical strategies to diagnose and resolve missing or unexpected alerts.
Understanding the "for" parameter
Prometheus evaluates alerts based on a rule that includes a for duration, which acts as a pending period to filter out transient spikes.
Why alerts sometimes don’t fire
Even if a metric stays above the threshold, the alert may not fire because the for period hasn’t been satisfied due to sparse sampling.
Why alerts sometimes fire
Sampling interval impact
Prometheus stores data as (timestamp, value) points collected at scrape_interval. Alert rules are evaluated at fixed intervals, producing sparse samples. Grafana’s range queries use a step parameter, which can cause the chart to show points that the alert rule never sees.
Because of this mismatch, charts may display a dip that the alert rule missed, leading to confusion about why an alert was or wasn’t triggered.
How to cope
Accept that Prometheus provides an approximation; use the built‑in ALERTS metric to inspect the lifecycle of each alert. For deeper insight, create a Recording Rule to store the computed value and alert on that metric.
- alert: KubeAPILatencyHigh
annotations:
message: The API server has a 99th percentile latency of {{ $value }} seconds for {{ $labels.verb }} {{ $labels.resource }}.
expr: |
cluster_quantile:apiserver_request_latencies:histogram_quantile{job="apiserver",quantile="0.99",subresource!="log"} > 4
for: 10m
labels:
severity: criticalBeyond the alert rule
After an alert fires, Alertmanager handles grouping, inhibition, silencing, deduplication, and noise reduction before notifying receivers; issues in this stage can also prevent notifications.
Source: https://aleiwu.com/post/prometheus-alert-why/
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
