Configuring Prometheus Alert Rules for Monitoring Kubernetes Pod Status
This article demonstrates how to set up Prometheus alerting rules to monitor Kubernetes Pod phases, explains the different Pod states, provides example alert expressions, and discusses practical solutions to avoid false alarms during deployments.
This article demonstrates how to set up Prometheus alerting rules to monitor the status of Kubernetes Pods.
Pod status types
1. Running – The Pod is actively running and its containers are responding to requests as expected.
2. Pending – The Pod has not yet been assigned to a node, often because the scheduler cannot find a suitable node or resources are insufficient.
3. Succeeded – All containers in the Pod have completed their tasks and exited successfully, typically for batch or scheduled jobs.
4. Failed – One or more containers in the Pod have terminated with an error, indicating a crash or startup failure.
5. Unknown – Prometheus cannot retrieve the Pod’s status, possibly due to API issues or because the Pod has been deleted.
Alert rule
The metric kube_pod_status_phase reports the current phase of each Pod; when the metric value is 1 for any phase other than Running or Succeeded , an alert should fire.
Example alert rule:
- name: kubernetes-pod
rules:
- alert: 'pod 状态监控'
annotations:
description: 'Pod 状态:{{ $labels.phase }}'
limit: '检测到Pod 状态异常'
expr: |
(kube_pod_status_phase{job="kube-state-metrics", phase !~ "Running|Succeeded"} != 0)
for: 2mThis rule triggers when the kube_pod_status_phase metric is 1 for any non‑Running/Succeeded phase for at least two minutes.
Problem handling
In production, Pods may stay in a non‑Running state during deployments, causing false alerts if the startup time exceeds the two‑minute window. Two naive solutions are to silence alerts during releases or to extend the alert waiting time, but both have drawbacks.
The recommended solution is to refine the PromQL expression to ignore Pods that have been created within the last ten minutes:
(kube_pod_status_phase{job="kube-state-metrics", phase !~ "Running|Succeeded"} != 0 and on (pod,namespace) kube_pod_created{job="kube-state-metrics"} < (time() - 600))This expression adds a condition that the Pod’s creation timestamp must be older than ten minutes before an alert is generated, effectively eliminating false positives during rollouts.
For deeper learning, the author promotes a paid tutorial series on Prometheus, but the technical content above stands on its own as a practical guide.
DevOps Operations Practice
We share professional insights on cloud-native, DevOps & operations, Kubernetes, observability & monitoring, and Linux systems.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.