Cloud Native 6 min read

Configuring Prometheus Alert Rules for Monitoring Kubernetes Pod Status

This article demonstrates how to set up Prometheus alerting rules to monitor Kubernetes Pod phases, explains the different Pod states, provides example alert expressions, and discusses practical solutions to avoid false alarms during deployments.

DevOps Operations Practice

May 9, 2024

Configuring Prometheus Alert Rules for Monitoring Kubernetes Pod Status

This article demonstrates how to set up Prometheus alerting rules to monitor the status of Kubernetes Pods.

Pod status types

1. Running – The Pod is actively running and its containers are responding to requests as expected.

2. Pending – The Pod has not yet been assigned to a node, often because the scheduler cannot find a suitable node or resources are insufficient.

3. Succeeded – All containers in the Pod have completed their tasks and exited successfully, typically for batch or scheduled jobs.

4. Failed – One or more containers in the Pod have terminated with an error, indicating a crash or startup failure.

5. Unknown – Prometheus cannot retrieve the Pod’s status, possibly due to API issues or because the Pod has been deleted.

Alert rule

The metric kube_pod_status_phase reports the current phase of each Pod; when the metric value is 1 for any phase other than Running or Succeeded, an alert should fire.

Example alert rule:

- name: kubernetes-pod
  rules:
  - alert: 'pod 状态监控'
    annotations:
      description: 'Pod 状态:{{ $labels.phase }}'
      limit: '检测到Pod 状态异常'
    expr: |
      (kube_pod_status_phase{job="kube-state-metrics", phase !~ "Running|Succeeded"} != 0)
      for: 2m

This rule triggers when the kube_pod_status_phase metric is 1 for any non‑Running/Succeeded phase for at least two minutes.

Problem handling

In production, Pods may stay in a non‑Running state during deployments, causing false alerts if the startup time exceeds the two‑minute window. Two naive solutions are to silence alerts during releases or to extend the alert waiting time, but both have drawbacks.

The recommended solution is to refine the PromQL expression to ignore Pods that have been created within the last ten minutes:

(kube_pod_status_phase{job="kube-state-metrics", phase !~ "Running|Succeeded"} != 0 and on (pod,namespace) kube_pod_created{job="kube-state-metrics"} < (time() - 600))

This expression adds a condition that the Pod’s creation timestamp must be older than ten minutes before an alert is generated, effectively eliminating false positives during rollouts.

For deeper learning, the author promotes a paid tutorial series on Prometheus, but the technical content above stands on its own as a practical guide.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Observability Kubernetes Prometheus Pod Monitoring

Written by

DevOps Operations Practice

We share professional insights on cloud-native, DevOps & operations, Kubernetes, observability & monitoring, and Linux systems.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.