Operations 37 min read

Mastering Prometheus on Kubernetes: From Basics to Advanced Alerting

This guide provides a comprehensive walkthrough of Prometheus fundamentals, component architecture, deployment patterns, node‑exporter setup, Grafana integration, kube‑state‑metrics, and detailed Alertmanager configuration for Kubernetes monitoring.

Full-Stack DevOps & Kubernetes

Aug 23, 2021

Mastering Prometheus on Kubernetes: From Basics to Advanced Alerting

Background : Prometheus is an open‑source monitoring and alerting system now hosted by the CNCF, widely used with Kubernetes for collecting metrics via exporters and pushgateway. It can scale to thousands of nodes and supports powerful querying with PromQL.

Prometheus Core Concepts

Prometheus stores time‑series data as metric name plus key‑value labels, allowing aggregation and slicing. Key features include:

Multi‑dimensional data model

PromQL query language for arithmetic and joins

Local storage without external dependencies

HTTP pull model for scraping metrics

Pushgateway for short‑lived jobs

Service discovery and static configuration

Visualization via Grafana or built‑in UI

Efficient storage (≈3.5 bytes per sample)

High‑availability via federation, remote storage, and multiple server instances

Prometheus Architecture

The ecosystem consists of Prometheus server (Retrieval, Storage, PromQL), client libraries, exporters, Alertmanager, and visualization tools like Grafana.

Deployment Modes

Three typical patterns are described:

Basic HA : Multiple Prometheus servers for redundancy, but no data sharing or persistence across instances.

HA + Remote Storage : Adds remote storage to retain data across server failures.

HA + Remote Storage + Federation : Uses federation to split collection duties across multiple Prometheus instances and aggregate results centrally.

Installing node‑exporter

kubectl create ns monitor-sa
kubectl apply -f node-export.yaml   # creates DaemonSet
curl http://<em>host</em>:9100/metrics   # verify metrics endpoint

Key metrics such as node_cpu_seconds_total and node_load1 are demonstrated with explanations of their types (counter vs gauge).

Deploying Prometheus Server

# ServiceAccount and RBAC
kubectl create serviceaccount monitor -n monitor-sa
kubectl create clusterrolebinding monitor-clusterrolebinding \
  --clusterrole=cluster-admin \
  --serviceaccount=monitor-sa:monitor

# ConfigMap for prometheus.yml
kubectl apply -f prometheus-cfg.yaml

# Deployment (nodeName must match the node with /data directory)
kubectl apply -f prometheus-deploy.yaml

# Service (NodePort 30009)
kubectl apply -f prometheus-svc.yaml

Access the UI at http://<em>master_ip</em>:30009/graph.

Grafana Integration

# Load Grafana image
docker load -i heapster-grafana-amd64_v5_0_4.tar.gz
kubectl apply -f grafana.yaml
# Service (NodePort 30989)
kubectl get svc -n kube-system | grep grafana

Configure a Prometheus data source with URL http://prometheus.monitor-sa.svc:9090 and import dashboards (e.g., node_exporter.json).

Kube‑state‑metrics

Provides Kubernetes object state metrics without storing them. Install by creating a ServiceAccount, RBAC, and deployment:

kubectl apply -f kube-state-metrics-rbac.yaml
kubectl apply -f kube-state-metrics-deploy.yaml
kubectl apply -f kube-state-metrics-svc.yaml

Alertmanager Configuration

Configure email alerts (SMTP) in alertmanager-cm.yaml and link it to Prometheus via prometheus-alertmanager-cfg.yaml. Example rule alerts include CPU usage thresholds for kube‑proxy, scheduler, controller‑manager, etcd, and node resource alerts.

# Sample alert rule
- alert: kube-proxy_cpu_usage_high
  expr: rate(process_cpu_seconds_total{job=~"kubernetes-kube-proxy"}[1m]) * 100 > 80
  for: 2s
  labels:
    severity: warning
  annotations:
    description: "{{ $labels.instance }} {{ $labels.job }} CPU usage > 80%"

Deploy Alertmanager alongside Prometheus, expose it via NodePort 30066, and verify alerts in the web UI ( http://<em>master_ip</em>:30066).

Common Troubleshooting

Adjust kube‑scheduler and kube‑controller‑manager bind addresses from 127.0.0.1 to the master node IP, restart kubelet, and ensure ports (e.g., 10251, 10252) are listening. Update kube-proxy metrics bind address to 0.0.0.0:10249 and restart the pod.

Applying Configuration Changes

kubectl delete -f alertmanager-cm.yaml
kubectl apply -f alertmanager-cm.yaml
kubectl delete -f prometheus-alertmanager-cfg.yaml
kubectl apply -f prometheus-alertmanager-cfg.yaml
kubectl delete -f prometheus-alertmanager-deploy.yaml
kubectl apply -f prometheus-alertmanager-deploy.yaml

These steps reload updated alerting rules and server settings.

Conclusion

The article equips readers with end‑to‑end instructions to set up a robust monitoring stack on Kubernetes, covering metric collection, storage, visualization, and alerting. By following the provided YAML manifests and command snippets, operators can achieve high availability, persistent data, and fine‑grained alerts for production clusters.

Monitoring Kubernetes node_exporter kube-state-metrics HA deployment

Written by

Full-Stack DevOps & Kubernetes

Focused on sharing DevOps, Kubernetes, Linux, Docker, Istio, microservices, Spring Cloud, Python, Go, databases, Nginx, Tomcat, cloud computing, and related technologies.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.