Mastering Prometheus on Kubernetes: From Basics to Advanced Alerting
This guide provides a comprehensive walkthrough of Prometheus fundamentals, component architecture, deployment patterns, node‑exporter setup, Grafana integration, kube‑state‑metrics, and detailed Alertmanager configuration for Kubernetes monitoring.
Background : Prometheus is an open‑source monitoring and alerting system now hosted by the CNCF, widely used with Kubernetes for collecting metrics via exporters and pushgateway. It can scale to thousands of nodes and supports powerful querying with PromQL.
Prometheus Core Concepts
Prometheus stores time‑series data as metric name plus key‑value labels, allowing aggregation and slicing. Key features include:
Multi‑dimensional data model
PromQL query language for arithmetic and joins
Local storage without external dependencies
HTTP pull model for scraping metrics
Pushgateway for short‑lived jobs
Service discovery and static configuration
Visualization via Grafana or built‑in UI
Efficient storage (≈3.5 bytes per sample)
High‑availability via federation, remote storage, and multiple server instances
Prometheus Architecture
The ecosystem consists of Prometheus server (Retrieval, Storage, PromQL), client libraries, exporters, Alertmanager, and visualization tools like Grafana.
Deployment Modes
Three typical patterns are described:
Basic HA : Multiple Prometheus servers for redundancy, but no data sharing or persistence across instances.
HA + Remote Storage : Adds remote storage to retain data across server failures.
HA + Remote Storage + Federation : Uses federation to split collection duties across multiple Prometheus instances and aggregate results centrally.
Installing node‑exporter
kubectl create ns monitor-sa
kubectl apply -f node-export.yaml # creates DaemonSet
curl http://<em>host</em>:9100/metrics # verify metrics endpointKey metrics such as node_cpu_seconds_total and node_load1 are demonstrated with explanations of their types (counter vs gauge).
Deploying Prometheus Server
# ServiceAccount and RBAC
kubectl create serviceaccount monitor -n monitor-sa
kubectl create clusterrolebinding monitor-clusterrolebinding \
--clusterrole=cluster-admin \
--serviceaccount=monitor-sa:monitor
# ConfigMap for prometheus.yml
kubectl apply -f prometheus-cfg.yaml
# Deployment (nodeName must match the node with /data directory)
kubectl apply -f prometheus-deploy.yaml
# Service (NodePort 30009)
kubectl apply -f prometheus-svc.yamlAccess the UI at http://<em>master_ip</em>:30009/graph.
Grafana Integration
# Load Grafana image
docker load -i heapster-grafana-amd64_v5_0_4.tar.gz
kubectl apply -f grafana.yaml
# Service (NodePort 30989)
kubectl get svc -n kube-system | grep grafanaConfigure a Prometheus data source with URL http://prometheus.monitor-sa.svc:9090 and import dashboards (e.g., node_exporter.json).
Kube‑state‑metrics
Provides Kubernetes object state metrics without storing them. Install by creating a ServiceAccount, RBAC, and deployment:
kubectl apply -f kube-state-metrics-rbac.yaml
kubectl apply -f kube-state-metrics-deploy.yaml
kubectl apply -f kube-state-metrics-svc.yamlAlertmanager Configuration
Configure email alerts (SMTP) in alertmanager-cm.yaml and link it to Prometheus via prometheus-alertmanager-cfg.yaml. Example rule alerts include CPU usage thresholds for kube‑proxy, scheduler, controller‑manager, etcd, and node resource alerts.
# Sample alert rule
- alert: kube-proxy_cpu_usage_high
expr: rate(process_cpu_seconds_total{job=~"kubernetes-kube-proxy"}[1m]) * 100 > 80
for: 2s
labels:
severity: warning
annotations:
description: "{{ $labels.instance }} {{ $labels.job }} CPU usage > 80%"Deploy Alertmanager alongside Prometheus, expose it via NodePort 30066, and verify alerts in the web UI ( http://<em>master_ip</em>:30066).
Common Troubleshooting
Adjust kube‑scheduler and kube‑controller‑manager bind addresses from 127.0.0.1 to the master node IP, restart kubelet, and ensure ports (e.g., 10251, 10252) are listening. Update kube-proxy metrics bind address to 0.0.0.0:10249 and restart the pod.
Applying Configuration Changes
kubectl delete -f alertmanager-cm.yaml
kubectl apply -f alertmanager-cm.yaml
kubectl delete -f prometheus-alertmanager-cfg.yaml
kubectl apply -f prometheus-alertmanager-cfg.yaml
kubectl delete -f prometheus-alertmanager-deploy.yaml
kubectl apply -f prometheus-alertmanager-deploy.yamlThese steps reload updated alerting rules and server settings.
Conclusion
The article equips readers with end‑to‑end instructions to set up a robust monitoring stack on Kubernetes, covering metric collection, storage, visualization, and alerting. By following the provided YAML manifests and command snippets, operators can achieve high availability, persistent data, and fine‑grained alerts for production clusters.
Full-Stack DevOps & Kubernetes
Focused on sharing DevOps, Kubernetes, Linux, Docker, Istio, microservices, Spring Cloud, Python, Go, databases, Nginx, Tomcat, cloud computing, and related technologies.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
