Operations 31 min read

Mastering Prometheus in Kubernetes: Practical Tips, Exporter Guide, and Common Pitfalls

This article shares practical experiences with Prometheus in Kubernetes, covering core principles, limitations, common exporters, metric selection, capacity planning, high‑availability strategies, query optimization, and integration with Grafana, offering actionable guidance for building reliable, scalable monitoring solutions.

Programmer DD
Programmer DD
Programmer DD
Mastering Prometheus in Kubernetes: Practical Tips, Exporter Guide, and Common Pitfalls

Prometheus is a modern open‑source monitoring system that has become the de‑facto standard in cloud‑native environments.

Key Principles

Monitoring is infrastructure; collect only necessary metrics to avoid waste of resources.

Only emit alerts that need to be handled.

A simple, reliable architecture is essential; the monitoring system must not fail when the business system does.

Limitations of Prometheus

Metric‑based only – not suitable for logs, events, or tracing.

Default pull model; plan network topology to avoid unnecessary forwarding.

No built‑in solution for horizontal scaling – choose between federation, Cortex, Thanos, etc.

Availability > consistency; occasional data loss is acceptable for query success.

Functions like rate and histogram_quantile can produce unintuitive results; long‑range queries need down‑sampling.

Common Exporters in Kubernetes

cAdvisor (built into kubelet)

kubelet (ports 10255/10250)

apiserver (port 6443)

scheduler (port 10251)

controller‑manager (port 10252)

etcd (latency, storage metrics)

docker (experimental metrics‑addr)

kube‑proxy (port 10249)

kube‑state‑metrics (metadata of pods, deployments, etc.)

node‑exporter (CPU, memory, disk)

blackbox_exporter (network probes)

process‑exporter (process metrics)

nvidia‑exporter (GPU metrics)

node‑problem‑detector (node health)

Application exporters (MySQL, Nginx, MQ, …)

Grafana Dashboards for Core K8s Components

Using the metrics from the exporters above, Grafana can render dashboards for kubelet, apiserver, scheduler, controller‑manager, etc.

All‑in‑One Collector

Exporters can be launched as child processes of a main binary, or Telegraf can be used to aggregate multiple inputs into a single exporter.

Selecting Golden Metrics

Follow Google SRE’s “four golden signals” (latency, traffic, errors, saturation). Use the Use method (Utilization, Saturation, Errors) for resource‑centric metrics and the Red method (Rate, Errors, Duration) for service‑centric metrics.

Version Compatibility

Prometheus 2.16 is the current stable release; older 1.x versions are no longer recommended.

Memory and Storage Planning

Memory usage spikes during the 2‑hour block compaction. Large query ranges and heavy aggregation increase memory pressure. Reduce series count, increase scrape interval, or use remote‑write solutions (Thanos, Victoriametrics) to mitigate.

rate(prometheus_tsdb_compaction_chunk_size_bytes_sum[1h]) / rate(prometheus_tsdb_compaction_chunk_samples_sum[1h])

Disk usage can be estimated with the formula shown in the original article (samples × bytes_per_sample × retention_seconds).

High‑Availability Solutions

Basic HA: two identical Prometheus instances behind a load balancer.

HA + remote write: replicate data to an external TSDB.

Federation: shard data by function and aggregate with a global node.

Thanos or Victoriametrics: deduplicate and query across multiple replicas.

Alerting and Operator Wrappers

Alertmanager provides grouping, inhibition, and routing, but many teams build a UI‑driven wrapper to let non‑engineers configure alerts without writing PromQL. Grafana’s experimental alerting can be used for simple cases.

Logging and Events

Log collection is delegated to Fluentd/Fluent‑Bit/Filebeat and stored in Elasticsearch or object storage. Log‑to‑metric conversion can be done with mtail or grok. Kubernetes events should be persisted via tools like kube‑eventer or event‑exporter, optionally exposing them as Prometheus metrics.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

monitoringOperationsKubernetesPrometheusExportersGrafana
Programmer DD
Written by

Programmer DD

A tinkering programmer and author of "Spring Cloud Microservices in Action"

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.