Mastering Prometheus on Kubernetes: Practical Tips, Exporter Guide, and Capacity Planning
This article explores the history and principles of Prometheus monitoring, offers guidance on version selection, highlights its limitations, details common Kubernetes exporters, shows Grafana dashboard setups, and provides in‑depth strategies for exporter aggregation, golden metrics, multi‑cluster scraping, GPU monitoring, timezone handling, memory optimization, capacity planning, and rate calculations.
Prometheus, a modern open‑source monitoring system, has become the de‑facto standard in cloud‑native environments, offering a mature solution for infrastructure observability.
Key principles include treating monitoring as infrastructure, emitting only actionable alerts, and keeping the architecture simple to avoid single points of failure.
Version selection
Use the latest stable release (e.g., 2.16) and avoid older 1.x versions; the experimental UI in 2.16 provides TSDB status and top labels/metrics.
Prometheus limitations
Metric‑based monitoring does not cover logs, events, or tracing.
It uses a pull model; plan network topology accordingly.
For clustering and scaling, choose between Federate, Cortex, Thanos, etc.
Prioritize availability over consistency; occasional data loss is acceptable.
Statistical functions (rate, histogram_quantile) can produce unintuitive results, and long‑range queries may lose precision.
Common exporters in Kubernetes
cAdvisor (built into kubelet)
kubelet (ports 10255/10250)
apiserver (port 6443)
scheduler (port 10251)
controller‑manager (port 10252)
etcd
docker (experimental metrics‑addr)
kube‑proxy (default 127.0.0.1:10249)
kube‑state‑metrics
node‑exporter
blackbox_exporter
process‑exporter
nvidia exporter (GPU metrics)
node‑problem‑detector (NPd)
application exporters (mysql, nginx, mq, etc.)
Custom exporters can be created to fill gaps, though managing many exporters adds operational overhead.
Kubernetes core component monitoring and Grafana panels
Metrics from the above exporters can be visualized in Grafana dashboards for components such as kubelet, apiserver, and others.
All‑in‑one collection approaches
Launch multiple exporter processes from a single main process, keeping them up‑to‑date with community releases.
Use Telegraf to aggregate various inputs into a single collector.
Node‑exporter does not monitor processes; a process‑exporter or Telegraf's
procstatinput can fill this gap.
Selecting golden metrics
Follow Google SRE’s four golden signals—latency, traffic, errors, saturation—and apply the Use (Utilization, Saturation, Errors) or Red (Rate, Errors, Duration) methods depending on service type.
Cadvisor label compatibility in Kubernetes 1.16
Labels
pod_nameand
container_namewere re‑added via
metric_relabel_configsto maintain compatibility with older queries.
metric_relabel_configs:
- source_labels: [container]
regex: (.+)
target_label: container_name
replacement: $1
action: replace
- source_labels: [pod]
regex: (.+)
target_label: pod_name
replacement: $1
action: replaceScraping external or multi‑cluster Kubernetes
When Prometheus runs outside a cluster, configure
kubernetes_sd_configswith appropriate
api_server,
bearer_token_file, and TLS settings. Use
__metrics_path__rewrites to proxy through the apiserver or directly to kubelet ports.
- job_name: cluster-cadvisor
honor_timestamps: true
scrape_interval: 30s
scheme: https
kubernetes_sd_configs:
- api_server: https://xx:6443
role: node
bearer_token_file: token/cluster.token
tls_config:
insecure_skip_verify: true
relabel_configs:
- source_labels: [__meta_kubernetes_node_name]
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
action: replace
metric_relabel_configs:
- source_labels: [container]
target_label: container_name
replacement: $1
action: replace
- source_labels: [pod]
target_label: pod_name
replacement: $1
action: replaceGPU metrics
cAdvisor exposes GPU metrics such as
container_accelerator_duty_cycle,
container_accelerator_memory_total_bytes, and
container_accelerator_memory_used_bytes. For richer data, install the DCGM exporter (requires Kubernetes 1.13+).
Changing Prometheus display timezone
Prometheus stores timestamps in UTC and does not support timezone configuration; visualization tools like Grafana handle timezone conversion, and the newer Web UI (2.16) offers a local timezone option.
Collecting metrics behind a LoadBalancer
Use sidecar proxies on the backend services or configure the LB to forward specific paths to each backend, allowing Prometheus to scrape the underlying pods.
Prometheus memory consumption
Memory usage spikes during the 2‑hour block compaction phase and with large queries (e.g., wide‑range
rateor
group). Mitigation strategies include sharding, reducing series count, evaluating high‑cost metrics, limiting query ranges, and avoiding expensive aggregations.
Capacity planning
Estimate disk usage as
retention_time_seconds × ingested_samples_per_second × bytes_per_sample. Reduce series count or increase scrape intervals to lower storage needs. For remote‑write or Thanos setups, local disk can be minimal.
Apiserver performance impact
When using
kubernetes_sd_config, Prometheus queries pass through the apiserver; large clusters may increase apiserver load, so consider direct node scraping after discovery.
Rate calculation logic
Counters are designed for
ratefunctions;
rateautomatically handles counter resets. Use a range vector at least four times the scrape interval to ensure robustness against missing samples.
Author: Xu Yason
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.