Master Prometheus: Practical Tips, Exporter Strategies, and Scaling Challenges
This comprehensive guide explores Prometheus monitoring fundamentals, key design principles, exporter selection for Kubernetes, advanced configuration tricks, capacity planning, high‑cardinality pitfalls, HA architectures, and integration with Grafana, Alertmanager, and Thanos to help you build reliable cloud‑native observability pipelines.
Prometheus Overview and History
Monitoring has a long history, and Prometheus, as a new‑generation open‑source system, has become the de‑facto standard in cloud‑native ecosystems due to its popular design.
Key Principles
Monitoring is infrastructure; collect only necessary metrics to avoid waste of manpower and storage (except for B2B products).
Only emit alerts that need to be handled, and every alert must be acted upon.
Keep the architecture simple; monitoring must not fail even if the business system is down. Avoid "magic" systems like AI‑driven alerts unless truly needed.
Prometheus Limitations
Metric‑based only – not suitable for logs, events, or tracing.
Default pull model; plan network topology to avoid unnecessary forwarding.
For clustering and horizontal scaling there is no silver bullet; choose between Federate, Cortex, Thanos, etc.
Monitoring systems generally favor availability over strict consistency, tolerating some data loss to ensure query success.
Data accuracy is not guaranteed: functions like rate and histogram_quantile perform statistical inference, and long‑range queries require down‑sampling, which reduces precision.
Kubernetes Exporters
Prometheus, as a CNCF project, provides a rich ecosystem of exporters compared with traditional agents like Zabbix.
cadvisor – integrated in Kubelet.
kubelet – ports 10255 (unauthenticated) and 10250 (authenticated).
apiserver – port 6443, monitor request count, latency, etc.
scheduler – port 10251.
controller‑manager – port 10252.
etcd – monitor write/read latency, storage capacity.
docker – enable experimental metrics for container creation time.
kube‑proxy – default 127.0.0.1:10249, can expose to 0.0.0.0 for external scraping.
kube‑state‑metrics – official project for pod, deployment metadata.
node‑exporter – collects host metrics such as CPU, memory, disk.
blackbox_exporter – network probing (DNS, ping, HTTP).
process‑exporter – process‑level metrics.
nvidia exporter – GPU metrics (requires K8s 1.13+).
node‑problem‑detector – reports node health as taints.
Application exporters – MySQL, Nginx, MQ, etc., based on business needs.
Custom exporters can be written to meet specific requirements.
All‑in‑One Collection Component
Each exporter runs independently, which increases operational overhead when many exporters need to be maintained and upgraded. Two approaches reduce this complexity:
Launch a main process that spawns multiple exporter processes, keeping them up‑to‑date with community releases.
Use Telegraf to aggregate various inputs into a single process (N‑in‑1).
Node‑exporter lacks process monitoring; adding a Process‑Exporter or using Telegraf with the procstat input solves this.
Choosing Golden Metrics
Google’s SRE handbook defines four golden signals: latency, traffic, errors, and saturation. In practice, use the USE method (Utilization, Saturation, Errors) for resource‑centric services and the RED method (Rate, Errors, Duration) for request‑centric services.
K8s 1.16 Cadvisor Compatibility
In K8s 1.16, Cadvisor dropped the pod_name and container_name labels, replacing them with pod and container. Adjust queries or use relabel_configs to retain the original names.
Prometheus Collecting External or Multi‑Cluster K8s
When Prometheus runs outside the cluster, you must provide token certificates and replace the address, often using the Apiserver proxy. Example job configuration for Cadvisor collection is shown below.
bearer_token_file: /path/to/token
__metrics_path__: /api/v1/nodes/${1}/proxy/metrics/cadvisor
${1}:10255/metrics/cadvisorFor endpoint‑type services, adjust __metrics_path__ to /api/v1/namespaces/${1}/services/${2}:${3}/proxy/metrics when the exporter exposes metrics at /metrics.
Version Selection
Prometheus 2.16 is the latest stable version; older 1.x versions are no longer recommended. Version 2.16 introduces an experimental UI with local timezone support.
Memory Consumption and Capacity Planning
Prometheus memory usage grows with ingestion rate because data is kept in memory for two‑hour blocks before being flushed to disk. Large queries, extensive group or wide rate ranges also increase memory pressure.
Use the calculator from Robust Perception to estimate required RAM based on series count, scrape interval, and retention.
Optimization strategies include:
Shard when series exceed 2 million, using Thanos, Cortex, or Victoriametrics for aggregation.
Remove unused metrics and labels; use TSDB status (2.14+) to identify heavy series.
Avoid large time‑range queries; keep step proportional to range.
Prefer label‑based filtering over joins; add necessary labels via relabel_configs instead of runtime joins.
High Cardinality Issues
Labels with unbounded values (e.g., client IP) cause high cardinality, inflating storage and query cost. Use logs for such data instead of metrics.
Prometheus Restart and Hot Reload
During restart, Prometheus loads WAL data into memory; larger WAL files increase restart time. Enable web.enable-lifecycle for hot reload via the /-/reload endpoint, or use the operator which triggers reload automatically.
How Many Metrics Should an Application Expose?
Simple services should expose around 120 metrics; large services should stay below 10 000 metrics, carefully controlling label cardinality.
Node‑Exporter Issues
Does not monitor processes; add Process‑Exporter or use Telegraf.
Only supports Unix; use wmi_exporter for Windows.
Prefer newer versions (0.16/0.17) for naming‑convention compliance.
kube‑state‑metrics Issues
Combines with Cadvisor to enrich pod metadata. Does not expose pod annotations due to high cardinality concerns.
Relabel Configs vs Metric Relabel Configs
relabel_config runs before scraping; metric_relabel_configs runs after. Use both to manipulate target labels and metric labels as needed.
Prometheus Prediction Capabilities
Use deriv to compute rate of change and predict_linear to forecast future values, e.g., disk space depletion or memory pressure.
Alertmanager Wrappers
Alertmanager is rarely changed after deployment; however, alert configuration is frequent. Build a UI layer that abstracts PromQL and YAML, integrates with existing notification channels via webhook, and adds authentication and rate‑limiting.
Common HA Design Mistakes
Pushing metrics to a queue (e.g., Kafka) before Prometheus adds latency, loss of service‑discovery, and creates a single point of failure. Prefer native pull‑based scraping or side‑car exporters.
Prometheus‑Operator Scenarios
Operator simplifies configuration via CRDs and provides ready‑made Grafana dashboards, but hides details that are important for troubleshooting. It also cannot be used for out‑of‑cluster deployments.
HA Solutions
Basic HA: two identical Prometheus instances behind a load balancer.
HA + Remote‑Write: write to a remote TSDB for durability.
Federation: shard data by function, aggregate with a global node.
Thanos/Victoriametrics: provide global query, deduplication, and multi‑region storage.
Container Logs and Events
Logs should be collected by Fluentd/Fluent‑Bit/Filebeat and sent to Elasticsearch, object storage, or Kafka. Use side‑car or DaemonSet approaches. For Kubernetes Events, use kube-eventer to push to Elasticsearch or event_exporter to expose as Prometheus metrics.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Programmer DD
A tinkering programmer and author of "Spring Cloud Microservices in Action"
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
