Operations 20 min read

Comprehensive Guide to Prometheus: Metrics Collection, Storage, Querying, Alerting and Visualization

This article provides a detailed overview of Prometheus, covering its architecture, metric exposure, scraping models, storage format, metric types, custom exporter implementation in Go, PromQL query language, built‑in functions, Grafana integration, and alerting with Alertmanager, offering practical code examples throughout.

IT Architects Alliance

Jun 27, 2022

Comprehensive Guide to Prometheus: Metrics Collection, Storage, Querying, Alerting and Visualization

Prometheus is an open‑source monitoring solution that collects, stores, and visualizes metrics from services, supporting both pull and push models for metric exposure.

The system organizes monitored services as Jobs and their instances as Targets, which can be registered statically in the scrape_configs section of the Prometheus YAML file or discovered dynamically via service discovery mechanisms such as Consul, DNS, or Kubernetes.

Metrics are stored as time‑series in an internal TSDB and can be queried using PromQL, which provides instant vectors, range vectors, and aggregation functions like rate, irate, sum, by, and without. The language also supports histogram quantiles and summary calculations.

Prometheus ships with a client library for Go; a basic exporter can be created with the following code:

package main
import (
    "net/http"
    "github.com/prometheus/client_golang/prometheus/promhttp"
)
func main() {
    http.Handle("/metrics", promhttp.Handler())
    http.ListenAndServe(":8080", nil)
}

Custom metrics such as counters, gauges, histograms, and summaries are defined with prometheus.NewCounter, prometheus.NewGauge, prometheus.NewHistogram, and prometheus.NewSummary respectively, then registered with prometheus.MustRegister. Labels can be added using NewCounterVec and With(prometheus.Labels{...}).

Prometheus configuration can be reloaded without restarting by starting the server with --web.enable-lifecycle and sending a POST request to http://localhost:9090/-/reload: curl -v --request POST 'http://localhost:9090/-/reload' Alerting is handled by Alertmanager, which groups, routes, and silences alerts defined in Prometheus rule files. An example alert rule triggers when all instances of a job are down for one minute:

groups:
- name: simulator-alert-rule
  rules:
  - alert: HttpSimulatorDown
    expr: sum(up{job="http_srv"}) == 0
    for: 1m
    labels:
      severity: critical

Grafana can be used for richer visualizations by adding Prometheus as a data source, creating dashboards, and writing PromQL queries in the panel editor.

The article also discusses common pitfalls such as inappropriate histogram bucket definitions that lead to inaccurate quantile estimations, and provides guidance on choosing sensible bucket boundaries.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Monitoring Go Metrics Alerting prometheus PromQL grafana

Written by

IT Architects Alliance

Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.