Operations 21 min read

Mastering Prometheus: From Metrics Collection to Alerting and Visualization

This comprehensive guide explains Prometheus' architecture, metric collection models, storage format, query language (PromQL), alerting workflow, configuration reload methods, metric types, custom exporters, and how to visualise data with Grafana, providing a complete end‑to‑end monitoring solution.

Efficient Ops

Mar 3, 2024

Mastering Prometheus: From Metrics Collection to Alerting and Visualization

Introduction

Prometheus, named after the Greek titan who foresaw the future, is an open‑source monitoring system that collects, stores and visualises metrics to give insight into system health.

Overall Ecosystem

Prometheus provides a full stack from metric exposition, scraping, storage, visualisation, to alerting. Each monitored service is a Job with one or more targets . An official SDK lets you expose custom metrics, and exporters exist for common components such as MySQL or Consul.

Short‑lived scripts or services that cannot be scraped directly can push metrics to a PushGateway, which Prometheus then scrapes.

Metric Scraping Models

Pull model : Prometheus actively pulls metrics from the exposed endpoint at regular intervals (default 1 minute, configurable via scrape_interval).

Push model : Monitored services push metrics to a gateway; Prometheus pulls from the gateway.

Metric Storage and Query

Scraped metrics are stored in Prometheus' built‑in time‑series database. Queries are performed with PromQL, either via the built‑in Web UI or third‑party tools such as Grafana.

Alerting

Alertmanager receives alerts generated by Prometheus when a PromQL expression exceeds a defined threshold. Alerts can be routed to email, WeChat, etc.

Working Principle

Service Registration

Each monitored service registers as a Job with a list of targets . Registration can be static (IP and port listed in scrape_configs) or dynamic using service‑discovery mechanisms (Consul, DNS, Kubernetes, etc.). Example static config:

scrape_configs:
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090"]

Dynamic Consul example:

- job_name: "node_export_consul"
  metrics_path: "/node_metrics"
  scheme: http
  consul_sd_configs:
    - server: localhost:8500
      services:
        - node_exporter

Configuration Reload

After editing prometheus.yml, reload the configuration without restarting by starting Prometheus with --web.enable-lifecycle and sending a POST request to /-/reload:

prometheus --config.file=/usr/local/etc/prometheus.yml --web.enable-lifecycle

curl -v -X POST http://localhost:9090/-/reload

The reload handler is implemented in the web module and signals the main loop to reload the config.

Metric Types

Prometheus stores all metrics as time series but defines four logical types to aid interpretation:

Counter : monotonically increasing (e.g., request count).

Gauge : can go up or down (e.g., memory usage).

Histogram : bucketed distribution for latency or size.

Summary : pre‑computed quantiles.

Exporters and Custom Exporters

Use community exporters for components like MySQL or Kafka, or write a custom exporter with the Go client library:

package main
import (
    "net/http"
    "github.com/prometheus/client_golang/prometheus/promhttp"
)
func main() {
    http.Handle("/metrics", promhttp.Handler())
    http.ListenAndServe(":8080", nil)
}

Register custom metrics (counter, gauge, histogram, summary) and optionally add labels using NewCounterVec, NewGaugeVec, etc.

myCounter := prometheus.NewCounter(prometheus.CounterOpts{Name: "my_counter_total", Help: "custom counter"})
myGauge := prometheus.NewGauge(prometheus.GaugeOpts{Name: "my_gauge", Help: "custom gauge"})
myHistogram := prometheus.NewHistogram(prometheus.HistogramOpts{Name: "my_histogram", Buckets: []float64{0.1,0.2,0.3,0.4,0.5}})
mySummary := prometheus.NewSummary(prometheus.SummaryOpts{Name: "my_summary", Objectives: map[float64]float64{0.5:0.05,0.9:0.01,0.99:0.001}})
prometheus.MustRegister(myCounter, myGauge, myHistogram, mySummary)

PromQL Basics

PromQL expressions are of four kinds: string literals, scalars, instant vectors, and range vectors. Examples:

Instant query: go_gc_duration_seconds_count Label filter: go_gc_duration_seconds_count{instance="127.0.0.1:9600"} Regex filter: go_gc_duration_seconds_count{instance=~"localhost.*"} Range query (last 5 minutes): go_gc_duration_seconds_count[5m] Common functions include rate() (average per‑second increase), irate() (instantaneous rate), and aggregation functions such as sum() by() or sum() without(). Quantile calculation for histograms uses histogram_quantile().

Grafana Visualization

Connect Grafana to Prometheus as a data source, create dashboards, and write PromQL queries in panels to visualise metrics. Dashboards can be exported as JSON for reuse.

Alertmanager Configuration

Define alert rules in a separate file (e.g., alert_rules.yml) and reference it from prometheus.yml. Example rule triggers when a job named http_srv is down for one minute:

groups:
- name: simulator-alert-rule
  rules:
  - alert: HttpSimulatorDown
    expr: sum(up{job="http_srv"}) == 0
    for: 1m
    labels:
      severity: critical

Configure Alertmanager to route alerts to email, Slack, etc., and optionally silence alerts via its Web UI.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Observability Metrics prometheus PromQL grafana

Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.