Operations 22 min read

Master Prometheus: From Metrics Collection to Alerting and Visualization

Prometheus is an open‑source monitoring solution that covers metric exposition, scraping, storage, querying, visualization, and alerting, and this guide walks through its architecture, configuration, custom exporters, PromQL queries, Grafana integration, and alert management, providing a comprehensive introduction for developers and ops engineers.

Open Source Linux

Dec 8, 2022

Master Prometheus: From Metrics Collection to Alerting and Visualization

Introduction

Prometheus is an open‑source, full‑stack monitoring solution. It provides metric exposition, scraping, storage, querying, visualization, and alerting, allowing you to gain insight into system health and quickly locate problems.

Ecosystem Overview

Prometheus consists of components for exposing metrics, scraping them, storing them in a built‑in time‑series database, querying with PromQL, visualizing via its own Web UI or Grafana, and sending alerts through Alertmanager.

Metric Exposure

Each monitored service is a Job . Services can expose metrics directly via an SDK or through exporters (e.g., MySQL, Consul). For short‑lived jobs, the Pushgateway can be used to push metrics.

Metric Scraping

Prometheus primarily uses the Pull model: it periodically scrapes the /metrics endpoint of targets. The default scrape interval is one minute and can be configured via scrape_interval in prometheus.yml.

global:
  scrape_interval: 15s
scrape_configs:
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090"]

Storage and Query

Scraped metrics are stored as time‑series in an internal TSDB. PromQL is used to query these series, either instantly or over a range.

Alerting

Alertmanager receives alerts generated from PromQL expressions. Alerts can be routed to email, Slack, etc., and support silencing and grouping.

Configuration Details

Service Registration

Jobs can be registered statically in scrape_configs or dynamically via service discovery (Consul, DNS, Kubernetes, etc.). Example static registration:

scrape_configs:
  - job_name: "node_export_consul"
    metrics_path: /node_metrics
    scheme: http
    consul_sd_configs:
      - server: localhost:8500
        services:
          - node_exporter

Dynamic Registration Note

When using dynamic registration, ensure metrics_path is set correctly; otherwise Prometheus may report an "INVALID" start token error.

Configuration Reload

Start Prometheus with --web.enable-lifecycle to allow runtime reloads via POST /-/reload.

prometheus --config.file=/usr/local/etc/prometheus.yml --web.enable-lifecycle
curl -v --request POST 'http://localhost:9090/-/reload'

Metric Types

Prometheus defines four metric types:

Counter : monotonically increasing (e.g., request counts).

Gauge : can go up and down (e.g., memory usage).

Histogram : bucketed distribution for observations.

Summary : pre‑computed quantiles.

Exporting Metrics

Use the official client_golang library to expose metrics. A minimal exporter:

package main
import (
  "net/http"
  "github.com/prometheus/client_golang/prometheus/promhttp"
)
func main() {
  http.Handle("/metrics", promhttp.Handler())
  http.ListenAndServe(":8080", nil)
}

Custom counters, gauges, histograms, and summaries can be defined and registered:

myCounter := prometheus.NewCounter(prometheus.CounterOpts{Name: "my_counter_total", Help: "custom counter"})
myGauge := prometheus.NewGauge(prometheus.GaugeOpts{Name: "my_gauge_num", Help: "custom gauge"})
myHistogram := prometheus.NewHistogram(prometheus.HistogramOpts{Name: "my_histogram_bucket", Help: "custom histogram", Buckets: []float64{0.1,0.2,0.3,0.4,0.5}})
mySummary := prometheus.NewSummary(prometheus.SummaryOpts{Name: "my_summary_bucket", Help: "custom summary", Objectives: map[float64]float64{0.5:0.05,0.9:0.01,0.99:0.001}})
prometheus.MustRegister(myCounter, myGauge, myHistogram, mySummary)

Metrics with labels use NewCounterVec (or similar) and With(prometheus.Labels{...}) to set values.

myCounterVec := prometheus.NewCounterVec(prometheus.CounterOpts{Name: "my_counter_total", Help: "custom counter"}, []string{"label1", "label2"})
myCounterVec.With(prometheus.Labels{"label1":"1", "label2":"2"}).Inc()

PromQL Basics

PromQL expressions can be instant vectors, range vectors, scalars, or strings. Common functions include rate(), irate(), sum() with by() or without(), and histogram_quantile() for percentile calculations.

# Example: QPS per path
sum(rate(demo_api_request_duration_seconds_count{job="demo",method="GET",status="200"}[5m])) by (path)

Grafana Visualization

Grafana can be added as a data source pointing to Prometheus, then dashboards are built using PromQL queries.

Alertmanager

Define alert rules in a separate file and reference it from prometheus.yml. Example rule triggers when a job has no up targets for one minute:

groups:
- name: simulator-alert-rule
  rules:
  - alert: HttpSimulatorDown
    expr: sum(up{job="http_srv"}) == 0
    for: 1m
    labels:
      severity: critical

Configure Alertmanager with SMTP settings to send email notifications, and use its UI to silence alerts.

Summary

This guide provides a complete overview of Prometheus, covering its architecture, metric collection models, configuration, custom exporters, query language, visualization with Grafana, and alerting with Alertmanager, equipping developers and operations engineers to monitor and observe their systems effectively.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

metrics Alerting Prometheus PromQL Grafana Exporter

Written by

Open Source Linux

Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.