Operations 22 min read

Master Prometheus: From Metrics Collection to Alerting and Visualization

Prometheus is an open‑source monitoring solution that covers metric exposition, scraping, storage, querying, visualization, and alerting, and this guide walks through its architecture, configuration, custom exporters, PromQL queries, Grafana integration, and alert management, providing a comprehensive introduction for developers and ops engineers.

Open Source Linux
Open Source Linux
Open Source Linux
Master Prometheus: From Metrics Collection to Alerting and Visualization

Introduction

Prometheus is an open‑source, full‑stack monitoring solution. It provides metric exposition, scraping, storage, querying, visualization, and alerting, allowing you to gain insight into system health and quickly locate problems.

Ecosystem Overview

Prometheus consists of components for exposing metrics, scraping them, storing them in a built‑in time‑series database, querying with PromQL, visualizing via its own Web UI or Grafana, and sending alerts through Alertmanager.

Metric Exposure

Each monitored service is a Job . Services can expose metrics directly via an SDK or through exporters (e.g., MySQL, Consul). For short‑lived jobs, the Pushgateway can be used to push metrics.

Metric Scraping

Prometheus primarily uses the Pull model: it periodically scrapes the

/metrics

endpoint of targets. The default scrape interval is one minute and can be configured via

scrape_interval

in

prometheus.yml

.

global:
  scrape_interval: 15s
scrape_configs:
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090"]

Storage and Query

Scraped metrics are stored as time‑series in an internal TSDB. PromQL is used to query these series, either instantly or over a range.

Alerting

Alertmanager receives alerts generated from PromQL expressions. Alerts can be routed to email, Slack, etc., and support silencing and grouping.

Configuration Details

Service Registration

Jobs can be registered statically in

scrape_configs

or dynamically via service discovery (Consul, DNS, Kubernetes, etc.). Example static registration:

scrape_configs:
  - job_name: "node_export_consul"
    metrics_path: /node_metrics
    scheme: http
    consul_sd_configs:
      - server: localhost:8500
        services:
          - node_exporter

Dynamic Registration Note

When using dynamic registration, ensure

metrics_path

is set correctly; otherwise Prometheus may report an "INVALID" start token error.

Configuration Reload

Start Prometheus with

--web.enable-lifecycle

to allow runtime reloads via

POST /-/reload

.

prometheus --config.file=/usr/local/etc/prometheus.yml --web.enable-lifecycle
curl -v --request POST 'http://localhost:9090/-/reload'

Metric Types

Prometheus defines four metric types:

Counter : monotonically increasing (e.g., request counts).

Gauge : can go up and down (e.g., memory usage).

Histogram : bucketed distribution for observations.

Summary : pre‑computed quantiles.

Exporting Metrics

Use the official

client_golang

library to expose metrics. A minimal exporter:

package main
import (
  "net/http"
  "github.com/prometheus/client_golang/prometheus/promhttp"
)
func main() {
  http.Handle("/metrics", promhttp.Handler())
  http.ListenAndServe(":8080", nil)
}

Custom counters, gauges, histograms, and summaries can be defined and registered:

myCounter := prometheus.NewCounter(prometheus.CounterOpts{Name: "my_counter_total", Help: "custom counter"})
myGauge := prometheus.NewGauge(prometheus.GaugeOpts{Name: "my_gauge_num", Help: "custom gauge"})
myHistogram := prometheus.NewHistogram(prometheus.HistogramOpts{Name: "my_histogram_bucket", Help: "custom histogram", Buckets: []float64{0.1,0.2,0.3,0.4,0.5}})
mySummary := prometheus.NewSummary(prometheus.SummaryOpts{Name: "my_summary_bucket", Help: "custom summary", Objectives: map[float64]float64{0.5:0.05,0.9:0.01,0.99:0.001}})
prometheus.MustRegister(myCounter, myGauge, myHistogram, mySummary)

Metrics with labels use

NewCounterVec

(or similar) and

With(prometheus.Labels{...})

to set values.

myCounterVec := prometheus.NewCounterVec(prometheus.CounterOpts{Name: "my_counter_total", Help: "custom counter"}, []string{"label1", "label2"})
myCounterVec.With(prometheus.Labels{"label1":"1", "label2":"2"}).Inc()

PromQL Basics

PromQL expressions can be instant vectors, range vectors, scalars, or strings. Common functions include

rate()

,

irate()

,

sum()

with

by()

or

without()

, and

histogram_quantile()

for percentile calculations.

# Example: QPS per path
sum(rate(demo_api_request_duration_seconds_count{job="demo",method="GET",status="200"}[5m])) by (path)

Grafana Visualization

Grafana can be added as a data source pointing to Prometheus, then dashboards are built using PromQL queries.

Alertmanager

Define alert rules in a separate file and reference it from

prometheus.yml

. Example rule triggers when a job has no up targets for one minute:

groups:
- name: simulator-alert-rule
  rules:
  - alert: HttpSimulatorDown
    expr: sum(up{job="http_srv"}) == 0
    for: 1m
    labels:
      severity: critical

Configure Alertmanager with SMTP settings to send email notifications, and use its UI to silence alerts.

Summary

This guide provides a complete overview of Prometheus, covering its architecture, metric collection models, configuration, custom exporters, query language, visualization with Grafana, and alerting with Alertmanager, equipping developers and operations engineers to monitor and observe their systems effectively.

monitoringmetricsAlertingPrometheusPromQLGrafanaExporter
Open Source Linux
Written by

Open Source Linux

Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.