Master Prometheus: From Metrics Collection to Alerting and Visualization
Prometheus is an open‑source monitoring solution that covers metric exposition, scraping, storage, querying, visualization, and alerting, and this guide walks through its architecture, configuration, custom exporters, PromQL queries, Grafana integration, and alert management, providing a comprehensive introduction for developers and ops engineers.
Introduction
Prometheus is an open‑source, full‑stack monitoring solution. It provides metric exposition, scraping, storage, querying, visualization, and alerting, allowing you to gain insight into system health and quickly locate problems.
Ecosystem Overview
Prometheus consists of components for exposing metrics, scraping them, storing them in a built‑in time‑series database, querying with PromQL, visualizing via its own Web UI or Grafana, and sending alerts through Alertmanager.
Metric Exposure
Each monitored service is a Job . Services can expose metrics directly via an SDK or through exporters (e.g., MySQL, Consul). For short‑lived jobs, the Pushgateway can be used to push metrics.
Metric Scraping
Prometheus primarily uses the Pull model: it periodically scrapes the /metrics endpoint of targets. The default scrape interval is one minute and can be configured via scrape_interval in prometheus.yml.
global:
scrape_interval: 15s
scrape_configs:
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"]Storage and Query
Scraped metrics are stored as time‑series in an internal TSDB. PromQL is used to query these series, either instantly or over a range.
Alerting
Alertmanager receives alerts generated from PromQL expressions. Alerts can be routed to email, Slack, etc., and support silencing and grouping.
Configuration Details
Service Registration
Jobs can be registered statically in scrape_configs or dynamically via service discovery (Consul, DNS, Kubernetes, etc.). Example static registration:
scrape_configs:
- job_name: "node_export_consul"
metrics_path: /node_metrics
scheme: http
consul_sd_configs:
- server: localhost:8500
services:
- node_exporterDynamic Registration Note
When using dynamic registration, ensure metrics_path is set correctly; otherwise Prometheus may report an "INVALID" start token error.
Configuration Reload
Start Prometheus with --web.enable-lifecycle to allow runtime reloads via POST /-/reload.
prometheus --config.file=/usr/local/etc/prometheus.yml --web.enable-lifecycle
curl -v --request POST 'http://localhost:9090/-/reload'Metric Types
Prometheus defines four metric types:
Counter : monotonically increasing (e.g., request counts).
Gauge : can go up and down (e.g., memory usage).
Histogram : bucketed distribution for observations.
Summary : pre‑computed quantiles.
Exporting Metrics
Use the official client_golang library to expose metrics. A minimal exporter:
package main
import (
"net/http"
"github.com/prometheus/client_golang/prometheus/promhttp"
)
func main() {
http.Handle("/metrics", promhttp.Handler())
http.ListenAndServe(":8080", nil)
}Custom counters, gauges, histograms, and summaries can be defined and registered:
myCounter := prometheus.NewCounter(prometheus.CounterOpts{Name: "my_counter_total", Help: "custom counter"})
myGauge := prometheus.NewGauge(prometheus.GaugeOpts{Name: "my_gauge_num", Help: "custom gauge"})
myHistogram := prometheus.NewHistogram(prometheus.HistogramOpts{Name: "my_histogram_bucket", Help: "custom histogram", Buckets: []float64{0.1,0.2,0.3,0.4,0.5}})
mySummary := prometheus.NewSummary(prometheus.SummaryOpts{Name: "my_summary_bucket", Help: "custom summary", Objectives: map[float64]float64{0.5:0.05,0.9:0.01,0.99:0.001}})
prometheus.MustRegister(myCounter, myGauge, myHistogram, mySummary)Metrics with labels use NewCounterVec (or similar) and With(prometheus.Labels{...}) to set values.
myCounterVec := prometheus.NewCounterVec(prometheus.CounterOpts{Name: "my_counter_total", Help: "custom counter"}, []string{"label1", "label2"})
myCounterVec.With(prometheus.Labels{"label1":"1", "label2":"2"}).Inc()PromQL Basics
PromQL expressions can be instant vectors, range vectors, scalars, or strings. Common functions include rate(), irate(), sum() with by() or without(), and histogram_quantile() for percentile calculations.
# Example: QPS per path
sum(rate(demo_api_request_duration_seconds_count{job="demo",method="GET",status="200"}[5m])) by (path)Grafana Visualization
Grafana can be added as a data source pointing to Prometheus, then dashboards are built using PromQL queries.
Alertmanager
Define alert rules in a separate file and reference it from prometheus.yml. Example rule triggers when a job has no up targets for one minute:
groups:
- name: simulator-alert-rule
rules:
- alert: HttpSimulatorDown
expr: sum(up{job="http_srv"}) == 0
for: 1m
labels:
severity: criticalConfigure Alertmanager with SMTP settings to send email notifications, and use its UI to silence alerts.
Summary
This guide provides a complete overview of Prometheus, covering its architecture, metric collection models, configuration, custom exporters, query language, visualization with Grafana, and alerting with Alertmanager, equipping developers and operations engineers to monitor and observe their systems effectively.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Open Source Linux
Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
