Master Prometheus: From Basics to Advanced Monitoring, Alerting, and Grafana Integration
This comprehensive guide explains Prometheus fundamentals, its ecosystem, metric collection models, configuration, PromQL querying, custom exporters, Grafana visualization, and Alertmanager setup, providing step‑by‑step instructions and code examples for effective system monitoring and alerting.
Introduction
Prometheus is an open‑source monitoring solution that collects and stores time‑series metrics, providing insight into system health.
Ecosystem
Prometheus includes components for exposing metrics, scraping, storage, visualization and alerting.
Metrics collection
Each monitored service is a Job with Targets. Metrics can be exported via SDKs or exporters (MySQL, Consul, etc.). PushGateway is used for short‑lived jobs.
Pull model
Prometheus regularly pulls metrics from the
/metricsendpoint; the interval is configured with
scrape_interval.
Push model
PushGateway allows services to push metrics which Prometheus then pulls.
Storage and query
Metrics are stored in an internal TSDB and queried with PromQL via the web UI or Grafana.
Alerting
Alertmanager handles alerts generated from PromQL expressions and can route them to email, WeChat, etc.
How it works
Service registration (static or dynamic), configuration reload, and metric scraping flow are illustrated.
Static registration
<code>scrape_configs:
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"]
</code>Dynamic registration
<code>- job_name: "node_export_consul"
metrics_path: /node_metrics
scheme: http
consul_sd_configs:
- server: localhost:8500
services:
- node_exporter
</code>After editing the config, reload with
--web.enable-lifecycleand POST to
/-/reload.
Metric model
Each time‑series consists of metric name with labels, timestamp, and value. Types include Counter, Gauge, Histogram, Summary.
Counter
Monotonically increasing values such as request counts.
Gauge
Values that can go up and down, e.g., memory usage.
Histogram and Summary
Statistical distributions; Histograms are bucketed and require client‑side bucket configuration.
PromQL
PromQL supports instant vectors, range vectors, and functions like
rate,
irate,
sumwith
by/
without, and
histogram_quantile.
Grafana visualization
Connect Grafana to Prometheus as a data source, create dashboards, and write PromQL queries to visualize metrics.
Alerting configuration
Define alert rules in YAML, configure Alertmanager with receivers (e.g., email), and silence alerts via the UI.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.