Master Prometheus: From Metrics Collection to Alerts and Grafana Visualization
This comprehensive guide walks you through Prometheus fundamentals, including metric exposure, scraping, storage, querying with PromQL, custom exporter creation in Go, dynamic configuration reloading, and visualizing data with Grafana, while also covering alerting with Alertmanager and best practices for accurate histogram bucket design.
Introduction
Prometheus is an open‑source monitoring solution that collects metrics from services (Jobs) via a pull model, stores them in a time‑series database, and provides a powerful query language (PromQL) and alerting via Alertmanager.
Ecosystem Overview
Metrics can be exposed directly by applications using the Prometheus client libraries or via exporters for common services such as MySQL, Kafka, etc. Services are registered as static or dynamic jobs in the scrape_configs section of prometheus.yml.
scrape_configs:
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"]Core Components
Prometheus scrapes metrics, stores them as time‑series (metric name, labels, timestamp, value), and supports four metric types: Counter, Gauge, Histogram, Summary. The article shows Go code examples for creating and registering each type, including labelled counters with NewCounterVec.
myCounter := prometheus.NewCounter(prometheus.CounterOpts{Name: "my_counter_total", Help: "custom counter"})
prometheus.MustRegister(myCounter)
myCounter.Add(23)PromQL
PromQL provides instant vectors, range vectors, and scalar values. Common functions such as rate(), irate(), sum() with by() / without(), and histogram_quantile() are demonstrated to calculate QPS, percentiles, and other statistics.
rate(http_requests_total[5m])
sum(rate(http_requests_total[5m])) by (path)
histogram_quantile(0.99, go_gc_pauses_seconds_total_bucket)Visualization
Metrics can be visualized in Prometheus’s own UI or exported to Grafana. The guide shows how to add Prometheus as a data source in Grafana, create dashboards, and use PromQL queries in panels.
Alerting
Alertmanager receives alerts generated by Prometheus rules, groups them, and forwards them via email, Slack, etc. Example alert rule triggers when a job’s up metric stays at zero for one minute, and the configuration for Alertmanager (SMTP settings, receivers) is provided.
groups:
- name: simulator-alert-rule
rules:
- alert: HttpSimulatorDown
expr: sum(up{job="http_srv"}) == 0
for: 1m
labels:
severity: criticalBest Practices
When using Histograms, bucket boundaries must be chosen to match the expected value range; otherwise percentile calculations can be inaccurate. The article demonstrates correcting bucket definitions to obtain realistic P50 and P99 values.
MyHistogram = prometheus.NewHistogram(prometheus.HistogramOpts{
Name: "my_histogram_bucket",
Help: "custom histogram",
Buckets: []float64{0.1, 0.2, 0.3, 0.4, 0.5},
})
MyHistogram.Observe(0.3)Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
