Operations 20 min read

Master Prometheus: From Metrics Collection to Alerts and Grafana Visualization

This comprehensive guide walks you through Prometheus fundamentals, including metric exposure, scraping, storage, querying with PromQL, custom exporter creation in Go, dynamic configuration reloading, and visualizing data with Grafana, while also covering alerting with Alertmanager and best practices for accurate histogram bucket design.

21CTO

Jun 28, 2022

Master Prometheus: From Metrics Collection to Alerts and Grafana Visualization

Introduction

Prometheus is an open‑source monitoring solution that collects metrics from services (Jobs) via a pull model, stores them in a time‑series database, and provides a powerful query language (PromQL) and alerting via Alertmanager.

Ecosystem Overview

Metrics can be exposed directly by applications using the Prometheus client libraries or via exporters for common services such as MySQL, Kafka, etc. Services are registered as static or dynamic jobs in the scrape_configs section of prometheus.yml.

scrape_configs:
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090"]

Core Components

Prometheus scrapes metrics, stores them as time‑series (metric name, labels, timestamp, value), and supports four metric types: Counter, Gauge, Histogram, Summary. The article shows Go code examples for creating and registering each type, including labelled counters with NewCounterVec.

myCounter := prometheus.NewCounter(prometheus.CounterOpts{Name: "my_counter_total", Help: "custom counter"})
prometheus.MustRegister(myCounter)
myCounter.Add(23)

PromQL

PromQL provides instant vectors, range vectors, and scalar values. Common functions such as rate(), irate(), sum() with by() / without(), and histogram_quantile() are demonstrated to calculate QPS, percentiles, and other statistics.

rate(http_requests_total[5m])
sum(rate(http_requests_total[5m])) by (path)
histogram_quantile(0.99, go_gc_pauses_seconds_total_bucket)

Visualization

Metrics can be visualized in Prometheus’s own UI or exported to Grafana. The guide shows how to add Prometheus as a data source in Grafana, create dashboards, and use PromQL queries in panels.

Alerting

Alertmanager receives alerts generated by Prometheus rules, groups them, and forwards them via email, Slack, etc. Example alert rule triggers when a job’s up metric stays at zero for one minute, and the configuration for Alertmanager (SMTP settings, receivers) is provided.

groups:
- name: simulator-alert-rule
  rules:
  - alert: HttpSimulatorDown
    expr: sum(up{job="http_srv"}) == 0
    for: 1m
    labels:
      severity: critical

Best Practices

When using Histograms, bucket boundaries must be chosen to match the expected value range; otherwise percentile calculations can be inaccurate. The article demonstrates correcting bucket definitions to obtain realistic P50 and P99 values.

MyHistogram = prometheus.NewHistogram(prometheus.HistogramOpts{
    Name: "my_histogram_bucket",
    Help: "custom histogram",
    Buckets: []float64{0.1, 0.2, 0.3, 0.4, 0.5},
})
MyHistogram.Observe(0.3)

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

monitoring metrics Alerting Prometheus PromQL Grafana

Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.