Operations 12 min read

Mastering Prometheus Histograms: From Basics to Advanced Queries

This article explains the fundamentals of Prometheus Histogram metrics, covering data format, metric types, how histograms work as cumulative time series, provides Go code examples for collection, and demonstrates practical queries for rate, bucket analysis, and quantile calculations to monitor service performance.

Efficient Ops
Efficient Ops
Efficient Ops
Mastering Prometheus Histograms: From Basics to Advanced Queries

In modern micro‑service systems, the performance of each service on a request chain is critical; a slowdown in one service drags down the entire chain. Effective optimization starts with collecting and observing the right metrics, and Prometheus Histogram is a powerful but often misunderstood metric type.

1. Data Format and Metric Types

Prometheus stores all collected samples as time‑series in an in‑memory database and periodically writes them to disk. Each time‑series is identified by a metric name and a set of labels, forming a vector that can be visualized as a matrix with time on the X‑axis.

^
│   . . . . . . . . . . . . . . . . .   . .   
node_cpu_seconds_total{cpu="cpu0",mode="idle"}   
│     . . . . . . . . . . . . . . . . . . .   
node_cpu_seconds_total{cpu="cpu0",mode="system"}   
│     . . . . . . . . . .   . . . . . . . .   node_load1{}   
│     . . . . . . . . . . . . . . . .   . .   
v
      <------------------ time ---------------->

Each sample consists of three parts:

Metric: the metric name together with its label set describing the sample.

Timestamp: a millisecond‑precision timestamp.

Value: a float64 representing the sample's value.

2. Metric Types

Counter : stores monotonically increasing counts, e.g., total page views.

Gauge : stores values that can go up and down, e.g., free memory.

Summary : an extension of Histogram that records quantiles directly on the client side.

Histogram : samples observations into configurable buckets, allowing range queries and total counts, typically visualized as a histogram.

While Counter and Gauge are straightforward, Histogram is more complex but essential for tracking latency and other distribution‑based metrics.

3. How Histograms Work

Buckets are cumulative – each bucket contains the count of observations less than or equal to its upper bound.

A Histogram is itself a time‑series; Prometheus records a new snapshot at each scrape.

The series is cumulative over time – bucket counts never decrease, reflecting the total observations since the process started.

These properties mean that a Histogram provides both the distribution of values and the total count of observations.

4. Go Example

package main

import (
    "log"
    "math/rand"
    "net/http"
    "time"

    "github.com/prometheus/client_golang/prometheus"
    "github.com/prometheus/client_golang/prometheus/promauto"
    "github.com/prometheus/client_golang/prometheus/promhttp"
)

func main() {
    // Define a Histogram metric
    histogram := promauto.NewHistogram(prometheus.HistogramOpts{
        Name:    "histogram_showcase_metric",
        Buckets: []float64{5.0, 10.0, 20.0, 50.0, 100.0}, // configure buckets as needed
    })

    go func() {
        for {
            // Simulate random latency between 0 and 100 ms
            histogram.Observe(rand.Float64() * 100.0)
            time.Sleep(1 * time.Second)
        }
    }()

    // Expose the metrics endpoint
    http.Handle("/metrics", promhttp.Handler())
    log.Fatal(http.ListenAndServe(":8080", nil))
}

The program continuously records random values to the histogram and exposes them at /metrics for Prometheus to scrape.

Histogram buckets illustration
Histogram buckets illustration

5. Querying Histograms

Prometheus automatically creates a _count series for each histogram, which can be used to calculate request rates:

rate(histogram_showcase_metric_count[1m])

To observe the distribution, query the bucket series and compute per‑second rates:

rate(histogram_showcase_metric_bucket[1m])

Finally, use histogram_quantile to estimate quantiles, e.g., the 95th percentile:

histogram_quantile(0.95, rate(histogram_showcase_metric_bucket[1m]))
95th percentile result
95th percentile result

6. Practical Advice

QPS (queries per second) can be derived from rate(histogram_showcase_metric_count[1m]).

Latency monitoring should focus on p90, p95, and p99, which are directly obtainable via histogram_quantile.

Use rate or irate when querying to get per‑second changes.

7. Conclusion

The article covered the definition, internal mechanics, and practical usage of Prometheus Histogram metrics. Mastering Histograms enables robust monitoring of backend services, helping teams identify performance bottlenecks and guide optimization efforts.

GoMetricsHistogram
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.