Cloud Native 19 min read

Mastering Prometheus Metrics: Counters, Gauges, Histograms & Summaries Explained

This article introduces the fundamentals of metrics in IT monitoring, explains the structure of metric data points, explores dimensional metrics, and provides an in‑depth guide to Prometheus metric types—Counters, Gauges, Histograms, and Summaries—along with practical code examples and usage considerations in cloud‑native environments.

MaGe Linux Operations

Sep 13, 2023

Mastering Prometheus Metrics: Counters, Gauges, Histograms & Summaries Explained

Metrics are used to measure performance, consumption, efficiency, and many other software attributes over time. They enable engineers to monitor trends via alerts and dashboards (e.g., CPU or memory usage, request latency). Metrics have a long history in IT monitoring and are used alongside logs and tracing.

In its simplest form, a metric data point consists of three parts: a metric name, a timestamp, and a numeric value.

Over the past decade, as systems grew more complex, the concept of dimensional metrics emerged: a metric can include a set of labels (dimensions) that provide additional context. Monitoring systems that support dimensional metrics let engineers query a metric name and filter/group by labels to aggregate and analyze data across components and dimensions.

Prometheus defines a metric exposition format and a remote write protocol, which have become de‑facto standards adopted by the community and many vendors. OpenMetrics, a CNCF project built on the Prometheus export format, provides a vendor‑agnostic standard for metric collection. OpenTelemetry, another CNCF project, aims to unify the collection of metrics, traces, and logs.

Prometheus Metric Types

Prometheus collects four metric types as part of its exposition format.

Counters

Gauges

Histograms

Summaries

Prometheus uses a pull model: it actively scrapes HTTP endpoints that expose metrics. These endpoints can be native to the monitored component or provided by one of the many Prometheus exporters. Client libraries are available for multiple programming languages.

The pull model works well for Kubernetes clusters but can be challenging for dynamic environments such as virtual machines, AWS Fargate, or Lambda, where network policies may restrict access. Prometheus Agent Mode (released end‑2021) addresses some of these issues by collecting metrics locally and sending them via remote write.

Prometheus can scrape both the Prometheus exposition format and the OpenMetrics format, using either a simple text format or a more efficient Protobuf format. The text format is human‑readable, allowing inspection via a browser or tools like curl.

Each unique combination of metric name and label set defines a time series; each timestamped float value is a sample within that series.

Metadata can be attached to metrics to define their type and provide descriptions, which tools like Grafana use to display additional context.

Counter

Counters are monotonically increasing values; they only go up (except when the process restarts, resetting to zero). They are useful for calculating deltas or rates.

Typical use case: counting total API requests.

# HELP http_requests_total Total number of http api requests
# TYPE http_requests_total counter
http_requests_total{api="add_product"} 4633433

The metric name is http_requests_total with label api="add_product" and value 4633433. The _total suffix indicates a Counter.

When combined with PromQL’s rate function, you can compute per‑second request rates:

rate(http_requests_total{api="add_product"}[5m])

To compute the absolute increase over a period, use increase (or irate in older versions):

increase(http_requests_total{api="add_product"}[5m])

Python example using the Prometheus client library:

from prometheus_client import Counter
api_requests_counter = Counter(
    'http_requests_total',
    'Total number of http api requests',
    ['api']
)
api_requests_counter.labels(api='add_product').inc()

Gauge

Gauges represent values that can go up or down, such as temperature, CPU usage, memory usage, or queue length.

# HELP node_memory_used_bytes Total memory used in the node in bytes
# TYPE node_memory_used_bytes gauge
node_memory_used_bytes{hostname="host1.domain.com"} 943348382

This gauge shows that host host1.domain.com is using roughly 900 MiB of memory.

Functions like avg_over_time, max_over_time, min_over_time, and quantile_over_time are commonly used with Gauges. Example to compute average memory usage over the last 10 minutes:

avg_over_time(node_memory_used_bytes{hostname="host1.domain.com"}[10m])

Python example:

from prometheus_client import Gauge
memory_used = Gauge(
    'node_memory_used_bytes',
    'Total memory used in the node in bytes',
    ['hostname']
)
memory_used.labels(hostname='host1.domain.com').set(943348382)

Histogram

Histograms are useful for representing the distribution of measurements, such as request latency or response size. A histogram consists of:

A Counter for the number of observations (suffix _count).

A Counter for the sum of all observed values (suffix _sum).

Multiple Counters for buckets, each with a le label indicating the upper bound.

Example of a histogram for API response time:

# HELP http_request_duration_seconds Api requests response time in seconds
# TYPE http_request_duration_seconds histogram
http_request_duration_seconds_sum{api="add_product",instance="host1.domain.com"} 8953.332
http_request_duration_seconds_count{api="add_product",instance="host1.domain.com"} 27892
http_request_duration_seconds_bucket{api="add_product",instance="host1.domain.com",le="0.01"} 0
http_request_duration_seconds_bucket{api="add_product",instance="host1.domain.com",le="0.025"} 8
... (additional buckets) ...

Average request latency over the last 5 minutes can be calculated as:

rate(http_request_duration_seconds_sum{api="add_product",instance="host1.domain.com"}[5m]) / rate(http_request_duration_seconds_count{api="add_product",instance="host1.domain.com"}[5m])

Percentile (quantile) calculation uses histogram_quantile:

histogram_quantile(0.99, rate(http_request_duration_seconds_bucket{api="add_product",instance="host1.domain.com"}[5m]))

Python example with custom buckets:

from prometheus_client import Histogram
api_request_duration = Histogram(
    name='http_request_duration_seconds',
    documentation='Api requests response time in seconds',
    labelnames=['api', 'instance'],
    buckets=(0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10, 25)
)
api_request_duration.labels(api='add_product', instance='host1.domain.com').observe(0.3672)

Summary

Summaries also measure request duration and size, but they compute quantiles on the client side.

Counter for total observations (suffix _count).

Counter for sum of observations (suffix _sum).

Optional quantile metrics generated on the client.

Example of a Summary for API response time:

# HELP http_request_duration_seconds Api requests response time in seconds
# TYPE http_request_duration_seconds summary
http_request_duration_seconds_sum{api="add_product",instance="host1.domain.com"} 8953.332
http_request_duration_seconds_count{api="add_product",instance="host1.domain.com"} 27892
http_request_duration_seconds{api="add_product",instance="host1.domain.com",quantile="0.5"} 0.232227334
http_request_duration_seconds{api="add_product",instance="host1.domain.com",quantile="0.99"} 2.829188272

Python example:

from prometheus_client import Summary
api_request_duration = Summary(
    'http_request_duration_seconds',
    'Api requests response time in seconds',
    ['api', 'instance']
)
api_request_duration.labels(api='add_product', instance='host1.domain.com').observe(0.3672)

Histogram vs. Summary

Generally, Histograms are preferred because they are more flexible and support aggregation across instances, which is crucial in cloud‑native environments. Summaries are useful when precise quantiles are required and aggregation is not needed.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

monitoring Metrics Prometheus

Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.