Operations 18 min read

Understanding Prometheus Metric Types: Counters, Gauges, Histograms & Summaries

This article explains the fundamentals of metrics, the evolution of dimensional data, and provides a deep dive into Prometheus' four metric types—Counters, Gauges, Histograms, and Summaries—complete with practical code examples, query patterns, and a comparison of their strengths and trade‑offs.

dbaplus Community

Feb 16, 2023

Understanding Prometheus Metric Types: Counters, Gauges, Histograms & Summaries

Metrics are time‑series data points used to monitor performance, resource consumption, and many other software attributes. A basic metric consists of a name, a timestamp, and a numeric value; modern systems often add a set of labels (dimensions) to provide context.

Prometheus, a CNCF project, has become the de‑facto open‑source monitoring system and defines a standard exposition format and remote‑write protocol. OpenMetrics builds on this format to create a vendor‑neutral model, while OpenTelemetry aims to unify metrics, traces, and logs.

Prometheus Metric Types

1. Counter

Counters only increase (or reset to zero on restart) and are typically used to count events such as API calls. The raw value is rarely useful alone; it is often combined with rate() or increase() in PromQL to compute per‑second rates or deltas.

# HELP http_requests_total Total number of http api requests
# TYPE http_requests_total counter
http_requests_total{api="add_product"} 4633433

Python example:

from prometheus_client import Counter
api_requests_counter = Counter('http_requests_total', 'Total number of http api requests', ['api'])
api_requests_counter.labels(api='add_product').inc()

2. Gauge

Gauges represent values that can go up or down, such as memory usage or queue length. They are directly readable; functions like avg_over_time() or max_over_time() are used for statistical analysis.

# HELP node_memory_used_bytes Total memory used in the node in bytes
# TYPE node_memory_used_bytes gauge
node_memory_used_bytes{hostname="host1.domain.com"} 943348382

Python example:

from prometheus_client import Gauge
memory_used = Gauge('node_memory_used_bytes', 'Total memory used in the node in bytes', ['hostname'])
memory_used.labels(hostname='host1.domain.com').set(943348382)

3. Histogram

Histograms record the distribution of observations by counting occurrences in predefined buckets and exposing a sum and count. They enable calculation of quantiles via histogram_quantile() and can be aggregated across instances.

# HELP http_request_duration_seconds Api requests response time in seconds
# TYPE http_request_duration_seconds histogram
http_request_duration_seconds_sum{api="add_product",instance="host1.domain.com"} 8953.332
http_request_duration_seconds_count{api="add_product",instance="host1.domain.com"} 27892
http_request_duration_seconds_bucket{api="add_product",instance="host1.domain.com",le="0.1"} 8954
... (other buckets) ...

Python example with custom buckets:

from prometheus_client import Histogram
api_request_duration = Histogram(
    name='http_request_duration_seconds',
    documentation='Api requests response time in seconds',
    labelnames=['api', 'instance'],
    buckets=(0.01,0.025,0.05,0.1,0.25,0.5,1,2.5,5,10,25)
)
api_request_duration.labels(api='add_product',instance='host1.domain.com').observe(0.3672)

Typical queries:

rate(http_request_duration_seconds_sum{api="add_product",instance="host1.domain.com"}[5m]) / rate(http_request_duration_seconds_count{api="add_product",instance="host1.domain.com"}[5m])

Pros: flexible, support aggregation. Cons: bucket design must be planned, quantiles are approximations, and computing them can be expensive.

4. Summary

Summaries also expose sum, count, and configurable quantiles, but the quantile calculation happens on the client side. This yields more precise quantiles for a single instance but cannot be aggregated across instances and adds memory overhead.

# HELP http_request_duration_seconds Api requests response time in seconds
# TYPE http_request_duration_seconds summary
http_request_duration_seconds_sum{api="add_product",instance="host1.domain.com"} 8953.332
http_request_duration_seconds_count{api="add_product",instance="host1.domain.com"} 27892
http_request_duration_seconds{api="add_product",instance="host1.domain.com",quantile="0.5"} 0.232227334
... (other quantiles) ...

Python example:

from prometheus_client import Summary
api_request_duration = Summary('http_request_duration_seconds', 'Api requests response time in seconds', ['api', 'instance'])
api_request_duration.labels(api='add_product',instance='host1.domain.com').observe(0.3672)

Pros: more accurate quantiles per instance. Cons: expensive client‑side computation, quantiles must be predefined, and summaries cannot be aggregated.

Choosing Between Histogram and Summary

In most scenarios, histograms are preferred because they are flexible and support aggregation. Summaries are useful when exact quantiles are required for a single instance, such as meeting strict SLA contracts.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

OpenTelemetry Prometheus Counters Histograms Gauges Summaries

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.