Operations 12 min read

Avoid These 6 Common Prometheus Mistakes When Getting Started

This guide translates and condenses six frequent errors new Prometheus users make—high‑cardinality labels, losing valuable tags during aggregation, using bare selectors, omitting the for field, choosing too‑short rate windows, and applying rate‑related functions to wrong metric types—offering practical fixes to improve monitoring reliability.

Efficient Ops
Efficient Ops
Efficient Ops
Avoid These 6 Common Prometheus Mistakes When Getting Started
This article is translated from https://promlabs.com/blog/2022/12/11/avoid-these-6-mistakes-when-getting-started-with-prometheus. The author’s summary of common Prometheus pitfalls is presented here for review and self‑reflection.

Mistake 1: High‑Cardinality Explosion

Prometheus stores time series using multiple labels, which is flexible but can cause severe performance issues (including OOM) if a label’s values are not sufficiently convergent. Adding a high‑cardinality label such as a unique user ID creates a separate series for each value.

Example of a low‑cardinality metric:

<code>http_requests_total{method="POST"}
http_requests_total{method="GET"}
http_requests_total{method="PUT"}
http_requests_total{method="DELETE"}</code>

Adding a user_id label creates many series:

<code>http_requests_total{method="POST",user_id="1"}
http_requests_total{method="POST",user_id="2"}
... (many more) ...
http_requests_total{method="GET",user_id="16434313"}</code>

When the number of distinct users is large, memory usage spikes and can lead to OOM. Avoid high‑cardinality values such as public IPs, email addresses, full HTTP request paths with dynamic IDs, and process IDs unless they form a limited set. Use placeholders (e.g.,

/api/users/{user_id}/posts/{post_id}

) to reduce cardinality.

Mistake 2: Losing Valuable Labels During Aggregation

When writing alert rules, aggregations like

sum()

drop all labels by default, which can remove useful routing information such as the

job

label. Preserve needed labels with

sum by(job)

or use

sum without(instance, type)

to exclude only unwanted labels.

Mistake 3: Using Bare Selectors

Writing PromQL queries without restricting the selector (e.g.,

rate(errors_total[5m]) &gt; 10

) may pull data from unrelated jobs that share the same metric name, causing false alerts and performance issues. Always scope queries with a label like

{job="my-job"}

.

Mistake 4: Omitting the for Field in Alert Rules

The

for

field defines how long a condition must persist before an alert fires, helping to filter out transient spikes. Example without

for

:

<code>alert: InstanceDown
expr: up == 0</code>

Improved rule with

for

:

<code>alert: InstanceDown
expr: up == 0
for: 5m</code>

Adding

for

to most alerts makes them more robust, though it may increase detection latency.

Mistake 5: Using Too‑Short Rate Windows

Rate functions need at least two samples within the window. If the window is shorter than the scrape interval, the function may return no data. Choose a window at least four times the scrape interval to handle occasional scrape failures and alignment issues.

Example of a too‑short window (1 min) on a 15 s scrape interval can miss samples, while a 4× interval (e.g., 60 s) provides reliable results.

Mistake 6: Applying Rate‑Related Functions to Wrong Metric Types

rate()

,

irate()

, and

increase()

are designed for counter metrics, which only increase. Using them on gauges (e.g., memory usage) leads to incorrect results because decreases are interpreted as counter resets.

deriv()

works on gauges but should not be used on counters, as it lacks reset compensation and can produce negative values.

To avoid these mistakes, verify metric types before applying functions, and consider tools like PromLens to help detect mismatches.

Conclusion

The six points above highlight frequent pitfalls for newcomers to Prometheus and provide practical tips to improve monitoring setups.

monitoringObservabilityalertingPrometheuspromql
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.