Master PromQL: From Basics to Advanced Query Techniques for Monitoring
This comprehensive guide walks you through PromQL fundamentals, data types, query expressions, selectors, operators, aggregation, and essential functions, illustrating each concept with real‑world monitoring scenarios and code examples to help you effectively query and analyze time‑series data in Prometheus.
PromQL From Beginner to Expert
Table of Contents
Data Types
Gauge Type
Counter Type
Time Series Data
Understanding Time Series Data
Query Types
Query Selectors
Operators
Arithmetic Operators
Comparison Operators
Logical/Set Operators
Vector Matching
Aggregation Operations
Functions
absent_over_time
increase
rate
irate
histogram_quantile
_over_time
count_gt_over_time
Conclusion
For the Prometheus ecosystem, PromQL is an essential skill. This article focuses on the query language, mixing production scenarios to help you master it.
Data Types
Prometheus has four data types: Gauge, Counter, Histogram, and Summary. The most critical are Gauge and Counter; Histogram and Summary are conveniences for client‑side metric collection and can be viewed as combinations of Gauge and Counter.
Gauge Type
A Gauge represents the current state and can be positive, negative, large, or small. Examples include a VM instance status (0 for down, 1 for up), memory usage percentage, recent load, or the number of running processes. Gauges are useful when you care about the current value.
Counter Type
Counters are monotonically increasing values, such as total packets received on a network interface. The focus is on the increment or rate rather than the absolute value. Example output from ifconfig shows cumulative packet counts, which are typically sampled periodically (e.g., every 10 seconds) and require rate calculations on the server side.
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.206.0.16 netmask 255.255.240.0 broadcast 10.206.15.255
inet6 fe80::5054:ff:fed2:a180 prefixlen 64 scopeid 0x20<link>
ether 52:54:00:d2:a1:80 txqueuelen 1000 (Ethernet)
RX packets 457952401 bytes 125894899868 (117.2 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 518040495 bytes 276312546157 (257.3 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0Time Series Data
PromQL queries time‑series data. Understanding the nature of time‑series data is a prerequisite for using PromQL effectively.
Understanding Time Series Data
Example: memory availability of five machines displayed as a line chart. Each point on the line is a data point (timestamp + value). Switching to Table view shows the latest values for each machine at a specific moment.
Prometheus uses the --query.lookback-delta=2m flag to fetch the latest sample within a 2‑minute window when an exact timestamp is missing.
Query Types
Expressions like mem_available_percent{app="clickhouse"} are query expressions. There are four result formats: Instant vector, Range vector, Scalar, and String. Instant queries return instant vectors; range queries (e.g., adding [1m]) return range vectors.
Query Selectors
Selectors filter series by label matchers. Supported operators are = (exact match), != (not equal), =~ (regex match), and !~ (regex not match).
{__name__="mem_available_percent", app="clickhouse"}The name function can be used for regex matching on metric names.
Offset
The offset keyword retrieves historical data, e.g., sum(http_requests_total{method="GET"} offset 1d) compares current values with those from one day ago.
Operators
PromQL supports arithmetic (+, -, *, /, %, ^) and comparison (==, !=, >, <, >=, <=) operators, enabling server‑side calculations and alert logic.
Arithmetic Operators
+
-
*
/
%
^
Example: compute memory availability from raw metrics:
mem_available{app="clickhouse"} / mem_total{app="clickhouse"} * 100If label sets differ (e.g., net_bytes_recv includes an interface label while mem_total does not), the operation yields no result.
net_bytes_recv{app="clickhouse"} / mem_total{app="clickhouse"}Comparison Operators
Typical use: alert when memory availability drops below a threshold. mem_available_percent{app="clickhouse"} < 60 This expression can be used directly in alert rules; if it returns results, an alert is triggered.
Logical/Set Operators
and
or
unless
Example using and to filter disks with high usage only on small disks:
disk_used_percent{app="clickhouse"} > 70
and
disk_total{app="clickhouse"} / 1024 / 1024 / 1024 < 500Example using or for load alerts:
system_load1{app="clickhouse"} > 8
or
system_load5{app="clickhouse"} > 8Example using unless to exclude large disks:
disk_free{app="clickhouse"} / 1024 / 1024 / 1024 < 300
unless
disk_total{app="clickhouse"} / 1024 / 1024 / 1024 < 1024Vector Matching
Vector matching aligns series based on common labels. Keywords on and ignoring restrict the label set used for matching.
mysql_slave_status_slave_sql_running == 0
and ON (instance)
mysql_slave_status_master_server_id > 0Example with ignoring from Prometheus documentation:
method_code:http_errors:rate5m{code="500"}
/ ignoring(code)
method:http_requests:rate5mgroup_left and group_right
These modifiers handle one‑to‑many or many‑to‑one matches. Example using group_left to attach label_version from kube_pod_labels to request rate vectors:
sum(rate(http_request_count{code=~"^(?:5..)$"}[5m])) by (pod)
*
on (pod) group_left(label_version) kube_pod_labelsAggregation Operations
PromQL provides aggregation functions such as sum, min, max, avg, count, bottomk, topk, quantile, etc., to compute statistics across series.
avg(mem_available_percent{app="clickhouse"})
bottomk(2, mem_available_percent{app="clickhouse"})
avg(mem_available_percent{app=~"clickhouse|canal"}) by (app)Functions
Prometheus offers many functions; the article highlights a few.
absent_over_time
Returns 1 when a range vector is empty, useful for no‑data alerts.
absent_over_time(system_load_norm_1{ident="tt-fc-dev02.nj"}[5m])increase
Calculates the increase over a range, applying extrapolation when necessary.
increase(net_bytes_recv{interface="eth0"}[1m])📌 The increase function extrapolates based on the first and last points, then scales to the requested interval.
rate
Computes per‑second rate, essentially increase / interval.
rate(net_bytes_recv{interface="eth0"}[1m]) == bool increase(net_bytes_recv{interface="eth0"}[1m]) / 60.0irate
Uses the two most recent points for a more sensitive rate.
histogram_quantile
Estimates quantiles from histogram buckets. Example calculating the 90th percentile latency:
histogram_quantile(0.9, rate(http_request_duration_seconds_bucket[10m]))For per‑job quantiles:
histogram_quantile(0.9, sum by (job, le) (rate(http_request_duration_seconds_bucket[10m])))_over_time Functions
Functions ending with _over_time operate on range vectors, e.g., avg_over_time computes the average over the specified window.
avg_over_time(mem_available_percent{ident="10.3.4.5"}[1m])count_gt_over_time
Counts how many samples in a range exceed a threshold, useful for alerting.
count_gt_over_time(interface_status[5m], 10) >= 3Conclusion
The article covered core PromQL concepts, enriched with production examples. For deeper exploration, refer to the official Prometheus documentation.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
