Why Prometheus Uses TSDB: Mastering Scalable Monitoring and Queries
This article explains how Prometheus, a data‑driven monitoring system, leverages a time‑series database (TSDB) to handle massive metric volumes, perform efficient queries, and enable powerful calculations such as recording rules for pre‑computed results.
Background
For many people, the unknown and uncontrollable can trigger subconscious avoidance; the author felt the same when first encountering Prometheus, which can seem daunting due to its many concepts and high entry barrier.
Concepts: Instance, Job, Metric, Metric Name, Metric Label, Metric Value, Metric Type (Counter, Gauge, Histogram, Summary), DataType (Instant Vector, Range Vector, Scalar, String), Operator, Function
As Ma said, "Although Alibaba is the world’s largest retail platform, it is a data company, not a retail company." Similarly, Prometheus is fundamentally a data‑based monitoring system.
Daily Monitoring
Assume we need to monitor the request volume of each API on WebServerA, with dimensions such as service name (job), instance IP (instance), API name (handler), method, response code (code), and request count (value).
Example SQL‑like queries:
Query request count where method="put" and code="200" (red box):
SELECT * FROM http_requests_total WHERE code="200" AND method="put" AND created_at BETWEEN 1495435700 AND 1495435710;Query request count where handler="prometheus" and method="post" (green box):
SELECT * FROM http_requests_total WHERE handler="prometheus" AND method="post" AND created_at BETWEEN 1495435700 AND 1495435710;Query request count where instance="10.59.8.110" and handler starts with "query" (green box):
SELECT * FROM http_requests_total WHERE handler="query" AND instance="10.59.8.110" AND created_at BETWEEN 1495435700 AND 1495435710;From these examples, daily monitoring involves dimension‑based queries combined with time ranges. Monitoring 100 services, each with 10 instances, 20 APIs, 4 methods, collecting data every 30 seconds and retaining 60 days yields about 13.8 billion data points, which is infeasible for relational databases like MySQL. Hence Prometheus uses a TSDB storage engine.
Storage Engine
TSDB perfectly fits the monitoring data scenario.
Enormous data volume
Predominantly write operations
Writes are mostly sequential, ordered by time
Rarely writes old data or updates existing data
Deletes are block‑based, removing whole time ranges
Data size typically exceeds memory; caching has little effect
Read operations are ordered (ascending or descending)
High‑concurrency reads are common
How does TSDB achieve this?
{"labels":[{"latency":"500"}],"samples":[{"timestamp":1473305798,"value":0.9}]}Raw data consists of two parts: labels (monitoring dimensions) and samples (timestamp and value). Labels uniquely identify a time series (series_id); samples hold the actual metric values.
TSDB stores series using timeseries:doc:: as the key and builds three indexes to accelerate queries: Series, Label Index, and Time Index.
Example with label latency:
Series
Stores all label key‑value pairs in lexical order (series) and an index of time windows pointing to data blocks, allowing fast skipping of irrelevant records during queries.
Label Index
Each label is stored as index:label: key, containing a list of all its values and references to the starting position of the corresponding series.
Time Index
Data is stored with index:timeseries:: keys pointing to files for specific time intervals.
Data Computation
The robust storage engine enables powerful data computation, distinguishing Prometheus from other monitoring services. Users can query different metric series, apply basic operators and advanced functions, and perform matrix operations on metric series.
This capability makes Prometheus comparable to a “data warehouse + compute platform” for monitoring, indicating the future direction of monitoring systems.
One Computation, Many Queries
Such powerful computation consumes significant resources, so pre‑computing results is often faster than evaluating raw expressions each time. Prometheus provides Recording Rules to pre‑compute frequently used or expensive expressions and store them as new time series, achieving one computation, many queries.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
