Why Prometheus’s TSDB Makes Massive Monitoring Data Manageable
The article explains how Prometheus, a data‑driven monitoring system, handles massive time‑series data using its TSDB storage engine, detailing concepts, query examples, storage characteristics, indexing mechanisms, and the benefits of pre‑computing rules for efficient monitoring at scale.
Background
For many people, the unknown and uncontrollable provoke subconscious avoidance. When I first encountered Prometheus I felt the same; its many concepts and high entry barrier can be daunting for beginners.
Key concepts: Instance, Job, Metric, Metric Name, Metric Label, Metric Value, Metric Type (Counter, Gauge, Histogram, Summary), DataType (Instant Vector, Range Vector, Scalar, String), Operator, Function.
As Ma said, although Alibaba is the world’s largest retail platform, it is fundamentally a data company; similarly, Prometheus is essentially a data‑driven monitoring system.
Daily Monitoring
Assume we need to monitor the request count of each API on WebServerA, with dimensions such as service name (job), instance IP (instance), API name (handler), method, response code (code), and request count (value).
Example SQL‑like queries:
SELECT * from http_requests_total WHERE code="200" AND method="put" AND created_at BETWEEN 1495435700 AND 1495435710; SELECT * from http_requests_total WHERE handler="prometheus" AND method="post" AND created_at BETWEEN 1495435700 AND 1495435710; SELECT * from http_requests_total WHERE handler="query" AND instance="10.59.8.110" AND created_at BETWEEN 1495435700 AND 1495435710;These examples show that routine monitoring involves dimension‑based queries combined with time ranges. Monitoring 100 services, each with 10 instances, 20 APIs, 4 methods, collecting data every 30 seconds and retaining 60 days would generate about 13.8 billion data points, which is infeasible for relational databases like MySQL. Therefore Prometheus uses a TSDB storage engine.
Storage Engine
TSDB perfectly fits the monitoring data use case.
Massive data volume.
Predominantly write operations.
Writes are mostly sequential, ordered by time.
Writes rarely modify old data; data is written shortly after collection.
Deletes are block‑based, removing whole time ranges.
Data size exceeds memory; caching has little effect.
Reads are typically ordered (ascending or descending).
High‑concurrency reads are common.
How does TSDB achieve this?
"labels": [{
"latency": "500"
}]
"samples":[{
"timestamp": 1473305798,
"value": 0.9
}]Raw data consists of two parts: labels (monitoring dimensions) and samples (timestamp and value). Labels uniquely identify a time series (series_id), while samples hold the actual measurements.
TSDB stores values using timeseries:doc:: as the key and builds three indexes to accelerate queries: Series, Label Index, and Time Index.
Series
Series stores all label key‑value pairs in lexical order and indexes time windows to quickly skip irrelevant blocks during queries.
Label Index
Each label creates an index:label: key that stores a list of all its values and references the start of the corresponding series.
Time Index
Data is keyed by index:timeseries:: , pointing to files for specific time intervals.
Data Computation
The powerful storage engine enables advanced calculations. Prometheus can select different metric series, apply basic operators and rich functions, and perform matrix operations on metric series.
This capability makes Prometheus comparable to a combined data warehouse and compute platform, illustrating the future direction of monitoring.
One Calculation, Many Queries
Such computational power consumes resources, so pre‑computing results is often faster than evaluating expressions on each dashboard refresh or alert evaluation. Prometheus provides Recording Rules to compute expensive expressions in advance and store them as new time series, achieving “one calculation, many queries”.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
