Operations 8 min read

Unlocking Prometheus: How TSDB Powers Scalable Monitoring and Fast Queries

This article demystifies Prometheus by explaining its core concepts, daily monitoring queries, the role of its TSDB storage engine, how series, label, and time indexes enable fast time‑series queries, and how pre‑computed recording rules boost performance for dashboards and alerts.

Efficient Ops
Efficient Ops
Efficient Ops
Unlocking Prometheus: How TSDB Powers Scalable Monitoring and Fast Queries

Background

For many people the unknown feels intimidating; the author felt the same when first encountering Prometheus, which can seem overwhelming for beginners because it contains many concepts.

Concepts: Instance, Job, Metric, Metric Name, Metric Label, Metric Value, Metric Type (Counter, Gauge, Histogram, Summary), DataType (Instant Vector, Range Vector, Scalar, String), Operator, Function

As Jack Ma said, Alibaba is a data company rather than a retail company; similarly, Prometheus is fundamentally a data‑driven monitoring system.

Daily Monitoring

Assume we need to monitor the request volume of each API on WebServerA. The dimensions include job (service name), instance (IP), handler (API name), method, code (status), and value (request count).

Example SQL‑like queries:

<code>SELECT * FROM http_requests_total WHERE code="200" AND method="put" AND created_at BETWEEN 1495435700 AND 1495435710;</code>
<code>SELECT * FROM http_requests_total WHERE handler="prometheus" AND method="post" AND created_at BETWEEN 1495435700 AND 1495435710;</code>
<code>SELECT * FROM http_requests_total WHERE handler="query%" AND instance="10.59.8.110" AND created_at BETWEEN 1495435700 AND 1495435710;</code>

From these examples we see that daily monitoring mainly involves dimension‑based queries combined with time ranges. If we monitor 100 services, each with 10 instances, 20 APIs, 4 methods, collecting data every 30 seconds and retaining 60 days, the total data points reach about 13.8 billion rows, which is impractical for relational databases. Therefore Prometheus uses a Time‑Series Database (TSDB) as its storage engine.

Storage Engine

TSDB as Prometheus' storage engine perfectly fits the monitoring data scenario.

Massive data volume

Mostly write operations

Writes are almost sequential, ordered by time

Writes rarely modify old data; data is written shortly after collection

Deletion is block‑based, removing whole time ranges

Data size usually exceeds memory; only a small, irregular subset is cached

Read operations are typical ascending or descending sequential reads

High‑concurrency reads are common

How does TSDB achieve this?

<code>"labels": [{"latency":"500"}]
"samples":[{"timestamp":1473305798,"value":0.9}]
</code>

Raw data consists of two parts: labels (the monitoring dimensions) and samples (timestamp and value). Labels and metric name uniquely identify a time series (represented by a series_id); samples store the actual measurements.

<code>series
│ server{latency="500"}
│ server{latency="300"}
│ server{}
</code>

TSDB stores values using a timeseries key and builds three indexes—Series, Label Index, and Time Index—to accelerate common label‑and‑time‑range queries.

Series

One part stores all label key‑value pairs in lexical order; the other part stores an index of time windows pointing to data blocks, allowing queries to skip irrelevant records.

Label Index

Each label is stored under an index:label: key, containing a list of all its values and references to the starting position of the corresponding series.

Time Index

Data is stored under an index:timeseries:: key that points to the file containing the data for a specific time interval.

Data Computation

The powerful storage engine enables Prometheus to perform matrix operations on metric series using basic operators and a rich set of functions, effectively turning the monitoring system into a data‑warehouse‑plus‑computation platform.

One Calculation, Many Queries

Because such computation can be resource‑intensive, pre‑computing results (recording rules) is often faster than evaluating the original expression each time, especially for dashboards and alerting rules that repeatedly query the same expression.

monitoringObservabilityPrometheusTSDBTimeSeries
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.