Operations 8 min read

Why Prometheus’s TSDB Makes Monitoring Scalable: A Deep Dive

This article explains how Prometheus’s time‑series database handles massive monitoring data, illustrates practical query examples, and shows why its storage engine and pre‑computation features enable efficient, high‑performance observability for large‑scale services.

Efficient Ops
Efficient Ops
Efficient Ops
Why Prometheus’s TSDB Makes Monitoring Scalable: A Deep Dive

Background

Many beginners feel overwhelmed by Prometheus because it introduces numerous concepts such as Instance, Job, Metric, Metric Name, Metric Label, Metric Value, Metric Type (Counter, Gauge, Histogram, Summary), DataType (Instant Vector, Range Vector, Scalar, String), Operator, and Function. Like Alibaba, which is fundamentally a data company, Prometheus is essentially a data‑driven monitoring system.

Daily Monitoring

To monitor each API request of a web server (e.g., WebServerA), dimensions include service name (job), instance IP (instance), API name (handler), method, response code, and request count.

Example SQL‑like queries on the

http_requests_total

metric:

<code>SELECT * FROM http_requests_total WHERE code="200" AND method="put" AND created_at BETWEEN 1495435700 AND 1495435710;</code>
<code>SELECT * FROM http_requests_total WHERE handler="prometheus" AND method="post" AND created_at BETWEEN 1495435700 AND 1495435710;</code>
<code>SELECT * FROM http_requests_total WHERE handler="query%" AND instance="10.59.8.110" AND created_at BETWEEN 1495435700 AND 1495435710;</code>

When monitoring 100 services, each with 10 instances, 20 APIs, 4 methods, collecting data every 30 seconds and retaining 60 days, the total data points reach roughly 13.8 billion rows, which is impractical for traditional relational databases. Therefore, Prometheus uses a Time‑Series Database (TSDB) as its storage engine.

Storage Engine

TSDB perfectly matches the characteristics of monitoring data:

Massive data volume

Predominantly write‑heavy workload

Writes are mostly sequential, ordered by timestamp

Rarely updates old data; writes occur shortly after collection

Deletion is performed in block ranges, not individual points

Data size typically exceeds memory, limiting cache effectiveness

Read operations are ordered (ascending or descending) scans

High‑concurrency reads are common

TSDB stores each sample as two parts:

labels

(the dimensions) and

samples

(timestamp and value). The label set uniquely identifies a time series (series_id).

<code>{
  "labels": [{"latency": "500"}],
  "samples": [{"timestamp": 1473305798, "value": 0.9}]
}
</code>

A simplified series diagram:

<code>series
│ server{latency="500"}
│ server{latency="300"}
│ server{}
│ ...
│ <--- time --->
</code>

TSDB builds three auxiliary indexes to accelerate queries:

Series Index

Stores all label key‑value pairs in lexical order and maps them to their time‑series identifiers.

Label Index

For each label, an index key points to a list of all its values and references the starting position of the corresponding series.

Time Index

Maps a time range to the file blocks that contain the data for that interval.

Data Computation

The powerful storage engine enables Prometheus to perform matrix operations on metric series using built‑in operators and functions, effectively turning the monitoring system into a combined data‑warehouse and compute platform.

Prometheus matrix operation diagram
Prometheus matrix operation diagram

One Calculation, Many Queries

Because intensive calculations consume resources, Prometheus encourages the use of recording rules to pre‑compute frequently needed or expensive expressions. The results are stored as new time series, allowing a single computation to serve multiple queries, such as dashboard refreshes and alert evaluations.

Prometheus recording rules diagram
Prometheus recording rules diagram
monitoringObservabilityPrometheustime-series databaseTSDB
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.