Cloud Native 8 min read

Why Prometheus’s TSDB Makes Monitoring Scalable: A Deep Dive

This article explains how Prometheus’s time‑series database handles massive monitoring data, from basic concepts and query examples to storage engine design, indexing strategies, and powerful data computation techniques such as recording rules.

Open Source Linux
Open Source Linux
Open Source Linux
Why Prometheus’s TSDB Makes Monitoring Scalable: A Deep Dive

Background

For many people, the unknown and uncontrollable provoke subconscious avoidance; I felt the same when first encountering Prometheus. Beginners find Prometheus overwhelming because it contains many concepts and a high entry barrier.

Concepts: Instance, Job, Metric, Metric Name, Metric Label, Metric Value, Metric Type (Counter, Gauge, Histogram, Summary), DataType (Instant Vector, Range Vector, Scalar, String), Operator, Function

As Mr. Ma said, "Although Alibaba is the world’s largest retail platform, it is not a retail company but a data company." Prometheus is similar—it is fundamentally a data‑driven monitoring system.

Daily Monitoring

Assume we need to monitor the request volume of each API on WebServerA, with dimensions such as service name (job), instance IP (instance), API name (handler), method (method), response code (code), and request count (value).

Common query operations using SQL:

SELECT * FROM http_requests_total WHERE code="200" AND method="put" AND created_at BETWEEN 1495435700 AND 1495435710;
SELECT * FROM http_requests_total WHERE handler="prometheus" AND method="post" AND created_at BETWEEN 1495435700 AND 1495435710;
SELECT * FROM http_requests_total WHERE handler="query%" AND instance="10.59.8.110" AND created_at BETWEEN 1495435700 AND 1495435710;

These examples show that daily monitoring relies on dimension‑based queries combined with time ranges. If we monitor 100 services, each with 10 instances, 20 APIs, 4 methods, collecting data every 30 seconds and retaining 60 days, the total data points reach about 13.8 billion rows, which is infeasible for relational databases like MySQL. Therefore, Prometheus uses a time‑series database (TSDB) as its storage engine.

Storage Engine

TSDB perfectly matches the characteristics of monitoring data.

Enormous data volume

Predominantly write operations

Writes are almost sequential, ordered by time

Writes rarely modify old data; data is written shortly after collection

Deletion is performed in block ranges, not individual timestamps

Data size generally exceeds memory, making caching ineffective

Read operations are typical sequential reads (ascending or descending)

High‑concurrency reads are common

How does TSDB achieve these functions?

{
  "labels": [{
    "latency": "500"
  }],
  "samples": [{
    "timestamp": 1473305798,
    "value": 0.9
  }]
}

Raw data consists of two parts: labels (monitoring dimensions) and samples (timestamp and metric value). Labels uniquely identify a time series (series_id), while samples store the actual measurements.

series
^
│ ... server{latency="500"}
│ ... server{latency="300"}
│ ... server{}
│
v
<-------- time -------->

TSDB stores timeseries:doc:: as the key for values. To accelerate common queries that combine label filters with time ranges, TSDB builds three indexes: Series, Label Index, and Time Index.

Example with label latency:

Series

One part stores all label key‑value pairs in lexical order; the other part stores an index of time windows pointing to data blocks, allowing fast skipping of irrelevant time ranges during queries.

Label Index

Each label is stored as index:label: with a list of all its values, and each value references the starting position of the corresponding series.

Time Index

Data is stored with keys like index:timeseries:: that point to the files containing data for specific time intervals.

Data Computation

The powerful storage engine enables advanced data computation. Prometheus can select different metric series, apply basic operators, and use rich functions to perform matrix operations on metric series, as illustrated in the diagram below.

Thus, Prometheus’s capabilities rival a combination of a data warehouse and a compute platform, pointing to the future direction of monitoring in the big‑data era.

One Calculation, Many Queries

Such powerful computation consumes significant resources. Therefore, pre‑computing results is often faster than evaluating raw expressions each time, especially for dashboards and alerting rules that repeatedly evaluate the same expressions. Prometheus provides recording rules to compute expensive expressions in advance and store the results as new time series, achieving “one calculation, many queries”.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

monitoringcloud-nativePrometheusTSDBTime Series
Open Source Linux
Written by

Open Source Linux

Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.