How Prometheus V2 Stores Time‑Series Data: Disk Formats and Query Mechanics
This article provides an in‑depth analysis of Prometheus V2's storage architecture, detailing the on‑disk block layout, chunk and index formats, the inverted index structure, memory representations, and the step‑by‑step query process that locates matching time‑series data.
Background
Prometheus is a leading cloud‑native time‑series database used for monitoring. While its overall architecture remains stable, the storage engine has evolved through several versions. The article examines the storage format of Prometheus V2 (v2.25.2) and explains how queries locate the required data.
Directory Structure
Data is organized into blocks, each covering a default two‑hour time range and identified by a ULID. A block contains chunks (fixed‑size files up to 128 MiB), an index (inverted index), and meta.json (metadata with minTime / maxTime). Additionally there are chunks_head (the currently written chunk) and a wal for write‑ahead logging.
Block – read‑only, holds chunks, index, and meta.json.
chunks_head – the active chunk for the current block, up to 120 samples, max two‑hour span.
wal – ensures durability by batching writes.
Data Format
Each block stores a series of Chunk structures. A chunk header contains a magic number, version, and padding, followed by a sequence of chunk entries. Inside a chunk, the layout is:
┌─────────────────────┬───────────────────────┬───────────────────────┬───────────────────┬───────────────┬──────────────┬────────────────┐
| series ref<8 byte> | mint<8 uint64> | maxt<8 uint64> | encoding<1 byte> | len | data | CRC32<4 byte> |
└─────────────────────┴───────────────────────┴───────────────────────┴───────────────────┴───────────────┴──────────────┴────────────────┘Key points:
Chunk time span defaults to two hours; Prometheus merges adjacent blocks during compaction. series ref uniquely identifies a time‑series (file ID + offset) and is used with the index to locate data.
Index Format
The on‑disk index is an inverted index with the following major sections (from bottom to top):
TOC – stores offsets of other sections.
Postings Offset Table – maps label name/value pairs to offsets of posting lists.
Postings N – actual posting lists containing series references.
Series – metadata linking series to their chunk files.
During a query, the engine first finds the posting list for each label in the Postings Offset Table, then retrieves the series references from the posting lists.
Memory Structures
At runtime, Prometheus maintains two main structures: DB – holds an array of read‑only Block objects and a Head for the currently writing data. Head – contains MemPostings (in‑memory posting lists) and a stripeSeries map that stores memSeries objects, each holding mmapped chunks and a writable head chunk.
Only a subset of posting offsets (every 32nd entry plus first/last) are loaded into memory to reduce pressure.
Query Process
Query execution consists of two phases:
Label matching – for each label selector, retrieve the corresponding posting lists, apply negation conversion, and merge the lists using a lazy merge (similar to mergesort) implemented by intersectPostings.
Chunk retrieval – once matching chunks are identified, data is read directly via mmap, leveraging the OS page cache.
Key code snippets illustrate how posting offsets are read, how binary search locates the correct entry, and how the decoder reads label values and offsets from the postings table.
Summary
The analysis shows that Prometheus stores time‑series data in time‑partitioned blocks, uses a compact on‑disk inverted index to map label pairs to series, and relies on mmap for efficient data access. Some advanced details, such as dictionary compression of labels and compaction strategies, are omitted for brevity.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
IT Architects Alliance
Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
