Databases 15 min read

Comparative Analysis of VictoriaMetrics and Thanos for Large‑Scale Metric Storage

This article examines the migration from Thanos to VictoriaMetrics for large‑scale metric storage, detailing background challenges, VictoriaMetrics architecture and storage engine, data write and read processes, and a comparative analysis of performance, scalability, and operational costs between the two systems.

Soul Technical Team

Sep 2, 2024

Comparative Analysis of VictoriaMetrics and Thanos for Large‑Scale Metric Storage

Background

Since 2022 the company has promoted metric‑based observability across many systems, causing the number of metrics to grow from a few million to nearly 100 million. Thanos, used as a Prometheus extension for long‑term storage, began to show bottlenecks, high resource consumption, long query latency, and occasional service unavailability.

After extensive optimization attempts, the team evaluated open‑source alternatives and decided to migrate the monitoring storage from Thanos to VictoriaMetrics.

VictoriaMetrics Overview

VictoriaMetrics is a fast, efficient, and horizontally scalable time‑series database that can serve as Prometheus long‑term storage. It offers low CPU, memory, and storage usage while maintaining high query speed. Key advantages include full Prometheus compatibility, high performance, high compression ratio, easy operation, high availability, global query capability, and a strong community.

VictoriaMetrics Technical Process

Architecture

Storage Engine Features

VictoriaMetrics relies on three core components: LSM Tree, SSTable, and TSID (Time Series ID). Together they provide high‑throughput writes, efficient indexing, and compact storage.

LSM Tree (Log‑Structured Merge‑Tree)

The LSM tree is optimized for high‑write workloads by converting random writes into sequential writes. New data is first written to an in‑memory Memtable, which is flushed to an immutable SSTable file once it reaches a size threshold.

Write flow: Data arrives, is stored in Memtable, then flushed to disk as an SSTable.

Compaction: Periodic background merges combine multiple SSTable files into larger ones, reducing fragmentation and improving query performance.

SSTable (Sorted String Table)

An immutable file containing ordered key‑value pairs. It enables fast point and range queries through multi‑level indexes, Bloom filters, and sparse indexes.

Write and storage: When Memtable is flushed, a new SSTable is created; updates generate new files.

Index and lookup: Built‑in indexes allow rapid key location.

Merge and compression: Background merges deduplicate data and apply compression algorithms to reduce space.

TSID (Time Series ID)

TSID is a unique hash generated from a series' label set. It ensures distinct identification, efficient indexing, and balanced distribution across storage nodes.

Uniqueness: Guarantees each series is uniquely identifiable.

Efficient indexing: Enables fast lookups based on TSID.

Distributed storage optimization: Helps evenly spread data and avoid hotspots.

Data Ingestion Process

Data Reception ( vminsert )

All incoming metrics are sent to the vminsert component, which parses the data and converts it to an internal format.

TSID is calculated by hashing the label set.

Consistent hashing selects an appropriate vmstorage node for storage, ensuring uniform data distribution.

Data Storage ( vmstorage )

Writes are first recorded in a Write‑Ahead Log (WAL) and stored in an in‑memory Memtable organized by TSID.

When the Memtable reaches a threshold, it is flushed to disk as an immutable SSTable file.

Background processes periodically merge multiple SSTable files to reduce file count and improve query speed.

Data Retrieval Process

Query Request ( vmselect )

Users (e.g., via Grafana) send PromQL queries to vmselect, which parses the query and determines the required TSIDs.

Based on TSIDs, vmselect contacts the relevant vmstorage nodes.

Data Lookup and Processing ( vmstorage )

In‑memory lookup: vmselect first checks the Memtable for matching data.

Disk lookup: If needed, it searches SSTable files using multi‑level indexes and Bloom filters.

Parallel processing: Queries are executed in parallel across multiple storage nodes.

Data Merge and Aggregation ( vmselect )

Results from multiple vmstorage nodes are merged, de‑duplicated by TSID, and aggregated (e.g., sum, average) in real time.

The final result set is returned to the user.

VictoriaMetrics vs Thanos

Architectural Comparison

Thanos consists of Sidecar, Store Gateway, Compactor, Querier, and Receive components, providing high availability but a complex architecture.

VictoriaMetrics has a simpler design with three components: vminsert, vmstorage, and vmselect, offering easier management and higher performance.

Data Write

Thanos: Prometheus writes locally, Sidecar uploads data to object storage, and optional remote write to Receive adds latency and complexity.

VictoriaMetrics: vminsert writes directly to vmstorage using TSID, LSM‑based Memtable, WAL, and periodic SSTable flushes, resulting in low‑latency, high‑throughput writes.

Data Read

Thanos: Querier aggregates data from multiple Store Gateways and object storage, incurring higher latency due to remote accesses.

VictoriaMetrics: vmselect queries vmstorage nodes directly, leveraging efficient indexes and parallelism for fast responses.

Operational Cost

Thanos: Requires multiple components and external object storage, leading to higher hardware, storage, and maintenance costs.

VictoriaMetrics: Fewer components and efficient compression lower hardware and storage requirements, simplifying operations and reducing cost.

Summary

Architecture: Thanos is complex; VictoriaMetrics is streamlined.

Write Path: VictoriaMetrics offers lower latency and higher throughput.

Read Path: VictoriaMetrics provides superior query performance, especially under high concurrency.

Cost: VictoriaMetrics has lower resource and operational expenses.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Monitoring Performance Observability time_series_database VictoriaMetrics Thanos

Written by

Soul Technical Team

Technical practice sharing from Soul

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.