Thanos vs VictoriaMetrics: Which Prometheus Long‑Term Storage Wins?
This article compares Thanos and VictoriaMetrics as Prometheus long‑term storage solutions, evaluating their architectures, write and read paths, reliability, data consistency, performance, scalability, high‑availability, and cost to help you choose the best fit for your monitoring stack.
1. Architecture
Thanos
Thanos consists of several core components:
Sidecar : runs alongside each Prometheus instance, uploads data older than two hours to object storage (e.g., S3 or GCS) and serves recent data to the Query component.
Store Gateway : provides stored object‑storage data to the Query component.
Query : implements the Prometheus query API, aggregates results from Sidecars and Store Gateways, and serves them to clients such as Grafana.
Compact : merges uploaded blocks into larger ones to improve query efficiency and reduce storage size.
Ruler : evaluates recording and alerting rules on global data, can generate new metrics, and optionally uploads results to object storage.
Receiver (experimental): implements the remote‑write API so Prometheus instances can push data directly.
VictoriaMetrics
VictoriaMetrics cluster edition includes three core components:
vmstorage : stores time‑series data.
vminsert : receives data from Prometheus via the remote‑write API and distributes it across vmstorage nodes.
vmselect : queries vmstorage nodes, aggregates results, and returns them to clients such as Grafana.
Each component can be scaled independently on suitable hardware.
2. Write Path Comparison
Configuration and Operational Complexity
Thanos requires disabling local TSDB block compression, deploying a Sidecar for each Prometheus instance, configuring the Sidecar, and setting up a Compactor for each object‑storage bucket.
VictoriaMetrics only needs a remote‑write configuration in Prometheus; no Sidecar or compression changes are required.
Reliability and Availability
Thanos uploads data in two‑hour blocks, so a disk failure can lose up to two hours of data per instance. The upload process shares resources with query handling, potentially affecting performance.
VictoriaMetrics writes each sample via remote‑write in near real‑time; only a few seconds of data could be lost on a disk failure.
Data Consistency
Thanos’ Compactor and Store Gateway can introduce eventual consistency issues when blocks are overwritten or deleted.
VictoriaMetrics provides strong consistency for stored data.
Performance
Thanos’ write performance is good, but heavy queries can slow Sidecar uploads. Compactor load can affect object‑storage bandwidth.
VictoriaMetrics adds minimal CPU overhead to Prometheus and can allocate sufficient CPU on the storage side to maintain performance.
Scalability
Thanos relies on object‑storage scalability; Sidecar upload speed depends on storage service.
VictoriaMetrics scales by adding more vminsert and vmstorage nodes.
3. Read Path Comparison
Configuration and Operational Complexity
Thanos requires Sidecar Store API, Store Gateway, and Query components to be deployed and connected.
VictoriaMetrics offers a ready‑to‑use Prometheus query API; only the data source in Grafana needs to point to vmselect.
Reliability and Availability
Thanos Query must connect to all Sidecars and Store Gateways, which can be problematic across data centers.
VictoriaMetrics queries stay within the cluster, offering higher reliability and faster startup.
Data Consistency
Both systems can return partial results when some nodes are unavailable, but VictoriaMetrics’ partial‑response option is optional and less likely to be used.
Performance
Thanos Query performance is limited by the slowest Sidecar or Store Gateway.
VictoriaMetrics query performance scales with the number of vmselect and vmstorage instances and is generally faster.
Scalability
Thanos Query is stateless and can be horizontally scaled, but the underlying Prometheus + Sidecar pair can become a bottleneck.
VictoriaMetrics allows independent scaling of vmselect and vmstorage, with optimizations for low‑bandwidth environments.
4. High‑Availability Comparison
Thanos achieves HA by running multiple Query instances in different zones; if a zone fails, only partial results may be returned.
VictoriaMetrics can replicate data across clusters in multiple zones, continuing to receive data and return full query results even when a zone is down.
5. Managed Cost Comparison
Thanos
GCS: $4‑$36 per TB depending on storage class; network egress $10/TB internal, $80‑$? external.
S3: $4‑$23 per TB; network egress $2‑$10/TB internal, $50‑$90/TB external; $0.10 per million API calls.
Costs depend on data volume, egress traffic, and API usage.
VictoriaMetrics
GCE HDD: $40/TB, SSD: $240/TB.
AWS EBS HDD: $45/TB, SSD: $125/TB.
VictoriaMetrics compresses data up to 10× more efficiently than Thanos, reducing storage cost.
Summary
VictoriaMetrics uses standard remote_write to ingest data and stores it on block storage, while Thanos requires disabling local compression and using a Sidecar to upload blocks to object storage.
VictoriaMetrics provides a built‑in global query API without extra components, whereas Thanos needs Sidecar, Store Gateway, and Query, making large deployments more complex.
Deploying VictoriaMetrics on Kubernetes is straightforward; Thanos deployment and configuration are considerably more involved.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.