Cloud Native 15 min read

Why Switch from Prometheus to Thanos? Boost Metric Retention & Cut Costs

This article explains the limitations of a traditional Prometheus‑based monitoring stack for Kubernetes, demonstrates how integrating Thanos improves metric retention, scalability, and storage cost, and provides a complete multi‑cluster deployment example with Terraform and Helm configurations.

Open Source Linux
Open Source Linux
Open Source Linux
Why Switch from Prometheus to Thanos? Boost Metric Retention & Cut Costs

Kubernetes Prometheus Monitoring Stack

When deploying Kubernetes infrastructure for customers, a monitoring stack is installed on each cluster. The typical stack consists of:

Prometheus – collects metrics

Alertmanager – sends alerts based on queries

Grafana – visualises dashboards

The simplified architecture is shown below:

Limitations of the Classic Stack

As the number of clusters grows, the architecture does not scale well. Specific pain points include:

Each cluster runs its own Grafana instance, making maintenance cumbersome.

Prometheus stores data on local disks, forcing a trade‑off between storage size and retention period; long‑term storage on cloud block volumes can become very expensive.

High‑availability setups duplicate data, further increasing storage requirements.

Solution: Thanos

Thanos is an open‑source, highly available Prometheus system with long‑term storage capabilities. It adds “infinite” storage by writing metrics to object storage (e.g., S3, MinIO, or Rook).

Key components:

Thanos Sidecar – runs alongside Prometheus, uploads metrics to object storage every two hours, making Prometheus effectively stateless.

Thanos Store – provides a gateway that reads metrics from object storage.

Thanos Compactor – deduplicates and down‑samples data to reduce storage costs.

Thanos Query – the central query layer exposing a PromQL‑compatible endpoint.

Thanos Query Frontend – splits large queries into smaller ones and caches results.

Multi‑Cluster Architecture

The demo deploys two EKS clusters: an observer cluster and an observee cluster. Terraform modules from the

particuleio/teks

repository provision the infrastructure, using the official

kube‑prometheus‑stack

and Bitnami Thanos charts.

. 
├── env_tags.yaml
├── eu-west-1
│   ├── clusters
│   │   └── observer
│   │       ├── eks
│   │       │   ├── kubeconfig
│   │       │   └── terragrunt.hcl
│   │       ├── eks-addons
│   │       │   └── terragrunt.hcl
│   │       └── vpc
│   │           └── terragrunt.hcl
│   └── region_values.yaml
└── eu-west-3
    ├── clusters
    │   └── observee
    │       ├── cluster_values.yaml
    │       ├── eks
    │       │   ├── kubeconfig
    │       │   └── terragrunt.hcl
    │       ├── eks-addons
    │       │   └── terragrunt.hcl
    │       └── vpc
    │           └── terragrunt.hcl
    └── region_values.yaml

Both clusters run the

kube‑prometheus‑stack

with Thanos sidecar enabled. TLS certificates are generated so that the observer cluster trusts the sidecars of the observee clusters.

Verification

Example commands to list pods, ingresses, and view logs:

kubectl -n monitoring get pods
kubectl -n monitoring get ingress
kubectl -n monitoring logs -f thanos-tls-querier-observee-query-687dd88ff5-nzpdh
kubectl -n monitoring port-forward thanos-tls-querier-observee-query-687dd88ff5-nzpdh 10902

The TLS querier can successfully query metrics from the observee clusters, and the central Thanos Query aggregates data from all stores.

Grafana Visualization

Grafana dashboards (including the default Kubernetes dashboard) are configured to use the Thanos Query Frontend as a data source, providing a unified view across clusters.

Conclusion

Thanos adds considerable complexity but solves the key problems of metric retention, scalability, and cost. A fairly complete AWS implementation is provided in the

teks

repository, with Terraform modules that abstract most of the heavy lifting. Future work includes support for other cloud providers and deeper customisation.

For more details, see the original blog post: https://particule.io/en/blog/thanos-monitoring/ and the referenced GitHub repositories.

monitoringCloud NativeObservabilityKubernetesPrometheusTerraformThanos
Open Source Linux
Written by

Open Source Linux

Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.