Cloud Native 15 min read

Why Switch from Prometheus to Thanos? Boost Metric Retention & Cut Costs

This article explains the limitations of a traditional Prometheus‑based monitoring stack for Kubernetes, demonstrates how integrating Thanos improves metric retention, scalability, and storage cost, and provides a complete multi‑cluster deployment example with Terraform and Helm configurations.

Open Source Linux

Aug 26, 2021

Why Switch from Prometheus to Thanos? Boost Metric Retention & Cut Costs

Kubernetes Prometheus Monitoring Stack

When deploying Kubernetes infrastructure for customers, a monitoring stack is installed on each cluster. The typical stack consists of:

Prometheus – collects metrics

Alertmanager – sends alerts based on queries

Grafana – visualises dashboards

The simplified architecture is shown below:

Limitations of the Classic Stack

As the number of clusters grows, the architecture does not scale well. Specific pain points include:

Each cluster runs its own Grafana instance, making maintenance cumbersome.

Prometheus stores data on local disks, forcing a trade‑off between storage size and retention period; long‑term storage on cloud block volumes can become very expensive.

High‑availability setups duplicate data, further increasing storage requirements.

Solution: Thanos

Thanos is an open‑source, highly available Prometheus system with long‑term storage capabilities. It adds “infinite” storage by writing metrics to object storage (e.g., S3, MinIO, or Rook).

Key components:

Thanos Sidecar – runs alongside Prometheus, uploads metrics to object storage every two hours, making Prometheus effectively stateless.

Thanos Store – provides a gateway that reads metrics from object storage.

Thanos Compactor – deduplicates and down‑samples data to reduce storage costs.

Thanos Query – the central query layer exposing a PromQL‑compatible endpoint.

Thanos Query Frontend – splits large queries into smaller ones and caches results.

Multi‑Cluster Architecture

The demo deploys two EKS clusters: an observer cluster and an observee cluster. Terraform modules from the particuleio/teks repository provision the infrastructure, using the official kube‑prometheus‑stack and Bitnami Thanos charts.

. 
├── env_tags.yaml
├── eu-west-1
│   ├── clusters
│   │   └── observer
│   │       ├── eks
│   │       │   ├── kubeconfig
│   │       │   └── terragrunt.hcl
│   │       ├── eks-addons
│   │       │   └── terragrunt.hcl
│   │       └── vpc
│   │           └── terragrunt.hcl
│   └── region_values.yaml
└── eu-west-3
    ├── clusters
    │   └── observee
    │       ├── cluster_values.yaml
    │       ├── eks
    │       │   ├── kubeconfig
    │       │   └── terragrunt.hcl
    │       ├── eks-addons
    │       │   └── terragrunt.hcl
    │       └── vpc
    │           └── terragrunt.hcl
    └── region_values.yaml

Both clusters run the kube‑prometheus‑stack with Thanos sidecar enabled. TLS certificates are generated so that the observer cluster trusts the sidecars of the observee clusters.

Verification

Example commands to list pods, ingresses, and view logs:

kubectl -n monitoring get pods
kubectl -n monitoring get ingress
kubectl -n monitoring logs -f thanos-tls-querier-observee-query-687dd88ff5-nzpdh
kubectl -n monitoring port-forward thanos-tls-querier-observee-query-687dd88ff5-nzpdh 10902

The TLS querier can successfully query metrics from the observee clusters, and the central Thanos Query aggregates data from all stores.

Grafana Visualization

Grafana dashboards (including the default Kubernetes dashboard) are configured to use the Thanos Query Frontend as a data source, providing a unified view across clusters.

Conclusion

Thanos adds considerable complexity but solves the key problems of metric retention, scalability, and cost. A fairly complete AWS implementation is provided in the teks repository, with Terraform modules that abstract most of the heavy lifting. Future work includes support for other cloud providers and deeper customisation.

For more details, see the original blog post: https://particule.io/en/blog/thanos-monitoring/ and the referenced GitHub repositories.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

cloud-native Observability Kubernetes Prometheus Terraform Thanos

Written by

Open Source Linux

Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.