Why Switch from Prometheus to Thanos? Boost Metric Retention & Cut Costs
This article explains the limitations of a traditional Prometheus‑based monitoring stack for Kubernetes, demonstrates how integrating Thanos improves metric retention, scalability, and storage cost, and provides a complete multi‑cluster deployment example with Terraform and Helm configurations.
Kubernetes Prometheus Monitoring Stack
When deploying Kubernetes infrastructure for customers, a monitoring stack is installed on each cluster. The typical stack consists of:
Prometheus – collects metrics
Alertmanager – sends alerts based on queries
Grafana – visualises dashboards
The simplified architecture is shown below:
Limitations of the Classic Stack
As the number of clusters grows, the architecture does not scale well. Specific pain points include:
Each cluster runs its own Grafana instance, making maintenance cumbersome.
Prometheus stores data on local disks, forcing a trade‑off between storage size and retention period; long‑term storage on cloud block volumes can become very expensive.
High‑availability setups duplicate data, further increasing storage requirements.
Solution: Thanos
Thanos is an open‑source, highly available Prometheus system with long‑term storage capabilities. It adds “infinite” storage by writing metrics to object storage (e.g., S3, MinIO, or Rook).
Key components:
Thanos Sidecar – runs alongside Prometheus, uploads metrics to object storage every two hours, making Prometheus effectively stateless.
Thanos Store – provides a gateway that reads metrics from object storage.
Thanos Compactor – deduplicates and down‑samples data to reduce storage costs.
Thanos Query – the central query layer exposing a PromQL‑compatible endpoint.
Thanos Query Frontend – splits large queries into smaller ones and caches results.
Multi‑Cluster Architecture
The demo deploys two EKS clusters: an observer cluster and an observee cluster. Terraform modules from the
particuleio/teksrepository provision the infrastructure, using the official
kube‑prometheus‑stackand Bitnami Thanos charts.
.
├── env_tags.yaml
├── eu-west-1
│ ├── clusters
│ │ └── observer
│ │ ├── eks
│ │ │ ├── kubeconfig
│ │ │ └── terragrunt.hcl
│ │ ├── eks-addons
│ │ │ └── terragrunt.hcl
│ │ └── vpc
│ │ └── terragrunt.hcl
│ └── region_values.yaml
└── eu-west-3
├── clusters
│ └── observee
│ ├── cluster_values.yaml
│ ├── eks
│ │ ├── kubeconfig
│ │ └── terragrunt.hcl
│ ├── eks-addons
│ │ └── terragrunt.hcl
│ └── vpc
│ └── terragrunt.hcl
└── region_values.yamlBoth clusters run the
kube‑prometheus‑stackwith Thanos sidecar enabled. TLS certificates are generated so that the observer cluster trusts the sidecars of the observee clusters.
Verification
Example commands to list pods, ingresses, and view logs:
kubectl -n monitoring get pods
kubectl -n monitoring get ingress
kubectl -n monitoring logs -f thanos-tls-querier-observee-query-687dd88ff5-nzpdh
kubectl -n monitoring port-forward thanos-tls-querier-observee-query-687dd88ff5-nzpdh 10902The TLS querier can successfully query metrics from the observee clusters, and the central Thanos Query aggregates data from all stores.
Grafana Visualization
Grafana dashboards (including the default Kubernetes dashboard) are configured to use the Thanos Query Frontend as a data source, providing a unified view across clusters.
Conclusion
Thanos adds considerable complexity but solves the key problems of metric retention, scalability, and cost. A fairly complete AWS implementation is provided in the
teksrepository, with Terraform modules that abstract most of the heavy lifting. Future work includes support for other cloud providers and deeper customisation.
For more details, see the original blog post: https://particule.io/en/blog/thanos-monitoring/ and the referenced GitHub repositories.
Open Source Linux
Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.