Tagged articles

Thanos

35 articles · Page 1 of 1

Jun 12, 2026 · Operations

End‑to‑End Prometheus Monitoring: Deployment, Tuning, HA & Troubleshooting

This guide walks through the complete Prometheus monitoring lifecycle—from binary, Docker, and Kubernetes deployments to Ansible‑driven node_exporter rollout, SNMP switch and router monitoring, alert routing via WeChat, SMS and email, production‑grade tuning, high‑availability designs, and systematic troubleshooting.

AlertmanagerAnsibleKubernetes

0 likes · 25 min read

End‑to‑End Prometheus Monitoring: Deployment, Tuning, HA & Troubleshooting

MaGe Linux Operations

Mar 30, 2026 · Cloud Native

How to Scale Prometheus to Thousands of Nodes with Thanos: A Deep Dive

This article examines the storage, query performance, high‑availability, and high‑cardinality challenges of running Prometheus on a thousand‑node Kubernetes cluster and presents a complete, step‑by‑step Thanos‑based architecture, capacity‑planning models, configuration examples, and operational best practices for reliable horizontal scaling.

KubernetesMonitoringObservability

0 likes · 34 min read

How to Scale Prometheus to Thousands of Nodes with Thanos: A Deep Dive

Raymond Ops

Dec 22, 2025 · Operations

Build a High‑Availability Prometheus Monitoring System from Scratch: Pitfalls & Performance Tuning

This guide walks you through constructing a production‑grade, highly available Prometheus monitoring stack, covering architecture choices, sharding strategies, common pitfalls such as memory bloat, query latency and storage growth, and provides concrete tuning steps, Kubernetes deployment examples, and advanced optimisation techniques.

AlertingHigh AvailabilityKubernetes

0 likes · 11 min read

Build a High‑Availability Prometheus Monitoring System from Scratch: Pitfalls & Performance Tuning

Soul Technical Team

Jan 24, 2025 · Operations

Migration from Thanos to VictoriaMetrics: Architecture, Plan, Issues, and Benefits

This article details the end‑to‑end migration from Thanos to VictoriaMetrics, covering background analysis, architectural comparison, a phased migration plan, encountered configuration and performance issues, resolution strategies, and the resulting performance, cost, and scalability improvements for the monitoring system.

MonitoringThanosVictoriaMetrics

0 likes · 16 min read

Migration from Thanos to VictoriaMetrics: Architecture, Plan, Issues, and Benefits

Efficient Ops

Dec 11, 2024 · Operations

Thanos vs VictoriaMetrics: Which Prometheus Storage Solution Wins for Scale and Cost?

This article compares Thanos and VictoriaMetrics as long‑term storage solutions for Prometheus, evaluating their architecture, write and read paths, reliability, consistency, performance, scalability, high‑availability, and hosting costs to help you choose the most suitable option for your monitoring stack.

Long‑term StorageMonitoringThanos

0 likes · 18 min read

Thanos vs VictoriaMetrics: Which Prometheus Storage Solution Wins for Scale and Cost?

Soul Technical Team

Sep 2, 2024 · Databases

Comparative Analysis of VictoriaMetrics and Thanos for Large‑Scale Metric Storage

This article examines the migration from Thanos to VictoriaMetrics for large‑scale metric storage, detailing background challenges, VictoriaMetrics architecture and storage engine, data write and read processes, and a comparative analysis of performance, scalability, and operational costs between the two systems.

MonitoringObservabilityPerformance

0 likes · 15 min read

Comparative Analysis of VictoriaMetrics and Thanos for Large‑Scale Metric Storage

Efficient Ops

Aug 5, 2024 · Operations

Thanos vs VictoriaMetrics: Which Prometheus Long‑Term Storage Wins?

This article compares Thanos and VictoriaMetrics as Prometheus long‑term storage solutions, evaluating their architectures, write and read paths, reliability, data consistency, performance, scalability, high‑availability, and cost to help you choose the best fit for your monitoring stack.

CloudThanosVictoriaMetrics

0 likes · 17 min read

Thanos vs VictoriaMetrics: Which Prometheus Long‑Term Storage Wins?

Alibaba Cloud Native

Jul 10, 2024 · Cloud Native

Migrate Self‑Hosted Prometheus + Thanos to Alibaba Cloud Managed Service

This guide explains how to move from a self‑built open‑source Prometheus + Thanos monitoring stack to Alibaba Cloud's fully managed Prometheus service, covering typical deployment scenarios, migration requirements, step‑by‑step procedures for metric collection, visualization, and alerting, and key considerations for each environment.

Alibaba CloudMonitoringThanos

0 likes · 15 min read

Migrate Self‑Hosted Prometheus + Thanos to Alibaba Cloud Managed Service

DevOps Operations Practice

May 19, 2024 · Operations

High‑Availability Solutions for Prometheus Monitoring

Prometheus, a leading monitoring system, can achieve high availability through several common architectures—including dual-node with external storage, federated mode with external storage, and multi-node clusters combined with Thanos and object storage—each offering data persistence and load distribution to enhance system stability and performance.

External StorageHigh AvailabilityThanos

0 likes · 3 min read

High‑Availability Solutions for Prometheus Monitoring

Alibaba Cloud Native

Apr 8, 2024 · Cloud Native

How to Build a Global View for Multiple Prometheus Instances – Community and Alibaba Cloud Solutions

This article explains why a global view is needed when Prometheus metrics are scattered across many instances, compares community approaches such as Federation, Thanos, and Remote Write, and details Alibaba Cloud's Global Aggregation Instance and Remote Write solutions with configuration examples and a real‑world case study.

FederationGlobal ViewMonitoring

0 likes · 25 min read

How to Build a Global View for Multiple Prometheus Instances – Community and Alibaba Cloud Solutions

Practical DevOps Architecture

Mar 15, 2024 · Operations

Comprehensive Practical Guide to Prometheus Configuration, Optimization, and Source Code Development

This multi‑chapter guide provides in‑depth, hands‑on instruction for configuring and optimizing all Prometheus components, exploring Kubernetes monitoring, source‑code analysis, custom exporter development, high‑availability setups, service discovery, resource‑efficient scraping, and integrating Thanos for long‑term storage.

KubernetesMonitoringObservability

0 likes · 4 min read

Comprehensive Practical Guide to Prometheus Configuration, Optimization, and Source Code Development

dbaplus Community

Jul 10, 2023 · Operations

Why Most Logging and Metrics Strategies Fail – and How to Fix Them

The author reflects on the shortcomings of current logging, metrics, and tracing practices, explains why they become costly and unscalable, and offers concrete recommendations—including log level discipline, structured logging, metric aggregation, and the use of tools like Prometheus, Cortex, and Thanos—to build a more efficient observability stack.

LoggingMetricsObservability

0 likes · 18 min read

Why Most Logging and Metrics Strategies Fail – and How to Fix Them

Efficient Ops

Jun 13, 2023 · Cloud Native

Boost Kubernetes Monitoring: Why Switch from Prometheus to Thanos for Scalable, Cost‑Effective Metrics

This article explores the limitations of a Prometheus‑based monitoring stack and demonstrates how adopting a Thanos‑based architecture improves metric retention, enables multi‑cluster querying, and reduces overall infrastructure costs while providing a scalable, cloud‑native solution.

Cloud‑nativeKubernetesMonitoring

0 likes · 15 min read

Boost Kubernetes Monitoring: Why Switch from Prometheus to Thanos for Scalable, Cost‑Effective Metrics

Efficient Ops

Apr 12, 2023 · Operations

Building Highly Available Prometheus Monitoring with Thanos: A Practical Guide

This article explains why native Prometheus HA solutions fall short for large, multi‑region clusters and shows how to use Thanos components—including sidecar, query, store gateway, and compactor—to achieve long‑term storage, unlimited scaling, a global view, and non‑intrusive integration with existing Prometheus deployments.

High AvailabilityKubernetesMonitoring

0 likes · 22 min read

Building Highly Available Prometheus Monitoring with Thanos: A Practical Guide

ITPUB

Nov 27, 2022 · Operations

Designing a Scalable, High‑Availability Monitoring System with Prometheus and Thanos

This article explores the challenges of building a fault‑tolerant monitoring platform, compares open‑source solutions, details why Prometheus is preferred, and shows how to achieve high availability and horizontal scaling using Thanos, remote‑write, hash‑ring sharding, and Kubernetes integration.

Thanoscloud-nativehigh-availability

0 likes · 18 min read

Designing a Scalable, High‑Availability Monitoring System with Prometheus and Thanos

Java Architect Essentials

Aug 23, 2022 · Cloud Native

Implementing Multi‑Cluster Monitoring with Prometheus and Thanos on Kubernetes

This article explains the limitations of a standard Prometheus monitoring stack on Kubernetes and demonstrates how to migrate to a Thanos‑based solution for long‑term metric retention, reduced infrastructure cost, and scalable multi‑cluster observability using Terraform and cloud‑native components.

Cloud NativeKubernetesMonitoring

0 likes · 15 min read

Implementing Multi‑Cluster Monitoring with Prometheus and Thanos on Kubernetes

Ops Development Stories

Aug 5, 2022 · Cloud Native

Boost Kubernetes Reliability with 4 Essential Open‑Source Monitoring Tools

This article introduces four CNCF‑graduated open‑source projects—Prometheus, Jaeger, OpenTelemetry, and Thanos—that together provide metrics, alerts, tracing, and long‑term storage to improve observability, reduce downtime, and streamline troubleshooting for workloads running on Kubernetes.

JaegerKubernetesObservability

0 likes · 9 min read

Boost Kubernetes Reliability with 4 Essential Open‑Source Monitoring Tools

Java Captain

Jun 1, 2022 · Operations

Migrating from Prometheus to Thanos for Scalable, Cost‑Effective Monitoring on Kubernetes

This article explains the limitations of a traditional Prometheus monitoring stack, demonstrates how Thanos provides unlimited long‑term storage and lower infrastructure costs, and walks through a complete multi‑cluster deployment on Kubernetes using Terraform and AWS.

KubernetesObservabilityTerraform

0 likes · 16 min read

Migrating from Prometheus to Thanos for Scalable, Cost‑Effective Monitoring on Kubernetes

MaGe Linux Operations

May 2, 2022 · Operations

How to Build a Scalable, Highly‑Available Monitoring Stack with Thanos, Prometheus & Grafana

Learn how to design a resilient, scalable monitoring solution for multi‑cluster Kubernetes environments using Thanos, Prometheus, and Grafana, covering architecture, data ingestion, querying, long‑term storage on S3, cost savings, and practical deployment tips.

MonitoringObservabilityThanos

0 likes · 10 min read

How to Build a Scalable, Highly‑Available Monitoring Stack with Thanos, Prometheus & Grafana

MaGe Linux Operations

Jan 22, 2022 · Cloud Native

Boost Kubernetes Monitoring: Migrate from Prometheus to Thanos for Scalable Low‑Cost Metrics

This article examines the limitations of a standard Prometheus‑based monitoring stack on Kubernetes, explains how adopting Thanos improves metric retention and reduces infrastructure costs, and provides a detailed multi‑cluster deployment guide with Terraform, TLS configuration, and Grafana visualization.

KubernetesObservabilityTerraform

0 likes · 16 min read

Boost Kubernetes Monitoring: Migrate from Prometheus to Thanos for Scalable Low‑Cost Metrics

Open Source Linux

Nov 21, 2021 · Operations

Building a Scalable Prometheus Monitoring Stack with Thanos on Kubernetes

This article explains how to design and deploy a robust monitoring solution using Prometheus, Thanos, Pushgateway, and Alertmanager on Kubernetes, covering metric collection, naming conventions, query language, high‑availability strategies, and practical YAML configurations for a production‑grade observability platform.

AlertmanagerKubernetesPushgateway

0 likes · 20 min read

Building a Scalable Prometheus Monitoring Stack with Thanos on Kubernetes

Efficient Ops

Nov 16, 2021 · Operations

How to Build a Scalable Prometheus Monitoring System with Thanos on Kubernetes

This article explains why monitoring is essential for production stability, compares white‑box and black‑box approaches, and provides a step‑by‑step guide to deploying Prometheus, configuring scrape targets, using Pushgateway and Alertmanager, and scaling the solution with Thanos in a Kubernetes environment.

AlertmanagerMonitoringObservability

0 likes · 21 min read

How to Build a Scalable Prometheus Monitoring System with Thanos on Kubernetes

Open Source Linux

Aug 26, 2021 · Cloud Native

Why Switch from Prometheus to Thanos? Boost Metric Retention & Cut Costs

This article explains the limitations of a traditional Prometheus‑based monitoring stack for Kubernetes, demonstrates how integrating Thanos improves metric retention, scalability, and storage cost, and provides a complete multi‑cluster deployment example with Terraform and Helm configurations.

Cloud NativeKubernetesObservability

0 likes · 15 min read

Why Switch from Prometheus to Thanos? Boost Metric Retention & Cut Costs

MaGe Linux Operations

Jul 18, 2021 · Cloud Native

Boost Kubernetes Monitoring: Why Switch from Prometheus to Thanos

This article examines the limitations of a traditional Prometheus monitoring stack on Kubernetes, explains how adopting a Thanos‑based architecture improves metric retention and reduces infrastructure costs, and provides a detailed multi‑cluster deployment guide with Terraform, code snippets, and visualizations.

KubernetesTerraformThanos

0 likes · 15 min read

Boost Kubernetes Monitoring: Why Switch from Prometheus to Thanos

Efficient Ops

Apr 18, 2021 · Operations

How to Build a Scalable Prometheus Monitoring System with Thanos on Kubernetes

This article explains why monitoring is essential for production stability, compares white‑box and black‑box approaches, details the advantages of Prometheus, walks through its architecture, metric types, query language, high‑availability strategies with Thanos, and provides practical Kubernetes deployment manifests and configuration tips.

DevOpsKubernetesObservability

0 likes · 21 min read

MaGe Linux Operations

Apr 3, 2021 · Operations

Designing a Scalable, High‑Availability Monitoring System with Prometheus & Thanos

This article explores the challenges of building a reliable monitoring platform, compares open‑source solutions such as Elasticsearch, Nagios, Zabbix and Prometheus, and details how to achieve high availability and horizontal scaling using Prometheus, Thanos, sharding, remote‑write, and Kubernetes orchestration.

High AvailabilityObservabilityThanos

0 likes · 22 min read

Designing a Scalable, High‑Availability Monitoring System with Prometheus & Thanos

dbaplus Community

Mar 30, 2021 · Operations

How to Build a Scalable Prometheus Monitoring Stack on Kubernetes with Thanos

This article explains why monitoring is essential for production stability, introduces Prometheus fundamentals, metric naming conventions, query types, and high‑availability solutions such as Thanos federation, then walks through a complete Kubernetes deployment including StatefulSets, RBAC, Pushgateway, Alertmanager, and Ingress configuration.

AlertmanagerDevOpsKubernetes

0 likes · 20 min read

How to Build a Scalable Prometheus Monitoring Stack on Kubernetes with Thanos

Efficient Ops

Nov 25, 2020 · Operations

How to Build a Scalable, Highly‑Available Prometheus Monitoring Stack with Thanos

This article explains why standard Prometheus HA solutions fall short for large, multi‑region deployments, and walks through using Thanos—its components, configuration, and best‑practice tips—to achieve long‑term storage, unlimited scaling, a global view, and non‑intrusive monitoring across 300+ clusters.

KubernetesObservabilityThanos

0 likes · 24 min read

How to Build a Scalable, Highly‑Available Prometheus Monitoring Stack with Thanos

Efficient Ops

Nov 3, 2020 · Operations

How to Build a Scalable Prometheus Monitoring System with Thanos on Kubernetes

This article explains why monitoring is essential, compares white‑box and black‑box approaches, details Prometheus features, metric naming, query language, high‑availability challenges, and shows how to extend Prometheus with Thanos, Pushgateway, Alertmanager, and Kubernetes deployments for a robust observability stack.

AlertmanagerKubernetesObservability

0 likes · 20 min read

Aikesheng Open Source Community

Oct 26, 2020 · Operations

Debugging Persistent Active Alerts in Thanos Ruler: Queue Bottleneck Analysis and maxBatchSize Tuning

The article analyzes a persistent active alert observed via Thanos Ruler's HTTP interface, identifies the buffering queue bottleneck as the root cause, and proposes adjusting the maxBatchSize parameter to prevent alert delay and automatic resolution failures.

AlertingAlertmanagerBufferQueue

0 likes · 8 min read

Debugging Persistent Active Alerts in Thanos Ruler: Queue Bottleneck Analysis and maxBatchSize Tuning

Cloud Native Technology Community

Apr 21, 2020 · Cloud Native

Deploying Thanos on Kubernetes: Architecture, Deployment Options, and Practical Guide

This article explains the Thanos architecture, compares Sidecar and Receiver deployment modes, walks through object‑storage configuration, and provides complete Kubernetes YAML examples for Prometheus, Thanos Sidecar, Query, Store Gateway, Ruler, Compact, and Receiver to build a large‑scale cloud‑native monitoring system.

Cloud NativeKubernetesThanos

0 likes · 27 min read

Deploying Thanos on Kubernetes: Architecture, Deployment Options, and Practical Guide

Cloud Native Technology Community

Apr 8, 2020 · Operations

Decoding Thanos Architecture: From Query to Compact for Scalable Monitoring

This article provides a detailed analysis of Thanos' architecture, explaining each core component—Query, Sidecar, Store Gateway, Ruler, Compact, and the upcoming Receiver—how they enable global view, high availability, and long‑term storage for distributed Prometheus deployments, and discusses design trade‑offs and optimization strategies.

Cloud NativeLong‑term StorageMonitoring

0 likes · 12 min read

Decoding Thanos Architecture: From Query to Compact for Scalable Monitoring

Aikesheng Open Source Community

Dec 25, 2019 · Operations

Deploying Thanos for Unified Prometheus Monitoring and Long‑Term Storage

This guide explains the background, key features, architecture, and step‑by‑step deployment of Thanos—including Sidecar, Store, Query, Compact, Bucket, Rule, and Check components—to provide a unified, high‑availability Prometheus monitoring view with unlimited historical data storage using object storage.

Cloud NativeLong‑term StorageMonitoring

0 likes · 9 min read

Deploying Thanos for Unified Prometheus Monitoring and Long‑Term Storage

360 Tech Engineering

Dec 23, 2019 · Cloud Native

Using Thanos and Prometheus for Scalable Monitoring in OpenStack and Ceph Clusters

The article explains how Thanos combined with Prometheus provides a cloud‑native, highly available solution for long‑term metric storage and fast querying to address the exponential growth of monitoring data in large OpenStack and Ceph deployments.

Cloud NativeMonitoringOpenStack

0 likes · 7 min read

Using Thanos and Prometheus for Scalable Monitoring in OpenStack and Ceph Clusters

360 Zhihui Cloud Developer

Dec 17, 2019 · Operations

How Thanos + Prometheus Solve Large‑Scale OpenStack Monitoring Challenges

This article explains how the Thanos and Prometheus combination provides long‑term, highly available monitoring for massive OpenStack and Ceph clusters, detailing its features, architecture, key components, practical deployment issues, and the operational problems it resolves.

CephMonitoringObservability

0 likes · 8 min read

How Thanos + Prometheus Solve Large‑Scale OpenStack Monitoring Challenges