Operations 24 min read

How to Build a Scalable, Highly‑Available Prometheus Monitoring Stack with Thanos

This article explains why standard Prometheus HA solutions fall short for large, multi‑region deployments, and walks through using Thanos—its components, configuration, and best‑practice tips—to achieve long‑term storage, unlimited scaling, a global view, and non‑intrusive monitoring across 300+ clusters.

Efficient Ops
Efficient Ops
Efficient Ops
How to Build a Scalable, Highly‑Available Prometheus Monitoring Stack with Thanos

Background

In the "High‑Availability Prometheus: FAQ" article we briefly mentioned Prometheus HA solutions. After trying federation and Remote Write, we chose Thanos as the monitoring companion, using its global view to manage monitoring data from multiple regions and over 300 clusters. This article introduces Thanos components and our experience.

Prometheus Official HA Options

HA: two identical Prometheus instances behind a load balancer.

HA + Remote storage: multiple Prometheus replicas write to remote storage via Remote Write.

Federation: shards collect different data and a global node stores the unified view.

Using the official multi‑replica + federation still faces problems because Prometheus local storage lacks data synchronization, making consistency difficult. Typical issues include:

Replica A may lose data during a crash, causing gaps when load‑balanced requests hit A.

Different start times or clocks cause mismatched timestamps across replicas.

Federation still has a single‑point Global node; each layer may need double‑replication.

Sensitive alerts should avoid triggering from the Global node due to potential transmission delays.

Current Practices

Most Prometheus clustering solutions ensure data consistency from storage and query perspectives:

Storage side: use Remote Write with an adapter that elects a leader so only one replica pushes data to the TSDB, guaranteeing no data loss and a single shared remote store.

Storage side (alternative): each replica writes to its own TSDB and synchronizes the two stores.

Query side: solutions like Thanos or VictoriaMetrics keep two copies of data but de‑duplicate and join results at query time. Thanos stores data in object storage via Sidecar; VictoriaMetrics uses its own server.

Actual Requirements

Our cluster size keeps growing, bringing more monitoring types and volumes: master/node monitoring, process monitoring, core component performance, pod resources, kube‑stats, K8s events, plugin monitoring, etc. Beyond HA, we need a global view with the following requirements:

Long‑term storage: about one month of data, tens of gigabytes per day, low maintenance cost, disaster recovery, preferably on cloud TSDB or object storage.

Unlimited scaling: 300+ clusters, thousands of nodes, tens of thousands of services. Sharding by function or tenant is required.

Global view: a single Grafana dashboard showing all regions, clusters, and pods.

Non‑intrusive: no modifications to existing Prometheus instances; the solution should be a thin wrapper that follows upstream releases.

After evaluating open‑source options (Cortex, Thanos, VictoriaMetrics, StackDriver) and commercial products, we selected Thanos because it meets long‑term storage, unlimited scaling, global view, and non‑intrusive requirements.

Thanos Architecture

Default mode: Sidecar.

Besides the sidecar mode, Thanos also offers a less common Receive mode.

Thanos consists of the following components (as listed on the official site):

Bucket

Check

Compactor

Query

Rule

Sidecar

Store

receive (optional)

downsample (optional)

All components are built into a single binary; different functionalities are enabled via command‑line flags.

Components and Configuration

The following steps show how to combine Thanos components for a quick HA Prometheus setup (based on the Quick Start, with recommended configurations as of January 2020).

Step 1: Verify Existing Prometheus

Deploy a single‑node Prometheus (pod or host). Example launch command:

<code>./prometheus \
  --config.file=prometheus.yml \
  --log.level=info \
  --storage.tsdb.path=data/prometheus \
  --web.listen-address='0.0.0.0:9090' \
  --storage.tsdb.max-block-duration=2h \
  --storage.tsdb.min-block-duration=2h \
  --storage.tsdb.wal-compression \
  --storage.tsdb.retention.time=2h \
  --web.enable-lifecycle</code>

Key points:

Enable

--web.enable-lifecycle

for hot‑reloading.

Set retention to 2 hours; Prometheus will generate a block every 2 hours, which Thanos uploads to object storage.

Prometheus configuration (prometheus.yml) must declare

external_labels

to identify region and replica:

<code>global:
  scrape_interval: 60s
  evaluation_interval: 60s
  external_labels:
    region: 'A'
    replica: 0

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['0.0.0.0:9090']

  - job_name: 'demo-scrape'
    metrics_path: '/metrics'
    # ... other configs ...
</code>

Requirements for Prometheus:

Version 2.2.1 or higher.

Declare

external_labels

.

Enable

--web.enable-admin-api

and

--web.enable-lifecycle

.

Step 2: Deploy Sidecar

The Sidecar runs in the same pod as Prometheus and provides two functions:

Exposes Prometheus Remote Read as Thanos Store API, allowing Query to fetch data without direct Prometheus API calls.

Optionally uploads each TSDB block (every 2 hours) to object storage, enabling long‑term retention.

Sidecar launch command:

<code>./thanos sidecar \
  --prometheus.url="http://localhost:8090" \
  --objstore.config-file=./conf/bos.yaml \
  --tsdb.path=/home/work/opdir/monitor/Prometheus/data/Prometheus/</code>

If using object storage (e.g., GCS, AWS S3), provide the bucket configuration:

<code>type: GCS
config:
  bucket: ""
  service_account: ""
</code>

Deploy a Sidecar for each Prometheus replica (A, B, C).

Step 3: Deploy Query Component

Query provides the Prometheus HTTP v1 API and can query across multiple Store APIs (Sidecars and Store Gateways).

<code>./thanos query \
  --http-address="0.0.0.0:8090" \
  --store=replica0:10901 \
  --store=replica1:10901 \
  --store=replica2:10901 \
  --store=127.0.0.1:19914
</code>

The

--store

flags point to the Sidecar instances (default port 10901) and optionally a Store Gateway (port 19914).

The Query UI looks similar to Prometheus, allowing you to hide the underlying Prometheus instances.

Two important checkboxes:

Deduplication : removes duplicate series from multiple replicas.

Partial response : enables returning data from available replicas when some are down, trading consistency for availability.

Step 4: Deploy Store Gateway

The Store Gateway reads persisted blocks from object storage and serves them via the Store API, enabling queries of historic data beyond the recent 2‑hour window kept locally.

<code>./thanos store \
  --data-dir=./thanos-store-gateway/tmp/store \
  --objstore.config-file=./thanos-store-gateway/conf/bos.yaml \
  --http-address=0.0.0.0:19904 \
  --grpc-address=0.0.0.0:19914 \
  --index-cache-size=250MB \
  --sync-block-duration=5m \
  --min-time=-2w \
  --max-time=-1h
</code>

The Store Gateway can be scaled horizontally; each instance can fetch the same bucket data.

Step 5: Visualize Data with Grafana

With multi‑region, multi‑replica data aggregated, you can create a single Grafana dashboard showing metrics such as ETCD performance per region, node/exporter stats, pod resource usage, and various Kubernetes components.

Receive Mode (Optional)

Receive mode uses Remote Write directly, avoiding the 2‑hour window limitation of the sidecar. It is useful when network policies prevent sidecar‑to‑Prometheus communication or when a fully external data path is required.

Additional Topics

Prometheus Block Compression

When using Sidecar, set

--storage.tsdb.min-block-duration

and

--storage.tsdb.max-block-duration

to the same value (2 h) to disable Prometheus’s internal compaction, preventing upload failures.

Store‑Gateway Resource Consumption

The Store Gateway can be memory‑intensive due to index caching. Configuration options such as

--index-cache-size

,

--sync-block-duration

,

--min-time

, and

--max-time

allow tuning of cache size and query windows.

Compactor Component

Compactor merges old blocks and performs down‑sampling for long‑range queries. It does not reduce disk usage; instead, it creates higher‑level aggregates.

Query De‑duplication Logic

Query de‑duplicates based on the

query.replica-label

. When multiple replicas return differing values, Thanos selects the most stable replica using an internal scoring algorithm.

References

Thanos official site

Percona performance analysis

Design introduction

GitHub issue

Katacoda tutorial

Comparison with VictoriaMetrics

Video overview

monitoringObservabilityHigh AvailabilityKubernetesPrometheusThanos
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.