Cloud Native 24 min read

Unlock Scalable Cloud‑Native Alerting with Grafana Mimir: Architecture & Setup

This article explains the current state of cloud‑native alerting, introduces Grafana Mimir as a horizontally scalable, multi‑tenant storage for Prometheus, details its architecture and components, and provides step‑by‑step guidance for installing, configuring, and operating Mimir in Kubernetes environments.

Open Source Linux
Open Source Linux
Open Source Linux
Unlock Scalable Cloud‑Native Alerting with Grafana Mimir: Architecture & Setup

Current Landscape of Cloud‑Native Alerting

In cloud‑native ecosystems, Kubernetes is increasingly deployed in production, and monitoring systems must collect system, business, and database metrics in real time. The most common stack includes prometheus, exporter, grafana and alertmanager. However, editing alert rules traditionally relies on editing files with vim. This article introduces a visual alerting solution that combines prometheus and alertmanager via Grafana’s mimir component.

What Mimir Does

Mimir provides horizontally scalable, highly available, multi‑tenant long‑term storage for Prometheus. Its architecture consists of several services that work together to ingest, store, and query metrics.

Mimir architecture diagram
Mimir architecture diagram

Mimir Components and Their Functions

Optional: alertmanager, ruler, overrides‑exporter, query‑scheduler

Required: compactor, distributor, ingester, querier, query‑frontend, store‑gateway

The following sections describe each component.

Compactor (Data Compressor, Stateless)

The compactor merges blocks to improve query performance and reduce storage usage. It compresses multiple blocks per tenant into a single optimized block, updates bucket indexes used by queriers, store‑gateway and rulers, and deletes blocks that fall outside the retention period.

How It Works

Compaction runs at fixed intervals per tenant. Vertical compaction merges blocks uploaded within a configurable time window (default 2 hours). Horizontal compaction then merges adjacent time‑range blocks into larger ones, reducing the total number of blocks and index size.

Scaling

Compaction concurrency is controlled by -compactor.compaction-concurrency. Tenant‑sharding can be enabled with -compactor.compactor-tenant-shard-size to limit the number of compactors that handle a given tenant.

Compaction Algorithm

Mimir uses a split‑and‑merge algorithm that overcomes TSDB index limitations and prevents unbounded growth for large tenants. The split phase creates N × M small blocks, which the merge phase reduces to M larger blocks.

Deletion Process

After successful compaction, original blocks are first soft‑deleted (marked) and later hard‑deleted after a configurable delay, ensuring queries can still access the new blocks.

Distributor (Data Distributor)

The distributor is a stateless component that receives time‑series data from Prometheus or Grafana agents, validates the data, and forwards it to multiple ingesters with a configurable replication factor (default 3). It also enforces per‑tenant rate limits for request and ingestion throughput.

Validation

Metric metadata and labels must follow Prometheus format.

Metadata length must not exceed validation.max-metadata-length.

Label count per series must not exceed -validation.max-label-names-per-series.

Label name length must not exceed -validation.max-length-label-name.

Label value length must not exceed -validation.max-length-label-value.

Sample timestamps must not be later than -validation.create-grace-period.

Rate Limiting

Two limits apply per tenant: request rate (max requests per second) and ingestion rate (max samples per second). Exceeding limits results in HTTP 429 responses.

High‑Availability Tracker

When HA pairs are configured for remote write, the distributor deduplicates incoming series from the pair, ensuring only one copy is stored.

Ingester (Data Receiver)

The ingester is a stateful component that buffers incoming series in memory, periodically writes them to long‑term storage, and serves recent data to queriers. It supports replication, write‑ahead logs, and region‑aware replication to avoid data loss.

Write Amplification

Ingester batches and compresses samples in memory before flushing to storage, reducing write amplification and overall TCO.

Failure Handling

Replication

Write‑ahead log (WAL)

Write‑behind log (WBL) when out‑of‑order is enabled

Querier (Query Engine)

The querier is a stateless component that evaluates PromQL expressions, fetching recent samples from ingesters and historic blocks from the store‑gateway. It maintains an up‑to‑date view of bucket metadata, either by downloading bucket indexes or scanning the bucket.

Bucket Index

When enabled (default), the querier lazily downloads the bucket index on the first query, caches it, and periodically refreshes it, reducing API calls and startup latency.

Query‑Frontend

The query‑frontend provides the same API as the querier and adds queuing, request splitting, and result caching. It helps avoid OOM errors for large queries by splitting them into smaller time‑range sub‑queries and executing them in parallel.

Queueing

Ensures failed large queries can be retried, balances load across queriers, and fairly schedules queries among tenants.

Splitting

By default, queries are split into 24‑hour intervals, executed in parallel downstream, and results are merged.

Caching

Results are cached (via Memcached) and reused for subsequent queries. Aligning queries with step parameters improves cache hit rate but may break PromQL consistency.

Store‑Gateway (Data Store Gateway)

The store‑gateway is a stateful component that queries long‑term storage blocks on the read path. It can operate with bucket index enabled or disabled, and supports block sharding, replication, and various caching strategies (in‑memory, Memcached).

Alertmanager

Mimir’s Alertmanager adds multi‑tenant support and horizontal scalability to Prometheus Alertmanager. It deduplicates, groups, and routes alerts to notification channels such as email, PagerDuty, or OpsGenie.

Override‑Exporter

This component exposes per‑tenant limit metrics, allowing operators to monitor resource usage against configured quotas.

Query‑Scheduler

An optional stateless component that maintains a queue of pending queries and distributes work to available queriers.

Ruler

Evaluates recording and alerting rules defined in PromQL for each tenant, supporting rule namespaces.

Installation

Mimir can be installed by downloading the binary from the official site or deploying it directly in a Kubernetes cluster. The example below shows a minimal configuration without multi‑tenant mode.

alertmanager:
  external_url: http://127.0.0.1:8080/alertmanager
  sharding_ring:
    replication_factor: 2
ingester:
  ring:
    replication_factor: 1
multitenancy_enabled: false
ruler:
  alertmanager_url: http://127.0.0.1:8080/alertmanager
  external_url: http://127.0.0.1:8080/ruler
  query_frontend:
    address: 127.0.0.1:9095
  query_stats_enabled: true
  rule_path: ./ruler/
ruler_storage:
  filesystem:
    dir: ./rules-storage
store_gateway:
  sharding_ring:
    replication_factor: 1
target: all,alertmanager,ruler

Start the service with:

/usr/local/mimir/mimir-darwin-amd64 --config.file /usr/local/mimir/mimir.yaml

Configure Alertmanager

Prepare Configuration File

global:
  resolve_timeout: 5m
  http_config:
    follow_redirects: true
    enable_http2: true
  smtp_from: [email protected]
  smtp_hello: mimir
  smtp_smarthost: smtp.qq.com:587
  smtp_auth_username: [email protected]
  smtp_require_tls: true
route:
  receiver: email
  group_by:
    - alertname
  continue: false
  routes:
    - receiver: email
      group_by:
        - alertname
      matchers:
        - severity="info"
      mute_time_intervals:
        - 夜间
      continue: true
  group_wait: 10s
  group_interval: 5s
  repeat_interval: 6h
inhibit_rules:
  - source_match:
      severity: warning
    target_match:
      severity: warning
    equal:
      - alertname
      - instance
receivers:
  - name: email
    email_configs:
      - send_resolved: true
        to: [email protected]
        from: [email protected]
        hello: mimir
        smarthost: smtp.qq.com:587
        auth_username: [email protected]
        headers:
          From: [email protected]
          Subject: '{{ template "email.default.subject" . }}'
          To: [email protected]
        html: '{{ template "email.default.html" . }}'
        text: '{{ template "email.default.html" . }}'
        require_tls: true
templates:
  - email.default.html
mute_time_intervals:
  - name: 夜间
    time_intervals:
      - times:
          - start_time: "00:00"
            end_time: "08:45"
          - start_time: "21:30"
            end_time: "23:59"

Upload Configuration to Mimir

After Mimir starts, the Alertmanager configuration is empty. Load the file with:

mimirtool alertmanager load ./alertmanager.yaml --address http://127.0.0.1:8080 --id anonymous

Configure Grafana Alertmanager and Prometheus Data Sources

Use Grafana UI to point the Alertmanager data source to the Mimir endpoint and add Prometheus as a data source.

Add Alert Rules

Create alerting rules in Grafana, which will be stored in Mimir and evaluated by the ruler.

Configure Multi‑Tenant Mode

Set multitenancy_enabled: true in the configuration file.

Upload an Alertmanager configuration for each tenant (instance ID usually matches the node name).

Load the configuration with

mimirtool alertmanager load ./alertmanager.yaml --address http://127.0.0.1:8080 --id <instance_id>

.

Monitoring Mimir Status

After starting the services, you can view the Mimir UI in a browser to check overall service health, node readiness, cluster membership, and multi‑tenant information.

Mimir status UI
Mimir status UI
Running status
Running status
Cluster nodes
Cluster nodes
Multi‑tenant view
Multi‑tenant view
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Cloud NativeObservabilityKubernetesAlertingPrometheusMimir
Open Source Linux
Written by

Open Source Linux

Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.