Unlock Scalable Cloud‑Native Alerting with Grafana Mimir: Architecture, Components, and Setup
This article explains how Grafana Mimir extends Prometheus and Alertmanager to provide a horizontally scalable, highly available, multi‑tenant monitoring solution for Kubernetes, covering its architecture, key components, compression mechanisms, deployment steps, and configuration of Alertmanager and multi‑tenant support.
Cloud‑Native Alerting Landscape
In cloud‑native environments, Kubernetes is increasingly used in production, and monitoring stacks built with prometheus, exporter, grafana and alertmanager are common. However, alert rule editing traditionally relies on vim. Grafana Mimir combines Prometheus and Alertmanager to provide a visual, multi‑tenant alerting solution.
What Is Mimir?
Mimir offers horizontally scalable, highly available, multi‑tenant long‑term storage for Prometheus. Its architecture is illustrated below.
Metrics ingestion: Prometheus or compatible remote‑write clients send data to Mimir directly or via Grafana Agent.
Strong scalability: clusters grow by adding instances without manual sharding.
Grafana integration: users create alerts, rules and dashboards that query Mimir.
Mimir Components and Their Roles
Type
Component Name
Optional
alertmanager, ruler, overrides‑exporter, query‑scheduler
Required
compactor, distributor, ingester, querier, query‑frontend, store‑gateway
The following sections describe each component.
Compactor (stateless)
The compactor merges data blocks to improve query performance and reduce storage costs.
Compresses multiple tenant blocks into a single optimized block, shrinking indexes.
Keeps bucket indexes up‑to‑date using queriers, store‑gateway and rulers.
Deletes blocks that fall outside the configured retention period.
How It Works
Compaction runs at fixed intervals per tenant. Vertical compaction merges blocks written within a short time window (default 2 h), performing deduplication. Horizontal compaction then merges adjacent time‑range blocks, reducing the total number of blocks while preserving total size.
Scaling
Compaction concurrency is controlled by -compactor.compaction-concurrency. Tenant sharding is configured with -compactor.compactor-tenant-shard-size.
Compression Algorithm
Mimir uses a split‑and‑merge algorithm that overcomes TSDB index limits and avoids unbounded growth for large tenants. The process consists of a split stage (optional) and a merge stage.
Deletion Process
After successful compaction, original blocks are soft‑deleted (marked) and later hard‑deleted after a configurable delay, ensuring queries see the new compacted blocks before the old ones disappear.
Distributor (stateless)
The distributor receives time‑series from Prometheus or Grafana Agent, validates them, applies tenant limits, batches them, and forwards them to ingesters with configurable replication (default 3).
Validation
Metric metadata and labels follow the Prometheus exposition format.
Metadata length, label count, label name/value length, and sample timestamps are checked against limits such as -validation.max-metadata-length and -validation.max-label-names-per-series.
Rate Limiting
Two limits per tenant: request rate (max requests per second) and ingestion rate (max samples per second). Exceeding limits returns HTTP 429.
High‑Availability Tracker
When Prometheus HA pairs are configured, the distributor deduplicates incoming series to avoid double‑counting.
Sharding & Replication
Series are sharded and replicated across ingesters using a consistent hash ring. The replication factor is set via -ingester.ring.replication-factor (default 3).
Ingester (stateful)
Ingesters store incoming series in memory, periodically flushing them to long‑term storage (default every 2 h). They support write‑amplification reduction, replication, WAL, and region‑aware replication.
Querier (stateless)
Queriers evaluate PromQL expressions by reading recent data from ingesters and historic blocks from the store‑gateway. They maintain an up‑to‑date view of bucket metadata, either via bucket index download or bucket scanning.
Query‑Frontend (stateless)
The query‑frontend provides the same API as queriers, adds request queuing, query splitting (default 24 h intervals), and result caching (Memcached). It can align queries with step parameters using -query-frontend.align-queries-with-step=true at the cost of PromQL consistency.
Store‑Gateway (stateful)
The store‑gateway queries long‑term storage blocks for both queriers and rulers, using bucket index or scanning to keep its view current. It supports in‑memory and Memcached caching for metadata and block data.
Alertmanager (optional)
Mimir Alertmanager adds multi‑tenant support and horizontal scaling to Prometheus Alertmanager, deduplicating and routing alerts to email, PagerDuty, OpsGenie, etc.
Override‑Exporter (optional)
Exports per‑tenant limit metrics so operators can monitor resource usage.
Query‑Scheduler (optional)
Queues queries and distributes workload among available queriers.
Ruler (optional)
Evaluates recording and alerting rules defined in PromQL for each tenant.
Installation
Download the Mimir binary from the official site or deploy it in a Kubernetes cluster. The example below shows a non‑multi‑tenant configuration.
alertmanager:
external_url: http://127.0.0.1:8080/alertmanager
sharding_ring:
replication_factor: 2
ingester:
ring:
replication_factor: 1
multitenancy_enabled: false
ruler:
alertmanager_url: http://127.0.0.1:8080/alertmanager
external_url: http://127.0.0.1:8080/ruler
query_frontend:
address: 127.0.0.1:9095
query_stats_enabled: true
rule_path: ./ruler/
ruler_storage:
filesystem:
dir: ./rules-storage
store_gateway:
sharding_ring:
replication_factor: 1
target: all,alertmanager,rulerStart the service:
/usr/local/mimir/mimir-darwin-amd64 --config.file /usr/local/mimir/mimir.yamlViewing Status
Open the homepage in a browser to see service health.
Check running status, readiness, node list, and multi‑tenant view via the provided UI screenshots.
Configuring Alertmanager
Prepare Configuration File
global:
resolve_timeout: 5m
http_config:
follow_redirects: true
enable_http2: true
smtp_from: [email protected]
smtp_hello: mimir
smtp_smarthost: smtp.qq.com:587
smtp_auth_username: [email protected]
smtp_require_tls: true
route:
receiver: email
group_by:
- alertname
continue: false
routes:
- receiver: email
group_by:
- alertname
matchers:
- severity="info"
mute_time_intervals:
- 夜间
continue: true
group_wait: 10s
group_interval: 5s
repeat_interval: 6h
inhibit_rules:
- source_match:
severity: warning
target_match:
severity: warning
equal:
- alertname
- instance
receivers:
- name: email
email_configs:
- send_resolved: true
to: [email protected]
from: [email protected]
hello: mimir
smarthost: smtp.qq.com:587
auth_username: [email protected]
headers:
From: [email protected]
Subject: '{{ template "email.default.subject" . }}'
To: [email protected]
html: '{{ template "email.default.html" . }}'
text: '{{ template "email.default.html" . }}'
require_tls: true
templates:
- email.default.html
mute_time_intervals:
- name: 夜间
time_intervals:
- times:
- start_time: "00:00"
end_time: "08:45"
- start_time: "21:30"
end_time: "23:59"Upload Alertmanager Config to Mimir
mimirtool alertmanager load ./alertmanager.yaml --address http://127.0.0.1:8080 --id anonymousConfigure Grafana Alertmanager and Prometheus Data Sources
Follow the UI screenshots to add the Alertmanager endpoint and Prometheus (Mimir) data source.
Add Alert Rules
Use Grafana’s UI to create alerting rules that reference the uploaded Alertmanager configuration.
Configuring Multi‑Tenant Mode
Set multitenancy_enabled: true in the Mimir config file.
Upload an Alertmanager config for each tenant (instance_id can be the node name).
Load the config with
mimirtool alertmanager load ./alertmanager.yaml --address http://127.0.0.1:8080 --id instance_id.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
