Understanding Prometheus: Architecture, Data Model, and Alerting Explained
This article provides a comprehensive overview of Prometheus, covering its open‑source monitoring architecture, multi‑dimensional data model, query language, storage mechanisms, service discovery, alerting workflow with Alertmanager, and visualization using Grafana, all illustrated with key diagrams and configuration examples.
Introduction
Prometheus is an open‑source monitoring system inspired by Google’s Borgmon. It offers a multi‑dimensional data model, flexible PromQL queries, local storage without external distributed stores, HTTP‑based pull collection, Pushgateway for push, service discovery, Grafana dashboards, and easy maintenance via binary or container images.
Architecture
Key components include Server, Exporters, Pushgateway, PromQL, Alertmanager, and Web UI.
Modules
Retrieval : periodically scrapes target endpoints for metrics.
Storage : writes scraped samples to the built‑in time‑series database.
PromQL : query language used by Prometheus and integrated tools like Grafana.
Jobs / Exporters : expose metrics via HTTP APIs for Prometheus to pull.
Pushgateway : allows short‑lived jobs to push metrics when they cannot be scraped.
Service discovery : discovers targets dynamically from DNS, Kubernetes, Consul, etc., or via static files.
Alertmanager : receives alerts from Prometheus, deduplicates, groups, silences and routes them.
How Prometheus Works
Server scrapes metrics from static or discovered targets.
Collected metrics are stored locally and alert rules are evaluated; alerts are sent to Alertmanager.
Alertmanager processes alerts (deduplication, grouping, inhibition, routing).
Metrics can be queried via the API, Prometheus console, or Grafana.
Data Model and Types
All stored data are time‑series consisting of a metric name, a set of labels, and sample values (float + timestamp).
Metric name must match [a-zA-Z_:][a-zA-Z0-9_:]*.
Label name must match [a-zA-Z_][a-zA-Z0-9_]*; label values may contain any Unicode.
Sample is a float value with millisecond timestamp.
Metric Types
Counter : monotonically increasing value (e.g., request count).
Gauge : value that can go up or down (e.g., temperature).
Histogram : series of <basename>_bucket{le="…"}, <basename>_sum, <basename>_count for bucketed observations.
Summary : similar to histogram but stores quantiles directly via <basename>{quantile="…"}, plus sum and count.
Jobs and Instances
An instance is a single target; a job groups instances with the same configuration. Example scrape configuration for a MySQL job is shown.
- job_name: 'qa-mysql'
scrape_timeout: 20s
scrape_interval: 1m
file_sd_configs:
- files:
- mysql_no_product.yml
relabel_configs:
- source_labels: ['mysql_host']
target_label: __param_mysql_host
- source_labels: ['mysql_port']
target_label: __param_mysql_port
- source_labels: ['__address__']
target_label: __address__Alerting
Alertmanager receives alerts from Prometheus and supports grouping, inhibition, and silencing to reduce noise.
Visualization
Grafana provides rich dashboards for visualizing Prometheus metrics, offering more features than the built‑in web UI.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Programmer DD
A tinkering programmer and author of "Spring Cloud Microservices in Action"
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
