An Overview of the Prometheus Monitoring System
Prometheus, an open‑source monitoring and alerting toolkit originally developed by SoundCloud and now a CNCF project, offers multidimensional data models, flexible queries, pull‑based data collection, various metric types (counter, gauge, summary, histogram), local and remote storage, service discovery, and integrates with Grafana for visualization.
Prometheus
Introduction
Prometheus is an open‑source monitoring and alerting system originally developed by SoundCloud.
It is written in Go and is the open‑source version of Google’s BorgMon monitoring system.
In 2016 it was accepted by the Cloud Native Computing Foundation (CNCF) as the second‑largest hosted project.
Features
Multidimensional data model.
Flexible query language.
Supports both local and remote storage.
Defines an open metric data standard.
Pull‑based data collection over HTTP.
Static file and dynamic service discovery.
Easy to maintain.
Supports sharding, sampling and federation.
Architecture
Core Components
Server – periodically scrapes metrics from targets.
Target – exposes an HTTP endpoint for the server to scrape.
Alertmanager – receives alerts from the server and handles notification.
Grafana – visualizes collected metrics.
Exporters – expose third‑party service metrics to Prometheus.
Metrics
Metric Definition
<metric name>{<label name>=<label value>, ...}
Example
http_request_total{status="200",method="POST"}
{__name__="http_request_total",status="200",method="POST"}Both lines represent the same metric; labels starting with an underscore are reserved for internal use.
Metric name “http_request_total” indicates total HTTP requests.
Label “status=200” filters by HTTP status code.
Label “method=POST” filters by request method.
Metric Types
Counter
Monotonically increasing values (e.g., request counts).
Not reset on service restart.
Often used with the rate() function to compute per‑second rates.
Gauge
Values that can go up or down (e.g., CPU or memory usage).
Most common metric type for real‑time measurements.
Summary
Provides quantiles of observed values (e.g., request latency).
Can be converted to a histogram.
More resource‑intensive than histograms and does not expose raw counts.
Histogram
Counts observations in configurable buckets defined by le="upper_bound" .
Data Samples
Samples are stored as time‑series in an in‑memory database and periodically flushed to disk.
Each time‑series is identified by a metric name and a set of label pairs.
Sample Composition
Metric name and associated label set.
Timestamp with millisecond precision.
Floating‑point value (float64).
Data Collection
Prometheus uses a pull‑based model, unlike push‑based systems.
Pull Model
Real‑time
Periodic scraping; latency depends on scrape interval.
State Persistence
Targets must store data; the server remains stateless.
Enables simple, decoupled configuration.
Control
The server decides what and how often to scrape.
Configuration Complexity
Can be batch‑configured or discovered automatically; targets need not know the server.
Push Model
Real‑time
Data is pushed immediately to the monitoring system.
State Persistence
Targets are stateless; the server maintains target state.
Control
Targets dictate the reporting frequency and content.
Configuration Complexity
Each target must be configured with the server address.
Service Discovery
Static Configuration
Traditional method using static files; suitable for fixed environments.
Requires explicit target definitions, e.g., “target”: ["10.10.10.10:8080"].
Dynamic Discovery
Ideal for cloud environments with auto‑scaling.
Supported by container orchestration platforms (e.g., Kubernetes).
Prometheus watches the API for changes and updates its target list accordingly.
Data Storage
Local Storage
Built‑in time‑series database writes data to local disk.
Remote Storage
Used for large‑scale data retention.
Supports back‑ends such as OpenTSDB, InfluxDB, Elasticsearch via adapters.
Data Query
PromQL and HTTP APIs allow flexible querying and visualization.
Grafana, PromDash, and built‑in templating provide charting capabilities.
DevOps Cloud Academy
Exploring industry DevOps practices and technical expertise.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.