Operations 9 min read

Prometheus Overview: Architecture, Metrics, Data Collection, and Storage

This article provides a comprehensive overview of Prometheus, an open‑source monitoring and alerting system, covering its origins, key features, architecture, core components, metric types, data collection methods, service discovery, storage options, and query capabilities.

DevOps Cloud Academy
DevOps Cloud Academy
DevOps Cloud Academy
Prometheus Overview: Architecture, Metrics, Data Collection, and Storage

Prometheus

Introduction

Prometheus is an open‑source monitoring and alerting system originally developed by SoundCloud. It is written in Go and is the open‑source version of Google’s BorgMon monitoring system. In 2016, the Cloud Native Computing Foundation (CNCF) under the Linux Foundation accepted it as the second most popular open‑source project.

Features

Multi‑dimensional data model

Flexible query language

Support for both local and remote storage

Open metric data standard

HTTP Pull‑based data collection

Static file and dynamic discovery mechanisms

Easy maintenance

Support for data sharding, sampling, and federation deployments

Architecture Design

Core Components

Server: periodically scrapes metrics from targets.

Target: exposes an HTTP endpoint for the Server to scrape.

AlertManager: receives alerts from the Server and handles notification routing.

Grafana: visualizes monitoring data.

Exporters: expose metrics of third‑party services to Prometheus.

Monitoring Metrics

Metric Definition

<metric name>{<label name>=<label value>, ...}

Metric name: must consist of letters, digits, underscores or colons and match the regex [a-zA-Z:][a-zA-Z0-9:]* (colon not allowed in exporter names).

Label: key‑value pair that adds dimensionality for filtering and aggregation.

Example

http_request_total{status="200",method="POST"}
{__name__="http_request_total",status="200",method="POST"}

Both examples represent the same metric; labels starting with an underscore are reserved for internal use.

Metric name http_request_total counts total HTTP requests.

Label status="200" indicates HTTP status code 200.

Label method="POST" indicates the request method.

Metric Types

Counter

Monotonically increasing values (e.g., request counts, uptime).

Never reset on restart; often used with rate() to compute per‑second change.

Gauge

Values that can go up or down (e.g., CPU or memory usage).

Most real‑time monitoring data are Gauges.

Summary

Provides quantile information for a distribution (e.g., request latency).

Can be converted to/from Histograms.

More CPU‑intensive than Gauges; does not allow calculation of averages directly.

Histogram

Counts samples in configurable buckets using the le label to define upper bounds.

Data Samples

Prometheus stores collected samples as time‑series in an in‑memory database and periodically persists them to disk.

Each time‑series is identified by a metric name and a set of label pairs.

Sample Composition

Metric: name and associated label set describing the sample.

Timestamp: millisecond‑precision time of collection.

Value: a 64‑bit floating‑point number representing the metric value.

Data Collection

Prometheus primarily uses a Pull model, unlike Push‑based systems.

Pull Model

Real‑time

Periodic scraping; latency depends on scrape interval, generally less real‑time than Push.

State Persistence

Targets must be able to serve data; the Server remains stateless.

Control

The Server decides what to scrape and how often.

Configuration Complexity

Targets can be discovered via static files or service‑discovery mechanisms, keeping configuration simple and decoupled.

Push Model

Real‑time

Data is sent immediately to the monitoring system, offering lower latency.

State Persistence

Targets are stateless; the Server must maintain target state.

Control

Targets dictate what and when to push.

Configuration Complexity

Each target must be configured with the Server’s address.

Service Discovery

Static Configuration

Traditional method using a static file listing target addresses (e.g., "target": ["10.10.10.10:8080"]).

Dynamic Discovery

Suited for cloud environments with auto‑scaling.

Integrates with container orchestration platforms (e.g., Kubernetes) by listening to API changes and updating the target list automatically.

Data Storage

Local Storage

Built‑in time‑series database writes data to local disk.

Remote Storage

Used for large‑scale data retention.

Supports back‑ends such as OpenTSDB, InfluxDB, Elasticsearch via adapters.

Data Query

Prometheus provides PromQL and HTTP APIs for querying collected data.

Visualization options include Grafana, the built‑in PromDash, and custom template engines.

monitoringoperationsmetricsPrometheustime seriesGrafanaAlertmanagerPull Model
DevOps Cloud Academy
Written by

DevOps Cloud Academy

Exploring industry DevOps practices and technical expertise.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.