Operations 8 min read

An Overview of the Prometheus Monitoring System

Prometheus, an open‑source monitoring and alerting toolkit originally developed by SoundCloud and now a CNCF project, offers multidimensional data models, flexible queries, pull‑based data collection, various metric types (counter, gauge, summary, histogram), local and remote storage, service discovery, and integrates with Grafana for visualization.

DevOps Cloud Academy
DevOps Cloud Academy
DevOps Cloud Academy
An Overview of the Prometheus Monitoring System

Prometheus

Introduction

Prometheus is an open‑source monitoring and alerting system originally developed by SoundCloud.

It is written in Go and is the open‑source version of Google’s BorgMon monitoring system.

In 2016 it was accepted by the Cloud Native Computing Foundation (CNCF) as the second‑largest hosted project.

Features

Multidimensional data model.

Flexible query language.

Supports both local and remote storage.

Defines an open metric data standard.

Pull‑based data collection over HTTP.

Static file and dynamic service discovery.

Easy to maintain.

Supports sharding, sampling and federation.

Architecture

Core Components

Server – periodically scrapes metrics from targets.

Target – exposes an HTTP endpoint for the server to scrape.

Alertmanager – receives alerts from the server and handles notification.

Grafana – visualizes collected metrics.

Exporters – expose third‑party service metrics to Prometheus.

Metrics

Metric Definition

<metric name>{<label name>=<label value>, ...}

Example

http_request_total{status="200",method="POST"}
{__name__="http_request_total",status="200",method="POST"}

Both lines represent the same metric; labels starting with an underscore are reserved for internal use.

Metric name “http_request_total” indicates total HTTP requests.

Label “status=200” filters by HTTP status code.

Label “method=POST” filters by request method.

Metric Types

Counter

Monotonically increasing values (e.g., request counts).

Not reset on service restart.

Often used with the rate() function to compute per‑second rates.

Gauge

Values that can go up or down (e.g., CPU or memory usage).

Most common metric type for real‑time measurements.

Summary

Provides quantiles of observed values (e.g., request latency).

Can be converted to a histogram.

More resource‑intensive than histograms and does not expose raw counts.

Histogram

Counts observations in configurable buckets defined by le="upper_bound" .

Data Samples

Samples are stored as time‑series in an in‑memory database and periodically flushed to disk.

Each time‑series is identified by a metric name and a set of label pairs.

Sample Composition

Metric name and associated label set.

Timestamp with millisecond precision.

Floating‑point value (float64).

Data Collection

Prometheus uses a pull‑based model, unlike push‑based systems.

Pull Model

Real‑time

Periodic scraping; latency depends on scrape interval.

State Persistence

Targets must store data; the server remains stateless.

Enables simple, decoupled configuration.

Control

The server decides what and how often to scrape.

Configuration Complexity

Can be batch‑configured or discovered automatically; targets need not know the server.

Push Model

Real‑time

Data is pushed immediately to the monitoring system.

State Persistence

Targets are stateless; the server maintains target state.

Control

Targets dictate the reporting frequency and content.

Configuration Complexity

Each target must be configured with the server address.

Service Discovery

Static Configuration

Traditional method using static files; suitable for fixed environments.

Requires explicit target definitions, e.g., “target”: ["10.10.10.10:8080"].

Dynamic Discovery

Ideal for cloud environments with auto‑scaling.

Supported by container orchestration platforms (e.g., Kubernetes).

Prometheus watches the API for changes and updates its target list accordingly.

Data Storage

Local Storage

Built‑in time‑series database writes data to local disk.

Remote Storage

Used for large‑scale data retention.

Supports back‑ends such as OpenTSDB, InfluxDB, Elasticsearch via adapters.

Data Query

PromQL and HTTP APIs allow flexible querying and visualization.

Grafana, PromDash, and built‑in templating provide charting capabilities.

MonitoringCloud NativeOperationsObservabilityMetricsPrometheus
DevOps Cloud Academy
Written by

DevOps Cloud Academy

Exploring industry DevOps practices and technical expertise.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.