Operations 10 min read

Introduction to Prometheus and Grafana for Monitoring and Alerting

This article provides a comprehensive overview of using Prometheus and Grafana for metric collection, storage, querying with PromQL, visualization, and alerting, including exporter integration, metric types, high‑availability setups, and practical examples for modern microservice architectures.

360 Tech Engineering

Jan 7, 2020

Introduction to Prometheus and Grafana for Monitoring and Alerting

Monitoring and alerting are fundamental for service stability and performance optimization, especially in modern microservice architectures.

Prometheus, an open‑source system that combines time‑series database, metric collection, and alerting, works together with Grafana to provide powerful visualization and alerting capabilities.

Prometheus scrapes metrics via exporters; languages such as Java (Micrometer), Go (client_golang) provide client libraries, while PHP can use the Pushgateway approach.

Metrics are stored in a local TSDB by default, but remote storage (e.g., InfluxDB, Elasticsearch, TimescaleDB) can be configured via remote_write and remote_read interfaces.

Four metric types are supported: Counter, Gauge, Histogram, and Summary, each serving different analysis needs.

Sample metric exposition format:

# HELP task_execute_count task execution count
# TYPE task_execute_count counter
task_execute_count{task="test1",instance="host1.huajiao.com"} 10
task_execute_count{task="test1",instance="host2.huajiao.com"} 20

# HELP system_load_average_1m 1‑minute load average
# TYPE system_load_average_1m gauge
system_load_average_1m{application="system-java"} 0.06

# HELP task_consume_all interface latency histogram
# TYPE task_consume_all histogram
task_consume_all_bucket{le="10"} 100
task_consume_all_bucket{le="20"} 200
task_consume_all_bucket{le="+Inf"} 100
task_consume_all_sum 10000
task_consume_all_count 400

# HELP go_gc_duration_seconds GC duration summary
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 3.326e-05
...

PromQL is the query language used to retrieve and compute metrics. Examples include instant vectors, range vectors, and functions such as rate, sum, topk, and histogram_quantile.

Grafana integrates with Prometheus as a data source, enabling dashboards for QPS, latency distribution, and more.

Alerting can be handled by Prometheus Alertmanager or Grafana’s built‑in alerting, with webhook notifications as a common method.

For high availability, remote storage and multiple Prometheus instances are recommended; the article describes a TimescaleDB‑based solution with a PostgreSQL adapter, including Docker run command and configuration snippets.

Operational tips include defining rule files for pre‑computed queries and setting the global evaluation interval for periodic calculations.

The article concludes that Prometheus is gaining popularity and often offers a more suitable, out‑of‑the‑box solution for modern production environments compared to legacy systems like Nagios.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

monitoring Metrics Prometheus Grafana

Written by

360 Tech Engineering

Official tech channel of 360, building the most professional technology aggregation platform for the brand.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.