Operations 32 min read

Master Prometheus: From Basics to Full-Scale Monitoring Deployment

This guide walks through Prometheus fundamentals, architecture, components, service discovery, Docker-based deployment, exporter integration, Alertmanager configuration, Grafana visualization, PromQL queries, and Consul service discovery, providing a complete end‑to‑end monitoring solution for cloud‑native environments.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
Master Prometheus: From Basics to Full-Scale Monitoring Deployment

Prometheus Overview

Prometheus is an open‑source monitoring and alerting system with a time‑series database, originally developed by SoundCloud and now a CNCF project.

Implemented in Go, Prometheus pulls metrics via HTTP from exporters, supports pull‑based data collection, and can monitor thousands of nodes.

System Architecture

Basic Principle

Prometheus periodically scrapes HTTP endpoints (exporters) exposed by monitored components; no SDK is required. Common exporters exist for Varnish, HAProxy, Nginx, MySQL, and system metrics.

Workflow:

Prometheus server scrapes metrics from configured jobs or exporters, or receives them from Pushgateway or other Prometheus servers.

Metrics are stored locally and alert rules are evaluated; alerts are sent to Alertmanager.

Alertmanager processes alerts (deduplication, grouping, routing) and sends notifications.

Grafana visualizes the collected data.

Key Features

Multi‑dimensional data model.

Powerful query language (PromQL).

Standalone server without external storage dependencies.

HTTP‑based pull collection.

Pushgateway for metric pushes.

Service discovery or static configuration.

Rich visualizations via Grafana and other tools.

Components

Prometheus Server – data collection, storage, and PromQL support.

Alertmanager – handles alerts.

Pushgateway – intermediate gateway for push metrics.

Exporters – expose component metrics over HTTP.

Grafana – web UI for dashboards.

Service Discovery

Because Prometheus pulls metrics, static target lists become cumbersome; service discovery (SD) mechanisms (e.g., Azure, Consul, DNS, EC2, Kubernetes, etc.) automate target discovery. In this guide static configuration is used.

Deploying Prometheus Server

1. Using the Official Image

Create prometheus.yml and rules.yml locally, then run:

$ docker run -d -p 9090:9090 --name=prometheus \
  -v /root/prometheus/conf/:/etc/prometheus/ \
  prom/prometheus

2. Building a Custom Image

Pull the base image and unpack the binary package: $ docker pull zhanganmin2017/prometheus:v2.9.0 Directory layout:

prometheus-2.9.0/
├── conf
│   ├── CentOS7-Base-163.repo
│   ├── container-entrypoint
│   ├── epel-7.repo
│   ├── prometheus-start.conf
│   ├── prometheus-start.sh
│   ├── prometheus.yml
│   ├── rules
│   │   └── service_down.yml
│   └── supervisord.conf
├── Dockerfile
└── package
    ├── console_libraries
    ├── consoles
    ├── LICENSE
    ├── NOTICE
    ├── prometheus
    ├── prometheus.yml
    └── promtool

Create prometheus-start.sh to launch Prometheus via Supervisor, and a prometheus-start.conf for Supervisor configuration.

#!/bin/bash
/bin/prometheus \
  --config.file=/data/prometheus/prometheus.yml \
  --storage.tsdb.path=/data/prometheus/data \
  --web.console.libraries=/data/prometheus/console_libraries \
  --web.enable-lifecycle \
  --web.console.templates=/data/prometheus/consoles

Supervisor configuration ( prometheus-start.conf) defines how the process is started.

[program:prometheus]
command=sh /etc/supervisord.d/prometheus-start.sh
autostart=false
startsecs=10
autorestart=false
startretries=0
user=root
redirect_stderr=true
stdout_logfile=/data/prometheus/prometheus.log
stopasgroup=true
killasgroup=true

Dockerfile (simplified):

FROM centos:7
MAINTAINER [email protected]
RUN rm -rf /etc/yum.repos.d/*.repo
ADD conf/CentOS7-Base-163.repo /etc/yum.repos.d/
ADD conf/epel-7.repo /etc/yum.repos.d/
RUN yum install -y openssh-server openssh-clients net-tools vim supervisor && yum clean all
RUN ssh-keygen -q -N "" -t rsa -f /etc/ssh/ssh_host_rsa_key && \
    ssh-keygen -q -N "" -t ecdsa -f /etc/ssh/ssh_host_ecdsa_key && \
    ssh-keygen -q -N "" -t ed25519 -f /etc/ssh/ssh_host_ed25519_key && \
    sed -i 's/#UseDNS yes/UseDNS no/g' /etc/ssh/sshd_config
ENV LANG=zh_CN.UTF-8
RUN echo "export LANG=zh_CN.UTF-8" >> /etc/profile.d/lang.sh && \
    ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime && \
    localedef -c -f UTF-8 -i zh_CN zh_CN.utf8
COPY package/prometheus /bin/prometheus
COPY package/promtool /bin/promtool
COPY package/console_libraries/ /usr/local/src/console_libraries/
COPY package/consoles/ /usr/local/src/consoles/
COPY conf/prometheus.yml /usr/local/src/prometheus.yml
COPY conf/rules/ /usr/local/src/rules/
RUN echo "root:123456" | chpasswd
ADD conf/supervisord.conf /etc/supervisord.conf
ADD conf/prometheus-start.conf /etc/supervisord.d/prometheus-start.conf
ADD conf/container-entrypoint /container-entrypoint
ADD conf/prometheus-start.sh /etc/supervisord.d/prometheus-start.sh
RUN chmod +x /container-entrypoint
CMD ["/container-entrypoint"]

Build and run:

$ docker build -t zhanganmin2017/prometheus:v2.9.0 .
$ docker run -itd -h prometheus139-210 -m 8g \
  --cpuset-cpus=28-31 --name=prometheus139-210 \
  --network trust139 --ip=10.1.133.28 \
  -v /data/works/prometheus139-210:/data \
  192.168.166.229/1an/prometheus:v2.9.0
$ docker exec -it prometheus139-210 /bin/bash
$ supervisorctl start prometheus

Access the UI at IP:9090.

Deploying Exporters

1. Host Monitoring (node‑exporter)

Run node‑exporter in host network mode (Docker container not recommended):

$ docker run -d \
  --net="host" \
  --pid="host" \
  -v "/:/host:ro,rslave" \
  quay.io/prometheus/node-exporter \
  --path.rootfs=/host

Add the target to prometheus.yml and reload.

2. Container Monitoring (cadvisor‑exporter)

# docker run -d -h cadvisor139-216 --name=cadvisor139-216 --net=none -m 8g \
  --cpus=4 --ip=10.1.139.216 \
  --volume=/:/rootfs:ro \
  --volume=/var/run:/var/run:rw \
  --volume=/sys:/sys:ro \
  --volume=/var/lib/docker/:/var/lib/docker:ro \
  --volume=/dev/disk/:/dev/disk:ro \
  google/cadvisor:latest

Add the cadvisor job to prometheus.yml and reload.

3. Redis Monitoring (redis‑exporter)

$ docker run -d -h redis_exporter139-218 --name redis_exporter139-218 \
  --network trust139 --ip=10.1.139.218 -m 8g -p 9121:9121 \
  oliver006/redis_exporter --redis.passwd 123456

Configure the job in prometheus.yml and reload.

4. Application Monitoring (jmx‑exporter)

Download jmx_prometheus_javaagent-0.11.0.jar and a suitable config file, then add to JVM startup:

CATALINA_OPTS="-javaagent:/app/tomcat-8.5.23/lib/jmx_prometheus_javaagent-0.11.0.jar=12345:/app/tomcat-8.5.23/conf/config.yaml"

Add the 12345 port as a target in prometheus.yml.

5. Process Monitoring (process‑exporter)

$ wget https://github.com/ncabatoff/process-exporter/releases/download/v0.5.0/process-exporter-0.5.0.linux-amd64.tar.gz
$ tar -xzvf process-exporter-0.5.0.linux-amd64.tar.gz
# process-name.yaml example
process_names:
  - name: "{{.Matches}}"
    cmdline:
      - 'redis-shake'
$ ./process-exporter -config.path process-name.yaml &

Add the exporter (port 9256) to prometheus.yml and reload.

Deploying Alertmanager

1. Overview

Alertmanager receives alerts from Prometheus, deduplicates, groups, routes, silences, and forwards them to receivers such as email, WeChat, PagerDuty, etc.

2. Configuration

global:
  resolve_timeout: 2m
  smtp_smarthost: smtp.163.com:25
  smtp_from: [email protected]
  smtp_auth_username: [email protected]
  smtp_auth_password: zxxx

templates:
  - '/data/alertmanager/template/wechat.tmpl'

route:
  group_by: ['alertname_wechat']
  group_wait: 1s
  group_interval: 1s
  receiver: 'wechat'
  repeat_interval: 1h
  routes:
    - receiver: wechat
      match_re:
        severity: wechat

receivers:
  - name: 'email'
    email_configs:
      - to: '[email protected]'
        send_resolved: true
  - name: 'wechat'
    wechat_configs:
      - corp_id: 'wwd402ce40b4720f24'
        to_party: '2'
        agent_id: '1000002'
        api_secret: '9nmYa4p12OkToCbh_oNc'
        send_resolved: true

Run Alertmanager container:

$ docker run -d -p 9093:9093 --name alertmanager \
  -m 8g --cpus=4 \
  -v /opt/alertmanager.yml:/etc/alertmanager/alertmanager.yml \
  -v /opt/template:/etc/alertmanager/template \
  prom/alertmanager:latest

Access UI at IP:9093.

Alert Rules (PromQL)

Example host‑monitoring rule ( host_sys.yml):

groups:
- name: Host
  rules:
  - alert: HostMemoryUsage
    expr: (node_memory_MemTotal_bytes - (node_memory_MemFree_bytes + node_memory_Buffers_bytes + node_memory_Cached_bytes)) / node_memory_MemTotal_bytes * 100 > 90
    for: 1m
    labels:
      name: Memory
      severity: Warning
    annotations:
      summary: "{{ $labels.appname }}"
      description: "Host memory usage exceeds 90%."
      value: "{{ $value }}"
  # Additional CPU, Load, Disk, DiskIO, Network rules omitted for brevity

Similar rule files are created for containers, Redis, and process monitoring.

Grafana Visualization

Run Grafana container:

$ docker run -d -h grafana139-211 -m 8g \
  --network trust139 --ip=10.2.139.211 \
  --cpus=4 --name=grafana139-211 \
  -e "GF_SERVER_ROOT_URL=http://10.2.139.211" \
  -e "GF_SECURITY_ADMIN_PASSWORD=passwd" \
  grafana/grafana

Access at IP:3000 (user: admin, password: passwd). Add Prometheus as a data source and import dashboards (e.g., Node‑exporter 8919, Cadvisor 193, JMX‑exporter 8563, Redis‑exporter 2751, Process‑exporter 249).

Consul Service Discovery

Deploy a Consul cluster using Docker (3 servers, 1 client). Register services via HTTP API, e.g.:

curl -X PUT -d '{"id":"192.168.16.173","name":"node-exporter","address":"192.168.16.173","port":9100,"tags":["DEV"],"checks":[{"http":"http://192.168.16.173:9100/","interval":"5s"}]}' http://172.17.0.4:8500/v1/agent/service/register

Prometheus configuration for Consul SD:

- job_name: 'consul'
  consul_sd_configs:
    - server: '192.168.16.173:8900'
      services: []
  relabel_configs:
    - source_labels: [__meta_consul_service]
      regex: 'consul'
      action: drop
    - source_labels: [__meta_consul_service]
      target_label: appname
    - source_labels: [__meta_consul_service_address]
      target_label: instance
    - source_labels: [__meta_consul_tags]
      target_label: job

Reload Prometheus and verify discovered targets in the UI.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

monitoringDockerservice discoveryPrometheusConsulGrafanaAlertmanager
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.