Cloud Native 34 min read

Deploying Prometheus on Kubernetes with Operator, Grafana, and Alertmanager

This guide walks through setting up a complete Prometheus monitoring stack on a Kubernetes cluster, covering both traditional YAML deployments and the Prometheus Operator, configuring services, integrating Grafana dashboards, and enabling Alertmanager notifications including WeChat alerts.

dbaplus Community

Jun 15, 2020

Deploying Prometheus on Kubernetes with Operator, Grafana, and Alertmanager

Prometheus Operator Architecture

The Prometheus Operator introduces custom resources (CRDs) such as Prometheus, ServiceMonitor, Alertmanager and manages their lifecycle. The operator watches these resources and creates the underlying StatefulSet, Service, ConfigMap, etc., to run Prometheus and Alertmanager instances.

Traditional Kubernetes Deployment (YAML)

All components are deployed into a dedicated monitoring namespace.

# ns-monitoring.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: monitoring

Create RBAC for Prometheus:

# prometheus-rbac.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus-k8s
  namespace: monitoring
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus
rules:
- apiGroups: [""]
  resources: ["nodes","services","endpoints","pods"]
  verbs: ["get","list","watch"]
- apiGroups: [""]
  resources: ["configmaps"]
  verbs: ["get"]
- nonResourceURLs: ["/metrics"]
  verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: prometheus
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus
subjects:
- kind: ServiceAccount
  name: prometheus-k8s
  namespace: monitoring

Prometheus configuration (ConfigMap):

# prometheus-core-cm.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-core
  namespace: monitoring
apiVersion: v1
data:
  prometheus.yaml: |
    global:
      scrape_interval: 15s
      scrape_timeout: 15s
      evaluation_interval: 15s
    alerting:
      alertmanagers:
      - static_configs:
        - targets: ["10.254.127.110:9093"]
    rule_files:
      - "/etc/prometheus-rules/*.yml"
    scrape_configs:
      - job_name: 'kubernetes-apiservers'
        kubernetes_sd_configs:
        - role: endpoints
        scheme: https
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        relabel_configs:
        - source_labels: [__meta_kubernetes_namespace,__meta_kubernetes_service_name,__meta_kubernetes_endpoint_port_name]
          action: keep
          regex: default;kubernetes;https

Alert rules (ConfigMap):

# prometheus-rules-cm.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-rules
  namespace: monitoring
data:
  node-up.yml: |
    groups:
    - name: server_rules
      rules:
      - alert: MachineDown
        expr: up{component="node-exporter"} != 1
        for: 1m
        labels:
          severity: "warning"
        annotations:
          summary: "Machine {{ $labels.instance }} is down"
          description: "Machine {{ $labels.instance }} has been down for >1m"
  cpu-usage.yml: |
    groups:
    - name: cpu_rules
      rules:
      - alert: HighCPUUsage
        expr: 100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 85
        for: 1m
        labels:
          severity: "warning"
        annotations:
          summary: "CPU usage high on {{ $labels.instance }}"
          description: "CPU usage {{ $value }}% on {{ $labels.instance }}"

Prometheus Service:

# prometheus-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: prometheus
  namespace: monitoring
  labels:
    app: prometheus
    component: core
  annotations:
    prometheus.io/scrape: 'true'
spec:
  ports:
  - port: 9090
    targetPort: 9090
    protocol: TCP
    name: webui
  selector:
    app: prometheus
    component: core

Prometheus Deployment (example):

# prometheus-deploy.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: prometheus-core
  namespace: monitoring
  labels:
    app: prometheus
    component: core
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: prometheus
        component: core
    spec:
      serviceAccountName: prometheus-k8s
      nodeSelector:
        kubernetes.io/hostname: 192.168.10.2
      containers:
      - name: prometheus
        image: zqdlove/prometheus:v2.0.0
        args:
        - '--storage.tsdb.retention=15d'
        - '--config.file=/etc/prometheus/prometheus.yaml'
        - '--storage.tsdb.path=/home/prometheus_data'
        - '--web.enable-lifecycle'
        ports:
        - name: webui
          containerPort: 9090
        resources:
          requests:
            cpu: 20000m
            memory: 20000M
          limits:
            cpu: 20000m
            memory: 20000M
        volumeMounts:
        - name: data
          mountPath: /home/prometheus_data
        - name: config-volume
          mountPath: /etc/prometheus
        - name: rules-volume
          mountPath: /etc/prometheus-rules
        - name: time
          mountPath: /etc/localtime
      volumes:
      - name: data
        hostPath:
          path: /home/cdnadmin/prometheus_data
      - name: config-volume
        configMap:
          name: prometheus-core
      - name: rules-volume
        configMap:
          name: prometheus-rules
      - name: time
        hostPath:
          path: /etc/localtime

Ingress for external access:

# prometheus_Ingress.yaml
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: traefik-prometheus
  namespace: monitoring
spec:
  rules:
  - host: prometheus.test.com
    http:
      paths:
      - path: /
        backend:
          serviceName: prometheus
          servicePort: 9090

Kubernetes Service Discovery in Prometheus

Static scrape example:

scrape_configs:
- job_name: 'prometheus'
  static_configs:
  - targets: ['localhost:9090','localhost:9100']
    labels:
      group: 'prometheus'

Kubernetes endpoints discovery for Prometheus itself:

scrape_configs:
- job_name: monitoring/kube-prometheus/0
  scrape_interval: 30s
  scrape_timeout: 10s
  metrics_path: /metrics
  scheme: http
  kubernetes_sd_configs:
  - role: endpoints
    namespaces:
      names: [monitoring]
  relabel_configs:
  - source_labels: [__meta_kubernetes_service_label_app]
    regex: prometheus
    action: keep
  - source_labels: [__meta_kubernetes_endpoint_port_name]
    regex: http
    action: keep
  - source_labels: [__meta_kubernetes_namespace]
    target_label: namespace
  - source_labels: [__meta_kubernetes_pod_name]
    target_label: pod
  - source_labels: [__meta_kubernetes_service_name]
    target_label: service

Typical discovery jobs for kube‑let metrics, kube‑apiserver, kube‑state‑metrics and node‑exporter are defined similarly, using role: endpoints and appropriate relabeling to add job, instance, namespace etc.

Grafana Deployment

# grafana-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: grafana
  namespace: monitoring
  labels:
    app: grafana
    component: core
spec:
  ports:
  - port: 3000
  selector:
    app: grafana
    component: core

# grafana-deploy.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: grafana-core
  namespace: monitoring
  labels:
    app: grafana
    component: core
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: grafana
        component: core
    spec:
      nodeSelector:
        kubernetes.io/hostname: 192.168.10.2
      containers:
      - name: grafana-core
        image: zqdlove/grafana:v5.0.0
        resources:
          limits:
            cpu: 10000m
            memory: 32000Mi
          requests:
            cpu: 10000m
            memory: 32000Mi
        env:
        - name: GF_AUTH_BASIC_ENABLED
          value: "true"
        - name: GF_AUTH_ANONYMOUS_ENABLED
          value: "false"
        readinessProbe:
          httpGet:
            path: /login
            port: 3000
        volumeMounts:
        - name: grafana-persistent-storage
          mountPath: /var
        - name: grafana
          mountPath: /etc/grafana
      volumes:
      - name: grafana-persistent-storage
        emptyDir: {}
      - name: grafana
        hostPath:
          path: /etc/grafana

Grafana Ingress (optional):

# grafana-ingress.yaml
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: traefik-grafana
  namespace: monitoring
spec:
  rules:
  - host: grafana.test.com
    http:
      paths:
      - path: /
        backend:
          serviceName: grafana
          servicePort: 3000

Alertmanager Deployment and Integration

# alertmanager deployment via Helm
helm install --name alertmanager alertmanager/ --namespace monitoring

Prometheus alerting section linking to Alertmanager:

alerting:
  alert_relabel_configs:
  - separator: ;
    regex: prometheus_replica
    action: labeldrop
  alertmanagers:
  - kubernetes_sd_configs:
    - role: endpoints
    namespaces:
      names: [monitoring]
    scheme: http
    path_prefix: /
    timeout: 10s
    relabel_configs:
    - source_labels: [__meta_kubernetes_service_name]
      regex: kube-prometheus-alertmanager
      action: keep
    - source_labels: [__meta_kubernetes_endpoint_port_name]
      regex: http
      action: keep

Sample alert rule (ConfigMap node-up.yml shown above) triggers an alert when a node‑exporter target is down.

WeChat Notification via Alertmanager Webhook

After creating a WeChat enterprise account, add the following to alertmanager.yml (or the global section of Prometheus config) to enable WeChat alerts:

global:
  wechat_api_url: 'https://qyapi.weixin.qq.com/cgi-bin/'
  wechat_api_secret: 'YOUR_SECRET'
  wechat_api_corp_id: 'YOUR_CORP_ID'

route:
  receiver: 'wechat'
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h

receivers:
- name: 'wechat'
  wechat_configs:
  - send_resolved: true
    to_party: '1'
    agent_id: '1000002'

Summary

This guide demonstrates two ways to run Prometheus on a Kubernetes cluster: a manual YAML‑based approach and the operator‑based approach using Helm. It covers namespace creation, RBAC, ConfigMaps for Prometheus configuration and alert rules, Services, Deployments, DaemonSets for node-exporter, and Ingress resources for external access. Service discovery configurations for static targets and Kubernetes objects are provided, along with Grafana deployment for visualisation and Alertmanager (including a WeChat webhook) for alert notifications.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

monitoring Prometheus

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.