Deploying Prometheus on Kubernetes with Operator, Grafana, and Alertmanager
This guide walks through setting up a complete Prometheus monitoring stack on a Kubernetes cluster, covering both traditional YAML deployments and the Prometheus Operator, configuring services, integrating Grafana dashboards, and enabling Alertmanager notifications including WeChat alerts.
Prometheus Operator Architecture
The Prometheus Operator introduces custom resources (CRDs) such as Prometheus, ServiceMonitor, Alertmanager and manages their lifecycle. The operator watches these resources and creates the underlying StatefulSet, Service, ConfigMap, etc., to run Prometheus and Alertmanager instances.
Traditional Kubernetes Deployment (YAML)
All components are deployed into a dedicated monitoring namespace.
# ns-monitoring.yaml
apiVersion: v1
kind: Namespace
metadata:
name: monitoringCreate RBAC for Prometheus:
# prometheus-rbac.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus-k8s
namespace: monitoring
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus
rules:
- apiGroups: [""]
resources: ["nodes","services","endpoints","pods"]
verbs: ["get","list","watch"]
- apiGroups: [""]
resources: ["configmaps"]
verbs: ["get"]
- nonResourceURLs: ["/metrics"]
verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prometheus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus
subjects:
- kind: ServiceAccount
name: prometheus-k8s
namespace: monitoringPrometheus configuration (ConfigMap):
# prometheus-core-cm.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-core
namespace: monitoring
apiVersion: v1
data:
prometheus.yaml: |
global:
scrape_interval: 15s
scrape_timeout: 15s
evaluation_interval: 15s
alerting:
alertmanagers:
- static_configs:
- targets: ["10.254.127.110:9093"]
rule_files:
- "/etc/prometheus-rules/*.yml"
scrape_configs:
- job_name: 'kubernetes-apiservers'
kubernetes_sd_configs:
- role: endpoints
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- source_labels: [__meta_kubernetes_namespace,__meta_kubernetes_service_name,__meta_kubernetes_endpoint_port_name]
action: keep
regex: default;kubernetes;httpsAlert rules (ConfigMap):
# prometheus-rules-cm.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-rules
namespace: monitoring
data:
node-up.yml: |
groups:
- name: server_rules
rules:
- alert: MachineDown
expr: up{component="node-exporter"} != 1
for: 1m
labels:
severity: "warning"
annotations:
summary: "Machine {{ $labels.instance }} is down"
description: "Machine {{ $labels.instance }} has been down for >1m"
cpu-usage.yml: |
groups:
- name: cpu_rules
rules:
- alert: HighCPUUsage
expr: 100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 85
for: 1m
labels:
severity: "warning"
annotations:
summary: "CPU usage high on {{ $labels.instance }}"
description: "CPU usage {{ $value }}% on {{ $labels.instance }}"Prometheus Service:
# prometheus-service.yaml
apiVersion: v1
kind: Service
metadata:
name: prometheus
namespace: monitoring
labels:
app: prometheus
component: core
annotations:
prometheus.io/scrape: 'true'
spec:
ports:
- port: 9090
targetPort: 9090
protocol: TCP
name: webui
selector:
app: prometheus
component: corePrometheus Deployment (example):
# prometheus-deploy.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: prometheus-core
namespace: monitoring
labels:
app: prometheus
component: core
spec:
replicas: 1
template:
metadata:
labels:
app: prometheus
component: core
spec:
serviceAccountName: prometheus-k8s
nodeSelector:
kubernetes.io/hostname: 192.168.10.2
containers:
- name: prometheus
image: zqdlove/prometheus:v2.0.0
args:
- '--storage.tsdb.retention=15d'
- '--config.file=/etc/prometheus/prometheus.yaml'
- '--storage.tsdb.path=/home/prometheus_data'
- '--web.enable-lifecycle'
ports:
- name: webui
containerPort: 9090
resources:
requests:
cpu: 20000m
memory: 20000M
limits:
cpu: 20000m
memory: 20000M
volumeMounts:
- name: data
mountPath: /home/prometheus_data
- name: config-volume
mountPath: /etc/prometheus
- name: rules-volume
mountPath: /etc/prometheus-rules
- name: time
mountPath: /etc/localtime
volumes:
- name: data
hostPath:
path: /home/cdnadmin/prometheus_data
- name: config-volume
configMap:
name: prometheus-core
- name: rules-volume
configMap:
name: prometheus-rules
- name: time
hostPath:
path: /etc/localtimeIngress for external access:
# prometheus_Ingress.yaml
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: traefik-prometheus
namespace: monitoring
spec:
rules:
- host: prometheus.test.com
http:
paths:
- path: /
backend:
serviceName: prometheus
servicePort: 9090Kubernetes Service Discovery in Prometheus
Static scrape example:
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090','localhost:9100']
labels:
group: 'prometheus'Kubernetes endpoints discovery for Prometheus itself:
scrape_configs:
- job_name: monitoring/kube-prometheus/0
scrape_interval: 30s
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
kubernetes_sd_configs:
- role: endpoints
namespaces:
names: [monitoring]
relabel_configs:
- source_labels: [__meta_kubernetes_service_label_app]
regex: prometheus
action: keep
- source_labels: [__meta_kubernetes_endpoint_port_name]
regex: http
action: keep
- source_labels: [__meta_kubernetes_namespace]
target_label: namespace
- source_labels: [__meta_kubernetes_pod_name]
target_label: pod
- source_labels: [__meta_kubernetes_service_name]
target_label: serviceTypical discovery jobs for kube‑let metrics, kube‑apiserver, kube‑state‑metrics and node‑exporter are defined similarly, using role: endpoints and appropriate relabeling to add job, instance, namespace etc.
Grafana Deployment
# grafana-service.yaml
apiVersion: v1
kind: Service
metadata:
name: grafana
namespace: monitoring
labels:
app: grafana
component: core
spec:
ports:
- port: 3000
selector:
app: grafana
component: core # grafana-deploy.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: grafana-core
namespace: monitoring
labels:
app: grafana
component: core
spec:
replicas: 1
template:
metadata:
labels:
app: grafana
component: core
spec:
nodeSelector:
kubernetes.io/hostname: 192.168.10.2
containers:
- name: grafana-core
image: zqdlove/grafana:v5.0.0
resources:
limits:
cpu: 10000m
memory: 32000Mi
requests:
cpu: 10000m
memory: 32000Mi
env:
- name: GF_AUTH_BASIC_ENABLED
value: "true"
- name: GF_AUTH_ANONYMOUS_ENABLED
value: "false"
readinessProbe:
httpGet:
path: /login
port: 3000
volumeMounts:
- name: grafana-persistent-storage
mountPath: /var
- name: grafana
mountPath: /etc/grafana
volumes:
- name: grafana-persistent-storage
emptyDir: {}
- name: grafana
hostPath:
path: /etc/grafanaGrafana Ingress (optional):
# grafana-ingress.yaml
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: traefik-grafana
namespace: monitoring
spec:
rules:
- host: grafana.test.com
http:
paths:
- path: /
backend:
serviceName: grafana
servicePort: 3000Alertmanager Deployment and Integration
# alertmanager deployment via Helm
helm install --name alertmanager alertmanager/ --namespace monitoringPrometheus alerting section linking to Alertmanager:
alerting:
alert_relabel_configs:
- separator: ;
regex: prometheus_replica
action: labeldrop
alertmanagers:
- kubernetes_sd_configs:
- role: endpoints
namespaces:
names: [monitoring]
scheme: http
path_prefix: /
timeout: 10s
relabel_configs:
- source_labels: [__meta_kubernetes_service_name]
regex: kube-prometheus-alertmanager
action: keep
- source_labels: [__meta_kubernetes_endpoint_port_name]
regex: http
action: keepSample alert rule (ConfigMap node-up.yml shown above) triggers an alert when a node‑exporter target is down.
WeChat Notification via Alertmanager Webhook
After creating a WeChat enterprise account, add the following to alertmanager.yml (or the global section of Prometheus config) to enable WeChat alerts:
global:
wechat_api_url: 'https://qyapi.weixin.qq.com/cgi-bin/'
wechat_api_secret: 'YOUR_SECRET'
wechat_api_corp_id: 'YOUR_CORP_ID'
route:
receiver: 'wechat'
group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 1h
receivers:
- name: 'wechat'
wechat_configs:
- send_resolved: true
to_party: '1'
agent_id: '1000002'Summary
This guide demonstrates two ways to run Prometheus on a Kubernetes cluster: a manual YAML‑based approach and the operator‑based approach using Helm. It covers namespace creation, RBAC, ConfigMaps for Prometheus configuration and alert rules, Services, Deployments, DaemonSets for node-exporter, and Ingress resources for external access. Service discovery configurations for static targets and Kubernetes objects are provided, along with Grafana deployment for visualisation and Alertmanager (including a WeChat webhook) for alert notifications.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
