Cloud Native 5 min read

Full‑Stack Monitoring of Kubernetes with Prometheus and Grafana (Part 4)

This guide walks through setting up Prometheus and Grafana to monitor a Kubernetes cluster and all business pods, covering the deployment of kube‑state‑metrics, the required RBAC objects, service definitions, and detailed Prometheus scrape configurations for both kube‑state‑metrics and cAdvisor.

Linux Cloud-Native Ops Stack
Linux Cloud-Native Ops Stack
Linux Cloud-Native Ops Stack
Full‑Stack Monitoring of Kubernetes with Prometheus and Grafana (Part 4)

The article explains how to monitor a Kubernetes cluster and every business pod by deploying Prometheus and Grafana. It starts by creating the kube‑state‑metrics service, which provides cluster‑level metrics needed for comprehensive observability.

First, a ServiceAccount, ClusterRole, ClusterRoleBinding, Deployment, and Service are defined. The YAML manifests are:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: kube-state-metrics
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: kube-state-metrics
rules:
- apiGroups: [""]
  resources:
  - nodes
  - pods
  - services
  - endpoints
  - replicationcontrollers
  - limitranges
  - persistentvolumeclaims
  - persistentvolumes
  - namespaces
  - resourcequotas
  verbs: ["list", "watch"]
- apiGroups: ["apps"]
  resources:
  - deployments
  - daemonsets
  - statefulsets
  - replicasets
  verbs: ["list", "watch"]
- apiGroups: ["batch"]
  resources:
  - jobs
  - cronjobs
  verbs: ["list", "watch"]
- apiGroups: ["autoscaling"]
  resources:
  - horizontalpodautoscalers
  verbs: ["list", "watch"]
- apiGroups: ["networking.k8s.io"]
  resources:
  - ingresses
  - networkpolicies
  verbs: ["list", "watch"]
- apiGroups: ["storage.k8s.io"]
  resources:
  - storageclasses
  - volumeattachments
  verbs: ["list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: kube-state-metrics
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: kube-state-metrics
subjects:
- kind: ServiceAccount
  name: kube-state-metrics
  namespace: kube-system
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: kube-state-metrics
  namespace: kube-system
  labels:
    app: kube-state-metrics
spec:
  replicas: 1
  selector:
    matchLabels:
      app: kube-state-metrics
  template:
    metadata:
      labels:
        app: kube-state-metrics
    spec:
      serviceAccountName: kube-state-metrics
      containers:
      - name: kube-state-metrics
        # Using a domestic DaoCloud mirror for faster pulls
        image: k8s.m.daocloud.io/kube-state-metrics/kube-state-metrics:v2.13.0
        ports:
        - name: http-metrics
          containerPort: 8080
        resources:
          requests:
            cpu: 100m
            memory: 256Mi
          limits:
            cpu: 500m
            memory: 512Mi
---
apiVersion: v1
kind: Service
metadata:
  name: kube-state-metrics
  namespace: kube-system
  labels:
    app: kube-state-metrics
spec:
  type: ClusterIP
  ports:
  - name: http-metrics
    port: 8080
    targetPort: 8080
    protocol: TCP
  selector:
    app: kube-state-metrics

After deploying these resources, the article adds Prometheus scrape jobs. The first job collects metrics from the kube-state-metrics service:

- job_name: 'kubernetes-kube-state-metrics'
    scrape_interval: 15s
    scrape_timeout: 10s
    metrics_path: /metrics
    scheme: http
    kubernetes_sd_configs:
    - role: service
      namespaces:
        names: ["kube-system"]
    relabel_configs:
    - source_labels: [__meta_kubernetes_service_name]
      action: keep
      regex: ^kube-state-metrics$
    - source_labels: [__meta_kubernetes_service_port_number]
      action: keep
      regex: ^8080$

The second job scrapes cAdvisor metrics from each node, using HTTPS and the Kubernetes API proxy:

- job_name: 'kubernetes-cadvisor'
    scrape_interval: 15s
    scrape_timeout: 10s
    metrics_path: /metrics/cadvisor
    scheme: https
    tls_config:
      ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      insecure_skip_verify: true
    bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
    kubernetes_sd_configs:
    - role: node
    relabel_configs:
    - target_label: __address__
      replacement: kubernetes.default.svc:443
    - source_labels: [__meta_kubernetes_node_name]
      regex: (.+)
      target_label: __metrics_path__
      replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
    - action: labelmap
      regex: __meta_kubernetes_node_label_(.+)

Once the configuration files are added, the Prometheus server (referred to as “p8s” in the original text) must be reloaded or restarted to apply the new scrape jobs.

The article also includes three screenshots illustrating the configuration steps and the resulting monitoring dashboards; these are embedded as images.

Template identifiers “14249, 15661” are noted, likely referring to internal template numbers used for the deployment.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

MonitoringKubernetesPrometheusGrafanacAdvisorkube-state-metrics
Linux Cloud-Native Ops Stack
Written by

Linux Cloud-Native Ops Stack

Focused on practical internet operations, sharing server monitoring, troubleshooting, automated deployment, and cloud-native tech insights. From Linux basics to advanced K8s, from ops tools to architecture optimization, helping engineers avoid pitfalls, grow quickly, and become your tech companion.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.