Cloud Native 26 min read

How to Monitor Kubernetes with OpenTelemetry Collector: Step‑by‑Step Helm Deployment

This guide walks through installing OpenTelemetry Collector on a Kubernetes cluster using Helm, configuring DaemonSet and Deployment collectors, integrating Prometheus for metrics, and customizing receivers, processors, and exporters to achieve comprehensive observability of nodes, pods, containers, and cluster resources.

Ops Development Stories

Oct 12, 2023

How to Monitor Kubernetes with OpenTelemetry Collector: Step‑by‑Step Helm Deployment

Kubernetes has become a widely adopted industry tool, increasing demand for observability solutions. OpenTelemetry provides various tools to help Kubernetes users monitor their clusters and services.

We will use the OpenTelemetry Collector Helm chart to install two collectors: a DaemonSet for node, pod, and container metrics and logs, and a Deployment for cluster‑level metrics and events.

$ helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
$ helm repo update

First, deploy a Prometheus instance to scrape metrics. Add the Prometheus community repo and install the kube-prometheus-stack chart with a custom prometheus-values.yaml that disables default exporters and enables remote‑write.

$ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
$ helm repo update
$ helm upgrade --install prometheus prometheus-community/kube-prometheus-stack -f prometheus-values.yaml --namespace kube-otel --create-namespace

After installation, expose Prometheus and Grafana via an Ingress at grafana.k8s.local (username admin, password prom-operator).

Metrics Collection

Create a otel-collector-ds-values.yaml file that defines the collector mode, tolerations, service, receivers, processors, exporters, and pipelines.

# otel-collector-ds-values.yaml
mode: daemonset

tolerations:
  - key: node-role.kubernetes.io/control-plane
    effect: NoSchedule

service:
  enabled: true

config:
  receivers:
    hostmetrics:
      collection_interval: 10s
      root_path: /hostfs
      scrapers:
        filesystem:
          exclude_fs_types:
            fs_types:
              - autofs
              - binfmt_misc
              - bpf
              - cgroup2
              - configfs
              - debugfs
              - devpts
              - devtmpfs
              - fusectl
              - hugetlbfs
              - iso9660
              - mqueue
              - nsfs
              - overlay
              - proc
              - procfs
              - pstore
              - rpc_pipefs
              - securityfs
              - selinuxfs
              - squashfs
              - sysfs
              - tracefs
          match_type: strict
          exclude_mount_points:
            match_type: regexp
            mount_points:
              - /dev/*
              - /proc/*
              - /sys/*
              - /run/k3s/containerd/*
              - /var/lib/docker/*
              - /var/lib/kubelet/*
              - /snap/*
    kubeletstats:
      auth_type: serviceAccount
      collection_interval: 20s
      endpoint: "${env:K8S_NODE_NAME}:10250"
      extra_metadata_labels:
        - container.id
    otlp:
      protocols:
        grpc:
          endpoint: "${env:MY_POD_IP}:4317"
        http:
          endpoint: "${env:MY_POD_IP}:4318"
    prometheus:
      config:
        scrape_configs:
          - job_name: opentelemetry-collector
            scrape_interval: 10s
            static_configs:
              - targets: ["${env:MY_POD_IP}:8888"]

  processors:
    memory_limiter:
      check_interval: 5s
      limit_percentage: 80
      spike_limit_percentage: 25
    metricstransform:
      transforms:
        - action: update
          include: .+
          match_type: regexp
          operations:
            - action: add_label
              new_label: k8s.cluster.id
              new_value: abcd1234
            - action: add_label
              new_label: k8s.cluster.name
              new_value: youdian-k8s
    k8sattributes:
      extract:
        metadata:
          - k8s.namespace.name
          - k8s.deployment.name
          - k8s.statefulset.name
          - k8s.daemonset.name
          - k8s.cronjob.name
          - k8s.job.name
          - k8s.node.name
          - k8s.pod.name
          - k8s.pod.uid
          - k8s.pod.start_time
      filter:
        node_from_env_var: K8S_NODE_NAME
      passthrough: false
      pod_association:
        - sources:
            - from: resource_attribute
              name: k8s.pod.ip
        - sources:
            - from: resource_attribute
              name: k8s.pod.uid
        - sources:
            - from: connection
    batch: {}

  exporters:
    logging:
      loglevel: debug
    prometheus:
      endpoint: 0.0.0.0:9090
      metric_expiration: 180m
      resource_to_telemetry_conversion:
        enabled: true

  service:
    pipelines:
      metrics:
        receivers: [otlp, hostmetrics, kubeletstats, prometheus]
        processors: [memory_limiter, metricstransform, k8sattributes, batch]
        exporters: [prometheus]

Deploy the collector DaemonSet with the values file:

$ helm upgrade --install opentelemetry-collector open-telemetry/opentelemetry-collector -f otel-collector-ds-values.yaml --namespace kube-otel --create-namespace
$ kubectl get pods -n kube-otel

OTLP Receiver

The OTLP receiver listens on ports 4317 (gRPC) and 4318 (HTTP) to ingest traces, metrics, and logs in OTLP format.

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: ${env:MY_POD_IP}:4317
      http:
        endpoint: ${env:MY_POD_IP}:4318

Hostmetrics Receiver

Collects host‑level metrics such as CPU, disk, memory, and filesystem usage every 10 seconds from the host filesystem at /hostfs. Filesystem scrapers exclude many virtual and container‑related mount points.

receivers:
  hostmetrics:
    collection_interval: 10s
    root_path: /hostfs
    scrapers:
      filesystem:
        exclude_fs_types:
          fs_types: [autofs, binfmt_misc, bpf, cgroup2, configfs, debugfs, devpts, devtmpfs, fusectl, hugetlbfs, iso9660, mqueue, nsfs, overlay, proc, procfs, pstore, rpc_pipefs, securityfs, selinuxfs, squashfs, sysfs, tracefs]
        match_type: strict
        exclude_mount_points:
          match_type: regexp
          mount_points: [/dev/*, /proc/*, /sys/*, /run/k3s/containerd/*, /var/lib/docker/*, /var/lib/kubelet/*, /snap/*]
        load: null
        memory: null
        network: null

Kubeletstats Receiver

Fetches metrics from the kubelet API (default secure endpoint on port 10250) for nodes, pods, and containers. Authentication can be set to serviceAccount or tls. Extra metadata labels such as container.id can be added.

receivers:
  kubeletstats:
    auth_type: serviceAccount
    collection_interval: 10s
    endpoint: "${env:K8S_NODE_NAME}:10250"
    insecure_skip_verify: true
    extra_metadata_labels:
      - container.id
    metric_groups:
      - node
      - pod

Prometheus Receiver

Acts as a Prometheus‑compatible scraper, supporting full scrape_config syntax. Example configuration adds a job to scrape the collector’s own metrics on port 8888 and a Kubernetes SD job for pod metrics.

receivers:
  prometheus:
    config:
      scrape_configs:
        - job_name: opentelemetry-collector
          scrape_interval: 10s
          static_configs:
            - targets: ["${env:MY_POD_IP}:8888"]
        - job_name: k8s
          kubernetes_sd_configs:
            - role: pod
          relabel_configs:
            - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
              regex: "true"
              action: keep
          metric_relabel_configs:
            - source_labels: [__name__]
              regex: "(request_duration_seconds.*|response_duration_seconds.*)"
              action: keep

Processors

Batch groups spans, metrics, or logs to improve compression and reduce outbound connections. It should be placed after memory_limiter and any sampling processors.

processors:
  batch: {}
  batch/2:
    send_batch_size: 10000
    timeout: 10s

Memory Limiter prevents out‑of‑memory crashes by checking usage at a configurable interval and applying soft and hard limits.

processors:
  memory_limiter:
    check_interval: 5s
    limit_percentage: 80
    spike_limit_percentage: 25

K8sattributes extracts Kubernetes metadata (namespace, deployment, pod name, UID, etc.) and associates telemetry with pods using IP, UID, or connection information.

processors:
  k8sattributes:
    extract:
      metadata:
        - k8s.namespace.name
        - k8s.deployment.name
        - k8s.statefulset.name
        - k8s.daemonset.name
        - k8s.cronjob.name
        - k8s.job.name
        - k8s.node.name
        - k8s.pod.name
        - k8s.pod.uid
        - k8s.pod.start_time
    filter:
      node_from_env_var: K8S_NODE_NAME
    passthrough: false
    pod_association:
      - sources:
          - from: resource_attribute
            name: k8s.pod.ip
      - sources:
          - from: resource_attribute
            name: k8s.pod.uid
      - sources:
          - from: connection

Metricstransform adds the labels k8s.cluster.id=abcd1234 and k8s.cluster.name=youdian-k8s to every metric.

processors:
  metricstransform:
    transforms:
      - action: update
        include: .+
        match_type: regexp
        operations:
          - action: add_label
            new_label: k8s.cluster.id
            new_value: abcd1234
          - action: add_label
            new_label: k8s.cluster.name
            new_value: youdian-k8s

Exporters

Logging writes telemetry to stdout for debugging.

exporters:
  logging:
    loglevel: debug

Prometheus exposes metrics on 0.0.0.0:9090/metrics with a 180‑minute expiration and converts resource attributes to labels.

exporters:
  prometheus:
    endpoint: 0.0.0.0:9090
    metric_expiration: 180m
    resource_to_telemetry_conversion:
      enabled: true

ServiceMonitor for Prometheus

Create a ServiceMonitor so Prometheus scrapes the collector’s metrics endpoint.

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: otel-prom
  namespace: kube-otel
  labels:
    release: prometheus
spec:
  endpoints:
    - interval: 10s
      port: prom
      path: metrics
  selector:
    matchLabels:
      component: agent-collector
      app.kubernetes.io/instance: opentelemetry-collector

After applying the ServiceMonitor, Prometheus displays collector metrics, which can also be visualized in Grafana.

Metrics include the custom k8s.cluster.id and k8s.cluster.name labels added by the metricstransform processor.

Grafana can query these metrics for dashboards.

Optionally, deploy the collector in Deployment mode to collect additional cluster‑wide metrics.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Observability Kubernetes OpenTelemetry Prometheus helm

Written by

Ops Development Stories

Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.