Cloud Native 26 min read

How to Monitor Kubernetes with OpenTelemetry Collector: Step‑by‑Step Helm Deployment

This guide walks through installing OpenTelemetry Collector on a Kubernetes cluster using Helm, configuring DaemonSet and Deployment collectors, integrating Prometheus for metrics, and customizing receivers, processors, and exporters to achieve comprehensive observability of nodes, pods, containers, and cluster resources.

Ops Development Stories
Ops Development Stories
Ops Development Stories
How to Monitor Kubernetes with OpenTelemetry Collector: Step‑by‑Step Helm Deployment

Kubernetes has become a widely adopted industry tool, increasing demand for observability solutions. OpenTelemetry provides various tools to help Kubernetes users monitor their clusters and services.

We will use the OpenTelemetry Collector Helm chart to install two collectors: a DaemonSet for node, pod, and container metrics and logs, and a Deployment for cluster‑level metrics and events.

<code>$ helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
$ helm repo update</code>

First, deploy a Prometheus instance to scrape metrics. Add the Prometheus community repo and install the

kube-prometheus-stack

chart with a custom

prometheus-values.yaml

that disables default exporters and enables remote‑write.

<code>$ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
$ helm repo update
$ helm upgrade --install prometheus prometheus-community/kube-prometheus-stack -f prometheus-values.yaml --namespace kube-otel --create-namespace</code>

After installation, expose Prometheus and Grafana via an Ingress at

grafana.k8s.local

(username

admin

, password

prom-operator

).

Metrics Collection

Create a

otel-collector-ds-values.yaml

file that defines the collector mode, tolerations, service, receivers, processors, exporters, and pipelines.

<code># otel-collector-ds-values.yaml
mode: daemonset

tolerations:
  - key: node-role.kubernetes.io/control-plane
    effect: NoSchedule

service:
  enabled: true

config:
  receivers:
    hostmetrics:
      collection_interval: 10s
      root_path: /hostfs
      scrapers:
        filesystem:
          exclude_fs_types:
            fs_types:
              - autofs
              - binfmt_misc
              - bpf
              - cgroup2
              - configfs
              - debugfs
              - devpts
              - devtmpfs
              - fusectl
              - hugetlbfs
              - iso9660
              - mqueue
              - nsfs
              - overlay
              - proc
              - procfs
              - pstore
              - rpc_pipefs
              - securityfs
              - selinuxfs
              - squashfs
              - sysfs
              - tracefs
          match_type: strict
          exclude_mount_points:
            match_type: regexp
            mount_points:
              - /dev/*
              - /proc/*
              - /sys/*
              - /run/k3s/containerd/*
              - /var/lib/docker/*
              - /var/lib/kubelet/*
              - /snap/*
    kubeletstats:
      auth_type: serviceAccount
      collection_interval: 20s
      endpoint: "${env:K8S_NODE_NAME}:10250"
      extra_metadata_labels:
        - container.id
    otlp:
      protocols:
        grpc:
          endpoint: "${env:MY_POD_IP}:4317"
        http:
          endpoint: "${env:MY_POD_IP}:4318"
    prometheus:
      config:
        scrape_configs:
          - job_name: opentelemetry-collector
            scrape_interval: 10s
            static_configs:
              - targets: ["${env:MY_POD_IP}:8888"]

  processors:
    memory_limiter:
      check_interval: 5s
      limit_percentage: 80
      spike_limit_percentage: 25
    metricstransform:
      transforms:
        - action: update
          include: .+
          match_type: regexp
          operations:
            - action: add_label
              new_label: k8s.cluster.id
              new_value: abcd1234
            - action: add_label
              new_label: k8s.cluster.name
              new_value: youdian-k8s
    k8sattributes:
      extract:
        metadata:
          - k8s.namespace.name
          - k8s.deployment.name
          - k8s.statefulset.name
          - k8s.daemonset.name
          - k8s.cronjob.name
          - k8s.job.name
          - k8s.node.name
          - k8s.pod.name
          - k8s.pod.uid
          - k8s.pod.start_time
      filter:
        node_from_env_var: K8S_NODE_NAME
      passthrough: false
      pod_association:
        - sources:
            - from: resource_attribute
              name: k8s.pod.ip
        - sources:
            - from: resource_attribute
              name: k8s.pod.uid
        - sources:
            - from: connection
    batch: {}

  exporters:
    logging:
      loglevel: debug
    prometheus:
      endpoint: 0.0.0.0:9090
      metric_expiration: 180m
      resource_to_telemetry_conversion:
        enabled: true

  service:
    pipelines:
      metrics:
        receivers: [otlp, hostmetrics, kubeletstats, prometheus]
        processors: [memory_limiter, metricstransform, k8sattributes, batch]
        exporters: [prometheus]
</code>

Deploy the collector DaemonSet with the values file:

<code>$ helm upgrade --install opentelemetry-collector open-telemetry/opentelemetry-collector -f otel-collector-ds-values.yaml --namespace kube-otel --create-namespace
$ kubectl get pods -n kube-otel</code>

OTLP Receiver

The OTLP receiver listens on ports

4317

(gRPC) and

4318

(HTTP) to ingest traces, metrics, and logs in OTLP format.

<code>receivers:
  otlp:
    protocols:
      grpc:
        endpoint: ${env:MY_POD_IP}:4317
      http:
        endpoint: ${env:MY_POD_IP}:4318</code>

Hostmetrics Receiver

Collects host‑level metrics such as CPU, disk, memory, and filesystem usage every 10 seconds from the host filesystem at

/hostfs

. Filesystem scrapers exclude many virtual and container‑related mount points.

<code>receivers:
  hostmetrics:
    collection_interval: 10s
    root_path: /hostfs
    scrapers:
      filesystem:
        exclude_fs_types:
          fs_types: [autofs, binfmt_misc, bpf, cgroup2, configfs, debugfs, devpts, devtmpfs, fusectl, hugetlbfs, iso9660, mqueue, nsfs, overlay, proc, procfs, pstore, rpc_pipefs, securityfs, selinuxfs, squashfs, sysfs, tracefs]
        match_type: strict
        exclude_mount_points:
          match_type: regexp
          mount_points: [/dev/*, /proc/*, /sys/*, /run/k3s/containerd/*, /var/lib/docker/*, /var/lib/kubelet/*, /snap/*]
        load: null
        memory: null
        network: null</code>

Kubeletstats Receiver

Fetches metrics from the kubelet API (default secure endpoint on port

10250

) for nodes, pods, and containers. Authentication can be set to

serviceAccount

or

tls

. Extra metadata labels such as

container.id

can be added.

<code>receivers:
  kubeletstats:
    auth_type: serviceAccount
    collection_interval: 10s
    endpoint: "${env:K8S_NODE_NAME}:10250"
    insecure_skip_verify: true
    extra_metadata_labels:
      - container.id
    metric_groups:
      - node
      - pod</code>

Prometheus Receiver

Acts as a Prometheus‑compatible scraper, supporting full

scrape_config

syntax. Example configuration adds a job to scrape the collector’s own metrics on port

8888

and a Kubernetes SD job for pod metrics.

<code>receivers:
  prometheus:
    config:
      scrape_configs:
        - job_name: opentelemetry-collector
          scrape_interval: 10s
          static_configs:
            - targets: ["${env:MY_POD_IP}:8888"]
        - job_name: k8s
          kubernetes_sd_configs:
            - role: pod
          relabel_configs:
            - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
              regex: "true"
              action: keep
          metric_relabel_configs:
            - source_labels: [__name__]
              regex: "(request_duration_seconds.*|response_duration_seconds.*)"
              action: keep</code>

Processors

Batch groups spans, metrics, or logs to improve compression and reduce outbound connections. It should be placed after

memory_limiter

and any sampling processors.

<code>processors:
  batch: {}
  batch/2:
    send_batch_size: 10000
    timeout: 10s</code>

Memory Limiter prevents out‑of‑memory crashes by checking usage at a configurable interval and applying soft and hard limits.

<code>processors:
  memory_limiter:
    check_interval: 5s
    limit_percentage: 80
    spike_limit_percentage: 25</code>

K8sattributes extracts Kubernetes metadata (namespace, deployment, pod name, UID, etc.) and associates telemetry with pods using IP, UID, or connection information.

<code>processors:
  k8sattributes:
    extract:
      metadata:
        - k8s.namespace.name
        - k8s.deployment.name
        - k8s.statefulset.name
        - k8s.daemonset.name
        - k8s.cronjob.name
        - k8s.job.name
        - k8s.node.name
        - k8s.pod.name
        - k8s.pod.uid
        - k8s.pod.start_time
    filter:
      node_from_env_var: K8S_NODE_NAME
    passthrough: false
    pod_association:
      - sources:
          - from: resource_attribute
            name: k8s.pod.ip
      - sources:
          - from: resource_attribute
            name: k8s.pod.uid
      - sources:
          - from: connection</code>

Metricstransform adds the labels

k8s.cluster.id=abcd1234

and

k8s.cluster.name=youdian-k8s

to every metric.

<code>processors:
  metricstransform:
    transforms:
      - action: update
        include: .+
        match_type: regexp
        operations:
          - action: add_label
            new_label: k8s.cluster.id
            new_value: abcd1234
          - action: add_label
            new_label: k8s.cluster.name
            new_value: youdian-k8s</code>

Exporters

Logging writes telemetry to stdout for debugging.

<code>exporters:
  logging:
    loglevel: debug</code>

Prometheus exposes metrics on

0.0.0.0:9090/metrics

with a 180‑minute expiration and converts resource attributes to labels.

<code>exporters:
  prometheus:
    endpoint: 0.0.0.0:9090
    metric_expiration: 180m
    resource_to_telemetry_conversion:
      enabled: true</code>

ServiceMonitor for Prometheus

Create a

ServiceMonitor

so Prometheus scrapes the collector’s metrics endpoint.

<code>apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: otel-prom
  namespace: kube-otel
  labels:
    release: prometheus
spec:
  endpoints:
    - interval: 10s
      port: prom
      path: metrics
  selector:
    matchLabels:
      component: agent-collector
      app.kubernetes.io/instance: opentelemetry-collector</code>

After applying the ServiceMonitor, Prometheus displays collector metrics, which can also be visualized in Grafana.

Metrics include the custom

k8s.cluster.id

and

k8s.cluster.name

labels added by the

metricstransform

processor.

Grafana can query these metrics for dashboards.

Optionally, deploy the collector in Deployment mode to collect additional cluster‑wide metrics.

MonitoringobservabilityKubernetesOpenTelemetryPrometheushelm
Ops Development Stories
Written by

Ops Development Stories

Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.