How to Monitor Kubernetes with OpenTelemetry Collector: Step‑by‑Step Helm Deployment
This guide walks through installing OpenTelemetry Collector on a Kubernetes cluster using Helm, configuring DaemonSet and Deployment collectors, integrating Prometheus for metrics, and customizing receivers, processors, and exporters to achieve comprehensive observability of nodes, pods, containers, and cluster resources.
Kubernetes has become a widely adopted industry tool, increasing demand for observability solutions. OpenTelemetry provides various tools to help Kubernetes users monitor their clusters and services.
We will use the OpenTelemetry Collector Helm chart to install two collectors: a DaemonSet for node, pod, and container metrics and logs, and a Deployment for cluster‑level metrics and events.
<code>$ helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
$ helm repo update</code>First, deploy a Prometheus instance to scrape metrics. Add the Prometheus community repo and install the
kube-prometheus-stackchart with a custom
prometheus-values.yamlthat disables default exporters and enables remote‑write.
<code>$ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
$ helm repo update
$ helm upgrade --install prometheus prometheus-community/kube-prometheus-stack -f prometheus-values.yaml --namespace kube-otel --create-namespace</code>After installation, expose Prometheus and Grafana via an Ingress at
grafana.k8s.local(username
admin, password
prom-operator).
Metrics Collection
Create a
otel-collector-ds-values.yamlfile that defines the collector mode, tolerations, service, receivers, processors, exporters, and pipelines.
<code># otel-collector-ds-values.yaml
mode: daemonset
tolerations:
- key: node-role.kubernetes.io/control-plane
effect: NoSchedule
service:
enabled: true
config:
receivers:
hostmetrics:
collection_interval: 10s
root_path: /hostfs
scrapers:
filesystem:
exclude_fs_types:
fs_types:
- autofs
- binfmt_misc
- bpf
- cgroup2
- configfs
- debugfs
- devpts
- devtmpfs
- fusectl
- hugetlbfs
- iso9660
- mqueue
- nsfs
- overlay
- proc
- procfs
- pstore
- rpc_pipefs
- securityfs
- selinuxfs
- squashfs
- sysfs
- tracefs
match_type: strict
exclude_mount_points:
match_type: regexp
mount_points:
- /dev/*
- /proc/*
- /sys/*
- /run/k3s/containerd/*
- /var/lib/docker/*
- /var/lib/kubelet/*
- /snap/*
kubeletstats:
auth_type: serviceAccount
collection_interval: 20s
endpoint: "${env:K8S_NODE_NAME}:10250"
extra_metadata_labels:
- container.id
otlp:
protocols:
grpc:
endpoint: "${env:MY_POD_IP}:4317"
http:
endpoint: "${env:MY_POD_IP}:4318"
prometheus:
config:
scrape_configs:
- job_name: opentelemetry-collector
scrape_interval: 10s
static_configs:
- targets: ["${env:MY_POD_IP}:8888"]
processors:
memory_limiter:
check_interval: 5s
limit_percentage: 80
spike_limit_percentage: 25
metricstransform:
transforms:
- action: update
include: .+
match_type: regexp
operations:
- action: add_label
new_label: k8s.cluster.id
new_value: abcd1234
- action: add_label
new_label: k8s.cluster.name
new_value: youdian-k8s
k8sattributes:
extract:
metadata:
- k8s.namespace.name
- k8s.deployment.name
- k8s.statefulset.name
- k8s.daemonset.name
- k8s.cronjob.name
- k8s.job.name
- k8s.node.name
- k8s.pod.name
- k8s.pod.uid
- k8s.pod.start_time
filter:
node_from_env_var: K8S_NODE_NAME
passthrough: false
pod_association:
- sources:
- from: resource_attribute
name: k8s.pod.ip
- sources:
- from: resource_attribute
name: k8s.pod.uid
- sources:
- from: connection
batch: {}
exporters:
logging:
loglevel: debug
prometheus:
endpoint: 0.0.0.0:9090
metric_expiration: 180m
resource_to_telemetry_conversion:
enabled: true
service:
pipelines:
metrics:
receivers: [otlp, hostmetrics, kubeletstats, prometheus]
processors: [memory_limiter, metricstransform, k8sattributes, batch]
exporters: [prometheus]
</code>Deploy the collector DaemonSet with the values file:
<code>$ helm upgrade --install opentelemetry-collector open-telemetry/opentelemetry-collector -f otel-collector-ds-values.yaml --namespace kube-otel --create-namespace
$ kubectl get pods -n kube-otel</code>OTLP Receiver
The OTLP receiver listens on ports
4317(gRPC) and
4318(HTTP) to ingest traces, metrics, and logs in OTLP format.
<code>receivers:
otlp:
protocols:
grpc:
endpoint: ${env:MY_POD_IP}:4317
http:
endpoint: ${env:MY_POD_IP}:4318</code>Hostmetrics Receiver
Collects host‑level metrics such as CPU, disk, memory, and filesystem usage every 10 seconds from the host filesystem at
/hostfs. Filesystem scrapers exclude many virtual and container‑related mount points.
<code>receivers:
hostmetrics:
collection_interval: 10s
root_path: /hostfs
scrapers:
filesystem:
exclude_fs_types:
fs_types: [autofs, binfmt_misc, bpf, cgroup2, configfs, debugfs, devpts, devtmpfs, fusectl, hugetlbfs, iso9660, mqueue, nsfs, overlay, proc, procfs, pstore, rpc_pipefs, securityfs, selinuxfs, squashfs, sysfs, tracefs]
match_type: strict
exclude_mount_points:
match_type: regexp
mount_points: [/dev/*, /proc/*, /sys/*, /run/k3s/containerd/*, /var/lib/docker/*, /var/lib/kubelet/*, /snap/*]
load: null
memory: null
network: null</code>Kubeletstats Receiver
Fetches metrics from the kubelet API (default secure endpoint on port
10250) for nodes, pods, and containers. Authentication can be set to
serviceAccountor
tls. Extra metadata labels such as
container.idcan be added.
<code>receivers:
kubeletstats:
auth_type: serviceAccount
collection_interval: 10s
endpoint: "${env:K8S_NODE_NAME}:10250"
insecure_skip_verify: true
extra_metadata_labels:
- container.id
metric_groups:
- node
- pod</code>Prometheus Receiver
Acts as a Prometheus‑compatible scraper, supporting full
scrape_configsyntax. Example configuration adds a job to scrape the collector’s own metrics on port
8888and a Kubernetes SD job for pod metrics.
<code>receivers:
prometheus:
config:
scrape_configs:
- job_name: opentelemetry-collector
scrape_interval: 10s
static_configs:
- targets: ["${env:MY_POD_IP}:8888"]
- job_name: k8s
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
regex: "true"
action: keep
metric_relabel_configs:
- source_labels: [__name__]
regex: "(request_duration_seconds.*|response_duration_seconds.*)"
action: keep</code>Processors
Batch groups spans, metrics, or logs to improve compression and reduce outbound connections. It should be placed after
memory_limiterand any sampling processors.
<code>processors:
batch: {}
batch/2:
send_batch_size: 10000
timeout: 10s</code>Memory Limiter prevents out‑of‑memory crashes by checking usage at a configurable interval and applying soft and hard limits.
<code>processors:
memory_limiter:
check_interval: 5s
limit_percentage: 80
spike_limit_percentage: 25</code>K8sattributes extracts Kubernetes metadata (namespace, deployment, pod name, UID, etc.) and associates telemetry with pods using IP, UID, or connection information.
<code>processors:
k8sattributes:
extract:
metadata:
- k8s.namespace.name
- k8s.deployment.name
- k8s.statefulset.name
- k8s.daemonset.name
- k8s.cronjob.name
- k8s.job.name
- k8s.node.name
- k8s.pod.name
- k8s.pod.uid
- k8s.pod.start_time
filter:
node_from_env_var: K8S_NODE_NAME
passthrough: false
pod_association:
- sources:
- from: resource_attribute
name: k8s.pod.ip
- sources:
- from: resource_attribute
name: k8s.pod.uid
- sources:
- from: connection</code>Metricstransform adds the labels
k8s.cluster.id=abcd1234and
k8s.cluster.name=youdian-k8sto every metric.
<code>processors:
metricstransform:
transforms:
- action: update
include: .+
match_type: regexp
operations:
- action: add_label
new_label: k8s.cluster.id
new_value: abcd1234
- action: add_label
new_label: k8s.cluster.name
new_value: youdian-k8s</code>Exporters
Logging writes telemetry to stdout for debugging.
<code>exporters:
logging:
loglevel: debug</code>Prometheus exposes metrics on
0.0.0.0:9090/metricswith a 180‑minute expiration and converts resource attributes to labels.
<code>exporters:
prometheus:
endpoint: 0.0.0.0:9090
metric_expiration: 180m
resource_to_telemetry_conversion:
enabled: true</code>ServiceMonitor for Prometheus
Create a
ServiceMonitorso Prometheus scrapes the collector’s metrics endpoint.
<code>apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: otel-prom
namespace: kube-otel
labels:
release: prometheus
spec:
endpoints:
- interval: 10s
port: prom
path: metrics
selector:
matchLabels:
component: agent-collector
app.kubernetes.io/instance: opentelemetry-collector</code>After applying the ServiceMonitor, Prometheus displays collector metrics, which can also be visualized in Grafana.
Metrics include the custom
k8s.cluster.idand
k8s.cluster.namelabels added by the
metricstransformprocessor.
Grafana can query these metrics for dashboards.
Optionally, deploy the collector in Deployment mode to collect additional cluster‑wide metrics.
Ops Development Stories
Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.