Cloud Native 7 min read

How to Deploy and Monitor Kubernetes Networks with Kubenurse

This guide explains how to install Kubenurse as a DaemonSet in a Kubernetes cluster, configure its ingress and ServiceMonitor resources, and use Prometheus and Grafana to visualize comprehensive network health metrics such as latency, DNS errors, and API server connectivity.

Ops Development Stories
Ops Development Stories
Ops Development Stories
How to Deploy and Monitor Kubernetes Networks with Kubenurse

Introduction

In Kubernetes, networking is provided by third‑party CNI plugins, whose implementations can be complex and make troubleshooting difficult. Kubenurse is a project that monitors all network connections in a cluster and exposes metrics for Prometheus collection.

Kubenurse Overview

Kubenurse is deployed as a DaemonSet on each node. After deployment it sends a health check to

/alive

every 5 seconds, caches results for 3 seconds, and performs comprehensive network probes targeting ingress, DNS, the API server, and kube‑proxy.

All checks generate public metrics that can be used to detect:

SDN network latency and errors

Kubelet‑to‑Kubelet latency and errors

Pod‑to‑API server communication issues

Ingress round‑trip latency and errors

Service round‑trip latency and errors (kube‑proxy)

API server problems

CoreDNS errors

External DNS resolution errors (ingress URL)

The main metrics are:

kubenurse_errors_total

: error counter grouped by type

kubenurse_request_duration

: request duration distribution by type

Metrics are labeled with a

Type

indicating the detection target, such as

api_server_direct

,

api_server_dns

,

me_ingress

,

me_service

, and

path_$KUBELET_HOSTNAME

. Percentiles (P50, P90, P99) allow precise assessment of cluster network health.

Installation and Configuration

1. Clone the repository:

<code>git clone https://github.com/postfinance/kubenurse.git</code>

2. Edit

example/ingress.yaml

to set your domain:

<code>---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  annotations:
    kubernetes.io/ingress.class: nginx
  name: kubenurse
  namespace: kube-system
spec:
  rules:
  - host: kubenurse-test.coolops.cn
    http:
      paths:
      - backend:
          serviceName: kubenurse
          servicePort: 8080
</code>

3. Update

example/daemonset.yaml

to use the same ingress host:

<code>---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  labels:
    app: kubenurse
  name: kubenurse
  namespace: kube-system
spec:
  selector:
    matchLabels:
      app: kubenurse
  template:
    metadata:
      labels:
        app: kubenurse
      annotations:
        prometheus.io/path: "/metrics"
        prometheus.io/port: "8080"
        prometheus.io/scheme: "http"
        prometheus.io/scrape: "true"
    spec:
      serviceAccountName: nurse
      containers:
      - name: kubenurse
        env:
        - name: KUBENURSE_INGRESS_URL
          value: kubenurse-test.coolops.cn  # modify this
        - name: KUBENURSE_SERVICE_URL
          value: http://kubenurse.kube-system.svc.cluster.local:8080
        - name: KUBENURSE_NAMESPACE
          value: kube-system
        - name: KUBENURSE_NEIGHBOUR_FILTER
          value: "app=kubenurse"
        image: "postfinance/kubenurse:v1.2.0"
        ports:
        - containerPort: 8080
          protocol: TCP
        tolerations:
        - effect: NoSchedule
          key: node-role.kubernetes.io/master
          operator: Equal
        - effect: NoSchedule
          key: node-role.kubernetes.io/control-plane
          operator: Equal
</code>

4. Create a

ServiceMonitor

so Prometheus can scrape the metrics:

<code>apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: kubenurse
  namespace: monitoring
  labels:
    k8s-app: kubenurse
spec:
  jobLabel: k8s-app
  endpoints:
  - port: "8080-8080"
    interval: 30s
    scheme: http
  selector:
    matchLabels:
      app: kubenurse
  namespaceSelector:
    matchNames:
    - kube-system
</code>

5. Apply all manifests:

<code>kubectl apply -f .</code>

6. Wait until all pods are in the

Running

state:

<code># kubectl get all -n kube-system -l app=kubenurse
NAME                     READY   STATUS    RESTARTS   AGE
pod/kubenurse-fznsw      1/1     Running   0          17h
pod/kubenurse-n52rq      1/1     Running   0          17h
pod/kubenurse-nwtl4      1/1     Running   0          17h
pod/kubenurse-xp92p      1/1     Running   0          17h
pod/kubenurse-z2ksz      1/1     Running   0          17h

NAME               TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
service/kubenurse  ClusterIP   10.96.229.244    <none>        8080/TCP   17h

NAME                         DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE   SELECTOR   AGE
daemonset.apps/kubenurse    5         5         5       5            5          <none>          17h
</code>

7. Verify that Prometheus is collecting the metrics and visualize them in Grafana.

References

GitHub repository: https://github.com/postfinance/kubenurse Example manifests: https://github.com/postfinance/kubenurse/tree/master/examples

kubernetesPrometheusDaemonSetNetwork MonitoringServiceMonitorkubenurse
Ops Development Stories
Written by

Ops Development Stories

Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.