How to Deploy and Monitor Kubernetes Networks with Kubenurse
This guide explains how to install Kubenurse as a DaemonSet in a Kubernetes cluster, configure its ingress and ServiceMonitor resources, and use Prometheus and Grafana to visualize comprehensive network health metrics such as latency, DNS errors, and API server connectivity.
Introduction
In Kubernetes, networking is provided by third‑party CNI plugins, whose implementations can be complex and make troubleshooting difficult. Kubenurse is a project that monitors all network connections in a cluster and exposes metrics for Prometheus collection.
Kubenurse Overview
Kubenurse is deployed as a DaemonSet on each node. After deployment it sends a health check to
/aliveevery 5 seconds, caches results for 3 seconds, and performs comprehensive network probes targeting ingress, DNS, the API server, and kube‑proxy.
All checks generate public metrics that can be used to detect:
SDN network latency and errors
Kubelet‑to‑Kubelet latency and errors
Pod‑to‑API server communication issues
Ingress round‑trip latency and errors
Service round‑trip latency and errors (kube‑proxy)
API server problems
CoreDNS errors
External DNS resolution errors (ingress URL)
The main metrics are:
kubenurse_errors_total: error counter grouped by type
kubenurse_request_duration: request duration distribution by type
Metrics are labeled with a
Typeindicating the detection target, such as
api_server_direct,
api_server_dns,
me_ingress,
me_service, and
path_$KUBELET_HOSTNAME. Percentiles (P50, P90, P99) allow precise assessment of cluster network health.
Installation and Configuration
1. Clone the repository:
<code>git clone https://github.com/postfinance/kubenurse.git</code>2. Edit
example/ingress.yamlto set your domain:
<code>---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
annotations:
kubernetes.io/ingress.class: nginx
name: kubenurse
namespace: kube-system
spec:
rules:
- host: kubenurse-test.coolops.cn
http:
paths:
- backend:
serviceName: kubenurse
servicePort: 8080
</code>3. Update
example/daemonset.yamlto use the same ingress host:
<code>---
apiVersion: apps/v1
kind: DaemonSet
metadata:
labels:
app: kubenurse
name: kubenurse
namespace: kube-system
spec:
selector:
matchLabels:
app: kubenurse
template:
metadata:
labels:
app: kubenurse
annotations:
prometheus.io/path: "/metrics"
prometheus.io/port: "8080"
prometheus.io/scheme: "http"
prometheus.io/scrape: "true"
spec:
serviceAccountName: nurse
containers:
- name: kubenurse
env:
- name: KUBENURSE_INGRESS_URL
value: kubenurse-test.coolops.cn # modify this
- name: KUBENURSE_SERVICE_URL
value: http://kubenurse.kube-system.svc.cluster.local:8080
- name: KUBENURSE_NAMESPACE
value: kube-system
- name: KUBENURSE_NEIGHBOUR_FILTER
value: "app=kubenurse"
image: "postfinance/kubenurse:v1.2.0"
ports:
- containerPort: 8080
protocol: TCP
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/master
operator: Equal
- effect: NoSchedule
key: node-role.kubernetes.io/control-plane
operator: Equal
</code>4. Create a
ServiceMonitorso Prometheus can scrape the metrics:
<code>apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: kubenurse
namespace: monitoring
labels:
k8s-app: kubenurse
spec:
jobLabel: k8s-app
endpoints:
- port: "8080-8080"
interval: 30s
scheme: http
selector:
matchLabels:
app: kubenurse
namespaceSelector:
matchNames:
- kube-system
</code>5. Apply all manifests:
<code>kubectl apply -f .</code>6. Wait until all pods are in the
Runningstate:
<code># kubectl get all -n kube-system -l app=kubenurse
NAME READY STATUS RESTARTS AGE
pod/kubenurse-fznsw 1/1 Running 0 17h
pod/kubenurse-n52rq 1/1 Running 0 17h
pod/kubenurse-nwtl4 1/1 Running 0 17h
pod/kubenurse-xp92p 1/1 Running 0 17h
pod/kubenurse-z2ksz 1/1 Running 0 17h
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubenurse ClusterIP 10.96.229.244 <none> 8080/TCP 17h
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/kubenurse 5 5 5 5 5 <none> 17h
</code>7. Verify that Prometheus is collecting the metrics and visualize them in Grafana.
References
GitHub repository: https://github.com/postfinance/kubenurse Example manifests: https://github.com/postfinance/kubenurse/tree/master/examples
Ops Development Stories
Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.