How to Deploy Kuberhealthy 2.x for Synthetic Monitoring and KPI Tracking
This guide walks through installing Kuberhealthy 2.x on a Kubernetes cluster using Helm, configuring built‑in and custom checks, exposing metrics via Prometheus, and defining key performance indicators such as availability, utilization, latency, and error rates with concrete PromQL queries.
In November 2019 the Kuberhealthy 2.0.0 operator was released at KubeCon, enabling developers to create custom synthetic monitoring checks as Kubernetes operators. The community quickly adopted the feature, and the following steps show how to install and use Kuberhealthy on a cluster.
1. Deploy Kuberhealthy
Ensure Helm 3 is installed. Then create a dedicated namespace and set the current context:
kubectl create namespace kuberhealthy kubectl config set-context --current --namespace=kuberhealthyAdd the Kuberhealthy Helm repository:
helm repo add kuberhealthy https://comcast.github.io/kuberhealthy/helm-reposInstall Kuberhealthy with the appropriate Prometheus settings:
If using Prometheus Operator:
helm install kuberhealthy kuberhealthy/kuberhealthy \
--set prometheus.enabled=true,prometheus.enableAlerting=true,\
prometheus.enableScraping=true,prometheus.serviceMonitor=trueIf using plain Prometheus (no operator):
helm install kuberhealthy kuberhealthy/kuberhealthy \
--set prometheus.enabled=true,prometheus.enableAlerting=true,\
prometheus.enableScraping=trueIf not using Prometheus at all:
helm install kuberhealthy kuberhealthy/kuberhealthyAfter the Helm command finishes, two Kuberhealthy pods (the controller and the check‑reaper) should be running, along with a JSON status endpoint and a /metrics endpoint.
2. Configure Additional Checks
Run kubectl get khchecks to list the default checks, which include:
daemonset : ensures a daemonset runs on every node.
deployment : creates a deployment, triggers a rolling update, and verifies service accessibility.
dns-status-internal : validates internal cluster DNS functionality.
Additional external checks can be added by applying YAML files from the external‑check registry.
3. View Check Status
Expose the Kuberhealthy service (e.g., set type: LoadBalancer) or use port‑forwarding: kubectl port-forward svc/kuberhealthy 8080:80 The JSON status page returns a structure similar to:
{
"OK": true,
"Errors": [],
"CheckDetails": {
"kuberhealthy/daemonset": {"OK": true, "RunDuration": "22.5s", ...},
"kuberhealthy/deployment": {"OK": true, "RunDuration": "29.1s", ...},
"kuberhealthy/dns-status-internal": {"OK": true, "RunDuration": "2.4s", ...}
},
"CurrentMaster": "kuberhealthy-7cf79bdc86-m78qr"
}You can filter results by namespace using the ?namespace= query parameter.
4. Write Your Own Checks
Kuberhealthy is designed to be extensible: any container can be packaged as a custom check, written in any language, allowing teams to validate client libraries, simulate real user workflows, or perform high‑trust checks during normal operation.
5. Prometheus Integration Details
When Prometheus is enabled, the Kuberhealthy service receives the following annotations:
prometheus.io/path: /metrics
prometheus.io/port: "80"
prometheus.io/scrape: "true"A typical scrape_config for Prometheus looks like:
scrape_configs:
- job_name: 'kuberhealthy'
scrape_interval: 1m
honor_labels: true
metrics_path: /metrics
kubernetes_sd_configs:
- role: service
namespaces:
names:
- kuberhealthy
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: trueWith this configuration, the following metrics become available:
kuberhealthy_check kuberhealthy_check_duration_seconds kuberhealthy_cluster_states kuberhealthy_running6. Define Key Performance Indicators (KPIs)
Using the exposed metrics, teams can calculate several KPIs:
Availability : measured as Uptime / (Uptime + Downtime), where uptime and downtime are derived from the number of successful or failed deployment checks multiplied by the check run interval.
Utilization : reflects how many cluster resources (nodes, deployments, statefulsets, PVCs, services, pods, jobs) are in use, calculated from the total counts of those objects.
Latency (Duration) : average runtime of the deployment check, obtained via
avg(kuberhealthy_check_duration_seconds{check="kuberhealthy/deployment"}).
Errors / Alerts : total number of failed Kuberhealthy checks, each generating an alert.
Example PromQL query for availability over the past 30 days:
1 - (sum(count_over_time(kuberhealthy_check{check="kuberhealthy/deployment",status="0"}[30d])) OR vector(0))
/ (sum(count_over_time(kuberhealthy_check{check="kuberhealthy/deployment",status="1"}[30d])) * 100)These queries enable operators to monitor cluster health, capacity, and reliability directly from Prometheus dashboards.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Cloud Native Technology Community
The Cloud Native Technology Community, part of the CNBPA Cloud Native Technology Practice Alliance, focuses on evangelizing cutting‑edge cloud‑native technologies and practical implementations. It shares in‑depth content, case studies, and event/meetup information on containers, Kubernetes, DevOps, Service Mesh, and other cloud‑native tech, along with updates from the CNBPA alliance.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
