Cloud Native 10 min read

How to Deploy Kuberhealthy 2.x for Synthetic Monitoring and KPI Tracking

This guide walks through installing Kuberhealthy 2.x on a Kubernetes cluster using Helm, configuring built‑in and custom checks, exposing metrics via Prometheus, and defining key performance indicators such as availability, utilization, latency, and error rates with concrete PromQL queries.

Cloud Native Technology Community
Cloud Native Technology Community
Cloud Native Technology Community
How to Deploy Kuberhealthy 2.x for Synthetic Monitoring and KPI Tracking

In November 2019 the Kuberhealthy 2.0.0 operator was released at KubeCon, enabling developers to create custom synthetic monitoring checks as Kubernetes operators. The community quickly adopted the feature, and the following steps show how to install and use Kuberhealthy on a cluster.

1. Deploy Kuberhealthy

Ensure Helm 3 is installed. Then create a dedicated namespace and set the current context:

kubectl create namespace kuberhealthy
kubectl config set-context --current --namespace=kuberhealthy

Add the Kuberhealthy Helm repository:

helm repo add kuberhealthy https://comcast.github.io/kuberhealthy/helm-repos

Install Kuberhealthy with the appropriate Prometheus settings:

If using Prometheus Operator:

helm install kuberhealthy kuberhealthy/kuberhealthy \
  --set prometheus.enabled=true,prometheus.enableAlerting=true,\
  prometheus.enableScraping=true,prometheus.serviceMonitor=true

If using plain Prometheus (no operator):

helm install kuberhealthy kuberhealthy/kuberhealthy \
  --set prometheus.enabled=true,prometheus.enableAlerting=true,\
  prometheus.enableScraping=true

If not using Prometheus at all:

helm install kuberhealthy kuberhealthy/kuberhealthy

After the Helm command finishes, two Kuberhealthy pods (the controller and the check‑reaper) should be running, along with a JSON status endpoint and a /metrics endpoint.

2. Configure Additional Checks

Run kubectl get khchecks to list the default checks, which include:

daemonset : ensures a daemonset runs on every node.

deployment : creates a deployment, triggers a rolling update, and verifies service accessibility.

dns-status-internal : validates internal cluster DNS functionality.

Additional external checks can be added by applying YAML files from the external‑check registry.

3. View Check Status

Expose the Kuberhealthy service (e.g., set type: LoadBalancer) or use port‑forwarding: kubectl port-forward svc/kuberhealthy 8080:80 The JSON status page returns a structure similar to:

{
  "OK": true,
  "Errors": [],
  "CheckDetails": {
    "kuberhealthy/daemonset": {"OK": true, "RunDuration": "22.5s", ...},
    "kuberhealthy/deployment": {"OK": true, "RunDuration": "29.1s", ...},
    "kuberhealthy/dns-status-internal": {"OK": true, "RunDuration": "2.4s", ...}
  },
  "CurrentMaster": "kuberhealthy-7cf79bdc86-m78qr"
}

You can filter results by namespace using the ?namespace= query parameter.

4. Write Your Own Checks

Kuberhealthy is designed to be extensible: any container can be packaged as a custom check, written in any language, allowing teams to validate client libraries, simulate real user workflows, or perform high‑trust checks during normal operation.

5. Prometheus Integration Details

When Prometheus is enabled, the Kuberhealthy service receives the following annotations:

prometheus.io/path: /metrics
prometheus.io/port: "80"
prometheus.io/scrape: "true"

A typical scrape_config for Prometheus looks like:

scrape_configs:
- job_name: 'kuberhealthy'
  scrape_interval: 1m
  honor_labels: true
  metrics_path: /metrics
  kubernetes_sd_configs:
  - role: service
    namespaces:
      names:
      - kuberhealthy
  relabel_configs:
  - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
    action: keep
    regex: true

With this configuration, the following metrics become available:

kuberhealthy_check
kuberhealthy_check_duration_seconds
kuberhealthy_cluster_states
kuberhealthy_running

6. Define Key Performance Indicators (KPIs)

Using the exposed metrics, teams can calculate several KPIs:

Availability : measured as Uptime / (Uptime + Downtime), where uptime and downtime are derived from the number of successful or failed deployment checks multiplied by the check run interval.

Utilization : reflects how many cluster resources (nodes, deployments, statefulsets, PVCs, services, pods, jobs) are in use, calculated from the total counts of those objects.

Latency (Duration) : average runtime of the deployment check, obtained via

avg(kuberhealthy_check_duration_seconds{check="kuberhealthy/deployment"})

.

Errors / Alerts : total number of failed Kuberhealthy checks, each generating an alert.

Example PromQL query for availability over the past 30 days:

1 - (sum(count_over_time(kuberhealthy_check{check="kuberhealthy/deployment",status="0"}[30d])) OR vector(0))
  / (sum(count_over_time(kuberhealthy_check{check="kuberhealthy/deployment",status="1"}[30d])) * 100)

These queries enable operators to monitor cluster health, capacity, and reliability directly from Prometheus dashboards.

Kuberhealthy diagram
Kuberhealthy diagram
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PrometheusKPIshelmSynthetic MonitoringKuberhealthy
Cloud Native Technology Community
Written by

Cloud Native Technology Community

The Cloud Native Technology Community, part of the CNBPA Cloud Native Technology Practice Alliance, focuses on evangelizing cutting‑edge cloud‑native technologies and practical implementations. It shares in‑depth content, case studies, and event/meetup information on containers, Kubernetes, DevOps, Service Mesh, and other cloud‑native tech, along with updates from the CNBPA alliance.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.