Unified Multi‑Cluster Monitoring with KubeDoor 1.0: Alerts, Metrics & Best Practices
KubeDoor 1.0 introduces a new architecture for unified multi‑Kubernetes monitoring, offering components for master and agent, flexible deployment options, Helm‑based installation, configurable storage and alerting settings, and detailed guidance on integrating with existing Prometheus/VictoriaMetrics setups while providing automatic peak‑usage data collection.
KubeDoor 1.0 Release Overview
KubeDoor 1.0 provides a brand‑new architecture that supports multiple Kubernetes clusters, unified monitoring, remote‑write aggregation, centralized alert rules, and best‑practice visualizations.
Component Overview
Master side (installed in the kubedoor namespace)
kubedoor-master: connects to agents and exposes API interfaces.
kubedoor-web: front‑end UI integrated with Nginx.
kubedoor-dash: Grafana dashboard.
kubedoor-alarm: receives alerts from Alertmanager, handles notifications and storage.
kubedoor-collect: scheduled jobs that pull high‑peak resource data via API.
Infrastructure (default deployed together with master)
alertmanager: alert routing service.
vmalert: evaluates alert rules, forwards triggered alerts to Alertmanager.
VictoriaMetrics: time‑series database.
ClickHouse: column‑ariented database.
Agent side (installed in the kubedoor namespace)
kubedoor-agent: connects to master, calls Kubernetes API.
vmagent: replaces Prometheus for metric collection.
KubeStateMetrics: gathers Kubernetes metrics.
NodeExporter: collects host‑level metrics.
Deployment Options
Fresh deployment (no existing monitoring system) : ideal when the target cluster lacks any monitoring stack.
Independent ClickHouse & VictoriaMetrics deployment : use Docker Compose to run the databases on the host.
Integration with existing multi‑K8s monitoring : configure Helm values to point to your existing Prometheus/vmagent and remote‑write endpoints.
Master‑only deployment : when a complete monitoring system already exists and you only need KubeDoor’s UI and alert aggregation.
Key Configuration Variables (master values-master.yaml )
storageClass: specify the storage class for ClickHouse and VictoriaMetrics persistent volumes.
CK_PASSWORD: password for ClickHouse
defaultuser (default can be kept).
external_labels_key: label key added to every metric for cluster identification; must match the agent side.
MSG_TYPE/MSG_TOKEN: IM notification type and bot token for alert notifications.
vm_single: credentials and retention settings for VictoriaMetrics single‑node mode.
nginx_auth: basic‑auth credentials for the web UI and agent‑master communication.
Installation Commands
<code># Download Helm chart
wget https://StarsL.cn/kubedoor/kubedoor-1.1.0.tgz
tar -zxvf kubedoor-1.1.0.tgz
cd kubedoor</code> <code># Master dry‑run
helm install kubedoor . --namespace kubedoor --create-namespace \
--values values-master.yaml --dry-run --debug
# Master install
helm install kubedoor . --namespace kubedoor --create-namespace \
--values values-master.yaml</code> <code># Agent dry‑run
helm install kubedoor-agent . --namespace kubedoor --create-namespace \
--values values-agent.yaml --dry-run --debug
# Agent install
helm install kubedoor-agent . --namespace kubedoor --create-namespace \
--values values-agent.yaml</code>Accessing the Web UI
Open
http://<NodeIP>:<NodePort>in a browser. The default username and password are both
kubedoor.
Alert Flow
Metrics are stored in VictoriaMetrics.
vmalertreads the data, evaluates rules, and sends triggered alerts to
alertmanager. Alertmanager routes them to
kubedoor-alarm, which records the alerts and sends notifications. KubeDoor installs its own Alertmanager instance in the
kubedoornamespace, avoiding conflicts with any existing Alertmanager.
Agent Configuration Details (agent values-agent.yaml )
ws: WebSocket endpoint for master‑agent communication, e.g.,
ws://kubedoor-master.kubedoor. Use a reachable external address for cross‑cluster setups.
MSG_TYPE/MSG_TOKEN: bot type and token for sending operation notifications.
OSS_URL: OSS endpoint where Java services upload dump/JFR/JStack data.
external_labels_key&
external_labels_value: must match the labels configured in your Prometheus/vmagent for multi‑cluster identification.
remoteWriteUrl: full URL for vmagent remote‑write to VictoriaMetrics, e.g.,
http://monit:[email protected]:8428/api/v1/write.
monit.enableand related
enableflags: set to
trueonly if you need vmagent; otherwise keep them
falseto avoid duplication with existing Prometheus instances.
Data Collection
After enabling automatic collection, KubeDoor gathers the previous day’s peak usage at 01:00 daily and writes the top‑consumption day of the last ten days into the control table. Manual collection can be triggered from the UI; repeated runs do not duplicate data.
If the system is newly installed and the current day’s peak period has already passed, data for that day cannot be collected until the next day’s peak window.
Prometheus/vmagent Job Requirements
Ensure the following metrics are exposed for proper analysis:
container_cpu_usage_seconds_total
container_memory_working_set_bytes
container_spec_cpu_quota
kube_pod_container_info
kube_pod_container_resource_limits
kube_pod_container_resource_requests
Repository
GitHub: https://github.com/CassInfra/KubeDoor/tree/main
Ops Development Stories
Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.