Deploying and Managing a VictoriaMetrics Cluster on Kubernetes for Scalable Monitoring
This guide explains the architecture of VictoriaMetrics, details a step‑by‑step Helm deployment on a Kubernetes cluster, covers data collection from multiple clusters, storage persistence, Grafana dashboard setup, and alerting configuration using vmalert and webhook integration.
VictoriaMetrics is a fast, cost‑effective, and scalable time‑series database that can serve as long‑term storage for Prometheus and act as a data source for Grafana, making it suitable for unified monitoring across multiple Kubernetes clusters.
Architecture Overview – The system consists of three core components: vmselect (query and aggregation), vmstorage (stateful data storage), and vminsert (data ingestion). Each component can be horizontally scaled, with vmstorage being the only stateful service.
Cluster Deployment – Using Helm, the VictoriaMetrics cluster is installed on a Kubernetes 1.20 cluster (later versions require >=1.23). The following commands add the Helm repo, update it, and install the chart with custom node selector, retention period, and storage class:
# Add Helm repo
helm repo add vm https://victoriametrics.github.io/helm-charts/
# Update repo
helm repo update
# Search for latest version
helm search repo vm/victoria-metrics-cluster -l
# Install with custom settings
helm install vmcluster /root/.cache/helm/repository/victoria-metrics-cluster-0.10.5.tgz \
--version 0.10.5 -n vm \
--set vmstorage.nodeSelector."directpv\/disk-type"=ssd-960 \
--set vmstorage.retentionPeriod=10d \
--set vmstorage.persistentVolume.storageClass=directpv-ssd-960After deployment, the vmstorage pods (stateful) and the stateless vminsert and vmselect pods should be in the Running state.
Data Collection – Multiple clusters write metrics to the central VictoriaMetrics instance via Prometheus remoteWrite . Example configurations add an externalLabels entry to identify the source cluster and point the remote write URL to the vminsert service:
externalLabels:
cluster: iaas-test
remoteWrite:
- url: http://vmcluster-victoria-metrics-cluster-vminsert.vm.svc.cluster.local:8480/insert/0/prometheus/Both in‑cluster (service) and out‑of‑cluster (NodePort or domain) write methods are supported.
Storage Persistence – The deployment uses the directpv local‑disk storage class, specified via --set vmstorage.persistentVolume.storageClass=directpv-ssd-960 , ensuring durable storage for long‑term metric retention.
Monitoring and Alerting – Grafana is configured with a VictoriaMetrics data source to visualize metrics. Alerting is handled by the vmalert component, installed via Helm:
helm install vmalert vm/victoria-metrics-alert \
--version 0.7.4 \
--set server.configMap="vmalert-alert-rules-config" \
--set server.datasource.url="http://vmcluster-victoria-metrics-cluster-vmselect.vm.svc.cluster.local:8481/select/0/prometheus/" \
--set alertmanager.enabled=true -n vmA sample alert rule detects missing kube-controller-manager instances across clusters:
apiVersion: v1
data:
alert-rules.yaml: |
groups:
- name: iass-custom.rules
rules:
- alert: KubeControllerManagerMiss
annotations:
message: KubeControllerManager has disappeared from Prometheus target discovery.
runbook_url: https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubecontrollermanagerdown
expr: sum(up{job="kube-controller-manager"}) by (cluster) == 3
for: 1m
labels:
severity: critical
kind: ConfigMap
metadata:
name: vmalert-alert-rules-config
namespace: vmAlert notifications are forwarded to a DingTalk webhook via an alertmanager ConfigMap, enabling centralized alert handling for multiple clusters.
Conclusion – VictoriaMetrics provides a highly scalable, cloud‑native solution for collecting, storing, and visualizing metrics from many Kubernetes clusters, with efficient compression, simple Helm‑based deployment, and robust alerting integration, making it well‑suited for long‑term monitoring needs.
TAL Education Technology
TAL Education is a technology-driven education company committed to the mission of 'making education better through love and technology'. The TAL technology team has always been dedicated to educational technology research and innovation. This is the external platform of the TAL technology team, sharing weekly curated technical articles and recruitment information.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.