Integrating TiDB Multi‑Cluster Monitoring with Prometheus, Consul, and VictoriaMetrics
This article presents a step‑by‑step solution for consolidating TiDB multi‑cluster monitoring by deploying Consul for service registration, configuring Prometheus to discover services via Consul, and optionally replacing Prometheus with VictoriaMetrics to achieve unified dashboards, scalable data collection, and easier health inspection across dozens or hundreds of instances.
TiDB clusters ship with a comprehensive monitoring stack that can become cumbersome due to numerous components and overly broad metrics; the guide explains how to streamline monitoring for single or many clusters using Prometheus, Consul, and optionally VictoriaMetrics.
1. Deploy Consul
Download and install Consul, then create /etc/consul.d/server.json with the following content:
{
"datacenter": "bjyt",
"data_dir": "/data1/consul",
"log_level": "INFO",
"node_name": "consul-server",
"server": true,
"bootstrap_expect": 1,
"bind_addr": "xx.xx.xx.xx",
"client_addr": "xx.xx.xx.xx",
"ui":true,
"retry_join": ["xx.xx.xx.xx"],
"retry_interval": "10s",
"enable_debug": false,
"rejoin_after_leave": true,
"start_join": ["xx.xx.xx.xx"],
"enable_syslog": true,
"syslog_facility": "local0"
}Start Consul:
nohup consul agent -config-dir=/etc/consul.d > /data/consul/consul.log &Access the Consul UI at http:// ip :8500/ (ACL can be added for security).
2. Register TiDB services in Consul
Use the Prometheus exporter API to obtain target information, then register each role (TiDB, TiKV, PD) with Consul via curl -X PUT -d '{...}' http:// ip :8500/v1/agent/service/register :
curl -X PUT -d '{"id":"tidb-exporter","name":"tidb","address":"xx.xx.xx.xx","port":10080,"tags":["tidb","shyc2","product","xx.xx.xx.xx","10080"],"checks":[{"http":"http://xx.xx.xx.xx:10080/metrics","interval":"5s"}]}' http://xx.xx.xx.xx:8500/v1/agent/service/register
curl -X PUT -d '{"id":"tikv-exporter","name":"tidb","address":"xx.xx.xx.xx","port":20180,"tags":["tidb","shyc2","product","xx.xx.xx.xx","20180"],"checks":[{"http":"http://xx.xx.xx.xx:20180/metrics","interval":"5s"}]}' http://xx.xx.xx.xx:8500/v1/agent/service/register
curl -X PUT -d '{"id":"pd-exporter","name":"tidb","address":"xx.xx.xx.xx","port":2379,"tags":["tidb","shyc2","product","xx.xx.xx.xx","2379"],"checks":[{"http":"http://xx.xx.xx.xx:2379/metrics","interval":"5s"}]}' http://xx.xx.xx.xx:8500/v1/agent/service/registerVerify registration via the Consul UI.
3. Configure Prometheus to discover services through Consul
Append the following job to prometheus.yml :
- job_name: 'tidb'
consul_sd_configs:
- server: 'xx.xx.xx.xx:8500'
services: ['tidb']
relabel_configs:
- source_labels: ['__meta_consul_tags']
regex: ',(.*),(.*),(.*),(.*),(.*),'
action: replace
target_label: 'instance'
replacement: '${1}_${4}_${5}'
- source_labels: ['__meta_consul_tags']
regex: ',(.*),(.*),(.*),(.*),(.*),'
action: replace
target_label: 'dc'
replacement: '${2}'
- source_labels: ['__meta_consul_tags']
regex: ',(.*),(.*),(.*),(.*),(.*),'
action: replace
target_label: 'env'
replacement: '${3}'
- source_labels: ['__meta_consul_tags']
regex: ',(.*),(.*),(.*),(.*),(.*),'
action: replace
target_label: 'service'
replacement: '${1}'
- source_labels: ['__meta_consul_service_address']
regex: "(.*)"
action: replace
target_label: 'ip'
replacement: '${1}'
- source_labels: ['__meta_consul_tags']
regex: ',(.*),(.*),(.*),(.*),(.*),'
action: replace
target_label: 'port'
replacement: '${5}'Grafana can then use the generated labels to build dashboards; the TiDB performance overview JSON (v6.1.0) can be imported for ready‑made panels.
4. Optional: Replace Prometheus with VictoriaMetrics for large‑scale clusters
When the number of instances reaches tens of thousands, Consul registration may time out; VictoriaMetrics offers lower memory usage and higher performance. Install VictoriaMetrics, create a systemd service, and point Prometheus to read from it:
wget https://github.com/VictoriaMetrics/VictoriaMetrics/releases/download/v1.65.0/victoria-metrics-amd64-v1.65.0.tar.gz
mkdir victoria-metrics && tar -xvzf victoria-metrics-amd64-v1.65.0.tar.gz && \
mv victoria-metrics-prod victoria-metrics/victoria-metrics
cat /etc/systemd/system/victoria-metrics-prod.service
[Unit]
Description=For Victoria-metrics-prod Service
After=network.target
[Service]
ExecStart=/usr/local/bin/victoria-metrics-prod -promscrape.config=/data1/tidb/deploy/conf/prometheus.yml -httpListenAddr=0.0.0.0:8428 -promscrape.config.strictParse=false -storageDataPath=/data1/victoria -retentionPeriod=3
[Install]
WantedBy=multi-user.target
systemctl restart victoria-metrics-prod.serviceUpdate Grafana data source to point to VictoriaMetrics (port 8428) and adjust dashboards accordingly.
Conclusion
The article presents two practical solutions for aggregating TiDB multi‑cluster monitoring: a Consul‑based service registration combined with Prometheus, and an alternative high‑performance stack using VictoriaMetrics for very large deployments, enabling unified visual inspection of key metrics across all clusters.
360 Smart Cloud
Official service account of 360 Smart Cloud, dedicated to building a high-quality, secure, highly available, convenient, and stable one‑stop cloud service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.