Databases 10 min read

Integrating TiDB Multi‑Cluster Monitoring with Prometheus, Consul, and VictoriaMetrics

This article presents a step‑by‑step solution for consolidating TiDB multi‑cluster monitoring by deploying Consul for service registration, configuring Prometheus to discover services via Consul, and optionally replacing Prometheus with VictoriaMetrics to achieve unified dashboards, scalable data collection, and easier health inspection across dozens or hundreds of instances.

360 Smart Cloud
360 Smart Cloud
360 Smart Cloud
Integrating TiDB Multi‑Cluster Monitoring with Prometheus, Consul, and VictoriaMetrics

TiDB clusters ship with a comprehensive monitoring stack that can become cumbersome due to numerous components and overly broad metrics; the guide explains how to streamline monitoring for single or many clusters using Prometheus, Consul, and optionally VictoriaMetrics.

1. Deploy Consul

Download and install Consul, then create /etc/consul.d/server.json with the following content:

{
  "datacenter": "bjyt",
  "data_dir": "/data1/consul",
  "log_level": "INFO",
  "node_name": "consul-server",
  "server": true,
  "bootstrap_expect": 1,
  "bind_addr": "xx.xx.xx.xx",
  "client_addr": "xx.xx.xx.xx",
  "ui":true,
  "retry_join": ["xx.xx.xx.xx"],
  "retry_interval": "10s",
  "enable_debug": false,
  "rejoin_after_leave": true,
  "start_join": ["xx.xx.xx.xx"],
  "enable_syslog": true,
  "syslog_facility": "local0"
}

Start Consul:

nohup consul agent -config-dir=/etc/consul.d > /data/consul/consul.log &

Access the Consul UI at http:// ip :8500/ (ACL can be added for security).

2. Register TiDB services in Consul

Use the Prometheus exporter API to obtain target information, then register each role (TiDB, TiKV, PD) with Consul via curl -X PUT -d '{...}' http:// ip :8500/v1/agent/service/register :

curl -X PUT -d '{"id":"tidb-exporter","name":"tidb","address":"xx.xx.xx.xx","port":10080,"tags":["tidb","shyc2","product","xx.xx.xx.xx","10080"],"checks":[{"http":"http://xx.xx.xx.xx:10080/metrics","interval":"5s"}]}' http://xx.xx.xx.xx:8500/v1/agent/service/register

curl -X PUT -d '{"id":"tikv-exporter","name":"tidb","address":"xx.xx.xx.xx","port":20180,"tags":["tidb","shyc2","product","xx.xx.xx.xx","20180"],"checks":[{"http":"http://xx.xx.xx.xx:20180/metrics","interval":"5s"}]}' http://xx.xx.xx.xx:8500/v1/agent/service/register

curl -X PUT -d '{"id":"pd-exporter","name":"tidb","address":"xx.xx.xx.xx","port":2379,"tags":["tidb","shyc2","product","xx.xx.xx.xx","2379"],"checks":[{"http":"http://xx.xx.xx.xx:2379/metrics","interval":"5s"}]}' http://xx.xx.xx.xx:8500/v1/agent/service/register

Verify registration via the Consul UI.

3. Configure Prometheus to discover services through Consul

Append the following job to prometheus.yml :

- job_name: 'tidb'
    consul_sd_configs:
      - server: 'xx.xx.xx.xx:8500'
        services: ['tidb']
    relabel_configs:
    - source_labels: ['__meta_consul_tags']
      regex: ',(.*),(.*),(.*),(.*),(.*),'
      action: replace
      target_label: 'instance'
      replacement: '${1}_${4}_${5}'
    - source_labels: ['__meta_consul_tags']
      regex: ',(.*),(.*),(.*),(.*),(.*),'
      action: replace
      target_label: 'dc'
      replacement: '${2}'
    - source_labels: ['__meta_consul_tags']
      regex: ',(.*),(.*),(.*),(.*),(.*),'
      action: replace
      target_label: 'env'
      replacement: '${3}'
    - source_labels: ['__meta_consul_tags']
      regex: ',(.*),(.*),(.*),(.*),(.*),'
      action: replace
      target_label: 'service'
      replacement: '${1}'
    - source_labels: ['__meta_consul_service_address']
      regex: "(.*)"
      action: replace
      target_label: 'ip'
      replacement: '${1}'
    - source_labels: ['__meta_consul_tags']
      regex: ',(.*),(.*),(.*),(.*),(.*),'
      action: replace
      target_label: 'port'
      replacement: '${5}'

Grafana can then use the generated labels to build dashboards; the TiDB performance overview JSON (v6.1.0) can be imported for ready‑made panels.

4. Optional: Replace Prometheus with VictoriaMetrics for large‑scale clusters

When the number of instances reaches tens of thousands, Consul registration may time out; VictoriaMetrics offers lower memory usage and higher performance. Install VictoriaMetrics, create a systemd service, and point Prometheus to read from it:

wget https://github.com/VictoriaMetrics/VictoriaMetrics/releases/download/v1.65.0/victoria-metrics-amd64-v1.65.0.tar.gz
mkdir victoria-metrics && tar -xvzf victoria-metrics-amd64-v1.65.0.tar.gz && \
mv victoria-metrics-prod victoria-metrics/victoria-metrics

cat /etc/systemd/system/victoria-metrics-prod.service
[Unit]
Description=For Victoria-metrics-prod Service
After=network.target

[Service]
ExecStart=/usr/local/bin/victoria-metrics-prod -promscrape.config=/data1/tidb/deploy/conf/prometheus.yml -httpListenAddr=0.0.0.0:8428 -promscrape.config.strictParse=false -storageDataPath=/data1/victoria -retentionPeriod=3

[Install]
WantedBy=multi-user.target

systemctl restart victoria-metrics-prod.service

Update Grafana data source to point to VictoriaMetrics (port 8428) and adjust dashboards accordingly.

Conclusion

The article presents two practical solutions for aggregating TiDB multi‑cluster monitoring: a Consul‑based service registration combined with Prometheus, and an alternative high‑performance stack using VictoriaMetrics for very large deployments, enabling unified visual inspection of key metrics across all clusters.

MonitoringPrometheusTiDBConsulVictoriaMetricsGrafana
360 Smart Cloud
Written by

360 Smart Cloud

Official service account of 360 Smart Cloud, dedicated to building a high-quality, secure, highly available, convenient, and stable one‑stop cloud service platform.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.