Operations 7 min read

Monitoring TiDB with Zabbix: Using HTTP Agent, Preprocessing, and Triggers

This guide explains how to collect TiDB metrics via its HTTP monitoring API, preprocess the data into JSON, create master and regular items in Zabbix, and configure triggers using Prometheus‑style expressions to achieve effective TiDB monitoring.

Aikesheng Open Source Community
Aikesheng Open Source Community
Aikesheng Open Source Community
Monitoring TiDB with Zabbix: Using HTTP Agent, Preprocessing, and Triggers

If you want to monitor TiDB with Zabbix, you need to use the HTTP agent to call TiDB's monitoring API and then preprocess the returned data. The required functionality (Prometheus pattern or Prometheus to JSON) was added in Zabbix 4.2, so the example uses Zabbix 5.0.5.

TiDB Monitoring API

Before starting, read the TiDB monitoring API documentation (https://docs.pingcap.com/zh/tidb/v5.1/tidb-monitoring-api).

Example request:

curl http://127.0.0.1:10080/metrics > /tmp/tidb_metics

One of the alert rules from the TiDB docs is:

increase(tidb_session_schema_lease_error_total{type="outdated"}[15m]) > 0

The metric name tidb_session_schema_lease_error_total can be found in the exported metrics file; its format includes a HELP line, a TYPE line, and the metric value.

Creating Items

An item is a monitoring metric. First create a master item that calls the TiDB /metrics endpoint and retrieves all metrics as plain text.

Then create regular items that extract a single metric from the master item using JSONPath preprocessing. Example JSONPath expression:

$[?(@.name=="tidb_session_schema_lease_error_total" && @.labels.type == "outdated")].value.first()

Because the metric type is Counter , set the item type to “Change per second” to get the per‑second growth; for Gauge metrics this step is unnecessary.

Creating Triggers

A trigger defines when an item’s value should raise an alarm. Using the TiDB alert rule syntax, the trigger expression becomes:

{TiDB by HTTP:tidb.session_schema_lease_error.outdate.rate.max(15m)}>0

This fires when the maximum per‑second increase of the metric over a 15‑minute window exceeds zero, indicating an error.

Appendix – JSONPath Examples

Sample data:

[{"name":"tidb_server_handle_query_duration_seconds_sum","value":"100","labels":{"sql_type":"Begin"}},{"name":"tidb_server_handle_query_duration_seconds_sum","value":"50","labels":{"sql_type":"Commit"}}]

JSONPath to sum all values:

$[?(@.name=="tidb_server_handle_query_duration_seconds_sum")].value.sum()

JSONPath to get the first value for a specific label:

$[?(@.name=="tidb_server_handle_query_duration_seconds_sum" && @.labels.sql_type=="Commit")].value.first()

JSONPath to sum values for multiple labels:

$[?(@.name=="tidb_server_handle_query_duration_seconds_sum" && @.labels.type =~ "Begin|Commit")].value.sum()

Additional Tips

Memory usage trigger example:

{TiDB by HTTP:tidb.heap_bytes.min(5m)}>{$TIDB.HEAP.USAGE.MAX.WARN}

99th percentile response time can be calculated in Prometheus with histogram_quantile(0.99, sum(rate(tidb_server_handle_query_duration_seconds_bucket[1m])) BY (le, instance)) > 1 , but Zabbix cannot process histograms directly, so you may compute average response time using calculated items.

MonitoringMetricsAlertingPrometheusTiDBJsonPathZabbix
Aikesheng Open Source Community
Written by

Aikesheng Open Source Community

The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.