Monitoring TiDB with Zabbix: Using HTTP Agent, Preprocessing, and Triggers
This guide explains how to collect TiDB metrics via its HTTP monitoring API, preprocess the data into JSON, create master and regular items in Zabbix, and configure triggers using Prometheus‑style expressions to achieve effective TiDB monitoring.
If you want to monitor TiDB with Zabbix, you need to use the HTTP agent to call TiDB's monitoring API and then preprocess the returned data. The required functionality (Prometheus pattern or Prometheus to JSON) was added in Zabbix 4.2, so the example uses Zabbix 5.0.5.
TiDB Monitoring API
Before starting, read the TiDB monitoring API documentation (https://docs.pingcap.com/zh/tidb/v5.1/tidb-monitoring-api).
Example request:
curl http://127.0.0.1:10080/metrics > /tmp/tidb_metics
One of the alert rules from the TiDB docs is:
increase(tidb_session_schema_lease_error_total{type="outdated"}[15m]) > 0
The metric name tidb_session_schema_lease_error_total can be found in the exported metrics file; its format includes a HELP line, a TYPE line, and the metric value.
Creating Items
An item is a monitoring metric. First create a master item that calls the TiDB /metrics endpoint and retrieves all metrics as plain text.
Then create regular items that extract a single metric from the master item using JSONPath preprocessing. Example JSONPath expression:
$[?(@.name=="tidb_session_schema_lease_error_total" && @.labels.type == "outdated")].value.first()
Because the metric type is Counter , set the item type to “Change per second” to get the per‑second growth; for Gauge metrics this step is unnecessary.
Creating Triggers
A trigger defines when an item’s value should raise an alarm. Using the TiDB alert rule syntax, the trigger expression becomes:
{TiDB by HTTP:tidb.session_schema_lease_error.outdate.rate.max(15m)}>0
This fires when the maximum per‑second increase of the metric over a 15‑minute window exceeds zero, indicating an error.
Appendix – JSONPath Examples
Sample data:
[{"name":"tidb_server_handle_query_duration_seconds_sum","value":"100","labels":{"sql_type":"Begin"}},{"name":"tidb_server_handle_query_duration_seconds_sum","value":"50","labels":{"sql_type":"Commit"}}]JSONPath to sum all values:
$[?(@.name=="tidb_server_handle_query_duration_seconds_sum")].value.sum()
JSONPath to get the first value for a specific label:
$[?(@.name=="tidb_server_handle_query_duration_seconds_sum" && @.labels.sql_type=="Commit")].value.first()
JSONPath to sum values for multiple labels:
$[?(@.name=="tidb_server_handle_query_duration_seconds_sum" && @.labels.type =~ "Begin|Commit")].value.sum()
Additional Tips
Memory usage trigger example:
{TiDB by HTTP:tidb.heap_bytes.min(5m)}>{$TIDB.HEAP.USAGE.MAX.WARN}
99th percentile response time can be calculated in Prometheus with histogram_quantile(0.99, sum(rate(tidb_server_handle_query_duration_seconds_bucket[1m])) BY (le, instance)) > 1 , but Zabbix cannot process histograms directly, so you may compute average response time using calculated items.
Aikesheng Open Source Community
The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.