How to Monitor ClickHouse with Alibaba Cloud Prometheus: Metrics, Dashboards, and Alerts
This guide explains how to set up Alibaba Cloud Observability Prometheus edition to monitor ClickHouse, covering ClickHouse fundamentals, metric collection, dashboard templates, alert rules, troubleshooting steps, and deployment options for both ACK and ECS environments.
ClickHouse is a columnar DBMS designed for OLAP workloads, offering high compression and fast query performance. It supports full DBMS features such as DDL/DML, fine‑grained permissions, backup/recovery, and distributed cluster management, making it suitable for large‑scale analytical queries.
Core Concepts
Cluster : A set of ClickHouse server instances, each containing one or more shards and replicas.
Shard : A data partition stored on a single server to scale storage and compute.
Replica : Redundant copies of data for high availability.
Database / Table / View : Logical objects within a cluster.
Monitoring Model
The monitoring model consists of metric collection, dashboard templates, and alert rules.
Metric Collection
Node‑Exporter gathers host resources (CPU, memory, disk, inode).
ClickHouse‑Exporter (compatible with Prometheus) scrapes system.metrics, system.events, and system.asynchronous_metrics tables.
Query logs ( system.query_log) can be enabled via config.xml.
Dashboard Templates
Two built‑in Grafana dashboards are provided:
arms-clickhouse-ecs for ECS deployments.
arms-clickhouse-k8s for ACK (Kubernetes) clusters.
They visualize host metrics, ClickHouse server metrics, MergeTree metrics, and message‑queue metrics.
Alert Rules
Pre‑defined thresholds (adjustable) include:
CPU, memory, disk, inode usage >90%.
Write failure rate >5%.
Running query count >95.
Connection count >4 000.
Failed query count >10.
Installation
Kubernetes (ACK) Deployment
In the Prometheus for Container Service console, select the ClickHouse integration, configure exporter name, scrape URL, credentials, and interval (default 30 s). The exporter is deployed to the arms-prom namespace and a scrape job is created automatically.
ECS Deployment
Install Node‑Exporter first, then add ClickHouse‑Exporter via the same integration steps. Configuration parameters:
datasource : Instance name.
job : Name of the ClickHouse‑Exporter job.
namespace (K8s) or instance (ECS): Target pod or host for node metrics.
pod : Optional specific ClickHouse pod.
Troubleshooting
Check Prometheus target status; if Unhealthy, verify the exporter pod.
If only go_* metrics appear, inspect exporter logs for errors.
Confirm scrape URL and credentials are correct.
Practical Query Examples
CPU Spike
Identify heavy queries:
SHOW PROCESSLIST query WHERE query NOT LIKE '%SYSTEM%' ORDER BY elapsed DESC LIMIT 10Review ClickHouse configuration, logs, and hardware resources.
High Memory
Query memory metrics:
SELECT * FROM system.metrics WHERE metric LIKE '%memory%';Check configuration, system memory usage, and query optimization.
Disk Usage
Check disk space: df -h Inspect table sizes:
SELECT database, table, sum(bytes) AS total_size FROM system.parts WHERE active GROUP BY database, table ORDER BY total_size DESC;Clean unnecessary data if needed.
Comparison with Self‑Hosted Prometheus
Self‑hosted stacks require separate installation and maintenance of Prometheus, Grafana, and Alertmanager across VPCs, increasing operational overhead. Alibaba Cloud Observability Prometheus edition provides one‑click integration, pre‑built dashboards, and managed alert groups, reducing deployment complexity.
References
https://github.com/ClickHouse/clickhouse_exporter https://grafana.com/grafana/dashboards/882-clickhouse/Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
