Design and Implementation of a General Business Monitoring and Alert Engine Using Prometheus and ClickHouse
This article describes how a company replaced its Zabbix‑based monitoring with a scalable, Prometheus‑driven alert engine that leverages ClickHouse for storage, remote‑storage integration via Prom2Click, and materialized views to provide flexible, SQL‑based business metric alerts.
Background : The existing Zabbix monitoring system could handle infrastructure health but struggled with fine‑grained business metrics such as per‑customer bandwidth, prompting the need for a more flexible solution.
Early attempts : A lightweight custom alert program addressed basic needs but became costly to extend as new data dimensions and complex rules required additional development effort.
Desired engine characteristics :
Direct, low‑cost synchronization with business data sources, preferably via SQL.
Unified query syntax with pluggable aggregation operators.
Support for complex rule composition using query conditions and aggregations.
Configurable notification channels.
Horizontal scalability for massive monitoring workloads.
Technical research : Business data resides in ClickHouse, which offers powerful aggregation but lacks stored procedures. The team evaluated Prometheus, an open‑source time‑series database with PromQL and an Alertmanager component, which satisfied most of the desired features except data ingestion.
Architecture : An exporter was initially built to pull data from ClickHouse into Prometheus, but this required code changes for each new metric. The team then adopted Prometheus’s Remote Storage feature, allowing ClickHouse to serve as the backend storage via the open‑source Prom2Click project.
System construction – ClickHouse configuration :
<!-- Settings for the ReplicatedGraphiteMergeTree engine. Adjust the retention and rollup entries as needed. -->
<graphite_rollup>
<path_column_name>tags</path_column_name>
<time_column_name>ts</time_column_name>
<value_column_name>val</value_column_name>
<version_column_name>updated</version_column_name>
<default>
<function>avg</function>
<retention>
<age>0</age>
<precision>10</precision>
</retention>
<retention>
<age>86400</age>
<precision>30</precision>
</retention>
<retention>
<age>172800</age>
<precision>300</precision>
</retention>
</default>
</graphite_rollup>Middleware deployment : The customized prom2click RPM was added to the internal YUM repository, installed with yum install prom2click, configured via /usr/local/prom2click/etc/config.yml, and started with systemctl start prom2click.
Prometheus deployment : Installed via yum install prometheus2 alertmanager. After the default start, the read_recent: true flag was added to /etc/prometheus/prometheus.yml to enable remote‑storage reads, followed by systemctl reload prometheus.
System operation : Operators now define monitoring rules directly in Prometheus rather than writing custom code, simplifying the workflow.
Data transformation view : ClickHouse materialized views (MV) are used to copy business tables into a Prometheus‑compatible schema. Example MV creates a metric cdn_customer_flow with dimensions customer, channel, view, serverType. The metric can be queried with PromQL, e.g.,
sum by (channel) (sum_over_time(cdn_customer_flow{serverType='0'}[5m])).
Alert rule configuration : A sample rule triggers when any customer's bandwidth exceeds 150 Mbps in the last 5 minutes and the change rate compared to the previous 5 minutes exceeds 10 %. Alertmanager routes such alerts to designated channels (e.g., a CDN alert email group).
Postscript : Future work aims to provide a UI that lets non‑technical users select dimensions, aggregation formulas, thresholds, and notification channels, generating the corresponding Prometheus rules automatically.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
