Designing an Effective UI for Monitoring Alerts: Insights from Huolala
This article shares Huolala's experience designing a unified monitoring platform UI, covering the evolution from open‑source dashboards to a fully self‑developed solution, simplification of PromQL, computed metrics, log and trace integration, and the challenges of alert configuration and visualization.
1. Introduction
Monitoring alarm systems are essential in modern IT environments; they help detect performance bottlenecks, service failures, and resource issues that can impact business continuity and user experience.
To meet reliability demands of digital transformation, many observability tools have emerged, such as Nagios, Zabbix, StatsD, Prometheus, Grafana, DataDog, Dynatrace, and New Relic. As scale grows, companies often integrate multiple subsystems (logs, traces, metrics) and build a one‑stop APM solution.
2. Evolution of Huolala Monitoring
2.1 Monitoring 1.0 – All Open‑Source UI
In 2018‑2019 Huolala used Zabbix for infrastructure monitoring, gradually migrated to Prometheus clusters for different business domains (infra, Java, PHP). Front‑end dashboards were built with Grafana, logs with ELK, and there was no full‑stack tracing.
2.2 Monitoring 2.0 – Self‑Developed + Open‑Source UI
Standardized Java SDK with bytecode enhancement for full‑stack tracing.
Adopted VictoriaMetrics cluster to ingest Prometheus metrics.
Developed LalaMonitor to provide application dashboards (excluding self‑service board configuration).
The front‑end UI was a Vue.js single‑page app where the back‑end generated chart queries and sent parsed data to the front‑end for rendering. Developers could click a small blue icon on each chart to create alert rules; the back‑end generated the query language (QL) and distributed it to Prometheus nodes.
After a year and a half, LalaMonitor delivered application monitoring, tracing, and alerting capabilities that stabilized Huolala’s services.
2.3 Monitoring 3.0 – Fully Self‑Developed UI
Provide Grafana‑level powerful yet simple dashboard self‑service configuration.
Offer leading‑edge alert experience within the Prom ecosystem.
Strengthen inline handling of different data types and deliver richer unified application monitoring.
The new UI is built on the open‑source etrace‑ui SPA, using Ant Design, React, ChartJS, and Webpack.
3. Dashboard Challenges
3.1 Simplified PromQL Configuration
Native Prometheus UI offers only a plain text box, requiring users to type metric names, then select label keys and values, which demands deep PromQL knowledge. Huolala first supported a near‑native PromQL mode and an Origin mode that structures metric, label, and function selection to lower the learning curve.
3.2 Computed Metrics
Because Prometheus stores single‑value series, Huolala introduced a SQL‑like configuration mechanism ( From, Select, Where, Group By, Limit) that automatically generates appropriate queries based on metric type, enabling users to build composite metrics with simple UI actions.
3.3 Application Dashboards
The application dashboard is a regular dashboard extended with an application variable. After selecting an app, developers can view metrics such as exceptions, HTTP, SOA, connection pools, and resource usage, with quick aggregation options (max, avg, p99, etc.).
4. Logs and Traces
4.1 ELK Logs
Huolala partitions ELK clusters by business type and integrates multiple log streams (business, access, cron, daemon) into Monitor. Using antlr to implement part of the Lucene syntax, the front‑end offers native query capabilities and fast dimension filters like Exception, ECS, and Pod.
4.2 ClickHouse Logs
Instead of full‑text search, ClickHouse stores standardized structured logs for analytical queries. Monitor leverages this to transform logs into metrics and display them on dashboards, addressing scenarios where Prometheus struggles (high cardinality, sub‑second resolution).
4.3 Traces
Logs automatically carry TraceID, enabling developers to jump from a metric spike to the corresponding trace view for root‑cause analysis.
5. Alerting Challenges
5.1 Metric‑Alert Correlation
Native Prometheus alerting lacks a dedicated subsystem and is simplistic. Grafana’s integrated alerting still faces pain points at medium‑to‑large scale, such as managing alerts per application, deriving alerts from metrics quickly, and previewing alert effects.
Huolala proposes two approaches: hide PromQL behind a UI that lets users select applications and metric types, or provide tenant‑aware, semantic‑rich metric definitions.
5.2 Trigger Conditions
Huolala introduces template expressions that can reference configured metrics and perform arithmetic, simplifying multi‑condition alerts (thresholds, ratios, etc.). These expressions can be nested for complex logic.
5.3 Alert Card Messages
Alert cards now support Feishu Markdown, native Prom label variables, a custom T{} function for template rendering, user mentions, and built‑in functions such as weather(${city_id}) to enrich notifications.
6. Future Plans
While this article focuses on UI design, upcoming posts will dive deeper into the underlying implementations of metrics, alerts, tracing, and logging.
Plain Text
Host: ${host}
Appid: ${hll_appid}
fstype: ${fstype}
mount-point: ${mountpoint}
InstanceName: ${InstanceName}
当前磁盘使用率: T{ ${A} }% > 90%
过去2h增长: T{ ${A} - ( ${A} offset 2h ) }%
过去6h增长: T{ ${A} - ( ${A} offset 6h ) }%
过去1天增长: T{ ${A} - ( ${A} offset 1d ) }%
过去3d增长: T{ ${A} - ( ${A} offset 3d ) }%
过去7天增长: T{ ${A} - ( ${A} offset 7d ) }%Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
