Operations 16 min read

Designing an Effective UI for Monitoring Alerts: Insights from Huolala

This article shares Huolala's experience designing a unified monitoring platform UI, covering the evolution from open‑source dashboards to a fully self‑developed solution, simplification of PromQL, computed metrics, log and trace integration, and the challenges of alert configuration and visualization.

Huolala Tech

Sep 14, 2023

Designing an Effective UI for Monitoring Alerts: Insights from Huolala

1. Introduction

Monitoring alarm systems are essential in modern IT environments; they help detect performance bottlenecks, service failures, and resource issues that can impact business continuity and user experience.

To meet reliability demands of digital transformation, many observability tools have emerged, such as Nagios, Zabbix, StatsD, Prometheus, Grafana, DataDog, Dynatrace, and New Relic. As scale grows, companies often integrate multiple subsystems (logs, traces, metrics) and build a one‑stop APM solution.

2. Evolution of Huolala Monitoring

2.1 Monitoring 1.0 – All Open‑Source UI

In 2018‑2019 Huolala used Zabbix for infrastructure monitoring, gradually migrated to Prometheus clusters for different business domains (infra, Java, PHP). Front‑end dashboards were built with Grafana, logs with ELK, and there was no full‑stack tracing.

2.2 Monitoring 2.0 – Self‑Developed + Open‑Source UI

Standardized Java SDK with bytecode enhancement for full‑stack tracing.

Adopted VictoriaMetrics cluster to ingest Prometheus metrics.

Developed LalaMonitor to provide application dashboards (excluding self‑service board configuration).

The front‑end UI was a Vue.js single‑page app where the back‑end generated chart queries and sent parsed data to the front‑end for rendering. Developers could click a small blue icon on each chart to create alert rules; the back‑end generated the query language (QL) and distributed it to Prometheus nodes.

After a year and a half, LalaMonitor delivered application monitoring, tracing, and alerting capabilities that stabilized Huolala’s services.

2.3 Monitoring 3.0 – Fully Self‑Developed UI

Provide Grafana‑level powerful yet simple dashboard self‑service configuration.

Offer leading‑edge alert experience within the Prom ecosystem.

Strengthen inline handling of different data types and deliver richer unified application monitoring.

The new UI is built on the open‑source etrace‑ui SPA, using Ant Design, React, ChartJS, and Webpack.

3. Dashboard Challenges

3.1 Simplified PromQL Configuration

Native Prometheus UI offers only a plain text box, requiring users to type metric names, then select label keys and values, which demands deep PromQL knowledge. Huolala first supported a near‑native PromQL mode and an Origin mode that structures metric, label, and function selection to lower the learning curve.

3.2 Computed Metrics

Because Prometheus stores single‑value series, Huolala introduced a SQL‑like configuration mechanism ( From, Select, Where, Group By, Limit) that automatically generates appropriate queries based on metric type, enabling users to build composite metrics with simple UI actions.

3.3 Application Dashboards

The application dashboard is a regular dashboard extended with an application variable. After selecting an app, developers can view metrics such as exceptions, HTTP, SOA, connection pools, and resource usage, with quick aggregation options (max, avg, p99, etc.).

4. Logs and Traces

4.1 ELK Logs

Huolala partitions ELK clusters by business type and integrates multiple log streams (business, access, cron, daemon) into Monitor. Using antlr to implement part of the Lucene syntax, the front‑end offers native query capabilities and fast dimension filters like Exception, ECS, and Pod.

4.2 ClickHouse Logs

Instead of full‑text search, ClickHouse stores standardized structured logs for analytical queries. Monitor leverages this to transform logs into metrics and display them on dashboards, addressing scenarios where Prometheus struggles (high cardinality, sub‑second resolution).

4.3 Traces

Logs automatically carry TraceID, enabling developers to jump from a metric spike to the corresponding trace view for root‑cause analysis.

5. Alerting Challenges

5.1 Metric‑Alert Correlation

Native Prometheus alerting lacks a dedicated subsystem and is simplistic. Grafana’s integrated alerting still faces pain points at medium‑to‑large scale, such as managing alerts per application, deriving alerts from metrics quickly, and previewing alert effects.

Huolala proposes two approaches: hide PromQL behind a UI that lets users select applications and metric types, or provide tenant‑aware, semantic‑rich metric definitions.

5.2 Trigger Conditions

Huolala introduces template expressions that can reference configured metrics and perform arithmetic, simplifying multi‑condition alerts (thresholds, ratios, etc.). These expressions can be nested for complex logic.

5.3 Alert Card Messages

Alert cards now support Feishu Markdown, native Prom label variables, a custom T{} function for template rendering, user mentions, and built‑in functions such as weather(${city_id}) to enrich notifications.

6. Future Plans

While this article focuses on UI design, upcoming posts will dive deeper into the underlying implementations of metrics, alerts, tracing, and logging.

Plain Text
Host: ${host}
Appid: ${hll_appid}
fstype: ${fstype}
mount-point: ${mountpoint}
InstanceName: ${InstanceName}
当前磁盘使用率: T{ ${A} }% > 90%
过去2h增长: T{ ${A} - ( ${A} offset 2h ) }%
过去6h增长: T{ ${A} - ( ${A} offset 6h ) }%
过去1天增长: T{ ${A} - ( ${A} offset 1d ) }%
过去3d增长: T{ ${A} - ( ${A} offset 3d ) }%
过去7天增长: T{ ${A} - ( ${A} offset 7d ) }%

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

monitoring Operations Observability Alerting Prometheus UI design

Written by

Huolala Tech

Technology reshapes logistics

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.