Design and Implementation of a Front-End Observability System for Business Monitoring
The article describes a unified front‑end observability platform that standardizes data‑point collection via a common SDK, automatically generates health and business dashboards, integrates real‑time monitoring and heatmaps, and has been adopted on 140 pages, delivering faster first‑screen loads, lower error and bounce rates, and higher conversion.
Background
As front‑end business grows, timely detection and resolution of issues, optimization of user experience, and real‑time monitoring of business health become increasingly important. The goal is to monitor core functions after each release, ensure core interfaces work correctly, and track page bounce rates. Technically, the aim is to monitor first‑paint times and prevent errors caused by releases, configuration changes, activity endings, or inventory shortages. Additionally, the system should help analyze user actions (clicks, interactions, navigation) to identify opportunities for improving core metrics.
The observability system provides a closed‑loop chain: data‑point reporting, automatic dashboard construction, and monitoring.
Problem diagnosis and fault repair : Real‑time monitoring of error counts and locations enables rapid issue localization and fixing.
Performance monitoring : Tracks first‑screen load time, core API latency, and custom lifecycle timings to identify bottlenecks and improve conversion.
Business decision support : Analyzes clicks on core features to evaluate the impact of product iterations.
User behavior analysis : Visualizes click, interaction, and navigation patterns to inform product decisions.
Design and Implementation
Current Pain Points
Inconsistent data‑point scripts (3‑4 variants) leading to fragmented reporting.
Data is reported but rarely reviewed; issues rely on customer feedback.
Poor visualization; teams write custom SQL or rely on third‑party tools, lacking cross‑project comparison.
Weak monitoring; difficult to check project status at any time.
Solutions
2.2.1 Unified data‑point SDK
A common SDK based on live‑web‑track standardizes reporting scripts and data tables, offering features such as non‑interaction exit reporting, page‑stay duration, custom events, and exposure click tracking. It also includes reusable components:
Image component: automatic compression, WebP conversion, lazy loading, error reporting.
HTTP request component: tags core APIs, reports request counts, success/failure, latency, and forwards error data to the monitoring platform.
2.2.2 Automatic core dashboard & health dashboard creation
The SDK automatically generates dashboards for core metrics and health indicators, eliminating manual effort.
2.2.3 Unified visualization dashboards
Current approaches (custom SQL on Guanyuan platform or Polaris) suffer from high learning curves, latency, and limited chart options. A unified dashboard addresses these issues, providing real‑time, low‑threshold visual analytics.
2.2.4 Strengthened monitoring via Dejavu platform
Health and visualization dashboards are integrated with Dejavu for pre‑ and post‑release health checks.
Implementation Results
3.1 System Unification
Unified data‑point reporting method.
Unified reporting dataset.
Unified dashboards.
3.2 Wide Adoption
140 pages have adopted the SDK, delivering:
Standardized data‑point collection and dashboard monitoring.
Page health monitoring on 81 pages (average health score 87, 56% above 80).
Visualization dashboards (heatmaps, etc.) deployed across multiple teams.
Core business improvements: 33.3% reduction in first‑screen time, 12pp drop in front‑end error rate, 4.03pp reduction in bounce rate, 2.03pp increase in core conversion, among other optimizations.
3.3 Co‑construction
End‑2023 collaboration with the Front‑End Foundations team integrated heatmaps and health dashboards into Dejavu, expanding coverage.
3.4 Patent
Application No.: CN202311033775.4
Dashboard Design
The visualization suite includes:
Business core dashboard (general + custom metrics).
Health dashboard covering six dimensions: core conversion rate, bounce rate, FMP‑P90, JS error rate, core API success rate, core API latency‑P90.
Error monitoring board (ARM platform) showing MID, URL, error type, stack trace, etc.
Heatmap board for click density.
Sankey diagram for traffic flow and conversion paths.
Retention board (daily, next‑day, 3‑day, 7‑day, weekly, etc.).
User journey tracking for deep single‑user experience analysis.
Future Plans
Enhancements include expanding custom formula support, adding enterprise‑WeChat alerts for errors and health drops, introducing more business‑oriented visual boards (retention, heatmap, Sankey), and continued co‑building with other Bilibili teams.
References
[1] ANTV Sankey Diagram: https://antv-2018.alipay.com/zh-cn/vis/chart/sankey.html
Bilibili Tech
Provides introductions and tutorials on Bilibili-related technologies.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.