How Prism Transformed Front‑End Monitoring at Scale: Architecture, Challenges & Insights

This article details the design, challenges, and solutions behind Prism, a self‑built front‑end monitoring platform that collects multi‑device SDK data, processes it through Kafka, Flink and ClickHouse, visualizes metrics, integrates with A/B testing, and outlines future enhancements for broader enterprise adoption.

Xingsheng Youxuan Technology Community
Xingsheng Youxuan Technology Community
Xingsheng Youxuan Technology Community
How Prism Transformed Front‑End Monitoring at Scale: Architecture, Challenges & Insights

1. Product Overview

Prism is a front‑end monitoring system developed by the Experience Technology front‑end performance team, offering performance evaluation, quality assessment, error alerts, and custom event tracking for more than 100 company projects.

2. Background

Studies have shown that page latency directly impacts revenue, highlighting the need for data‑driven decisions as product scale grows. Without quantitative data, decisions are biased; therefore, comprehensive A/B testing and small‑traffic mechanisms become essential.

3. Exploration Process

3.1 Challenges

Developing a full‑stack front‑end monitoring platform involves SDK development for multiple endpoints, unified data formats, cross‑technology learning, high‑concurrency data processing, and stable cluster maintenance.

3.2 Solutions

3.2.1 Data Collection

Prism uses an intrusive front‑end SDK to collect rich dimensions: web (stay time, request errors, page errors), app (device, network, version, OS, request errors), and mini‑program data.

3.2.2 Data Ingestion

Collected data follows a unified format and is received by a Node.js multi‑node service behind CLB, filtered for dirty data, then written to Kafka topics for downstream processing.

3.2.3 Data Cleaning

The pipeline employs Kafka + Flink + ClickHouse. Initially Spark was used, then Flink (Scala) replaced it to handle growing data volume and cross‑team coordination, achieving better performance.

3.2.4 Backend Service & Visualization

Node.js services expose 40+ metrics and 20+ OpenAPI endpoints (daily active users, interface performance, custom events, error details). Data is visualized on the Prism platform with charts for various products.

3.2.5 Alert Service

Alerts are sent via Enterprise WeChat bots using configurable webhook URLs and custom rules (max affected users, error count, time thresholds). Clicking a notification shows error details for rapid debugging.

3.2.6 Overall Architecture & Maintenance

The system integrates Alertmanager, Prometheus, Grafana, Node.js, and WeChat bots. Metrics are scraped by Prometheus, visualized in Grafana, and alerts trigger automated restart scripts.

4. Data Output Capabilities

4.1 Integration with A/B Testing (Picasso)

Picasso, the company’s A/B testing platform, relies on Prism’s data for experiment analysis. Prism’s metrics support multiple marketing experiments, handling massive data spikes during peak periods.

4.2 Cross‑Department Data Collaboration

4.2.1 Data Sharing via OpenAPI

Prism provides OpenAPI endpoints for external consumption, enabling other platforms (e.g., the simulation platform) to retrieve interface call details for testing and regression.

4.2.2 Custom Data Promotion

More than 20 products have integrated custom events, notably the “优选优咪” operations and warehouse apps, using Prism data to monitor employee behavior and guide product iteration.

5. Future Plans

Prism now covers about 90% of front‑end projects, but there remain UI and interaction refinements. The team plans to migrate backend services to the company’s data‑warehouse platform, enhance data depth, and continue learning from industry‑leading monitoring tools.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

FrontendMonitoringperformanceAB testingobservability
Xingsheng Youxuan Technology Community
Written by

Xingsheng Youxuan Technology Community

Xingsheng Youxuan Technology Official Account

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.