Building a Scalable Frontend Performance Monitoring System at 哈啰

This article details 哈啰's front‑end performance monitoring architecture, covering the background of rapid growth, a three‑step optimization workflow, data collection, cleaning, aggregation, visualization, and practical techniques like pre‑rendering and offline packages to dramatically improve page load metrics.

dbaplus Community
dbaplus Community
dbaplus Community
Building a Scalable Frontend Performance Monitoring System at 哈啰

Background

Rapid growth of business volume and feature breadth caused severe front‑end performance degradation, leading to poor user experience and time‑consuming root‑cause analysis.

Performance Optimization Workflow

The workflow is abstracted into three consistent stages: data collection , data cleaning , and aggregation & display .

Data Collection

The SDK collects performance data through four functional modules. Depending on the scenario, modules such as cache can be omitted or merged.

Information acquisition

Information assembly

Cache module

Reporting module

Information acquisition

Key metrics captured by the SDK:

Exception capture (window.onerror, unhandledrejection, script errors)

Page load time (white‑screen, first‑screen, total load, interactive time)

Page frame rate

Network conditions (effective type, RTT, downlink)

Native method invocation details (e.g., bridge calls)

Page load metrics definition

White‑screen time : time to render the first element with a non‑zero importance score.

First‑screen time : time to render the element with the highest importance score.

Total load time : time until all resources finish loading (window.onload).

Interactive time : DOM ready time plus any custom JS markers indicating readiness.

Information assembly

Collected data is normalized into a unified model with three categories:

Common information : type, timestamp, logId, etc.

Business information : performance metrics (page load times, exception details) plus business identifiers such as businessId, pageSessionId, pagePath.

User base information : app name, version, OS, device model, and other user identifiers.

Cache & Reporting

After assembly, logs are written to a local cache (e.g., localStorage or IndexedDB). The SDK merges new logs with cached entries and triggers upload based on configurable strategies:

Size‑based (upload when cache size exceeds logMaxSize)

Count‑based (upload when number of entries ≥ logMaxCount)

Time‑based (periodic timer maxTimeLog)

Failed uploads are retried; on permanent failure the error is bubbled to the caller for custom handling.

Data Cleaning Pipeline

1. Log server (Node.js) exposes an HTTP endpoint; front‑end SDK posts JSON logs.

2. The server writes each log to a Kafka topic.

3. Flink consumes the Kafka stream, performs real‑time cleansing (e.g., filtering malformed entries, deduplication, sampling), and writes the cleaned records to ClickHouse.

4. To mitigate Kafka back‑pressure and Flink‑ClickHouse write latency, a parallel sink writes the same stream to Hive. In case of ClickHouse data loss, the hourly Hive partition can be re‑imported, providing a coarse‑grained fault‑tolerance backup.

5. Auxiliary business data are stored in PostgreSQL for joins during later analysis.

Aggregation & Visualization

Node services query ClickHouse using SQL‑like statements, compute derived metrics (e.g., white‑screen rate, second‑open rate), and expose the results via a REST API.

The front‑end dashboard consumes the API and renders real‑time charts, daily reports, and custom analyses such as Lighthouse performance snapshots.

Optimization Practices

Pre‑rendering

During the build, a headless browser (e.g., Puppeteer) loads each target page, captures the fully rendered HTML, and uploads the static HTML to OSS. The static HTML is served directly to the client, eliminating the white‑screen and first‑screen rendering phases. In the reported case, white‑screen time dropped from >1.6 s to ~0.76 s without adding backend load.

Offline packages (APP scenario)

For H5 pages embedded in native apps, an offline bundle containing all static assets is generated. At runtime the client checks whether a matching bundle exists; if not, it fetches resources from the network and caches them locally. The build pipeline integrates pre‑rendered HTML and the offline assets into a single deliverable, enabling instant page display on subsequent launches.

Performance Analysis

Real‑time performance charts display stage‑level timings (DNS lookup, TCP handshake, HTML download, parsing, script execution, rendering). Engineers focus on the longest‑lasting stages and apply targeted optimizations such as:

DNS pre‑fetching to reduce connection‑setup latency.

CDN + OSS for static resource delivery.

Webpack bundle splitting and compression to shrink download size.

Pre‑rendering/SSR to shorten white‑screen and first‑screen times.

Key Takeaways

A unified monitoring system that collects, cleans, and visualizes front‑end performance data is essential for data‑driven optimization.

Modular SDK design (acquisition → assembly → cache → reporting) allows flexible adaptation to different business scenarios.

Real‑time stream processing (Kafka → Flink → ClickHouse) provides low‑latency metrics while Hive serves as a backup for fault tolerance.

High‑impact levers include concurrency, caching, resource compression, and workflow‑level improvements such as pre‑rendering and offline bundles.

Metrics‑driven bottleneck identification enables precise, measurable performance gains without unnecessary code‑level micro‑optimizations.

Illustrative Diagrams

Speaker Photo
Speaker Photo
Data Collection Diagram
Data Collection Diagram
Performance Optimization Process
Performance Optimization Process
Monitoring System Flow
Monitoring System Flow
Log Server Architecture
Log Server Architecture
Data Cleaning Pipeline
Data Cleaning Pipeline
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

frontendoptimizationdata pipelinemetricspre-renderingoffline package
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.