Building a Scalable Frontend Performance Monitoring System at 哈啰
This article details 哈啰's front‑end performance monitoring architecture, covering the background of rapid growth, a three‑step optimization workflow, data collection, cleaning, aggregation, visualization, and practical techniques like pre‑rendering and offline packages to dramatically improve page load metrics.
Background
Rapid growth of business volume and feature breadth caused severe front‑end performance degradation, leading to poor user experience and time‑consuming root‑cause analysis.
Performance Optimization Workflow
The workflow is abstracted into three consistent stages: data collection , data cleaning , and aggregation & display .
Data Collection
The SDK collects performance data through four functional modules. Depending on the scenario, modules such as cache can be omitted or merged.
Information acquisition
Information assembly
Cache module
Reporting module
Information acquisition
Key metrics captured by the SDK:
Exception capture (window.onerror, unhandledrejection, script errors)
Page load time (white‑screen, first‑screen, total load, interactive time)
Page frame rate
Network conditions (effective type, RTT, downlink)
Native method invocation details (e.g., bridge calls)
Page load metrics definition
White‑screen time : time to render the first element with a non‑zero importance score.
First‑screen time : time to render the element with the highest importance score.
Total load time : time until all resources finish loading (window.onload).
Interactive time : DOM ready time plus any custom JS markers indicating readiness.
Information assembly
Collected data is normalized into a unified model with three categories:
Common information : type, timestamp, logId, etc.
Business information : performance metrics (page load times, exception details) plus business identifiers such as businessId, pageSessionId, pagePath.
User base information : app name, version, OS, device model, and other user identifiers.
Cache & Reporting
After assembly, logs are written to a local cache (e.g., localStorage or IndexedDB). The SDK merges new logs with cached entries and triggers upload based on configurable strategies:
Size‑based (upload when cache size exceeds logMaxSize)
Count‑based (upload when number of entries ≥ logMaxCount)
Time‑based (periodic timer maxTimeLog)
Failed uploads are retried; on permanent failure the error is bubbled to the caller for custom handling.
Data Cleaning Pipeline
1. Log server (Node.js) exposes an HTTP endpoint; front‑end SDK posts JSON logs.
2. The server writes each log to a Kafka topic.
3. Flink consumes the Kafka stream, performs real‑time cleansing (e.g., filtering malformed entries, deduplication, sampling), and writes the cleaned records to ClickHouse.
4. To mitigate Kafka back‑pressure and Flink‑ClickHouse write latency, a parallel sink writes the same stream to Hive. In case of ClickHouse data loss, the hourly Hive partition can be re‑imported, providing a coarse‑grained fault‑tolerance backup.
5. Auxiliary business data are stored in PostgreSQL for joins during later analysis.
Aggregation & Visualization
Node services query ClickHouse using SQL‑like statements, compute derived metrics (e.g., white‑screen rate, second‑open rate), and expose the results via a REST API.
The front‑end dashboard consumes the API and renders real‑time charts, daily reports, and custom analyses such as Lighthouse performance snapshots.
Optimization Practices
Pre‑rendering
During the build, a headless browser (e.g., Puppeteer) loads each target page, captures the fully rendered HTML, and uploads the static HTML to OSS. The static HTML is served directly to the client, eliminating the white‑screen and first‑screen rendering phases. In the reported case, white‑screen time dropped from >1.6 s to ~0.76 s without adding backend load.
Offline packages (APP scenario)
For H5 pages embedded in native apps, an offline bundle containing all static assets is generated. At runtime the client checks whether a matching bundle exists; if not, it fetches resources from the network and caches them locally. The build pipeline integrates pre‑rendered HTML and the offline assets into a single deliverable, enabling instant page display on subsequent launches.
Performance Analysis
Real‑time performance charts display stage‑level timings (DNS lookup, TCP handshake, HTML download, parsing, script execution, rendering). Engineers focus on the longest‑lasting stages and apply targeted optimizations such as:
DNS pre‑fetching to reduce connection‑setup latency.
CDN + OSS for static resource delivery.
Webpack bundle splitting and compression to shrink download size.
Pre‑rendering/SSR to shorten white‑screen and first‑screen times.
Key Takeaways
A unified monitoring system that collects, cleans, and visualizes front‑end performance data is essential for data‑driven optimization.
Modular SDK design (acquisition → assembly → cache → reporting) allows flexible adaptation to different business scenarios.
Real‑time stream processing (Kafka → Flink → ClickHouse) provides low‑latency metrics while Hive serves as a backup for fault tolerance.
High‑impact levers include concurrency, caching, resource compression, and workflow‑level improvements such as pre‑rendering and offline bundles.
Metrics‑driven bottleneck identification enables precise, measurable performance gains without unnecessary code‑level micro‑optimizations.
Illustrative Diagrams
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
