How to Build a Real‑Time Page Performance Monitoring System
This article explains why monitoring page performance is crucial for user experience and SEO, outlines the design of a three‑part monitoring system—including front‑end data reporting via Navigation Timing, server‑side log collection with Nginx, data aggregation, sampling, storage, and visual dashboards—providing a complete end‑to‑end solution.
Background
Why monitor page performance? Poor performance hurts revenue because users may abandon a slow page, especially on mobile where tolerance for latency is low. Slow loading also harms SEO; high bounce rates lead Google to lower rankings. Since performance degrades over iterations, a continuous monitoring system is needed to evaluate, alert, and guide optimization.
Existing tools like GTmetrix provide static analysis but cannot reflect real‑world user conditions, regional speeds, or functional timings such as time to first click or ad display. Therefore we embed JavaScript on pages to collect real user data, report it to a server, aggregate, process, and visualize it.
Design of the Monitoring System
The system consists of three parts:
Front‑end reporting
How to record timing points
How to report the data
Data sampling
Data processing and storage
Data presentation
Front‑end Reporting
We inject a JavaScript snippet to capture performance metrics that reflect user experience, such as white‑screen time, first‑screen time, and time to interactive.
Determining the Start Point
The start point is when the user presses Enter after entering the URL. Modern browsers provide the Navigation Timing API to obtain this timestamp.
In Chrome, open the console and inspect
performance.timingto see a list of timestamps measured in milliseconds since the Unix epoch. A zero value indicates the event did not occur.
The
navigationStartproperty marks the moment the browser begins the request (i.e., the user hits Enter or refreshes the page).
White‑Screen Time
White‑screen time is the interval until the first visual element appears. It is not simply the time to first byte because the page may still be blank while header resources load.
Three scenarios are considered:
Static pages without JavaScript rendering: white‑screen ends after header resources load. A script placed at the end of the
<head>can log the time.
Pages built with frameworks like Vue or React: rendering occurs after JavaScript execution or asynchronous data fetching, so white‑screen ends after the loading indicator disappears.
<code><!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta http-equiv="X-UA-Compatible" content="ie=edge">
<!-- Header resources -->
<link href="style.css">
<title>Document</title>
<scirot>
// Record white‑screen end time
var time = +new Date() - performance.timing.navigationStart;
</scirot>
</head>
<body>
</body>
</html></code>First‑Screen Time
First‑screen time is the moment when all resources required for the initial viewport are fully rendered. For image‑heavy pages, it is after the last image loads; for data‑driven pages, it is after data insertion.
Reporting Method
After measuring timestamps, the data must be sent to the backend with minimal impact on the page. An
<img>tag with a GET request is used because it avoids CORS issues and works across browsers.
No AJAX cross‑origin problems; can request different origins.
Old tag with universal browser support.
<code>var i = new Image();
i.onload = i.onerror = i.onabort = function () {
i = i.onload = i.onerror = i.onabort = null;
};
i.src = url;</code>Modern browsers also support
navigator.sendBeacon, which sends small data asynchronously and even works when the page is closed. The final strategy prefers
sendBeaconwhen available, otherwise falls back to the
<img>method.
<code>navigator.sendBeacon(url, data ? $.param(data) : null);</code>Sampling
Because the volume of reported data is huge, sampling is applied on the client side. The sampling rate is indicated by a
rateparameter (e.g.,
rate=10for 1/10 sampling).
Data Collection and Storage
An Nginx server records each reporting request in logs, capturing request headers, IP, parameters, etc. Logs are rotated every five minutes using a custom configuration rather than the daily
logrotateinterval.
<code>if ($time_iso8601 ~ "^(\d{4})-(\d{2})-(\d{2})T(\d{2}):(\d{1})[0-4]") {
set $logname $1-$2-$3-$4-$50;
}
if ($time_iso8601 ~ "^(\d{4})-(\d{2})-(\d{2})T(\d{2}):(\d{1})[5-9]") {
set $logname $1-$2-$3-$4-$55;
}
access_log logs/stat.y.qq.com.sp.access.$logname.log spdata;
log_format spdata '$time_local ~|^ $http_x_forwarded_for ~|^ $request ~|^ $http_referer ~|^ $status ~|^ $http_user_agent ~|^ $cookie_ptisp ~|^ $cookie_uin';</code>The log fields include timestamp (5‑minute bucket), IP, reported data, product ID, project ID, page ID, measurement points, sampling rate, referer, parsed user‑agent info, ISP, and user ID.
Data Ingestion
To reduce server load, reporting machines and ingestion servers are separated. The ingestion server periodically pulls log files from the reporting machines for processing.
Database Design
Given billions of daily page views, data is partitioned by date into separate tables. Three tables are used:
Statistics table : stores 5‑minute average latency per page.
Raw data table : holds original records.
Index table : provides fast lookup of raw data.
The statistics table enables quick queries for trends, while the raw and index tables support complex multi‑dimensional queries (e.g., by country, ISP, network type). Each raw table is kept under ten million rows to maintain MySQL performance.
Threshold Alerts
If a data interface becomes slow, the system triggers an alert when the 5‑minute average exceeds a configurable threshold (default 10 seconds), notifying developers to investigate.
Data Presentation
The UI shows a bar chart of all monitoring points for a page, daily trends for a single point, and multi‑dimensional analysis tables.
Overall Page Overview
The chart helps developers quickly locate bottlenecks.
Detail of a Monitoring Point
Average latency
Request count
Slow‑user proportion
Latency distribution
Additional dimensions for analysis include country, province, ISP, network type, and operating system.
Abnormal Data Handling
Outliers (e.g., a single report taking >30 minutes) can distort averages. Points exceeding 10 minutes are filtered out to keep charts reliable.
Conclusion
We described a three‑layer monitoring system covering front‑end reporting, data collection and storage, and visualization. Continuous performance monitoring is essential for delivering a smooth user experience.
References
https://fex.baidu.com/blog/2014/05/build-performance-monitor-in-7-days/
https://www.qcloud.com/community/article/655542
http://javascript.ruanyifeng.com/bom/performance.html
QQ Music Frontend Team
QQ Music Web Frontend Team
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.