How to Build a Front‑End User Behavior Tracing System for Faster Issue Diagnosis
This article explains the design and implementation of a front‑end user behavior tracing system, covering common external network problems, the importance of collecting runtime environment, data, JS errors, and interaction logs, and detailing SDK data collection, reporting strategies, server processing, and query platform visualization.
Current Situation Analysis
When diagnosing external network issues, the most challenging cases are those that cannot be reproduced or appear intermittently. Without access to packet captures, breakpoints, or logs on the user's device, we must rely on screenshots and limited user descriptions, using guesswork and elimination methods, often ending with a generic suggestion to clear cache or reinstall the app.
The low efficiency stems from a lack of clues and users' limited technical understanding, which may omit or provide misleading information.
Common Causes of External Network Issues
Backend data returns abnormal or contains empty fields.
Pages lack proper fault‑tolerance for edge cases, causing errors.
User network environment or app version problems.
Missing parameters when navigating from a previous entry point.
Issues triggered by specific user operation steps.
Although we have script exception monitoring (e.g., Sentry), many user‑reported external issues are not caused by script exceptions and therefore cannot trigger automatic reports. A secondary reporting mechanism is needed for these scenarios.
Importance of User Behavior Trace
Collecting the following data greatly improves the ability to locate external issues:
Page runtime environment.
Data loaded by the page.
Page JS error information.
User operation logs (timeline).
By linking these data points with timestamps, we can create a clear timeline similar to a crime‑scene video, making analysis and problem localization much easier.
Design Overview
What to Report: Content and Protocol
Each page visit is treated as a basic query unit. For a user who visits page A three times, three records are stored, each containing multiple child records that share common base information.
const log = {
baseInfo: {},
childLogs: [{...}, {...}, ...]
};Base Information
The baseInfo field records the page's runtime environment, such as browser, OS, and other contextual data.
Child Record Types
Type 1: AJAX Communication
Records all AJAX requests to help determine whether backend data is the root cause of an issue.
Type 2: User Interaction
Records click events and DOM attributes associated with user actions.
Type 3: Error Reporting
Records JavaScript errors and manually thrown exceptions.
How to Report: SDK Data Collection and Reporting Strategy
Data is collected by loading a JavaScript SDK on the page. Collection is performed only for logged‑in users; unauthenticated pages are ignored.
Data Collection Methods
A unique FtraceId (UUID) is generated when the user enters a page and is shared by all subsequent child records.
AJAX Hook
hookAjax({
open: this.handleOpen,
onreadystatechange: this.handleStage
});During the open phase we capture request time, method, and parameters (excluding our own reporting requests). In the send phase we capture POST bodies. The readyStateChange phase records response time, HTTP status, and response data, attaching all collected fields to the current xhr object.
User Interaction Tracking
$(document).on('click', '.js_qm_trace', e => {
const target = e.currentTarget;
const FtimeStamp = getNowDate();
const FdomPath = _getDomPath(e.path);
let Fattr = null, FtraceContent = null;
if (target.hasAttributes()) {
const processed = _processAttrMap(target.attributes);
Fattr = processed.Fattr;
FtraceContent = processed.FtraceContent;
}
// ...report action...
});Reporting Strategy
Collected data is first cached locally using IndexedDB (large capacity, asynchronous, supports custom indexes). When the user and page URL are on a whitelist, cached data is uploaded; otherwise, data is uploaded on demand. Errors bypass the cache and are reported immediately.
Server‑Side Data Processing
Data is posted to an Nginx server, which logs the request body using a custom log format.
http {
log_format trace '$request_body';
server {
location /trace/ {
client_body_buffer_size 1000m;
client_max_body_size 1000m;
proxy_pass http://127.0.0.1:6699/env;
access_log /data/qmtrace/log/access.log trace;
}
server {
listen 6699;
location /env/ {
client_max_body_size 1000m;
alias /data/qmtrace/;
}
}
}
}A cron job runs every five minutes to rotate the access log, rename it with a timestamp, and signal Nginx to reopen the log file. A Node.js script then parses the rotated logs and stores the records in a database.
Data Presentation
The internal query platform allows searching by user UIN and page URL. Results are displayed as a list of trace IDs on the left and a detailed timeline view on the right, showing the sequence of user actions during a single page visit.
Conclusion
We described what to report (content and protocol), how to report (SDK collection and reporting strategy), server‑side processing, and data visualization, building an initial user behavior tracing system that significantly improves efficiency in handling external network issues. The system is extensible and can be refined further.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
