Ensuring Frontend System Stability through Monitoring and Automated Inspection
This article explains how modern front‑end teams ensure system stability and high‑quality operation by implementing comprehensive monitoring and automated inspection, covering background, significance, architecture, real‑time and scheduled checks, performance metrics, alert strategies, error handling, custom reporting, and future improvement plans.
Modern front‑end applications have complex scenarios and user experience is critical; ensuring stability and sustainability is a key challenge.
Background
Complexity increases with SPA and PWA adoption, higher performance expectations, diverse devices and networks, and agile CI/CD practices that require rapid issue response.
Significance
Performance monitoring improves user experience.
Error tracking enables quick resolution.
User behavior analysis guides product iteration.
Business metric monitoring ensures core flow stability.
Alert systems allow rapid response to anomalies.
Monitoring Categories
Two parts: real‑time monitoring (integrated with the SGM platform, covering 100+ applications with alert mechanisms) and scheduled task inspection (automated cron jobs that report results and trigger alerts).
Overall Architecture
Real‑time Monitoring
Integration with the SGM monitoring platform provides multi‑channel alerts (e.g., DingTalk, email, phone). Alert strategy balances precision and sensitivity, uses hierarchical levels, regular rule optimization, clear responsibility assignment, and team training.
Web‑end performance metrics include LCP, CLS, FCP, FID, and TTFB with thresholds (e.g., LCP ≤ 2.5 s). Alerts were tuned by temporarily raising the LCP threshold to 5 s to reduce noise while still tracking performance impact.
Page Performance
Monitoring covers the full page‑load lifecycle, white‑screen detection, and configuration of URLs (supporting regex) to capture first‑content‑fulfilment times. Metrics such as white‑screen time help identify and optimise pages that affect first‑impression experience.
JSError Monitoring
Error keywords are configured to match console messages; thresholds are set based on QPS because appropriate degradation strategies keep pages functional despite errors. Cross‑origin “Script error” is handled by enabling CORS and adding a Vue error handler.
<script src="http://xxxdomain.com/home.js" crossorigin></script> Vue.config.errorHandler = (err, vm, info) => {
if (err) {
try {
console.error(err);
window.__sgm__.error(err);
} catch (e) {}
}
};API Request Monitoring
Alerts focus on HTTP status codes and business error codes. Data‑collection parameters are configured, and a standardized mapping of error codes is applied across services.
{
"50000X": "Program exception, internal",
"500001": "Program exception, upstream",
"500002": "Program exception, xx",
"...": "..."
}Resource Errors
Monitoring includes loading failures of CSS, JS, and images. Degradation strategies are applied for image errors, and non‑essential image‑error collection can be disabled.
Custom Reporting
Key business nodes report detailed request/response data, user‑behavior traces, and specific failure cases such as address‑selection errors in H5 pages embedded within apps. Custom logs are used to locate and resolve issues quickly.
Mini‑Program Monitoring
Combines SGM monitoring with official mini‑program analysis tools to capture performance, JavaScript errors, and resource‑loading issues specific to mini‑programs.
Native App Monitoring
Basic monitoring via mPaaS (crash), Zhulong (startup time, first‑screen, stutter), and SGM (network, WebView, native pages). Business monitoring is applied to login, product‑detail, and order pages, with custom SDKs reporting abnormal flows.
Scheduled Inspection
Implemented through the UI “Woodpecker” platform and custom Node.js scripts. The tool checks link validity, hover and click interactions, and reports results. Example configuration snippets are shown below.
{
"cookieThor": "",
"urlPattern": "pro\\.jd\\.com",
"urls": ["https://b.jd.com/s?entry=newuser"]
} {
"cookieThor": "",
"urlPattern": "///pro.jd.com/",
"urls": [{
"url": "https://b.jd.com/s?entry=newuser",
"clickElements": [{
"item": ".recommendation-product-wrapper .jdb-sku-wrapper"
}]
}]
}Problems Discovered
JS errors in closed environments, resource‑loading failures, and business‑flow exceptions were identified. Mitigations include try‑catch wrappers, queue mechanisms for deferred reporting, and refined alert thresholds.
Summary
Before monitoring, issues were discovered reactively via user feedback; after integration, proactive detection, performance optimisation, error‑rate reduction, and stable releases were achieved.
Future Planning
Goals: raise >90 % of applications above a 90‑point performance score, deepen custom exception reporting (e.g., button‑visibility errors), and upgrade inspection tools for broader coverage and smarter automation.
JD Retail Technology
Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.