Operations 8 min read

Real-time Monitoring of H5 Pages Using Headless Browser and Puppeteer

This article describes a real‑time monitoring solution for large numbers of H5 pages that combines Python's Requests library for data crawling with a headless Chrome browser driven by Puppeteer to detect resource errors, API failures, and DOM anomalies, automatically alerting stakeholders.

转转QA
转转QA
转转QA
Real-time Monitoring of H5 Pages Using Headless Browser and Puppeteer

Background : The platform hosts many H5 pages published via a backend system, but occasional 404 errors, missing styles, or API failures are not captured by existing alarms, leading to poor user experience and potential churn.

Technical Introduction : The solution relies on two main technologies: the Python Requests library for crawling activity links and the Node.js Puppeteer library to launch a headless Chrome browser for monitoring.

Project Overview : The workflow includes crawling activity URLs, storing them, creating scheduled tasks, binding URLs to tasks, triggering monitoring every five minutes, and sending alerts when anomalies are detected.

Monitoring Types :

Real‑time status‑code monitoring of every resource requested by a page; errors (4xx/5xx) trigger alerts.

API response monitoring; non‑zero response codes generate alerts.

DOM analysis; empty page titles cause alerts.

Implementation Process :

1. Data Source Acquisition : Use Requests to call the backend API, filter active activity links, extract needed fields, and insert them into the database.

2. Task Creation : After obtaining the data source, automatically clear previous auto‑crawled link tables, insert new links, create tasks, associate activity URLs, set start time to now, end time to 24 hours later, mark status as running, and record task identifiers.

3. Task Execution : Every five minutes a scheduler scans the task table, discards expired tasks, selects runnable tasks, retrieves the associated activity URL, and launches Puppeteer to load the page.

4. Monitoring : While the headless browser runs, the system records resource status codes, API responses, and DOM content. Detected errors generate email notifications to pre‑configured recipients.

Manual Activity Monitoring : For activities added manually, links are stored in a separate table, tasks are created with explicit start/end times, cookies can be injected for authenticated pages, and the same Puppeteer‑based monitoring is applied.

Project Outcomes : The system has identified seven critical bugs, including missing CSS resources and backend API exceptions, all of which were promptly fixed by the front‑end or back‑end teams.

Future Directions :

Platformization – integrate the monitoring system into a unified task platform.

Task systemization – standardize creation, definition, and lifecycle management of monitoring tasks.

Report generation – aggregate abnormal request data into periodic reports.

Extended coverage – add more types of exception monitoring.

monitoringPythonPuppeteerautomationNode.jsRequestsheadless browser
转转QA
Written by

转转QA

In the era of knowledge sharing, discover 转转QA from a new perspective.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.