Big Data 17 min read

Design and Implementation of a General H5 User Behavior Tracking and Data Warehouse Model

This article presents a comprehensive H5 (HTML5) tracking solution that details the planning of event‑collection points, the full data‑warehouse modeling process—including schema design, retention calculations, and SQL implementations—and the automatic data‑capture mechanisms needed to improve user‑behavior analysis efficiency across the product lifecycle.

Sohu Tech Products
Sohu Tech Products
Sohu Tech Products
Design and Implementation of a General H5 User Behavior Tracking and Data Warehouse Model

The article begins by describing the need for a more efficient H5 user‑behavior analysis pipeline, introducing the concept of "埋点" (event tracking) and explaining why H5 pages are widely used in mobile web apps.

It then outlines a universal analysis model that covers three core themes: basic analysis (PV/UV, session duration, new vs. returning users), page analysis (page‑level PV, UV, dwell time), and retention analysis (N‑day and specific‑day retention). The model is illustrated with lifecycle diagrams and sample metric definitions.

Section 3 details the tracking scheme, starting with business goals such as automatically collecting PV/UV without manual instrumentation. The automatic collection mechanism relies on three rule‑based scenarios: URL changes (SPA routing), focus/blur events, and visibilitychange for tab switches. Each rule is implemented via JavaScript event listeners.

Key code snippets include the override of pushState and replaceState to emit custom events, and simple listeners for focus/blur and visibilitychange:

function resetHistoryFun(type) {
  let originMethod = window.history[type];
  return function() {
    let rs = originMethod.apply(this, arguments);
    let e = new Event(type.toLocaleLowerCase());
    e.arguments = arguments;
    window.dispatchEvent(e);
    return rs;
  };
}
window.history.pushState = resetHistoryFun('pushState');
window.history.replaceState = resetHistoryFun('replaceState');
window.addEventListener('focus', () => console.log('页面得到焦点'));
window.addEventListener('blur', () => console.log('页面失去焦点'));
document.addEventListener('visibilitychange', () => {
  if (document.hidden) console.log('页面离开');
  else console.log('页面进入');
});

The tracking design specifies two event types (page‑enter and page‑exit) to capture both start and end of a session, enabling accurate dwell‑time calculation and avoiding duplicate reports.

Section 4 describes the data‑warehouse solution, presenting a multi‑layer architecture (detail layer DW, light aggregation DMA, thematic DMT, and indicator DA). It explains how to unify app‑id and user‑id mappings, generate active‑user fact tables, and implement retention tags using bitmap techniques. Representative SQL snippets illustrate the ETL logic for merging daily increments, identifying new users, and calculating N‑day retention:

SELECT if(b.unique_id is null,1,0) AS is_new
FROM (
  SELECT * FROM table_XXX_hi WHERE day='${today}' AND hour='${etl_hour}'
) a
LEFT JOIN (
  SELECT unique_id, appid FROM (
    SELECT unique_id, appid, row_number() over(partition by unique_id, appid order by active_date) rn
    FROM table_XXX_df WHERE day='${etl_date}'
  ) b WHERE rn=1
) b ON a.unique_id = b.unique_id AND a.appid = b.appid;

Retention calculations use binary strings that are periodically converted to hexadecimal to keep storage compact, as shown in the provided SQL example.

Compatibility notes state that the solution depends on the window object and does not support IE6‑IE8. Error handling is performed via try/catch blocks, and any failures are reported as special event types.

Privacy considerations are addressed by minimizing collected identifiers, allowing user consent, and ensuring data‑processing transparency.

Finally, the article presents data‑visualization concepts using MySQL‑backed dashboards, showing sample reports for overall app overview, user retention, and page‑level analysis, and outlines future extensions such as funnel analysis, attribution models, and path tracking.

big dataSQLdata warehouseevent trackingH5 analytics
Sohu Tech Products
Written by

Sohu Tech Products

A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.