Big Data 12 min read

Designing a Unified User Behavior Data Collection System for Mobile and Web Applications

The article explains how to build a unified user‑behavior data collection platform that standardizes event definitions, front‑end reporting, and back‑end storage using Kafka pipelines and Elasticsearch, enabling comprehensive analysis of user interactions across Android, iOS, and web clients.

Big Data Technology & Architecture

Jun 3, 2020

Designing a Unified User Behavior Data Collection System for Mobile and Web Applications

This article focuses on the design of a user‑behavior data collection system that captures interactions from Android apps, iOS apps, and web pages. It explains why relying solely on backend databases limits insight and how front‑end reporting can provide richer conversion and scoring data.

Three core questions are addressed: what data to collect, how the front‑end should collect it, and how the back‑end should store it. The data model includes mandatory fields such as uuid, event_time, page, and element, with a flexible attrs object for additional metadata.

What to collect : The system targets page‑view (browse) and click events, recording the page identifier, element ID, and related input data. User and time dimensions are added via a UUID and timestamp.

{
    "uuid": "2b8c376e-bd20-11e6-9ebf-525499b45be6",
    "event_time": "2016-12-08T18:08:12",
    "page": "www.example.com/poster.html",
    "element": "register",
    "attrs": {
        "title": "test",
        "user_id": 1234
    }
}

Front‑end collection : Instead of manual instrumentation, the article proposes a lightweight “hook‑based” approach that binds a global click listener, uses custom attributes ( user_action_id and user_action_relation) to mark elements for reporting, and assembles related data before sending it to the backend.

$(d).ready(function() {
    // page view upload
    pvUpload({page: getPageUrl()}, $.extend({title: getTitle()}, getUrlParams()));

    // bind click events
    $(d).bind('click', function(event) {
        var $target = $(event.target);
        var $ua = $target.closest('[user_action_id]');
        if ($ua.length > 0) {
            var userActionId = $ua.attr('user_action_id');
            var userActionRelation = $("[user_action_relation=" + userActionId + "]");
            var relationData = [];
            if (userActionRelation.length > 0) {
                userActionRelation.each(function() {
                    var jsonStr = JSON.stringify({
                        "r_placeholder_element": $(this).get(0).tagName,
                        "r_placeholder_text": $(this).text()
                    });
                    jsonStr = jsonStr.replace(/\placeholder/g, $(this).attr('id'));
                    jsonStr = JSON.parse(jsonStr);
                    relationData.push(jsonStr);
                });
            }
            clickUpload({page: getPageUrl(), element: userActionId},
                        $.extend({title: getTitle()}, getUrlParams(), relationData));
        }
    });
});

HTML markup simply adds the custom attributes to elements that need to be tracked:

<div>
    <textarea id="answer" cols="30" rows="10" user_action_relation="answer-submit"></textarea>
    <button user_action_id="answer-submit">提交</button>
</div>

Back‑end storage : Reported events are first pushed into a Kafka queue, separating ingestion from processing. Consumers enrich the data with client type, event type, IP, and User‑Agent, then assign an event_id for human‑readable event names. The final JSON is stored in Elasticsearch, which handles the semi‑structured attrs field efficiently.

{
    "uuid": "2b8c376e-bd20-11e6-9ebf-525499b45be6",
    "event_time": "2016-12-08T18:08:12",
    "page": "www.example.com/poster.html",
    "element": "register",
    "client_type": 0,
    "event_type": 0,
    "user_agent": "Mozilla/5.0 ...",
    "ip": "59.174.196.123",
    "timestamp": 1481218631,
    "event_id": 12,
    "attrs": {
        "title": "test",
        "user_id": 1234
    }
}

Elasticsearch mapping templates are defined to dynamically handle string fields and timestamps, and bulk APIs are used for high‑throughput insertion.

The article concludes with a reminder to configure event‑name mappings and encourages readers to like, share, and follow the author’s public account.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

backend frontend data collection user behavior Elasticsearch kafka

Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.