Big Data 18 min read

Design and Architecture of WMDA: A Comprehensive User Behavior Analysis Platform

The article details WMDA, a no‑code and manual‑code data collection platform for PC, mobile and app that supports real‑time and offline user behavior analysis, describing its functional model, behavior taxonomy, five‑layer architecture, tracking techniques, circle‑selection, data services, streaming and batch processing pipelines, and related technologies such as Storm, Spark, Druid and Roaring Bitmap.

58 Tech
58 Tech
58 Tech
Design and Architecture of WMDA: A Comprehensive User Behavior Analysis Platform

WMDA is a user behavior analysis platform developed by 58 Group that enables zero‑code data collection across PC, mobile web, and app channels by loading an SDK once, while also offering manual code‑point tracking for customized data, ensuring internal data security.

The product model divides functionality into projects, applications, metrics, and analysis, where users create isolated projects, manage web/iOS/Android apps, select key metrics, and obtain overview, real‑time, single‑chart, retention, funnel, and segmentation insights.

Its behavior model defines a three‑level hierarchy: session → page view → event, with session termination criteria of 30 minutes of inactivity on web and 30 seconds in background on app, page views generating events, and all clicks counted as events.

The architecture follows a five‑layer design: data collection (SDKs for JS, iOS, Android), data transmission (collection service, data cleaning, fingerprinting, then Flume to Kafka), data modeling/storage (ETL to HDFS, real‑time stream to Storm), data statistics/analysis (Storm for real‑time; offline batch via Spark, Azkaban, OLAP, Bitmap, clustering), and data visualization (Vue‑based UI built on the WF framework).

For tracking, WMDA compares code‑point, visual, and no‑code approaches, noting that code‑point offers precision but high cost, visual is low‑cost but limited, and no‑code provides full coverage and back‑trace at the expense of flexibility; WMDA therefore combines no‑code with optional manual code points.

Circle‑selection lets users define which elements to analyze by combining attributes (Domain, Path, Query, XPath, Value, Index, Link for web; AppPage, ElemPath, Value, Index for app), with web leveraging DOM trees and XPath, and app using QR‑code‑triggered screenshots, element coordinates, and MySQL storage for interactive selection.

The data collection service handles user registration (UUID generation per platform), configuration distribution, time correction, information enrichment (IP‑based location, path data), and spider detection based on User‑Agent.

The real‑time analysis system uses Storm + HBase, processing data in 5‑minute windows with two overlapping windows to mitigate latency, writing completed windows to HBase for downstream use.

The offline analysis system employs Spark for ETL, Azkaban for scheduling, and a suite of sub‑systems (OLAP, Bitmap, clustering) to compute single‑chart, funnel, retention, and segmentation results, following a Lambda architecture for fault tolerance.

OLAP is implemented with Druid, comprising Real‑Time Nodes, Historical Nodes, Coordinators, Brokers, and an Indexing Service, with data stored in HDFS as deep storage and segments managed via Zookeeper.

Bitmap calculations for funnel, retention, and clustering use the Roaring Bitmap library, employing ArrayContainer for sparse blocks and BitmapContainer for dense blocks to achieve high compression and fast operations.

WMDA is still in early deployment, having processed billions of events in testing, and is positioned to become a key tool for improving product quality and operational efficiency across 58’s services.

The article concludes with recruitment notices for storage, cloud platform, and big‑data engineering positions at 58 Group.

Big Datadata collectionReal-time Streaminguser behavior analysisDruidRoaring Bitmapoffline analytics
58 Tech
Written by

58 Tech

Official tech channel of 58, a platform for tech innovation, sharing, and communication.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.