How WeChat’s WeAnalysis Powers Scalable User Segmentation with Big Data Architecture
This article explains the design and implementation of WeChat's WeAnalysis image system, covering its basic tag and user‑group modules, multi‑source data ingestion, ETL processing, storage choices such as TDSQL and ClickHouse, bitmap handling, query performance, and service APIs for flexible, high‑performance user segmentation.
Background
WeAnalysis is the official data‑analysis platform for WeChat Mini‑Programs, with the image insight module providing basic tag analysis and customizable user‑group capabilities to meet diverse analytical needs.
System Design Goals
Usability : Zero learning curve for merchants, ready‑to‑use out of the box.
Stability : Reliable tag data and timely generation of user‑group packages with fast query response.
Completeness : Rich tags, flexible rules, and comprehensive functionality supporting preset tags, user‑group tags, platform behavior, and custom reported data.
Overall Architecture
The system is divided into two main modules: the basic tag module and the user‑group module. Data flows from multiple sources (user attributes, group tags, platform behavior, custom reports) through ETL and pre‑computation, then into offline storage (TDW/HDFS) and finally into online stores (TDSQL for pre‑computed results, ClickHouse for detailed behavior).
Data Sources
Four sources feed the system: user attributes (e.g., gender, region), group tags (active, churned), platform behavior (visits, shares, transactions), and custom reported events uploaded by merchants.
Processing Pipeline
Data Ingestion & ETL : Raw data is cleaned, encoded, and aggregated. String dimensions are converted to integer IDs to reduce storage and improve query speed.
Tag Encoding & Storage : Tags are stored in vertical tables; each tag value is assigned a unique code. Bitmap (RoaringBitmap) structures represent tag‑to‑user mappings.
Online Storage : Pre‑computed results are written to TDSQL; detailed behavior and bitmap data are stored in ClickHouse using the groupBitmap aggregate function.
Data Import : Spark jobs generate per‑user bitmaps, serialize them to Base64 strings, and load them into ClickHouse tables with a materialized bitmap column.
Storage Choices
TDSQL provides up to 192 TB per instance with fast bulk import (≈40 min for 100 M rows). ClickHouse, combined with RoaringBitmap, offers efficient bitmap operations, high compression, and sub‑second query latency for large user groups.
Service Layer
Online services expose image APIs via the svrk‑javamesh RPC framework, with a middleware layer handling traffic control, async calls, monitoring, and parameter validation.
Query Performance
Local‑node execution and hash‑based sharding ensure queries run on a single machine, avoiding distributed joins. Numeric ID encoding yields >2× speedup over string‑based queries. Benchmarks show up to 5 × 10⁴ QPS for typical queries, with sampling used for very large apps to keep latency acceptable.
User‑Group Features
Real‑time Estimation : Calculates current group size based on defined rules.
Batch Creation : Nightly Spark jobs compute daily groups for all merchants, reading once and writing once to minimize resource usage.
Tracking & Analysis : Offline jobs export group members, join with metric tables, and store results for online analysis (e.g., activity, transaction trends).
AB Experiment Targeting : Groups can be used as experiment cohorts for controlled tests.
Key Takeaways
The architecture balances flexibility (rich, customizable tags) with performance (bitmap storage, local query execution) and scalability (supporting billions of daily events). By leveraging Spark for heavy ETL, ClickHouse for fast bitmap queries, and TDSQL for bulk storage, WeAnalysis delivers a robust, low‑latency user‑segmentation platform.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
