Scaling U‑App Analytics to Billions of Events with Flink, MaxCompute & Hologres
UMeng+’s U‑App analytics platform processes nearly a trillion daily logs by combining real‑time Flink streams, offline MaxCompute batches, and Alibaba Cloud Hologres OLAP, employing multi‑engine architecture, smart sampling, and Roaring Bitmap techniques to deliver fast, cost‑effective, high‑concurrency user behavior and profiling analysis.
UMeng+ Overview
UMeng+’s mission is "data intelligence, driving business growth". It offers a one‑stop solution for mobile developers and enterprises, including statistical analysis, performance monitoring, push messaging, and intelligent authentication. By June 2023 it had served 2.7 million mobile apps and 9.8 million websites.
Its statistical products U‑App, U‑Mini and U‑Web provide basic reports and customizable user‑behavior analysis, helping developers understand user needs, optimize features, and improve experience.
Technical Architecture
U‑App processes close to a trillion daily logs, requiring a balance of data freshness, query latency, cost, and system stability.
The architecture uses Flink for real‑time (T+0) computation and MaxCompute for offline batch (T+1) processing, exporting results to an HBase‑like store for high‑concurrency queries. For OLAP queries, Alibaba Cloud Hologres provides sub‑second analysis; when data exceeds memory limits, the workload is shifted to offline batch processing.
Four‑Layer Architecture
Data Service : parses DSL into a DAG, applies smart sampling and query queuing to reduce overload and ensure smooth queries.
Data Compute : abstracts analysis models for behavior analysis and profiling.
Data Storage : uses a User‑Event model to support detailed behavior analysis.
Core Components : MaxCompute for batch, Flink for streaming, Hologres for OLAP.
Multi‑Engine Design Rationale
Cost, stability, high availability, and fault tolerance require routing queries between high‑performance OLAP and robust but higher‑latency batch engines.
Cold‑hot data separation stores recent hot data for one month; older data is automatically routed to offline computation.
Future extensibility: the fast‑evolving OLAP landscape demands flexible integration of new engines.
Hologres Multi‑Dimensional Analysis
Hologres was chosen for its compute‑storage separation, PostgreSQL compatibility, deep integration with MaxCompute, and strong performance.
Elastic compute‑storage separation reduces cost while supporting flexible scaling.
Rich ecosystem and UDF support simplify development.
Direct read/write with MaxCompute enables federated queries and mixed hot‑cold data queries.
High‑performance C/C++ engine delivers PB‑scale data queries in seconds.
User Behavior Analysis
U‑App provides event, funnel, retention, and path analysis models, enabling fine‑grained insight into user behavior for precise operation.
Challenges include massive daily log volume (≈ 1 trillion entries), diverse app data sizes, schema‑free custom events, and the need for high‑concurrency query performance.
Data Storage Layer Indexing
Clustering key on appkey and event_name to reduce scan volume.
Distribution key on device ID for orthogonal data placement and reduced shuffle.
Appropriate shard count to balance write and query performance.
Bitmap indexes on frequently filtered columns to speed filtering.
Using Hologres v1.3 JSONB column store compresses JSON data 25‑50 % and speeds queries 5‑10 ×.
Engine Execution Layer Functions
Custom analysis functions (e.g., windowFunnel, retention) were developed in C and integrated into Hologres, replacing costly multi‑JOIN implementations.
-- Not using windowFunnel
WITH log(user_id, event_time, event_name) AS (
SELECT user_id, event_time, event_name FROM event_log
WHERE ds >= '20231220' AND ds <= '20231220'
)
SELECT ARRAY[
COUNT(DISTINCT step1.user_id) FILTER (WHERE step1.event_time IS NOT NULL),
COUNT(DISTINCT step2.user_id) FILTER (WHERE step2.event_time IS NOT NULL),
COUNT(DISTINCT step3.user_id) FILTER (WHERE step3.event_time IS NOT NULL)
]
FROM (
SELECT user_id, event_time FROM log WHERE event_name = 'session_start'
) step1
LEFT JOIN (
SELECT user_id, event_time FROM log WHERE event_name = 'add_cart'
) step2 ON (step1.user_id = step2.user_id AND step1.event_time < step2.event_time AND step2.event_time - step1.event_time <= 3600)
LEFT JOIN (
SELECT user_id, event_time FROM log WHERE event_name = 'order_pay'
) step3 ON (step1.user_id = step3.user_id AND step2.event_time < step3.event_time AND step3.event_time - step1.event_time <= 3600);
-- Using windowFunnel
WITH step_detail AS (
SELECT user_id,
windowFunnel(3600, 'default', event_time,
event_name = 'session_start',
event_name = 'add_cart',
event_name = 'order_pay') AS step
FROM event_log
WHERE ds >= '20231220' AND ds <= '20231220'
GROUP BY user_id
)
SELECT CASE step
WHEN 0 THEN 'total'
WHEN 1 THEN 'session_start'
WHEN 2 THEN 'add_cart'
WHEN 3 THEN 'order_pay'
END AS stage,
SUM(count_user) OVER (ORDER BY step DESC) AS cumulative
FROM (
SELECT step, COUNT(1) AS count_user FROM step_detail GROUP BY step
) t;Performance tests show the funnel function is 5‑10 × faster and uses 10‑25 % less memory than the traditional JOIN approach.
Query Layer Smart Sampling and Queuing
Smart sampling prevents OOM for large queries by estimating scan size and applying a sampling rate based on thresholds. Query queuing, implemented with Redis, controls concurrency by separating waiting and active queues.
Tag and Crowd Computation with Roaring Bitmap
Hologres Roaring Bitmap functions enable efficient multi‑dimensional tag and crowd calculations, reducing storage 5‑10 × and achieving sub‑second query latency for billions of IDs.
-- Hologres tag table
BEGIN;
CREATE TABLE IF NOT EXISTS rb_tag_table (
name TEXT NOT NULL,
value TEXT NOT NULL DEFAULT '',
bucket BIGINT NOT NULL,
bitmap ROARINGBITMAP
);
CALL set_table_property('rb_tag_table', 'orientation', 'column');
CALL set_table_property('rb_tag_table', 'distribution_key', 'bucket');
CALL set_table_property('rb_tag_table', 'clustering_key', 'name,value');
COMMIT;
-- Hologres crowd table
BEGIN;
CREATE TABLE IF NOT EXISTS rb_crowd_table (
crowd_id TEXT NOT NULL,
bucket BIGINT NOT NULL,
bitmap ROARINGBITMAP
);
CALL set_table_property('rb_crowd_table', 'orientation', 'column');
CALL set_table_property('rb_crowd_table', 'distribution_key', 'bucket');
CALL set_table_property('rb_crowd_table', 'clustering_key', 'crowd_id');
COMMIT;
-- Insert tags from MaxCompute external table
INSERT INTO rb_tag_table
SELECT 'age' AS name, age_v1 AS value, bucket,
rb_build_agg(oneid) AS bitmap
FROM (
SELECT age_v1, id % 64 AS bucket, CAST(id / 64 AS INT) AS oneid
FROM tag_foreign_table
WHERE age_v1 <> ''
) t1
GROUP BY age_v1, bucket;
-- Tag‑based crowd query example
SELECT SUM(rb_cardinality(bitmap)) AS total
FROM rb_tag_table
WHERE name = 'city' AND value = '0';
-- Crowd‑based tag aggregation example
SELECT t1.value AS value, SUM(t1.size) AS total
FROM (
SELECT tag.bucket, tag.value,
rb_cardinality(rb_and(tag.bitmap, crowd.bitmap)) AS size
FROM (
SELECT bucket, value, bitmap
FROM rb_tag_table
WHERE name = 'brand' AND value IN ('xiaomi','oppo')
) tag
JOIN (
SELECT bucket, bitmap
FROM rb_crowd_table
WHERE crowd_id = 'crowd_01'
) crowd ON tag.bucket = crowd.bucket
) t1
GROUP BY t1.value;Conclusion and Outlook
Hologres now powers multiple UMeng+ products, delivering fast, flexible, multi‑dimensional analysis for behavior insight and audience segmentation. Future directions include leveraging materialized views, hot‑cold data separation, and exploring Apache Paimon Streaming LakeHouse to further enhance real‑time OLAP capabilities while balancing performance, cost, and stability.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Big Data AI Platform
The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
