How RoaringBitmap Cut User Profile Analysis from Minutes to Seconds
This article explains how Alibaba's user growth platform leveraged RoaringBitmap in Hologres to accelerate massive user profiling, reducing analysis time from several minutes to around ten seconds by redesigning bitmap storage, optimizing data pipelines, and employing efficient SQL and scheduling techniques.
Business Introduction
Alibaba's user growth team built a technology platform that processes data at the scale of billions of records daily, supporting media投 platforms, ABTest, and user operation services. Traditional profiling using MaxCompute was slow, prompting a switch to Hologres with RoaringBitmap.
1. Bitmap Data Structure
Bitmap stores data using individual bits, offering high space and compute efficiency. Each element's presence is marked by setting the corresponding bit to 1.
Use one bit to represent a value for a given key.
If the element exists, set the bit to 1; otherwise, set it to 0.
Example: user IDs 3 and 4 are basketball fans; the bitmap marks these IDs with bits set to 1.
Bitmap can transform feature‑user relationships into bit arrays, enabling fast set operations without full table scans.
2. RoaringBitmap
RoaringBitmap compresses sparse bitmaps by dividing 32‑bit integers into 2^16 chunks, each stored in one of three container types:
Array Container : Stores integers in an array; optimal for sparse data (< 4096 values).
Bitmap Container : Uses a 65,536‑bit array; optimal for dense data (> 4096 values).
Run Container : Applies run‑length encoding for consecutive integers, storing start value and length.
The algorithm dynamically selects the best container, providing fast set operations (AND, OR, XOR) and excellent query performance.
3. Practical Use of RoaringBitmap in the Platform
Profiling workflow:
All tag data are stored in MaxCompute, along with a bitmap index.
Business users select tags and user groups; the platform builds bitmap indexes for the selected crowd.
If both tag and crowd have bitmap indexes, analysis is performed using bitmap operations.
4. Solution Comparison
Earlier profiling relied on MaxCompute SQL, which incurred high latency. Directly importing data into Hologres still took > 3 minutes due to data volume. RoaringBitmap, natively supported by Hologres, dramatically reduced query time to seconds.
Performance chart shows 75 % of analyses finish within 10 seconds, with some cases achieving > 20× speedup.
5. Core Process Details
UID bucket design splits a 64‑bit user ID into a high‑order bucket (44 bits) and a low‑order part (20 bits) to fit Hologres' 32‑bit RoaringBitmap functions.
Asynchronous result handling returns a task ID when the query exceeds the timeout, allowing the client to poll for completion.
Task scheduling uses a DAG‑based workflow to orchestrate bitmap construction, synchronization to Hologres, and query execution.
6. Key SQL Snippets
public class AccelerateConfigDTO {
private Boolean isAccelerationCompleted;
private Boolean isAccelerationEnabled;
private String lastDs;
} -- MaxCompute SQL to create bitmap table
CREATE TABLE IF NOT EXISTS demo_table (
field_value STRING COMMENT 'tag value',
bucket BIGINT COMMENT 'bucket',
bitmap BINARY COMMENT 'uid bitmap'
) PARTITIONED BY (ds STRING, label_id STRING) LIFECYCLE 365;
INSERT OVERWRITE TABLE demo_table PARTITION(ds='${bizdate}', label_id='${label_id}')
SELECT COALESCE(${label_field}, 'NULL') AS field_value,
SHIFTRIGHT(CAST(COALESCE(${uidField}, '0') AS BIGINT), 20) AS bucket,
ENCODE(mc_rb_build_agg(CAST(COALESCE(${uidField}, '0') AS BIGINT) & 1048575), 'utf-8') AS bitmap
FROM ${dataSource}.${dataTable}
WHERE ds = MAX_PT('${dataSource}.${dataTable}')
AND CAST(COALESCE(${uidField}, '0') AS BIGINT) > 0
GROUP BY COALESCE(${label_field}, 'NULL'), SHIFTRIGHT(CAST(COALESCE(${uidField}, '0') AS BIGINT), 20); -- Hologres SQL for profiling
SELECT ${label_alias} AS "${label_name}",
SUM(rb_and_cardinality(t1.bitmap, t2.bitmap)) AS "人数"
FROM (
SELECT bucket, bitmap FROM public.holo_crowd_table WHERE crowd_id='${crowd_id}'
) t1
JOIN (
SELECT field_value, bucket, bitmap FROM public.holo_table WHERE ds='${label_ds}' AND label_id='${label_id}'
) t2 ON t1.bucket = t2.bucket
GROUP BY ${label_alias};Conclusion
By adopting RoaringBitmap in Hologres, the platform reduced user‑profile analysis from minute‑level to second‑level for billions of users, achieving dozens of times speedup and significant compute‑resource savings.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Big Data AI Platform
The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
