Big Data 18 min read

Accelerating User Profile Analysis with Hologres RoaringBitmap

The article explains how Hologres RoaringBitmap compresses user ID sets into efficient bitmap indexes, splits 64‑bit IDs into buckets, syncs them from MaxCompute, and enables sub‑second user portrait queries that previously took minutes, dramatically improving performance and scalability.

DaTaobao Tech
DaTaobao Tech
DaTaobao Tech
Accelerating User Profile Analysis with Hologres RoaringBitmap

This article introduces the use of Hologres RoaringBitmap to speed up user portrait analysis, turning multi‑minute queries into sub‑second responses for massive data volumes.

Bitmap principle : A bitmap stores a set of IDs as bits, offering high space and compute efficiency. RoaringBitmap compresses sparse bitmaps by dividing the 32‑bit space into 2^16 chunks and using three container types (Array, Bitmap, Run) to store the low 16 bits.

RoaringBitmap containers :

Array Container – optimal for sparse data (< 4096 values). Bitmap Container – optimal for dense data (> 4096 values). Run Container – uses run‑length encoding for consecutive values.

System architecture : The platform extracts tag and crowd data from MaxCompute, builds bitmap indexes, syncs them to Hologres, and runs analysis queries using RoaringBitmap functions. A DAG scheduler orchestrates daily bucket construction and data synchronization.

UID bucket design : Because Hologres RoaringBitmap functions only support 32‑bit integers, 64‑bit user IDs are split into a high‑level (44 bits) and a low‑level (20 bits). The low‑level part is stored in the bitmap, while the high‑level part determines the bucket, dramatically reducing I/O.

Key SQL snippets :

-- MaxCompute table creation
CREATE TABLE IF NOT EXISTS demo_table(
field_value STRING COMMENT 'tag value',
bucket BIGINT COMMENT 'bucket',
bitmap BINARY COMMENT 'uid bitmap'
) PARTITIONED BY (ds STRING, label_id STRING) LIFECYCLE 365;
-- Insert bitmap data
INSERT OVERWRITE TABLE demo_table PARTITION(ds='${bizdate}', label_id='${label_id}')
SELECT COALESCE(${label_field}, 'NULL') AS field_value,
SHIFTRIGHT(CAST(COALESCE(${uidField}, '0') AS BIGINT), 20) AS bucket,
ENCODE(mc_rb_build_agg(CAST(COALESCE(${uidField}, '0') AS BIGINT) & 1048575), 'utf-8') AS bitmap
FROM ${dataSource}.${dataTable}
WHERE ds=MAX_PT('${dataSource}.${dataTable}') AND CAST(COALESCE(${uidField}, '0') AS BIGINT) > 0
GROUP BY COALESCE(${label_field}, 'NULL'), SHIFTRIGHT(CAST(COALESCE(${uidField}, '0') AS BIGINT), 20);
-- Hologres foreign table for bitmap sync
CREATE FOREIGN TABLE IF NOT EXISTS public.foreign_table(
field_value TEXT,
bucket INT8,
bitmap BYTEA,
ds TEXT,
label_id TEXT
) SERVER odps_server OPTIONS (project_name '${projectName}', table_name 'foreign_table_${index}');
-- Hologres internal table
CREATE TABLE IF NOT EXISTS public.holo_table(
field_value TEXT,
bucket INT8,
bitmap ROARINGBITMAP,
ds TEXT,
label_id TEXT,
PRIMARY KEY (ds, label_id, bucket, field_value)
) PARTITION BY LIST(ds);
-- Analysis query
SELECT ${label_alias} AS "${label_name}",
SUM(rb_and_cardinality(t1.bitmap, t2.bitmap)) AS "count"
FROM (SELECT bucket, bitmap FROM public.holo_crowd_table WHERE crowd_id='${crowd_id}') t1
JOIN (SELECT field_value, bucket, bitmap FROM public.holo_table WHERE ds='${label_ds}' AND label_id='${label_id}') t2
ON t1.bucket = t2.bucket
GROUP BY ${label_alias};

Java DTO example used for acceleration configuration:

public class AccelerateConfigDTO {
private Boolean isAccelerationCompleted;
private Boolean isAccelerationEnabled;
private String lastDs;
}

Performance results : Over 75% of queries finish within 10 seconds, with some cases dropping from minutes to seconds. The bitmap solution accounts for ~60% of total latency, while the remaining time is spent in ODPS (MaxCompute) when the bitmap cannot fully cover the workload.

Conclusion : By leveraging Hologres RoaringBitmap, the platform achieved order‑of‑magnitude speedups for user portrait analysis, reduced resource consumption, and provided a scalable solution for future data growth.

performancebig dataSQLHologresRoaringBitmapbitmap index
DaTaobao Tech
Written by

DaTaobao Tech

Official account of DaTaobao Technology

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.