Big Data 16 min read

How RoaringBitmap Supercharged Lazada’s Selection Platform and Cut Processing Time by 99%

This article explains how Lazada’s internal selection platform leveraged Hologres and the RoaringBitmap compression algorithm to dramatically reduce storage costs, accelerate set operations, and break the 200,000‑item pool limit, achieving up to a 99% speed improvement in scheduling.

Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
How RoaringBitmap Supercharged Lazada’s Selection Platform and Cut Processing Time by 99%

Platform Overview

Lazada’s selection platform aggregates merchants and products across the entire network, using Hologres’ RoaringBitmap capability to break the 200k item pool limit, reducing the scheduling of over 6,000 pools from 12 hours to 1 hour and cutting a single pool’s processing time from 90 seconds to 2 seconds.

The platform serves internal operators, selecting and supplying business entities (products, merchants) based on rule‑based criteria, and feeds these entities into various marketing channels such as flash sales and JFY.

Hologres and RoaringBitmap Basics

Hologres is a real‑time data‑warehouse engine supporting massive data ingestion, updates, and analytics with standard SQL (PostgreSQL compatible). It offers PB‑scale multidimensional analysis, low‑latency serving, and tight integration with MaxCompute, Flink, and DataWorks.

Traditional bitmaps use one bit per possible value, which wastes space when the data is sparse. RoaringBitmap compresses sparse bitmaps by dividing the high 16 bits into 65,536 buckets (containers) and storing the low 16 bits within each container.

Performance Tests

The following Java snippets compare intersection and difference operations between RoaringBitmap and standard Java lists.

RoaringBitmap rbAnd1 = new RoaringBitmap();
for (int k = 100000; k < 200000; k++) {
    rbAnd1.add(k);
}
RoaringBitmap rbAnd2 = new RoaringBitmap();
for (int k = 150000; k < 200000; k++) {
    rbAnd2.add(k);
}
Long start = System.currentTimeMillis();
rbAnd1.and(rbAnd2);
System.out.println("roaringBitmap and 耗时:" + (System.currentTimeMillis() - start) + "ms");

List<Integer> list1 = new ArrayList<>();
for (int k = 100000; k < 200000; k++) {
    list1.add(k);
}
List<Integer> list2 = new ArrayList<>();
for (int k = 150000; k < 200000; k++) {
    list2.add(k);
}
start = System.currentTimeMillis();
list1.retainAll(list2);
System.out.println("list and 耗时:" + (System.currentTimeMillis() - start) + "ms");

Result: RoaringBitmap intersection took 1 ms, while the list took 3,056 ms.

RoaringBitmap rbAnd1 = new RoaringBitmap();
for (int k = 100000; k < 200000; k++) {
    rbAnd1.add(k);
}
RoaringBitmap rbAnd2 = new RoaringBitmap();
for (int k = 150000; k < 200000; k++) {
    rbAnd2.add(k);
}
Long start = System.currentTimeMillis();
rbAnd1.andNot(rbAnd2);
System.out.println("roaringBitmap andNot 耗时:" + (System.currentTimeMillis() - start) + "ms");

List<Integer> list1 = new ArrayList<>();
for (int k = 100000; k < 200000; k++) {
    list1.add(k);
}
List<Integer> list2 = new ArrayList<>();
for (int k = 150000; k < 200000; k++) {
    list2.add(k);
}
start = System.currentTimeMillis();
list1.removeAll(list2);
System.out.println("list andNot 耗时:" + (System.currentTimeMillis() - start) + "ms");

Result: RoaringBitmap difference took 1 ms, while the list took 3,350 ms.

Practical Implementation

Multi‑value fields in the product wide table were originally stored as arrays, causing high CPU usage during intersection queries. Replacing these arrays with RoaringBitmap fields reduced storage size by 30‑40%, decreased file count, and lowered Hologres CPU utilization by about 30%.

Table creation example for a RoaringBitmap column:

BEGIN;
CREATE TABLE public.test (
    product_id bigint NOT NULL,
    product_categories roaringbitmap,
    ds text NOT NULL,
    PRIMARY KEY (product_id, ds)
) PARTITION BY LIST (ds);
CALL set_table_property('public.test', 'orientation', 'column,row');
CALL set_table_property('public.test', 'distribution_key', 'product_id');
END;

Selection Pool Operations

The selection pool stores IDs of entities that meet rule criteria. To avoid data redundancy and improve scheduling, the pool data was moved into Hologres RoaringBitmap fields. Because Hologres only supports 32‑bit integers, a bucket‑based scheme was used: the high 34 bits become the bucket number, and the low 30 bits are stored in the bitmap.

Overall benefits after the migration:

Data flow efficiency: 6,000+ pool schedules reduced from 12 h to 1 h; single pool time from 90 s to 2 s.

System stability: Full GC occurrences dropped by 88%.

Storage cost: Eliminated redundancy, saving ~75 GB.

Business breakthrough: Pool size limit increased from 200 k to 500 k items.

Conclusion

Adopting RoaringBitmap in Lazada’s selection platform dramatically improved both space and time efficiency, enabling the platform to overcome previous capacity constraints and delivering a clear example of how advanced data‑structure techniques can empower large‑scale e‑commerce operations.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Performance OptimizationBig DataSQLData WarehouseHologresRoaringBitmapBitmap Compression
Alibaba Cloud Big Data AI Platform
Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.