Big Data 13 min read

Speed Up Blacklist Filtering to Milliseconds with Bitmaps and Multithreading

The article examines performance bottlenecks in marketing systems when filtering blacklisted accounts, proposes multithreaded processing and bitmap-based set operations—including Java BitSet and RoaringBitmap—to dramatically reduce processing time from tens of minutes to milliseconds, and discusses implementation details, memory efficiency, and broader applications such as indexing and Redis.

JD Cloud Developers
JD Cloud Developers
JD Cloud Developers
Speed Up Blacklist Filtering to Milliseconds with Bitmaps and Multithreading

Background

In marketing systems, customer complaints hinder business growth; typically high‑risk accounts are filtered into a blacklist and combined with frequency‑control strategies to reduce complaints, improve efficiency, lower costs, and enhance quality.

Marketing platforms use big‑data modeling via a Customer Data Platform (CDP) to create target audiences, with blacklists also maintained in the CDP. For a group of ~300,000 accounts, the end‑to‑end reach process took over an hour, and blacklist filtering alone consumed nearly half an hour, which is unacceptable.

Performance Optimization

Introducing Multithreading

Since RPC calls involve I/O, splitting a large account set into multiple tasks submitted to a thread pool dramatically speeds up processing. Benchmarks show single‑threaded handling of 250k and 500k accounts takes ~30 min and ~1 h respectively, while 25 threads reduce these to about 1 min and 2 min.

Introducing Bitmap Optimization

CDP stores audience and blacklist groups as bitmap files. By downloading these bitmaps via the CDP SDK, a simple bitwise AND‑NOT operation removes blacklist members from the target audience.

DataLoader dataLoader = new DataLoader(token, bitMapBaseUrl);
ABitmap customerBitmap = dataLoader.loadGroup(customerGroupCode);
ABitmap blacklistBitmap = dataLoader.loadGroup(blacklistGroupCode);
customerBitmap.andNot(blacklistBitmap);

Bitmap storage is space‑efficient (≈2 MB for 500k accounts) and the AND‑NOT operation runs in ~80 ms, a dramatic improvement over the previous half‑hour.

Bitmap Basics

Principles

A bitmap uses a single bit to represent the presence (1) or absence (0) of a value, enabling massive space savings. For example, storing 4 billion numbers as 64‑bit longs would require ~30 GB, whereas as bits it needs only ~0.5 GB.

In Java, a long array (e.g., long[] words = new long[(nBits‑1)/64 + 1]) holds the bits, where each element represents 64 consecutive values. Index and bit position are computed via division and modulo (or bit shifts).

Adding a value involves setting the corresponding bit with a left‑shifted 1 and OR‑ing it into the array; removing uses a left‑shifted 1, NOT, and AND; searching checks the bit with an AND operation.

The Java BitSet class implements these ideas, using bit‑shifts instead of division/modulo for speed.

RoaringBitmap

Standard BitSet can waste memory; storing a single value like 200 000 000 may consume ~23 MB. Compressed bitmaps such as RoaringBitmap achieve far better compression (e.g., the same value uses only 144 B) and can be hundreds of times faster. RoaringBitmap splits a 32‑bit integer into high‑16 and low‑16 bits. High‑16 bits index a container; low‑16 bits are stored either in an ArrayContainer (for < 4096 elements) or a BitmapContainer (for ≥ 4096). A RunContainer also exists but is less common.

For 64‑bit values, Roaring64NavigableMap stores high‑32 and low‑32 bits similarly; CDP’s audience bitmaps are based on this implementation.

Bitmap Application Scenarios

Bitmaps efficiently represent massive boolean sets, support fast bitwise operations (AND, OR, XOR), and enable constant‑time existence checks. They are useful for de‑duplication, sorting, counting, and membership queries on billions of items, as well as in interview problems.

Java Use Cases

Re‑implementing Collection.removeIf with a bitmap reduces overhead by marking elements to delete in the first pass and physically removing them in the second pass.

Bitmap Indexes

Bitmap indexes excel for low‑cardinality columns in data warehouses, accelerating queries via bitwise logic, though they are unsuitable for high‑cardinality or frequently updated data.

Redis Bitmaps

Redis stores bitmaps as strings, enabling compact representation of user states, daily sign‑ins, active‑user counts, and supporting rich bit‑operations for features like A/B testing, while noting string length limits and memory considerations.

Bloom Filters

When dealing with non‑numeric keys (e.g., URLs, usernames), Bloom filters map elements to bits using multiple hash functions, offering high space efficiency and fast membership queries with a controllable false‑positive rate, useful for web crawling, cache protection, and similar scenarios.

Conclusion

By introducing bitmap data structures and multithreaded processing, blacklist filtering in marketing systems can be reduced from minutes‑hour scale to milliseconds, delivering substantial memory savings and performance gains for large‑scale data workloads.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Performance OptimizationBitmapmultithreading
JD Cloud Developers
Written by

JD Cloud Developers

JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.