Fundamentals 12 min read

How Bitmaps and Bloom Filters Supercharge Massive Data Lookups

This article explains the bitmap concept, demonstrates how to store and query billions of integers using bit-level operations, introduces BitSet for dynamic bit vectors, and describes Bloom filters as probabilistic structures for fast, memory‑efficient membership testing in large‑scale data scenarios.

Programmer DD
Programmer DD
Programmer DD
How Bitmaps and Bloom Filters Supercharge Massive Data Lookups

1. BitMap

The basic idea of a bitmap is to use a single bit to mark the presence of a value, with the key being the element itself, which greatly reduces storage space.

For example, to check whether a number m exists among 2 billion random integers on a 32‑bit system with 4 GB memory, storing each integer as an int would require about 7.45 GB, while storing a single bit per integer needs only about 0.233 GB.

To represent a set such as {1,2,4,6}, each number corresponds to a bit position; 1 means present, 0 means absent.

Adding a Number

To insert the number 5, compute 5/32 = 0 and 5%32 = 5, then set the 5th bit of tmp[0]:

In code this is b[0] = b[0] | (1<<5).

Removing a Number

To delete the number 6, clear its bit:

The operation is b[0] = b[0] & (~(1<<6)).

Searching

To test whether a number exists, check the corresponding bit; e.g., b[0] & (1<<3) is non‑zero if 3 is present.

2. Bitmap Applications

Bitmaps are useful for fast sorting, searching, and deduplication of large data sets when the key space is dense.

High computational efficiency without comparisons or shifts.

Low memory usage; for N = 10 000 000, only N/8 ≈ 1.25 MB is needed.

Drawbacks include inability to handle duplicate keys and advantage only when data is relatively dense.

3. BitSet

Java's BitSet implements a growable bit vector where each bit holds a boolean value. Bits can be set, cleared, or queried, and logical operations can combine multiple BitSet instances.

Typical Usage

int wordIndex = wordIndex(bitIndex);
words[wordIndex] |= (1L << bitIndex);

4. Bloom Filters

A Bloom filter is a probabilistic data structure that tests set membership with high speed and low memory consumption. It can definitively say an element is not present, or possibly present.

The core is a bit array; K hash functions map each element to K positions, which are set to 1 on insertion. Membership is checked by verifying that all K bits are 1.

Choose K hash functions.

Initialize an n‑bit array to 0.

On insertion, hash the key K times and set the corresponding bits to 1.

On query, hash the key K times; if all bits are 1, the element may be present, otherwise it is definitely absent.

Adding Bloom Filter to a Project (Maven)

<dependency>
    <groupId>com.google.guava</groupId>
    <artifactId>guava</artifactId>
    <version>28.1-jre</version>
</dependency>

Swap Two Variables Without a Temporary Variable

1 // Method 1
2 a = a + b;
3 b = a - b;
4 a = a - b;
5 
6 // Method 2
7 a = a ^ b;
8 b = a ^ b;
9 a = a ^ b;
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

algorithmMemory OptimizationBitmapBloom Filterdata-structures
Programmer DD
Written by

Programmer DD

A tinkering programmer and author of "Spring Cloud Microservices in Action"

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.