Fundamentals 12 min read

Understanding Bitmap, BitSet, and Bloom Filter: Memory‑Efficient Data Structures for Large‑Scale Data

This article explains the bitmap concept, demonstrates how to store and query billions of integers using bitwise operations, introduces BitSet and Bloom Filter implementations, and provides practical Java code snippets for adding, removing, and checking elements while highlighting their memory‑saving advantages.

Top Architect
Top Architect
Top Architect
Understanding Bitmap, BitSet, and Bloom Filter: Memory‑Efficient Data Structures for Large‑Scale Data

Bitmap uses a single bit to represent the presence of an element, allowing massive space savings when storing large sets of integers. For example, storing 2 billion integers as 32‑bit ints would require about 7.45 GB, whereas a bit‑wise representation needs only about 0.23 GB.

Each bit corresponds to a number: 0 means absent, 1 means present. An integer array int tmp[1+N/32] can hold bits for numbers up to N , where the index is M/32 and the position within the int is M%32 .

Adding an Element

To insert a number i , compute i/32 for the array index and i%32 for the bit position, then set the bit with array[idx] |= (1 << pos) . The article shows the example of inserting 5, resulting in 86 | (1<<5) = 118 .

Removing an Element

To delete a number, clear its bit: array[idx] &= ~(1 << pos) . For removing 6, the operation becomes b[0] = b[0] & (~(1<<6)) .

Searching for an Element

Check existence by testing the bit: b[0] & (1<<3) – a non‑zero result means the number exists.

Bitmap Applications

Bitmap is useful for fast sorting, deduplication, and lookup of large, dense key ranges. Sorting can be done by setting bits for each element and then scanning the bit array, achieving O(n) time. Deduplication works similarly by counting bits set to a specific state.

Advantages

High computational efficiency without comparisons or shifts.

Low memory consumption (e.g., 10 million items need only 1.25 MB).

Disadvantages

Cannot handle duplicate values.

Benefits appear only when the data set is dense.

BitSet

Java's BitSet is a growable bit vector where each bit represents a non‑negative integer. It supports setting, clearing, and querying bits. Core operations include:

int wordIndex = wordIndex(bitIndex);
words[wordIndex] |= (1L << bitIndex);

Images in the original article illustrate the internal layout of a BitSet stored as a long array.

Bloom Filter

A Bloom filter is a probabilistic bit‑vector structure that quickly tests set membership with low memory usage. It uses k hash functions to set k bits on insertion; a query checks whether all those bits are 1, indicating possible presence.

Typical use cases include email spam detection, URL deduplication, and cache‑penetration protection.

Bloom Filter Process

Choose k hash functions.

Initialize an n -bit array to 0.

On insertion, hash the key with each function and set the corresponding bits to 1.

On query, hash the key and verify that all k bits are 1; if any is 0, the element is definitely absent.

Example Maven dependency for Guava's Bloom filter implementation:

<dependency>
    <groupId>com.google.guava</groupId>
    <artifactId>guava</artifactId>
    <version>28.1-jre</version>
</dependency>

The article concludes that bitmap‑based structures are valuable for fast retrieval of contiguous or near‑contiguous key ranges, especially when memory efficiency is critical.

JavaMemory OptimizationbitmapData Structurebloom filterBitSet
Top Architect
Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.