Understanding BitMap, BitSet, and Bloom Filter: Principles, Operations, and Applications
This article explains the principles of BitMap, BitSet, and Bloom Filter data structures, illustrating how they use bits to efficiently store, add, delete, and query large sets of integers, and discusses their practical applications such as fast sorting, deduplication, and membership testing.
The article introduces three bit‑based data structures—BitMap, BitSet, and Bloom Filter—and shows how they exploit individual bits to represent the presence or absence of integer values, achieving significant memory savings.
BitMap uses one bit per possible value. For example, storing 2 billion integers with 32‑bit ints would require about 7.45 GB, whereas a BitMap needs only ~0.233 GB (2 billion ÷ 8 ÷ 1024³). Adding a number is done by left‑shifting 1 to the target position and applying a bitwise OR:
int index = i / 32; // array index<br/>int offset = i % 32; // bit position<br/>bitmap[index] |= (1 << offset);Removing a number clears the corresponding bit with a bitwise AND of the negated mask: bitmap[index] &= ~(1 << offset); Checking existence simply tests the bit: boolean exists = (bitmap[index] & (1 << offset)) != 0; Typical uses of BitMap include:
Fast sorting of a dense range of integers by setting bits and then scanning them in order (O(n) time).
Deduplication of massive integer streams by counting bits set to 1.
Constant‑time membership queries.
Advantages are high speed and low memory; drawbacks are inability to handle duplicate values and reduced benefit when data is sparse.
BitSet is a Java class that implements a dynamically growing bit vector. Each bit can be set, cleared, or queried, and the underlying storage is a long array. A typical operation looks like:
int wordIndex = wordIndex(bitIndex);<br/>words[wordIndex] |= (1L << bitIndex);This mirrors the manual BitMap logic but provides a convenient API.
Bloom Filter is a probabilistic data structure for set membership testing. It uses multiple hash functions to map an element to several bit positions in a bit array, setting those bits on insertion. A query checks whether all those bits are 1; if any is 0, the element is definitely absent, otherwise it may be present.
The workflow consists of:
Choose k hash functions.
Initialize an n -bit array to 0.
On insertion, hash the element k times and set the corresponding bits.
On query, hash the element k times and verify that all bits are 1.
A typical Maven dependency for a Bloom Filter implementation (Google Guava) is:
<dependency><br/> <groupId>com.google.guava</groupId><br/> <artifactId>guava</artifactId><br/> <version>28.1-jre</version><br/></dependency>Bloom Filters are widely used for large‑scale scenarios where memory is limited and occasional false positives are acceptable, such as cache‑penetration protection, URL deduplication, and spam detection.
In summary, BitMap, BitSet, and Bloom Filter demonstrate how bit‑level manipulation can dramatically reduce memory usage while providing fast operations for storage, insertion, deletion, and membership testing in big‑data contexts.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Architect Essentials
Committed to sharing quality articles and tutorials to help Java programmers progress from junior to mid-level to senior architect. We curate high-quality learning resources, interview questions, videos, and projects from across the internet to help you systematically improve your Java architecture skills. Follow and reply '1024' to get Java programming resources. Learn together, grow together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
