Understanding Bloom Filters: Theory, Implementation, and Applications
Bloom filters are space‑efficient probabilistic structures that test set membership using multiple hash functions, offering fast, low‑memory checks with a controllable false‑positive rate, and can be implemented manually in Java, via Guava’s library, or deployed at scale with RedisBloom for distributed applications.
Bloom filters are probabilistic data structures designed to test set membership efficiently, especially for massive data sets where a small false‑positive rate is acceptable.
What is a Bloom Filter?
A Bloom filter consists of a large bit array and multiple independent hash functions. When an element is added, each hash function maps the element to a bit position, which is set to 1. To query an element, the same hash functions are applied; if all corresponding bits are 1, the element is probably present, otherwise it is definitely absent.
Principle
Adding an element:
Compute hash values using several hash functions.
Set the bits at the resulting indices to 1.
Querying an element:
Compute the same hash values.
Check whether all bits at those indices are 1.
The structure is space‑efficient (e.g., 1 000 000 bits ≈ 122 KB) but does not support deletion and incurs a false‑positive probability that grows with the number of inserted items.
Typical Use Cases
Cache‑penetration protection – quickly reject requests for non‑existent keys.
Large‑scale deduplication – e.g., URL crawling, email spam lists, IP blacklists.
Membership checks in distributed systems where exact storage is prohibitive.
Java Manual Implementation
The following Java class demonstrates a simple Bloom filter implementation using a BitSet and custom hash functions.
import java.util.BitSet;
public class MyBloomFilter {
private static final int DEFAULT_SIZE = 2 << 24; // bit array size
private static final int[] SEEDS = new int[]{3, 13, 46, 71, 91, 134};
private BitSet bits = new BitSet(DEFAULT_SIZE);
private SimpleHash[] func = new SimpleHash[SEEDS.length];
public MyBloomFilter() {
for (int i = 0; i < SEEDS.length; i++) {
func[i] = new SimpleHash(DEFAULT_SIZE, SEEDS[i]);
}
}
public void add(Object value) {
for (SimpleHash f : func) {
bits.set(f.hash(value), true);
}
}
public boolean contains(Object value) {
boolean ret = true;
for (SimpleHash f : func) {
ret = ret && bits.get(f.hash(value));
}
return ret;
}
public static class SimpleHash {
private int cap;
private int seed;
public SimpleHash(int cap, int seed) { this.cap = cap; this.seed = seed; }
public int hash(Object value) {
int h;
return (value == null) ? 0 : Math.abs(seed * (cap - 1) & ((h = value.hashCode()) ^ (h >>> 16)));
}
}
}Test code:
String value1 = "https://javaguide.cn/";
String value2 = "https://github.com/Snailclimb";
MyBloomFilter filter = new MyBloomFilter();
System.out.println(filter.contains(value1)); // false
System.out.println(filter.contains(value2)); // false
filter.add(value1);
filter.add(value2);
System.out.println(filter.contains(value1)); // true
System.out.println(filter.contains(value2)); // trueUsing Guava’s Bloom Filter
Guava provides a well‑tested Bloom filter implementation. Example:
BloomFilter
filter = BloomFilter.create(
Funnels.integerFunnel(),
1500,
0.01);
System.out.println(filter.mightContain(1)); // false
System.out.println(filter.mightContain(2)); // false
filter.put(1);
filter.put(2);
System.out.println(filter.mightContain(1)); // true
System.out.println(filter.mightContain(2)); // trueGuava’s version is suitable for single‑node applications; for distributed scenarios, a Redis‑based filter is preferable.
Redis Bloom Filter (RedisBloom Module)
Redis 4.0+ supports modules; RedisBloom adds Bloom filter commands. Example Docker launch:
docker run -p 6379:6379 --name redis-redisbloom redislabs/rebloom:latest
docker exec -it redis-redisbloom bash
redis-cliCommon commands:
BF.ADD {key} {item} – create filter if needed and add an item.
BF.MADD {key} {item} [item ...] – add multiple items.
BF.EXISTS {key} {item} – check membership.
BF.MEXISTS {key} {item} [item ...] – batch check.
BF.RESERVE {key} {error_rate} {capacity} [EXPANSION expansion] – create a filter with specific parameters.
Sample usage:
BF.ADD myFilter java
BF.ADD myFilter javaguide
BF.EXISTS myFilter java // returns 1 (probably present)
BF.EXISTS myFilter github // returns 0 (definitely absent)References
Original article and additional resources are linked throughout the text, including the official Redis modules page and the RedisBloom GitHub repository.
Java Tech Enthusiast
Sharing computer programming language knowledge, focusing on Java fundamentals, data structures, related tools, Spring Cloud, IntelliJ IDEA... Book giveaways, red‑packet rewards and other perks await!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.