Mastering Bloom Filters: Theory, Java Guava & Redisson Implementations
This article explores Bloom filters—how they efficiently test set membership, their underlying principles, error rates, and practical usage in Java via Guava and Redisson, including code examples, configuration tips, handling cache penetration, and strategies for element deletion and periodic rebuilding.
What is a Bloom Filter?
A Bloom filter is a clever and classic data structure used in many well‑known projects such as RocketMQ, HBase, Cassandra, LevelDB and RocksDB. For backend developers, understanding Bloom filters is essential.
1. Cache Penetration Example
Consider a product‑detail query service:
public Product queryProductById(Long id) {
// query cache
Product product = queryFromCache(id);
if (product != null) {
return product;
}
// query database
product = queryFromDataBase(id);
if (product != null) {
saveCache(id, product);
}
return product;
}If the product is absent from both cache and database, the service cannot write back to the cache, causing massive database pressure—an典型的缓存穿透 scenario.
To mitigate this, a short‑lived empty placeholder can be written to the distributed cache, but this consumes storage. The core problem is: How to check the existence of an element in a set with minimal cost? The Bloom filter solves this by balancing time and space.
2. Principle Analysis
A Bloom filter (Bloom, 1970) consists of a long binary vector and a set of random hash functions. It offers excellent space efficiency and query speed, at the cost of a false‑positive rate and difficulty deleting elements.
When an element is added, k hash functions map it to k bits in the vector and set those bits to 1. To test membership, the same k bits are checked: if any is 0, the element is definitely absent; if all are 1, the element is probably present.
In simple terms, a bit array of length m is initialized to 0. Each element is hashed k times, and the resulting positions (mod m ) are set to 1.
As more elements are inserted, more bits become 1, increasing the false‑positive probability p . The four core parameters are:
k: number of hash functions
m: length of the bit array
n: number of inserted elements
p: false‑positive rate
If m is too small, most bits quickly become 1, rendering the filter ineffective. Larger m reduces p . More hash functions lower p but increase query time.
The false‑positive probability can be derived as:
Probability that a specific bit is still 0 after k hashings:
Probability that a bit remains 0 after inserting n elements:
Probability that a bit is 1 after inserting n elements:
Overall false‑positive rate:
(approximated by
when m is large).
Given desired p and expected n , we can compute optimal m and k .
3. Guava Implementation
Google Guava provides a ready‑made Bloom filter.
1. Add Maven dependency
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>31.0.1-jre</version>
</dependency>2. Create a Bloom filter
BloomFilter<Integer> filter = BloomFilter.create(
Funnels.integerFunnel(),
10000, // expected insertions
0.001); // false‑positive rate3. Add data
@PostConstruct
public void addProduct() {
logger.info("Initializing Bloom filter data");
filter.put(1L);
filter.put(2L);
filter.put(3L);
filter.put(4L);
logger.info("Initialization complete");
}4. Check existence
public boolean mayContain(Long id) {
return filter.mightContain(id);
}The core method in Guava is:
public <T extends @Nullable Object> boolean mightContain(
T object,
Funnel<? super T> funnel,
int numHashFunctions,
LockFreeBitArray bits) {
long bitSize = bits.bitSize();
byte[] bytes = Hashing.murmur3_128().hashObject(object, funnel).getBytesInternal();
long hash1 = lowerEight(bytes);
long hash2 = upperEight(bytes);
long combinedHash = hash1;
for (int i = 0; i < numHashFunctions; i++) {
if (!bits.get((combinedHash & Long.MAX_VALUE) % bitSize)) {
return false;
}
combinedHash += hash2;
}
return true;
}4. Redisson Implementation
Redisson offers a Bloom filter backed by Redis.
1. Add Maven dependency
<dependency>
<groupId>org.redisson</groupId>
<artifactId>redisson</artifactId>
<version>3.16.1</version>
</dependency>2. Configure Redisson client
@Configuration
public class RedissonConfig {
@Bean
public RedissonClient redissonClient() {
Config config = new Config();
config.useSingleServer().setAddress("redis://localhost:6379");
return Redisson.create(config);
}
}3. Initialize the filter
RBloomFilter<Long> bloomFilter = redissonClient.getBloomFilter("myBloomFilter");
// 10000 expected insertions, 0.001 false‑positive rate
bloomFilter.tryInit(10000, 0.001);
// add elements
bloomFilter.add(1L);
... // add 2L,3L,4L4. Check existence
public boolean mightContain(Long id) {
return bloomFilter.contains(id);
}Redisson stores the filter’s four core attributes in a Redis hash and uses a bitmap (bitset) to represent the bit array. The bitmap is a Redis string, limited to 2^32 bits, manipulated via GETBIT / SETBIT commands.
5. Practical Tips
Cache‑penetration scenario : Initialize the Bloom filter; on a request, if the filter reports absence, return “not found” immediately. If present, query the cache, fall back to the database, and write back to the cache.
Element‑deletion scenario : Standard Bloom filters cannot delete because bits are shared. Two common solutions are:
Counting Bloom filter – each bit becomes a small counter; insert increments, delete decrements (at the cost of extra space).
Periodic rebuild – schedule a job to rebuild the filter from scratch with the current dataset.
Rebuild steps:
Trigger a scheduled task to query all products.
Add product IDs to a new Bloom filter.
Swap the service mapping from the old filter to the new one.
After verification, delete the old filter.
6. Summary
A Bloom filter is a long binary vector combined with multiple random hash functions, used to test set membership with high space and time efficiency. It offers a controllable false‑positive rate ( p ) while guaranteeing no false negatives.
The four core parameters are k (hash count), m (bit array length), n (expected insertions) and p (false‑positive probability). In Java, Guava and Redisson provide ready‑to‑use implementations, each with simple Maven dependencies and concise APIs.
Although standard Bloom filters cannot delete elements, counting Bloom filters or periodic reconstruction can achieve the same effect when deletions are required.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Su San Talks Tech
Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
