Bloom Filter: Principles, Guava & Redisson Implementations, and Practical Usage
This article explains the Bloom filter data structure, its mathematical foundations and false‑positive analysis, demonstrates how to implement it with Google Guava and Redisson in Java, and discusses practical considerations such as cache‑penetration mitigation, deletion strategies, and periodic rebuilding.
Bloom filter is a classic probabilistic data structure that efficiently tests whether an element belongs to a set using a bit array and multiple hash functions, offering superior space and query‑time performance; it is employed in many projects such as RocketMQ, HBase, Cassandra, LevelDB and RocksDB.
In backend services it is often used to solve cache‑penetration problems: before hitting the database, the filter quickly tells whether a key might exist, allowing the system to return a negative result without expensive database access.
Principle : when an element is added, k hash functions map it to k positions in a bit array of length m and set those bits to 1. During a query, if any of the k bits is 0 the element is definitely absent; if all are 1 the element may be present, with a false‑positive probability p . The relationships are:
k – number of hash functions
m – length of the bit array
n – number of inserted elements
p – false‑positive rate
Formulas (shown as images in the original article) compute p , the probability that a specific bit remains 0 after n insertions, and the overall false‑positive rate. Larger m reduces p , while more hash functions lower p but increase query cost.
Guava implementation
public Product queryProductById(Long id) {
// query cache
Product product = queryFromCache(id);
if (product != null) {
return product;
}
// query DB
product = queryFromDataBase(id);
if (product != null) {
saveCache(id, product);
}
return product;
}Dependency:
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>31.0.1-jre</version>
</dependency>Creating the filter:
BloomFilter<Integer> filter = BloomFilter.create(
Funnels.integerFunnel(),
10000, // expected insertions
0.001); // false‑positive rateAdding data:
@PostConstruct
public void addProduct() {
logger.info("Initializing Bloom filter data");
filter.put(1L);
filter.put(2L);
filter.put(3L);
filter.put(4L);
logger.info("Initialization finished");
}Querying:
public boolean mayContain(Long id) {
return filter.mightContain(id);
}Redisson implementation
Dependency:
<dependency>
<groupId>org.redisson</groupId>
<artifactId>redisson</artifactId>
<version>3.16.1</version>
</dependency>Configuration bean:
@Configuration
public class RedissonConfig {
@Bean
public RedissonClient redissonClient() {
Config config = new Config();
config.useSingleServer().setAddress("redis://localhost:6379");
return Redisson.create(config);
}
}Using the Bloom filter:
RBloomFilter<Long> bloomFilter = redissonClient.getBloomFilter("myBloomFilter");
// expected 10000 elements, 0.001 false probability
bloomFilter.tryInit(10000, 0.001);
// add elements
bloomFilter.add(1L);
... // add moreChecking existence:
public boolean mightContain(Long id) {
return bloomFilter.contains(id);
}Practical considerations
Initialize the filter at service startup; for each request, check the filter first. If the element is not present, return “not found” without touching the database.
If the element may exist, query the cache, fall back to the database, and write back to the cache.
Standard Bloom filters do not support deletion because clearing a bit could affect other elements.
Two common work‑arounds: (a) use a Counting Bloom filter where each bit is a small counter, allowing decrements on delete (at the cost of extra space); (b) periodically rebuild the filter with fresh data and switch to the new instance.
In summary, Bloom filters provide a compact, fast way to test set membership with a controllable false‑positive rate. Java developers can leverage Guava for in‑process filters or Redisson for distributed filters backed by Redis. When deletions are required, consider a counting variant or a scheduled rebuild strategy.
IT Services Circle
Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.