Backend Development 19 min read

Mastering Bloom Filters: From Theory to Guava & Redisson Implementations

Explore the design and mechanics of Bloom filters, understand their role in preventing cache penetration, learn how to calculate optimal parameters, and see practical Java implementations using Google Guava and Redisson, including code snippets, performance considerations, and strategies for handling deletions.

Su San Talks Tech

Jun 26, 2023

Mastering Bloom Filters: From Theory to Guava & Redisson Implementations

Bloom filters are a classic data structure used in many high‑profile projects such as RocketMQ, HBase, Cassandra, LevelDB and RocksDB. For backend developers, mastering Bloom filters is essential for tackling cache‑penetration problems.

Cache Penetration

Consider a product‑detail service:

public Product queryProductById(Long id) {
    // query cache
    Product product = queryFromCache(id);
    if (product != null) {
        return product;
    }
    // query database
    product = queryFromDataBase(id);
    if (product != null) {
        saveCache(id, product);
    }
    return product;
}

If the product is absent from both cache and database, the service cannot write back to the cache, causing massive DB pressure— a typical cache‑penetration scenario. A common mitigation is to write a short‑lived empty placeholder into the cache, but this incurs storage cost.

The core question is: How to check whether an element belongs to a set with minimal cost? The answer is the Bloom filter, which balances time and space efficiently.

Principle Analysis

A Bloom filter (Bloom, 1970) consists of a long binary vector and a set of hash functions. It offers excellent space efficiency and query speed, at the expense of a false‑positive rate and difficulty deleting elements.

When an element is added, k hash functions map it to k positions in the bit array and set those bits to 1. To test membership, the same k bits are checked; if any is 0 the element is definitely absent, otherwise it is probably present.

Example: a bit array of length m = 8 with k = 3 . Elements x and y set bits {0,4,7} and {1,4,6} respectively. Overlap at bit 4 shows how false positives arise as more elements are inserted.

The four core properties are:

k: number of hash functions

m: length of the bit array

n: number of inserted elements

p: false‑positive probability

A small m quickly leads to all bits being 1, causing the filter to always return “maybe”. Larger m reduces p . More hash functions lower p but increase query time.

The false‑positive rate formula is:

Probability that a specific bit is 0 after k hashings:

Probability that a bit remains 0 after inserting n elements:

Probability that a bit is 1 after inserting n elements:

Overall false‑positive rate:

(approx.

when m is large).

Given desired p and expected n , we can compute optimal m and k .

Deletion support : Standard Bloom filters cannot delete elements because bits are shared among multiple items.

Time and space efficiency : Space O(m), insertion and query time O(k), both independent of the number of stored elements.

Hash function choices : Non‑cryptographic functions such as Murmur3, FNV series, and Jenkins are commonly used for their speed and good distribution.

Guava Implementation

Google Guava provides a ready‑made Bloom filter implementation.

1. Add Maven dependency:

<dependency>
    <groupId>com.google.guava</groupId>
    <artifactId>guava</artifactId>
    <version>31.0.1-jre</version>
</dependency>

2. Create a Bloom filter:

BloomFilter<Integer> filter = BloomFilter.create(
    Funnels.integerFunnel(),
    10000,          // expected insertions
    0.001           // false‑positive rate
);

3. Add data:

@PostConstruct
public void addProduct() {
    logger.info("Initializing Bloom filter data");
    filter.put(1L);
    filter.put(2L);
    filter.put(3L);
    filter.put(4L);
    logger.info("Bloom filter data initialization finished");
}

4. Check existence:

public boolean mayContain(Long id) {
    return filter.mightContain(id);
}

Guava’s internal methods compute the optimal bit array size and hash count, matching the formulas described earlier:

static <T> BloomFilter<T> create(Funnel<? super T> funnel, long expectedInsertions, double fpp, BloomFilter.Strategy strategy) {
    long numBits = optimalNumOfBits(expectedInsertions, fpp);
    int numHashFunctions = optimalNumOfHashFunctions(expectedInsertions, numBits);
    // ...
}

static long optimalNumOfBits(long n, double p) {
    if (p == 0) p = Double.MIN_VALUE;
    return (long)(-n * Math.log(p) / (Math.log(2) * Math.log(2)));
}

static int optimalNumOfHashFunctions(long n, long m) {
    return Math.max(1, (int)Math.round((double)m / n * Math.log(2)));
}

The mightContain method returns true if the element may be present (possible false positive) and false if it is definitely absent.

Redisson Implementation

Redisson offers a Redis‑backed Bloom filter.

1. Maven dependency:

<dependency>
    <groupId>org.redisson</groupId>
    <artifactId>redisson</artifactId>
    <version>3.16.1</version>
</dependency>

2. Configure Redisson client:

@Configuration
public class RedissonConfig {
    @Bean
    public RedissonClient redissonClient() {
        Config config = new Config();
        config.useSingleServer().setAddress("redis://localhost:6379");
        return Redisson.create(config);
    }
}

3. Initialize Bloom filter:

RBloomFilter<Long> bloomFilter = redissonClient
    .getBloomFilter("myBloomFilter");
// 10000 expected insertions, 0.001 false‑positive rate
bloomFilter.tryInit(10000, 0.001);
// add elements
bloomFilter.add(1L);
... // add 2L, 3L, 4L

4. Check existence:

public boolean mightContain(Long id) {
    return bloomFilter.contains(id);
}

Redisson stores the filter’s core attributes (size, hash iterations, expected insertions, false‑positive probability) in a Redis hash. Elements are represented by a bitmap (Redis string) where each bit can be set or cleared using SETBIT / GETBIT. The bitmap can hold up to 2^32 bits.

A simple test creates a bitmap key mybitset and sets bits 3, 5, 6, 8 to 1, confirming the binary representation matches expectations.

Practical Tips

Both Guava and Redisson make Bloom filter usage straightforward. In real‑world scenarios consider the following:

Cache‑penetration scenario : Initialize the filter at startup. For each request, check the filter; if the element is absent, return “not found” immediately. If present, query the cache, fall back to the database if needed, and write back to the cache.

Element‑deletion scenario : Standard Bloom filters cannot delete items because bits are shared. Two common workarounds are:

Counting Bloom filter – each bit is a small counter that can be incremented on insert and decremented on delete (at the cost of higher memory usage).

Periodic rebuild – regularly reconstruct the filter from the current data set, discarding stale entries.

Rebuilding steps:

Trigger a scheduled job to fetch all product IDs.

Insert IDs into a new Bloom filter.

Swap the old filter reference with the new one.

Update the service to use the new filter.

Delete the old filter at an appropriate time.

Conclusion

A Bloom filter is a long binary vector combined with multiple hash functions, used to test set membership with high space and time efficiency while tolerating a configurable false‑positive rate.

Its four core attributes are:

k – number of hash functions

m – length of the bit array

n – number of inserted elements

p – false‑positive probability

In the Java ecosystem, Guava and Redisson provide concise ways to create and use Bloom filters. Although standard Bloom filters cannot delete elements, counting filters or periodic reconstruction can achieve the desired delete semantics.

Many open‑source projects adopt Bloom filters because they are simple, efficient, and provide a practical trade‑off between accuracy and resource consumption.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

backend Java Guava Data Structures bloom filter redisson Cache Penetration

Written by

Su San Talks Tech

Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.