Fundamentals 15 min read

Bloom Filter: Principles, Guava & Redisson Implementations, and Practical Usage

This article explains the Bloom filter data structure, its mathematical foundations and false‑positive analysis, demonstrates how to implement it with Google Guava and Redisson in Java, and discusses practical considerations such as cache‑penetration mitigation, deletion strategies, and periodic rebuilding.

IT Services Circle

May 2, 2023

Bloom Filter: Principles, Guava & Redisson Implementations, and Practical Usage

Bloom filter is a classic probabilistic data structure that efficiently tests whether an element belongs to a set using a bit array and multiple hash functions, offering superior space and query‑time performance; it is employed in many projects such as RocketMQ, HBase, Cassandra, LevelDB and RocksDB.

In backend services it is often used to solve cache‑penetration problems: before hitting the database, the filter quickly tells whether a key might exist, allowing the system to return a negative result without expensive database access.

Principle : when an element is added, k hash functions map it to k positions in a bit array of length m and set those bits to 1. During a query, if any of the k bits is 0 the element is definitely absent; if all are 1 the element may be present, with a false‑positive probability p. The relationships are:

k – number of hash functions

m – length of the bit array

n – number of inserted elements

p – false‑positive rate

Formulas (shown as images in the original article) compute p, the probability that a specific bit remains 0 after n insertions, and the overall false‑positive rate. Larger m reduces p, while more hash functions lower p but increase query cost.

Guava implementation

public Product queryProductById(Long id) {
    // query cache
    Product product = queryFromCache(id);
    if (product != null) {
        return product;
    }
    // query DB
    product = queryFromDataBase(id);
    if (product != null) {
        saveCache(id, product);
    }
    return product;
}

Dependency:

<dependency>
    <groupId>com.google.guava</groupId>
    <artifactId>guava</artifactId>
    <version>31.0.1-jre</version>
</dependency>

Creating the filter:

BloomFilter<Integer> filter = BloomFilter.create(
    Funnels.integerFunnel(),
    10000,          // expected insertions
    0.001);        // false‑positive rate

Adding data:

@PostConstruct
public void addProduct() {
    logger.info("Initializing Bloom filter data");
    filter.put(1L);
    filter.put(2L);
    filter.put(3L);
    filter.put(4L);
    logger.info("Initialization finished");
}

Querying:

public boolean mayContain(Long id) {
    return filter.mightContain(id);
}

Redisson implementation

Dependency:

<dependency>
    <groupId>org.redisson</groupId>
    <artifactId>redisson</artifactId>
    <version>3.16.1</version>
</dependency>

Configuration bean:

@Configuration
public class RedissonConfig {
    @Bean
    public RedissonClient redissonClient() {
        Config config = new Config();
        config.useSingleServer().setAddress("redis://localhost:6379");
        return Redisson.create(config);
    }
}

Using the Bloom filter:

RBloomFilter<Long> bloomFilter = redissonClient.getBloomFilter("myBloomFilter");
// expected 10000 elements, 0.001 false probability
bloomFilter.tryInit(10000, 0.001);
// add elements
bloomFilter.add(1L);
... // add more

Checking existence:

public boolean mightContain(Long id) {
    return bloomFilter.contains(id);
}

Practical considerations

Initialize the filter at service startup; for each request, check the filter first. If the element is not present, return “not found” without touching the database.

If the element may exist, query the cache, fall back to the database, and write back to the cache.

Standard Bloom filters do not support deletion because clearing a bit could affect other elements.

Two common work‑arounds: (a) use a Counting Bloom filter where each bit is a small counter, allowing decrements on delete (at the cost of extra space); (b) periodically rebuild the filter with fresh data and switch to the new instance.

In summary, Bloom filters provide a compact, fast way to test set membership with a controllable false‑positive rate. Java developers can leverage Guava for in‑process filters or Redisson for distributed filters backed by Redis. When deletions are required, consider a counting variant or a scheduled rebuild strategy.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java Guava bloom filter redisson Cache Penetration probabilistic data structure

Written by

IT Services Circle

Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.