Big Data 8 min read

Understanding Bloom Filters and Their Implementation with Google Guava and Redis

This article explains the principles of Bloom filters, their false‑positive behavior, and demonstrates how to implement them using Google Guava in Java and Redis ReBloom via Docker, including code examples and a practical membership‑filtering use case.

Big Data Technology & Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
Understanding Bloom Filters and Their Implementation with Google Guava and Redis

Bloom filters are space‑efficient probabilistic data structures that can test whether an element is a member of a set, offering fast queries with a small false‑positive rate.

A Bloom filter consists of a bit array and K independent hash functions; adding an element hashes it K times and sets the corresponding bits to 1.

Because bits may be set by multiple elements, a query may return true for an element that was never added, which is the inherent false‑positive characteristic; however, a negative result is always correct.

Google's Guava library provides an in‑memory Bloom filter implementation. The article shows how to add the Guava dependency and create a BloomFilter<Integer> in a Spring service, populate it from a database, and expose a REST endpoint to check membership.

<dependency>
    <groupId>com.google.guava</groupId>
    <artifactId>guava</artifactId>
    <version>21.0</version>
</dependency>
@Service
public class BloomFilterService {
    @Autowired
    private UserMapper userMapper;

    private BloomFilter<Integer> bf;

    /**
     * Create Bloom filter (default 3% error rate)
     */
    @PostConstruct
    public void initBloomFilter() {
        List<User> userList = userMapper.selectAllUser();
        if (CollectionUtils.isEmpty(userList)) {
            return;
        }
        // create Bloom filter
        bf = BloomFilter.create(Funnels.integerFunnel(), userList.size());
        for (User user : userList) {
            bf.put(user.getId());
        }
    }

    /**
     * Check if an id might exist in the Bloom filter
     */
    public boolean userIdExists(int id) {
        return bf.mightContain(id);
    }
}
@RestController
public class BloomFilterController {
    @Autowired
    private BloomFilterService bloomFilterService;

    @RequestMapping("/bloom/idExists")
    public boolean ifExists(int id) {
        return bloomFilterService.userIdExists(id);
    }
}

Redis offers a distributed Bloom filter via the ReBloom module. After launching a Docker container with

docker run -d -p 6379:6379 --name bloomfilter redislabs/rebloom

, commands such as bf.add, bf.exists, bf.madd, and bf.mexists can be used. The filter can be pre‑allocated with bf.reserve to control error_rate and initial_size.

# Docker installation
[root@localhost ~]# docker run -d -p 6379:6379 --name bloomfilter redislabs/rebloom

# Basic usage
[root@localhost ~]# docker exec -it bloomfilter /bin/bash
root@container:/data# redis-cli -p 6379
127.0.0.1:6379> bf.add urls www.taobao.com
(integer) 1
127.0.0.1:6379> bf.exists urls www.taobao.com
(integer) 1
127.0.0.1:6379> bf.madd urls www.baidu.com www.tianmao.com
1) (integer) 1
2) (integer) 1
127.0.0.1:6379> bf.mexists urls www.baidu.com www.tianmao.com
1) (integer) 1
2) (integer) 1

# Reserve a filter with custom parameters
127.0.0.1:6379> bf.reserve user 0.01 100
OK

The article compares the two implementations: Guava's filter lives in JVM memory and is lost on restart, making it unsuitable for distributed scenarios, while Redis's filter is persistent and scalable but incurs network latency.

Finally, a practical use case is presented: filtering non‑member users in a membership lottery, where the Bloom filter eliminates 99 % of non‑members before a database lookup confirms actual membership.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

JavaDockerredisGuavabloom-filterprobabilistic data structure
Big Data Technology & Architecture
Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.