How Redis Implements Bloom Filters to Prevent Cache Penetration
The article explains the probabilistic Bloom filter data structure, its initialization, hash‑based insertion and query process, illustrates collisions with concrete examples, and shows how Redis (via Redisson) supports Bloom filters with Java code to efficiently block cache‑penetration attacks.
Bloom filter is a probabilistic data structure that uses minimal space and high speed to answer whether an element is definitely not in a set or possibly in the set.
Its implementation consists of three steps: (1) initialize a fixed‑size bit array with all bits set to 0; (2) when adding an element, compute several hash functions, take each result modulo the array length, and set the corresponding bits to 1; (3) when querying, compute the same hashes and if any resulting bit is 0 the element is definitely absent, otherwise it may exist. Because hash collisions can occur, false positives are possible.
Concrete examples with an 8‑bit array show how adding "Tom" and "John" sets bits 1, 5 and 3, 5 respectively, illustrating collisions and the need for multiple hash functions. A query for "Eric" (bits 1, 2) returns absent, while a query for "Jack" (bits 1, 3) would be a false positive.
Redis supports Bloom filters through the Redisson Java client. The following code creates a Bloom filter named userList, initializes it for an expected 1,000,000 elements with a 1% error rate, adds an element, and checks membership:
// Build a Bloom filter with a name
RBloomFilter<String> bloomFilter = redissonClient.getBloomFilter("userList");
// Initialize: expected 1,000,000 items, 1% false‑positive rate
bloomFilter.tryInit(1000000L, 0.01);
// Add an element
bloomFilter.add("1234567");
// Query membership
System.out.println(bloomFilter.contains("1234567")); // true
System.out.println(bloomFilter.contains("12345")); // falseIn practice, before inserting data into a persistent store (e.g., MySQL), the element is first added to the Bloom filter; subsequent insertions check the filter first to avoid unnecessary database writes. For cache penetration, the workflow is: (1) create a Bloom filter and add new keys after they are stored; (2) on a read request, query the filter first—if absent, return immediately; (3) if present, proceed to the cache as usual. This dramatically reduces load from requests for non‑existent keys.
Overall, Bloom filters, despite their probabilistic nature, are highly effective in scenarios such as username existence checks, tracking viewed content, or preventing cache penetration when combined with Redis.
Java Backend Full-Stack
Provides technical guidance, interview coaching, and tech sharing. Follow and reply '77' to receive our self-made 'Interview Cheat Sheet' and interview resources.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
