Using Bloom Filters to Prevent Cache Penetration in High‑Concurrency Systems
This article explains how cache penetration occurs in high‑traffic systems and demonstrates how Bloom filters—using minimal memory and fast hash checks—can efficiently prevent such attacks, including construction details, Redis BitMap experiments, and Java Redisson implementation examples.
In high‑traffic e‑commerce platforms, cache misses can cause massive request bursts to the database, leading to the phenomenon known as cache penetration, which can severely degrade system performance or even crash the database.
Bloom filters, introduced by Bloom in 1970, are space‑efficient probabilistic data structures that answer the question "Is an element possibly in a set?" using a bit array and multiple hash functions, offering fast lookups with a controllable false‑positive rate.
To build a Bloom filter, a binary array of length n is allocated. Each element (e.g., product IDs id1, id2, id3) is hashed three times; the resulting indices are set to 1. If a bit is already 1, it indicates a hash collision but does not affect correctness.
When checking cache, the workflow is: (1) query the cache; (2) if hit, return the data; (3) if miss, query the Bloom filter; (3‑a) if any of the three bits is 0, the item definitely does not exist and the request can be rejected early; (3‑b) if all bits are 1, the item may exist and the database is queried.
False positives can be reduced by (a) increasing the size of the bit array, which spreads hash values more sparsely, and (b) increasing the number of hash functions, which adds more independent checks.
A memory‑usage experiment using Redis BitMap demonstrates that storing 10 million flags requires only about 1.19 MB (10 000 000 / 8 / 1024 / 1024). The initialization code is:
redisTemplate.executePipelined(new RedisCallback<Long>() {
@Nullable
@Override
public Long doInRedis(RedisConnection connection) throws DataAccessException {
connection.openPipeline();
for (int offset = 10000000; offset >= 0; offset--) {
boolean value = offset % 2 == 0 ? true : false;
connection.setBit("bloom-filter-data-1".getBytes(), offset, value);
}
connection.closePipeline();
return null;
}
});
System.out.println("数据预热完成");In Java, the Redisson library provides a ready‑made Bloom filter. After adding the Maven dependency:
<dependency>
<groupId>org.redisson</groupId>
<artifactId>redisson</artifactId>
<version>3.11.1</version>
</dependency>the filter can be used as follows:
/**
* @author 微信公众号:微观技术
*/
@Test
public void test5() {
Config config = new Config();
config.useSingleServer().setAddress("redis://172.16.67.37:6379");
RedissonClient client = Redisson.create(config);
RBloomFilter<String> bloomFilter = client.getBloomFilter("test5-bloom-filter");
// Initialize with 1,000,000 slots and 1% false‑positive rate
bloomFilter.tryInit(1000000L, 0.01);
// Add data
bloomFilter.add("Tom哥");
// Check existence
System.out.println(bloomFilter.contains("微观技术")); // false
System.out.println(bloomFilter.contains("Tom哥")); // true (may be false‑positive)
}The output shows false for a definitely missing element and true for a possibly present one, illustrating the 1 % false‑positive rate.
Deletion is not straightforward because clearing a bit may affect other elements that share the same hash positions. Common strategies include periodically rebuilding the filter or maintaining a parallel counter array to track the number of elements mapped to each bit.
Typical application scenarios for Bloom filters include preventing cache penetration, deduplicating URLs in web crawlers, and fast membership checks in anti‑spam systems.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Full-Stack Internet Architecture
Introducing full-stack Internet architecture technologies centered on Java
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
