Backend Development 8 min read

Understanding Bloom Filters for Preventing Cache Penetration

This article explains the cache‑penetration problem in high‑traffic systems, introduces Bloom filters as a space‑efficient solution, details their construction and usage with Java and Redis examples, and discusses ways to reduce false positives and handle deletions.

Wukong Talks Architecture

Sep 30, 2021

Understanding Bloom Filters for Preventing Cache Penetration

In high‑concurrency environments, cache misses can lead to cache‑penetration attacks where massive requests hit the database; the article introduces Bloom filters as a low‑memory, fast‑lookup technique to mitigate this risk.

It describes the typical cache workflow—checking the cache, returning data if present, or querying the database on a miss—and highlights the vulnerability when many misses overload the database.

A Bloom filter is defined as a bit array with multiple hash functions that can quickly test set membership, offering high space efficiency and fast queries while having a small false‑positive rate and no native deletion support.

The construction process is illustrated with three product IDs (id1, id2, id3), showing how each ID is hashed three times to set bits in the array, handling hash collisions by leaving already‑set bits unchanged.

To use a Bloom filter, one hashes the item’s key, checks the corresponding bits, and if any bit is 0 the item is definitely absent; if all are 1 the item is possibly present, acknowledging the low false‑positive probability.

False positives can be reduced by increasing the bit‑array length or the number of hash functions, which spreads bits more sparsely and lowers collision chances.

An experiment using Redis BitMap demonstrates that storing ten million flags requires only about 1.19 MB, with sample Java code (shown below) that pipelines bit‑set operations for fast pre‑loading.

redisTemplate.executePipelined(new RedisCallback<Long>() {
    @Nullable
    @Override
    public Long doInRedis(RedisConnection connection) throws DataAccessException {
        connection.openPipeline();
        for (int offset = 10000000; offset >= 0; offset--) {
            boolean value = offset % 2 == 0 ? true : false;
            connection.setBit("bloom-filter-data-1".getBytes(), offset, value);
        }
        connection.closePipeline();
        return null;
    }
});
System.out.println("数据预热完成");

In Java, the Redisson library provides a ready‑made Bloom filter; after adding the Maven dependency, the example creates a filter with 1 % false‑positive rate, adds an element, and checks existence, printing false for a guaranteed miss and true for a possible hit.

/**
 * @author 微信公众号：微观技术
 */
@Test
public void test5() {
    Config config = new Config();
    config.useSingleServer().setAddress("redis://172.16.67.37:6379");
    RedissonClient client = Redisson.create(config);
    RBloomFilter<String> bloomFilter = client.getBloomFilter("test5-bloom-filter");
    // 初始化布隆过滤器，数组长度100W，误判率 1%
    bloomFilter.tryInit(1000000L, 0.01);
    // 添加数据
    bloomFilter.add("Tom哥");
    // 判断是否存在
    System.out.println(bloomFilter.contains("微观技术")); // false
    System.out.println(bloomFilter.contains("Tom哥"));   // true
}

Since Bloom filters cannot delete individual items without risking false deletions, the article suggests periodic recreation of the filter or augmenting it with a parallel counting array to safely handle removals.

Typical application scenarios include preventing cache penetration, deduplicating URLs in web crawlers, and filtering spam email addresses in massive mailing lists.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

java Data Structures bloom-filter cache-penetration

Written by

Wukong Talks Architecture

Explaining distributed systems and architecture through stories. Author of the "JVM Performance Tuning in Practice" column, open-source author of "Spring Cloud in Practice PassJava", and independently developed a PMP practice quiz mini-program.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.