Fundamentals 16 min read

Unlocking Fast Set Membership: Bloom & Cuckoo Filters Explained

This article introduces Bloom filters and Cuckoo filters, explains their probabilistic nature, false‑positive behavior, space‑time trade‑offs, provides Go and Java implementation examples, and discusses practical use cases such as Redis extensions and high‑traffic e‑commerce scenarios.

JD Cloud Developers

Nov 7, 2024

Unlocking Fast Set Membership: Bloom & Cuckoo Filters Explained

Bloom Filter

Bloom filters are probabilistic data structures used to test whether an element is a member of a set. They do not store the actual elements but a hashed representation, guaranteeing no false negatives while allowing false positives, which makes them space‑efficient and fast.

Internally a Bloom filter consists of a bit array. When an element is added, it is hashed and the bits at positions [hashval % nbits] are set to 1. To query an element, the same hash functions are applied and the corresponding bits are checked; if any bit is 0 the element is definitely absent.

Multiple hash functions reduce collision risk. The number of bits per element is fixed at creation; more bits lower the false‑positive probability. The fill‑rate of the filter also affects accuracy—over‑filled filters increase false positives.

Redis offers an expandable Bloom filter that creates a new, larger filter when capacity is reached, checking across all filters for membership. Insertion time is O(K) where K is the number of hash functions; checking an element in an expanded filter is O(K) or O(K·(n+1)) with n expanded filters.

type BloomFilter struct {
    m uint   // number of bits
    k uint   // number of hash functions
    b *bitset.BitSet
}

func NewWithEstimates(n uint, fp float64) *BloomFilter {
    m, k := EstimateParameters(n, fp)
    return New(m, k)
}

func EstimateParameters(n uint, p float64) (m uint, k uint) {
    m = uint(math.Ceil(-1 * float64(n) * math.Log(p) / math.Pow(math.Log(2), 2)))
    k = uint(math.Ceil(math.Log(2) * float64(m) / float64(n)))
    return
}

func New(m uint, k uint) *BloomFilter {
    return &BloomFilter{max(1, m), max(1, k), bitset.New(m)}
}

func (f *BloomFilter) Add(data []byte) *BloomFilter {
    h := baseHashes(data)
    for i := uint(0); i < f.k; i++ {
        f.b.Set(f.location(h, i))
    }
    return f
}

func (f *BloomFilter) Test(data []byte) bool {
    h := baseHashes(data)
    for i := uint(0); i < f.k; i++ {
        if !f.b.Test(f.location(h, i)) {
            return false
        }
    }
    return true
}

func (f *BloomFilter) location(h [4]uint64, i uint) uint {
    return uint(location(h, i) % uint64(f.m))
}

func location(h [4]uint64, i uint) uint64 {
    ii := uint64(i)
    return h[ii%2] + ii*h[2+(((ii+(ii%2))%4)/2)]
}

Using Bloom Filters with Redisson and Guava

Redisson provides a simple API for Bloom filters. Example (Java):

private void redisson() {
    RedissonClient redissonClient = Redisson.create();
    RBloomFilter<Object> bloomFilter = redissonClient.getBloomFilter("bloomFilter");
    // Initialize with capacity 1 billion and false‑positive rate 0.001
    bloomFilter.tryInit(1000000000, 0.001);
    Object obj = new Object();
    bloomFilter.add(obj); // add element
    boolean exist = bloomFilter.contains(obj); // check existence
}

Guava also offers a thread‑safe Bloom filter implementation (since version 23.0):

import com.google.common.hash.BloomFilter;
import com.google.common.hash.Funnels;

public class BloomFilterExample {
    public static void main(String[] args) {
        BloomFilter<Integer> bloomFilter = BloomFilter.create(Funnels.integerFunnel(), 3000000, 0.01);
        for (int i = 0; i < 3000000; i++) {
            bloomFilter.put(i);
        }
        for (int i = 0; i < 3001000; i++) {
            if (bloomFilter.mightContain(i)) {
                System.out.println(i + " might be in the filter.");
            } else {
                System.out.println(i + " is definitely not in the filter.");
            }
        }
    }
}

Cuckoo Filter

Cuckoo filters address Bloom filter limitations by supporting deletions. They use a bucket array where each bucket stores a small number of fingerprints (partial hashes). An element maps to two candidate buckets; if both are full, an existing element is evicted and re‑hashed, potentially triggering a chain of relocations.

Insertion time is O(1) amortized, but high load factors can degrade performance. Like Bloom filters, Cuckoo filters also produce false positives due to limited bucket space, fingerprint collisions, hash function properties, and load factor.

Key parameters (as used in the popular Golang implementation): each element has 2 candidate buckets, each bucket stores 4 fingerprints, and a fingerprint size of 8 bits yields a false‑positive rate between 0.00001 and 0.002 while keeping space usage high.

// Delete removes a fingerprint from the filter
func (cf *Filter) Delete(data []byte) bool {
    i1, fp := getIndexAndFingerprint(data, cf.bucketPow)
    if cf.delete(fp, i1) {
        return true
    }
    i2 := getAltIndex(fp, i1, cf.bucketPow)
    return cf.delete(fp, i2)
}

Deletion is safe only for elements that were previously inserted; otherwise it may unintentionally remove shared fingerprints, leading to false positives.

Typical Use Cases

Username existence check : preload registered usernames into a filter to quickly reject duplicates.

Ad targeting : store purchased items per user to filter irrelevant ads.

High‑traffic e‑commerce : during large promotions, cache completed orders in a Bloom filter to block invalid detail‑page requests and protect the database.

References

Redis – Bloom filter

GitHub – bloom (Go implementation)

Redisson – Bloom filter

Redis – Cuckoo filter

Linvon – Cuckoo filter paper

COOLSHELL – Cuckoo Filter: Design and Implementation

木鸟杂技 – Cuckoo hashing and Cuckoo filter

GitHub – cuckoofilter

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Golang Redis bloom filter probabilistic data structure Cuckoo Filter

Written by

JD Cloud Developers

JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.