Fundamentals 12 min read

Understanding Bloom Filter, Counting Bloom Filter, and Cuckoo Filter: Principles, Issues, and Optimizations

This article explains the motivation behind using probabilistic filters to reduce I/O, describes how Bloom filters, Counting Bloom filters, and Cuckoo filters work, analyzes their false‑positive and deletion problems, and presents practical optimizations such as multiple hash functions and multi‑slot buckets.

Top Architect
Top Architect
Top Architect
Understanding Bloom Filter, Counting Bloom Filter, and Cuckoo Filter: Principles, Issues, and Optimizations

In many backend systems I/O is a bottleneck, so filters are introduced to block unnecessary database queries before they reach the storage layer.

Bloom Filter uses a bitmap and several hash functions; when an element arrives its hash positions are set to 1, and a later query checks those bits to decide whether the element might exist.

The main issues of a standard Bloom filter are false positives—different elements may set overlapping bits—and the inability to delete elements because clearing bits could invalidate other entries.

Counting Bloom Filter replaces the bitmap with an array of counters, incrementing on insert and decrementing on delete, which solves the deletion problem but still suffers from false positives.

Cuckoo Filter was proposed to address both problems. It stores only a short fingerprint of each element in a cuckoo‑hash‑style table, allowing deletions and offering comparable false‑positive rates.

The basic cuckoo insertion uses two hash functions to map an element to two possible buckets; if both are occupied, one existing element is evicted (“cuckoo‑eviction”) and re‑inserted elsewhere, possibly triggering a chain of evictions.

When the table becomes too full, insertion may require many evictions, so a load‑factor threshold triggers table expansion. Additionally, hash collisions can cause eviction loops.

Optimizations include increasing the number of hash functions (e.g., three or four) and giving each bucket multiple slots, which dramatically improves space utilization and reduces eviction frequency.

In a cuckoo filter the fingerprint fp is stored instead of the full key. The two bucket indices are computed as:

fp = fingerprint(x)
 p1 = hash(x)
 p2 = p1 ^ hash(fp)  // XOR

Because p2 = p1 ^ hash(fp) , knowing one index and the fingerprint allows direct computation of the other, ensuring the two positions are distinct as long as hash(fp) != 0 . The table size is a power of two, so modulo operations reduce to masking low bits.

Data StructuresHashingBloom FilterCache OptimizationCuckoo filterCounting Bloom Filter
Top Architect
Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.