Fundamentals 13 min read

Understanding Bloom Filters, Counting Bloom Filters, and Cuckoo Filters

This article explains the motivation behind using filters to reduce database I/O, introduces Bloom filters and their bitmap implementation, discusses their limitations, then covers Counting Bloom filters and Cuckoo filters—including their hash structures, deletion support, performance trade‑offs, and practical improvements.

Top Architect
Top Architect
Top Architect
Understanding Bloom Filters, Counting Bloom Filters, and Cuckoo Filters

In many backend systems, database queries become a bottleneck when a large number of requests ask for data that does not exist, causing unnecessary I/O; filters are introduced to block such requests before they reach the database.

Bloom Filter

A Bloom filter uses a bitmap to record the presence of elements. When a request arrives, three hash functions map the element to three bitmap positions; if all three bits are set, the element is considered present, otherwise it is rejected.

Example: a bitmap initially all 0. After inserting an element, positions 1, 3, 6 are set to 1. A later query that maps to the same three positions is treated as a hit.

Problems of Bloom Filter

False positives: Different elements may set overlapping bits, causing a non‑existent element to appear present.

Deletion impossible: Removing an element would clear bits that might belong to other elements, leading to incorrect negatives.

Counting Bloom Filter

To support deletion, the bitmap is replaced by an array of counters; each insertion increments the counters at the hashed positions, and deletion decrements them. This eliminates the need to rebuild the filter but still suffers from false positives.

Cuckoo Filter

The Cuckoo filter, proposed in the paper "Cuckoo Filter: Better Than Bloom," stores only a fingerprint of each element and supports deletion. It improves query performance and space efficiency compared to Bloom filters.

查询性能弱 , 空间利用效率低 , 不支持反向操作(删除) , 不支持计数 are the main drawbacks of Bloom filters that Cuckoo filters aim to solve.

Why "Cuckoo"?

The name comes from the cuckoo bird’s habit of laying its eggs in other birds' nests, analogous to how elements are displaced between two possible locations in the data structure.

Cuckoo Hash Structure

Each element can be placed in one of two buckets determined by two hash functions. If both are occupied, one element is evicted ("鸠占鹊巢") and re‑inserted using its alternative bucket, possibly causing a chain of evictions.

p1 = hash1(x) % l
p2 = hash2(x) % l

If the eviction chain becomes too long, the table is considered full and must be resized.

Improvements

Increasing the number of hash functions (e.g., 3 or 4) reduces collisions and raises space utilization to about 95%.

Adding multiple slots per bucket (e.g., 2 slots) improves cache locality and query speed, achieving around 85% space utilization with high performance.

Combining both ideas—using 4 hash functions and 2 slots per bucket—can push space utilization close to 99% while maintaining fast operations.

Cuckoo Filter Design

The filter stores only a short fingerprint of each element. Given a fingerprint fp and one bucket index p1 , the alternate bucket is computed as p2 = p1 ^ hash(fp) , ensuring p1 != p2 when hash(fp) != 0 . The table size is a power of two, so modulo operations reduce to masking low bits.

fp = fingerprint(x)
p1 = hash(x)
p2 = p1 ^ hash(fp) // XOR

This property allows deletion without knowing the original element, because the fingerprint alone determines both possible locations.

Overall, the article provides a comparative overview of Bloom, Counting Bloom, and Cuckoo filters, their algorithms, advantages, limitations, and practical enhancements for high‑performance backend systems.

data structuresBloom Filterbackend optimizationhash functionsCuckoo filterCounting Bloom
Top Architect
Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.