Fundamentals 13 min read

Understanding Bloom Filter, Counting Bloom Filter, and Cuckoo Filter: Principles, Issues, and Optimizations

This article explains the concepts, advantages, and limitations of Bloom filters, Counting Bloom filters, and Cuckoo filters, illustrating how they reduce unnecessary I/O in backend systems and offering practical improvements such as multi‑hash functions and bucket designs to enhance space and time efficiency.

Top Architect
Top Architect
Top Architect
Understanding Bloom Filter, Counting Bloom Filter, and Cuckoo Filter: Principles, Issues, and Optimizations

Everyone knows that I/O is a bottleneck in computers, and many frameworks and hardware aim to reduce I/O operations; this article introduces filters, starting with a typical scenario where backend services query databases and may waste I/O on non‑existent data.

Bloom Filter (Bloom Filter) works by checking a bitmap before accessing the database. When a request arrives, three hash functions map the data to three bits; if all bits are 1 the item is considered present, otherwise it is rejected.

The bitmap initially contains all zeros. When data is inserted, the three hash values set the corresponding bits to 1. A later query recomputes the same three hashes; if any bit is 0 the item was never stored.

Problems of Bloom Filter

1. False positives : different items may set overlapping bits, causing a non‑existent item to appear present.

2. Deletion impossible : clearing bits may remove information for other items, and false positives make it unclear whether an item truly exists.

Counting Bloom Filter replaces the bitmap with an array of counters; each insertion increments the counters, each deletion decrements them, thus avoiding the deletion issue, though false positives remain.

Cuckoo Filter was proposed to overcome Bloom’s shortcomings. It stores only fingerprints (a few bits) of elements in a one‑dimensional array, allowing deletions and offering better space efficiency.

Key points of Cuckoo Filter:

Two hash functions map an element to two possible buckets.

Each bucket can hold multiple “slots”.

If both buckets are full, an existing fingerprint is evicted (“cuckoo‑eviction”) and re‑inserted elsewhere.

When the array becomes too crowded, the filter must be resized; otherwise, endless eviction loops can occur.

Optimizations include increasing the number of hash functions (e.g., 4 instead of 2) and adding multiple slots per bucket, which together can raise space utilization to over 99% while keeping high query speed.

Cuckoo Hash Structure (one‑dimensional array) stores full elements; the filter version stores only fingerprints, leading to similar false‑positive behavior as Bloom filters.

Fingerprint calculation and bucket location use a special hash function: fp = fingerprint(x) p1 = hash(x) p2 = p1 ^ hash(fp) // XOR Because the array size is a power of two, the modulo operation is implicit in the low bits.

Given a known fingerprint and one bucket index, the opposite bucket can be computed: p1 = p2 ^ hash(fp) This ensures that an element never maps to the same bucket twice, preventing self‑eviction loops.

Overall, Bloom, Counting Bloom, and Cuckoo filters provide trade‑offs between space, false‑positive rate, and support for deletions, and can be tuned with multiple hash functions and bucket capacities to suit backend caching and database access patterns.

BackendData StructuresHashingBloom FilterCache OptimizationCuckoo filterCounting Bloom Filter
Top Architect
Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.