Fundamentals 12 min read

Understanding Bloom Filters, Counting Bloom Filters, and Cuckoo Filters

The article explains how Bloom filters, Counting Bloom filters, and Cuckoo filters work, their hash‑based bitmap mechanisms, advantages and limitations such as false positives and deletion issues, and presents practical improvements and hash functions for efficient cache and database query optimization.

Top Architect
Top Architect
Top Architect
Understanding Bloom Filters, Counting Bloom Filters, and Cuckoo Filters

In many backend systems, I/O is a bottleneck, especially when repeated queries request data that does not exist in the database. To reduce unnecessary I/O, filter structures like Bloom filters are introduced.

Bloom Filter

A Bloom filter uses a bitmap and multiple hash functions. When an element arrives, three hash functions compute three positions in the bitmap and set those bits to 1. A later query hashes the same element; if all three bits are 1, the element is assumed to exist, otherwise it is rejected.

Because bits may be set by different elements, Bloom filters can produce false positives. They also cannot delete elements: clearing bits may affect other elements that share the same positions.

Counting Bloom Filter

To enable deletions, the bitmap is replaced by an array of counters. Insertion increments the counters at the hashed positions, and deletion decrements them. This avoids the need to recompute other elements' hashes but still suffers from false positives.

Cuckoo Filter

The Cuckoo filter, proposed in the paper "Cuckoo Filter: Better Than Bloom," stores only a short fingerprint of each element instead of the whole element. It supports deletion and offers higher space efficiency.

It uses two hash functions to map an element to two possible buckets; each bucket can hold multiple fingerprints (seats). If both buckets are full, an existing fingerprint is evicted ("cuckoo"), and the process repeats until a free seat is found or the table is deemed full and needs resizing.

Cuckoo Hashing Issues

If the table becomes too full, many evictions degrade insertion performance, and cycles can occur when multiple elements map to the same pair of buckets.

Improvements include increasing the number of hash functions (more possible buckets) and adding multiple seats per bucket, which raise space utilization to over 95% while keeping query speed high.

Hash Functions for Cuckoo Filters

Each element’s fingerprint fp is computed, then one position is p1 = hash(x) . The alternate position is derived without re‑hashing the full element: p2 = p1 ^ hash(fp) . This XOR operation ensures p1 != p2 as long as hash(fp) != 0 . The reverse calculation is also possible: p1 = p2 ^ hash(fp) .

Because the table size is a power of two, modulo operations are reduced to masking the low bits of the hash values.

Code Examples

p1 = hash1(x) % l
p2 = hash2(x) % l
fp = fingerprint(x)
p1 = hash(x)
p2 = p1 ^ hash(fp)  // XOR
p1 = p2 ^ hash(fp)

These structures provide efficient, probabilistic set membership tests that are widely used in caching, databases, and network systems.

Data StructuresAlgorithmsHashingBloom FilterCache OptimizationCuckoo filter
Top Architect
Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.