Fundamentals 13 min read

How Bloom, Counting Bloom, and Cuckoo Filters Cut Database I/O

To reduce costly database I/O, the article explains how Bloom filters, Counting Bloom filters, and Cuckoo filters work, detailing their bitmap/hash mechanisms, false‑positive behavior, deletion limitations, and practical enhancements such as multi‑hash functions and bucket‑level fingerprints.

Java Architect Essentials

Jun 13, 2023

How Bloom, Counting Bloom, and Cuckoo Filters Cut Database I/O

IO is a classic performance bottleneck in backend systems. When a request first checks a cache and, if missing, queries a database, many queries may target data that does not exist, causing unnecessary database I/O. Filters are introduced to block such useless queries.

1. Bloom Filter

A Bloom filter uses a bitmap and several hash functions. When an element is inserted, each hash maps to a bit position that is set to 1. To test membership, the same hash functions compute positions; if all bits are 1 the element is assumed present, otherwise it is definitely absent.

Problems of Bloom Filter

False positives: different elements may set overlapping bits, causing a non‑existent element to appear present.

Deletion is not supported: clearing bits may remove information for other elements, and false positives make it impossible to know whether an element truly exists before deletion.

2. Counting Bloom Filter

To enable deletions, the bitmap is replaced by an array of counters. Each hash increments the corresponding counter on insertion and decrements on deletion. This removes the need to recompute other elements’ bits, but false positives remain.

3. Cuckoo Filter

The Cuckoo filter, proposed in the paper "Cuckoo Filter: Better Than Bloom", addresses the deletion issue and improves space efficiency. It stores only a short fingerprint of each element instead of the full element.

Advantages over Bloom

Higher query performance because each lookup touches only two buckets.

Better space utilization (up to ~40% less than Bloom for the same false‑positive rate).

Supports deletion via the fingerprint mechanism.

Why is it called a "Cuckoo" filter? The name comes from the cuckoo bird’s habit of laying its eggs in other birds' nests, analogous to how the filter moves elements between buckets.

Cuckoo Hashing

Basic cuckoo hashing uses a one‑dimensional array and two hash functions. An element can reside in either of two positions; if both are occupied, one element is displaced ("kicked out") to its alternative location, possibly causing a cascade.

p1 = hash1(x) % l
p2 = hash2(x) % l

If the table becomes too full, insertion may loop indefinitely. A threshold is set; exceeding it triggers a table resize.

Another issue is a potential eviction cycle when multiple elements map to the same pair of buckets.

Optimizations

Increase the number of hash functions (e.g., three or four) to reduce collisions.

Allow multiple slots per bucket so that a bucket can hold several fingerprints before eviction is needed.

Combine both ideas (e.g., 4 hash functions with 2 slots per bucket) to achieve up to 99% space utilization while maintaining high query speed.

Cuckoo Filter Mechanics

Only the fingerprint (a few bits) of an element is stored. The filter uses two hash functions to compute two candidate buckets. The alternate bucket is derived from the primary bucket and the fingerprint using an XOR operation:

fp = fingerprint(x)
 p1 = hash(x)
 p2 = p1 ^ hash(fp) // XOR

Because the table size is a power of two, the modulo operation is implicit in the low‑order bits. Knowing any one of {p1, p2, fp} allows computation of the other, ensuring that the filter can locate the alternate bucket without the original element.

Deletion simply decrements the counter (or clears the fingerprint) in the occupied bucket, and insertion adds the fingerprint to a bucket with a free slot.

Overall, Bloom, Counting Bloom, and Cuckoo filters provide trade‑offs between false‑positive rate, memory usage, and support for deletions, making them valuable tools for reducing unnecessary database I/O in high‑throughput backend systems.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Data Structures Hashing bloom filter Cache Optimization Cuckoo Filter Counting Bloom Database I/O

Written by

Java Architect Essentials

Committed to sharing quality articles and tutorials to help Java programmers progress from junior to mid-level to senior architect. We curate high-quality learning resources, interview questions, videos, and projects from across the internet to help you systematically improve your Java architecture skills. Follow and reply '1024' to get Java programming resources. Learn together, grow together.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.