Fundamentals 11 min read

How Bitmaps and Bloom Filters Slash Memory Usage for Massive Datasets

This article explains how using a bitmap can reduce the memory needed to store billions of integers from about 15 GB to under 500 MB, describes the bitmap concept, introduces Bloom filters, outlines their principles, advantages, common use cases, and provides Java and Redis code examples for implementation.

macrozheng
macrozheng
macrozheng
How Bitmaps and Bloom Filters Slash Memory Usage for Massive Datasets

Storing 4 billion unsigned integers directly would require roughly 14.9 GB of memory: 4*4000000000 /1024/1024/1024 = 14.9G Using a bitmap, each number occupies only one bit, reducing the requirement to about 476 MB: 4000000000 * 1 /8 /1024/1024 = 476M A bitmap is essentially a bit array where each position represents the presence (1) or absence (0) of an element. For example, to store the QQ number "907607222" in a bitmap, you set the bit at that index to 1. After inserting all numbers, scanning the bitmap yields the existing values.

Bitmap example
Bitmap example

The main advantage of a bitmap is its extreme space efficiency, making it ideal for deduplication, sorting, and as the foundation of Bloom filters.

What Is a Bloom Filter and How Does It Work?

A Bloom filter is a probabilistic data structure that tests whether an element is possibly in a set using a bit array and multiple hash functions. When adding an element, each hash maps the element to several bit positions, which are set to 1. To query, the same hashes are computed; if all corresponding bits are 1, the element may exist, otherwise it definitely does not.

Bloom filters guarantee no false negatives but can produce false positives due to hash collisions.

Bloom Filter Operations

1. Initialize – Specify the expected number of elements and desired false‑positive rate; the filter creates a bit array and a set of hash functions.

2. Add Element – Hash the element with each function and set the resulting bits to 1 (skip if already set).

3. Query Element – Hash the element and check the bits; if all are 1, the element may be present.

Advantages: fast membership checks with low memory overhead. Drawbacks: false positives and difficulty deleting elements because bits may be shared.

Typical Use Cases

Web crawlers – avoid revisiting already crawled URLs.

Cache systems – quickly test if a key might be cached, reducing cache‑penetration.

Distributed systems – reduce network traffic by pre‑filtering queries.

Spam filtering – detect known spam addresses.

Blacklist checks – block malicious IPs or phone numbers.

Implementation in Java

Google Guava:

import com.google.common.hash.BloomFilter;
import com.google.common.hash.Funnels;

public class BloomFilterExample {
    public static void main(String[] args) {
        // Create a Bloom filter for 100 elements with 1% false‑positive rate
        BloomFilter<String> bloomFilter = BloomFilter.create(Funnels.stringFunnel(), 100, 0.01);
        // Add elements
        bloomFilter.put("Lynn");
        bloomFilter.put("666");
        bloomFilter.put("八股文");
        // Query elements
        System.out.println(bloomFilter.mightContain("Lynn")); // true
        System.out.println(bloomFilter.mightContain("张三")); // false
    }
}

Apache Commons Collections:

import org.apache.commons.lang3.StringUtils;
import org.apache.commons.collections4.BloomFilter;
import org.apache.commons.collections4.functors.HashFunctionIdentity;

public class BloomFilterExample {
    public static void main(String[] args) {
        BloomFilter<String> bloomFilter = new BloomFilter<>(HashFunctionIdentity.hashFunction(StringUtils::hashCode), 100, 0.01);
        bloomFilter.put("Lynn");
        bloomFilter.put("666");
        bloomFilter.put("八股文");
        System.out.println(bloomFilter.mightContain("Lynn")); // true
        System.out.println(bloomFilter.mightContain("张三")); // false
    }
}

Using Redisson (Redis) with the Bloom module:

Config config = new Config();
config.useSingleServer().setAddress("redis://127.0.0.1:6379");
RedissonClient redisson = Redisson.create(config);
RBloomFilter<String> bloomFilter = redisson.getBloomFilter("myfilter");
bloomFilter.tryInit(100, 0.01);
bloomFilter.add("Lynn");
bloomFilter.add("666");
bloomFilter.add("八股文");
System.out.println(bloomFilter.contains("Lynn"));
System.out.println(bloomFilter.contains("张三"));
redisson.shutdown();

Using Jedis:

Jedis jedis = new Jedis("localhost");
jedis.bfCreate("myfilter", 100, 0.01);
jedis.bfAdd("myfilter", "Lynn");
jedis.bfAdd("myfilter", "666");
jedis.bfAdd("myfilter", "八股文");
System.out.println(jedis.bfExists("myfilter", "Lynn")); // true
System.out.println(jedis.bfExists("myfilter", "张三")); // false
jedis.close();
Bloom filter principle
Bloom filter principle
Bloom filter false positive
Bloom filter false positive
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

JavaMemory OptimizationredisBitmapData Structuresbloom-filter
macrozheng
Written by

macrozheng

Dedicated to Java tech sharing and dissecting top open-source projects. Topics include Spring Boot, Spring Cloud, Docker, Kubernetes and more. Author’s GitHub project “mall” has 50K+ stars.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.