How Bitmap and Bloom Filter Slash Memory Usage for Billions of IDs
This article explains how using a bitmap reduces the storage of 4 billion unsigned integers from 14.9 GB to about 476 MB, introduces the concept and benefits of bitmaps, and then details Bloom filter principles, advantages, limitations, common use cases, and Java/Redis implementation examples.
Hello everyone, I am your friend Architect Jun, a code‑writing architect.
Storing 4 billion unsigned ints directly would require 4 × 4 000 000 000 / 1024 / 1024 / 1024 ≈ 14.9 GB, which is far beyond a 1 GB limit.
Using a bitmap, each number occupies only 1 bit, so the required space becomes 4 000 000 000 × 1 / 8 / 1024 / 1024 ≈ 476 MB, a huge reduction.
For example, to store my QQ number "907607222" in a bitmap, locate position 907607222 and set that bit to 1.
After placing all 4 billion numbers in the bitmap, bits set to 1 indicate presence, and scanning the bitmap yields all stored numbers.
What Is a Bitmap? What Is It Used For?
A bitmap (BitMap) uses a single bit to mark an element; it is essentially a bit array where each position holds either 0 or 1.
For example, the bitmap can represent the numbers 1, 4, 6:
If we stored 1, 4, 6 as three unsigned ints, we would need 3 × 4 bytes = 12 bytes, i.e., 12 × 8 = 96 bits, whereas the bitmap uses only 3 bits.
The biggest advantage of a bitmap is space saving.
Bitmaps are especially suitable for deduplication, sorting, and are the basis of the famous Bloom filter.
However, a bitmap can only represent 0 or 1, so it is limited to true/false scenarios.
What Is a Bloom Filter? How Does It Work?
A Bloom filter is a data structure that quickly checks whether an element may exist in a set using a bit array.
It applies multiple hash functions to map an element to several bits and sets those bits to 1. When querying, if all corresponding bits are 1, the element is possibly present; otherwise it is definitely absent.
Thus, Bloom filters can definitively say an element does not exist, but due to hash collisions they cannot guarantee existence.
False positives occur when a non‑existent element hashes to bits already set by other elements.
To reduce false‑positive probability, one can lower hash collisions and add more hash functions.
The Bloom filter workflow includes:
1. Initialize the Bloom filter – specify the set size and desired false‑positive rate; internally it creates a bit array and multiple hash functions.
2. Add elements – hash the element with each function, set the resulting bits to 1 (skip if already set).
3. Query elements – hash the element, check if all corresponding bits are 1; if so, the element may exist, otherwise it definitely does not.
Advantages: fast membership checks with low memory footprint. Disadvantages: false positives, inability to delete elements easily, and the need to know capacity in advance.
Typical application scenarios
Web crawlers – filter already‑crawled URLs.
Cache systems – test if a key may be in cache, helping prevent cache penetration.
Distributed systems – reduce network load by checking membership before remote queries.
Spam filtering – detect known spam addresses.
Blacklist filtering – block malicious IPs or phone numbers.
How to use in Java
Using Google Guava:
import com.google.common.hash.BloomFilter;
import com.google.common.hash.Funnels;
public class BloomFilterExample {
public static void main(String[] args) {
// Create a Bloom filter for 100 elements with 1% false‑positive rate
BloomFilter<String> bloomFilter = BloomFilter.create(Funnels.stringFunnel(), 100, 0.01);
// Add elements
bloomFilter.put("Lynn");
bloomFilter.put("666");
bloomFilter.put("八股文");
// Query elements
System.out.println(bloomFilter.mightContain("Lynn")); // true
System.out.println(bloomFilter.mightContain("张三")); // false
}
}Using Apache Commons Collections:
import org.apache.commons.lang3.StringUtils;
import org.apache.commons.collections4.BloomFilter;
import org.apache.commons.collections4.functors.HashFunctionIdentity;
public class BloomFilterExample {
public static void main(String[] args) {
BloomFilter<String> bloomFilter = new BloomFilter<>(HashFunctionIdentity.hashFunction(StringUtils::hashCode), 100, 0.01);
bloomFilter.put("Lynn");
bloomFilter.put("666");
bloomFilter.put("八股文");
System.out.println(bloomFilter.mightContain("Lynn")); // true
System.out.println(bloomFilter.mightContain("张三")); // false
}
}Using Redis with Redisson:
Config config = new Config();
config.useSingleServer().setAddress("redis://127.0.0.1:6379");
RedissonClient redisson = Redisson.create(config);
RBloomFilter<String> bloomFilter = redisson.getBloomFilter("myfilter");
bloomFilter.tryInit(100, 0.01);
bloomFilter.add("Lynn");
bloomFilter.add("666");
bloomFilter.add("八股文");
System.out.println(bloomFilter.contains("Lynn"));
System.out.println(bloomFilter.contains("张三"));
redisson.shutdown();Using Jedis:
Jedis jedis = new Jedis("localhost");
jedis.bfCreate("myfilter", 100, 0.01);
jedis.bfAdd("myfilter", "Lynn");
jedis.bfAdd("myfilter", "666");
jedis.bfAdd("myfilter", "八股文");
System.out.println(jedis.bfExists("myfilter", "Lynn"));
System.out.println(jedis.bfExists("myfilter", "张三"));
jedis.close();Source: blog.csdn.net/songmulin/article/details/130814507
Java Architect Essentials
Committed to sharing quality articles and tutorials to help Java programmers progress from junior to mid-level to senior architect. We curate high-quality learning resources, interview questions, videos, and projects from across the internet to help you systematically improve your Java architecture skills. Follow and reply '1024' to get Java programming resources. Learn together, grow together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
