How Bitmaps and Bloom Filters Slash Memory Usage for Massive Datasets
This article explains how using a bitmap can reduce the memory needed to store billions of integers from about 15 GB to under 500 MB, describes the bitmap concept, introduces Bloom filters, outlines their principles, advantages, common use cases, and provides Java and Redis code examples for implementation.
Storing 4 billion unsigned integers directly would require roughly 14.9 GB of memory: 4*4000000000 /1024/1024/1024 = 14.9G Using a bitmap, each number occupies only one bit, reducing the requirement to about 476 MB: 4000000000 * 1 /8 /1024/1024 = 476M A bitmap is essentially a bit array where each position represents the presence (1) or absence (0) of an element. For example, to store the QQ number "907607222" in a bitmap, you set the bit at that index to 1. After inserting all numbers, scanning the bitmap yields the existing values.
The main advantage of a bitmap is its extreme space efficiency, making it ideal for deduplication, sorting, and as the foundation of Bloom filters.
What Is a Bloom Filter and How Does It Work?
A Bloom filter is a probabilistic data structure that tests whether an element is possibly in a set using a bit array and multiple hash functions. When adding an element, each hash maps the element to several bit positions, which are set to 1. To query, the same hashes are computed; if all corresponding bits are 1, the element may exist, otherwise it definitely does not.
Bloom filters guarantee no false negatives but can produce false positives due to hash collisions.
Bloom Filter Operations
1. Initialize – Specify the expected number of elements and desired false‑positive rate; the filter creates a bit array and a set of hash functions.
2. Add Element – Hash the element with each function and set the resulting bits to 1 (skip if already set).
3. Query Element – Hash the element and check the bits; if all are 1, the element may be present.
Advantages: fast membership checks with low memory overhead. Drawbacks: false positives and difficulty deleting elements because bits may be shared.
Typical Use Cases
Web crawlers – avoid revisiting already crawled URLs.
Cache systems – quickly test if a key might be cached, reducing cache‑penetration.
Distributed systems – reduce network traffic by pre‑filtering queries.
Spam filtering – detect known spam addresses.
Blacklist checks – block malicious IPs or phone numbers.
Implementation in Java
Google Guava:
import com.google.common.hash.BloomFilter;
import com.google.common.hash.Funnels;
public class BloomFilterExample {
public static void main(String[] args) {
// Create a Bloom filter for 100 elements with 1% false‑positive rate
BloomFilter<String> bloomFilter = BloomFilter.create(Funnels.stringFunnel(), 100, 0.01);
// Add elements
bloomFilter.put("Lynn");
bloomFilter.put("666");
bloomFilter.put("八股文");
// Query elements
System.out.println(bloomFilter.mightContain("Lynn")); // true
System.out.println(bloomFilter.mightContain("张三")); // false
}
}Apache Commons Collections:
import org.apache.commons.lang3.StringUtils;
import org.apache.commons.collections4.BloomFilter;
import org.apache.commons.collections4.functors.HashFunctionIdentity;
public class BloomFilterExample {
public static void main(String[] args) {
BloomFilter<String> bloomFilter = new BloomFilter<>(HashFunctionIdentity.hashFunction(StringUtils::hashCode), 100, 0.01);
bloomFilter.put("Lynn");
bloomFilter.put("666");
bloomFilter.put("八股文");
System.out.println(bloomFilter.mightContain("Lynn")); // true
System.out.println(bloomFilter.mightContain("张三")); // false
}
}Using Redisson (Redis) with the Bloom module:
Config config = new Config();
config.useSingleServer().setAddress("redis://127.0.0.1:6379");
RedissonClient redisson = Redisson.create(config);
RBloomFilter<String> bloomFilter = redisson.getBloomFilter("myfilter");
bloomFilter.tryInit(100, 0.01);
bloomFilter.add("Lynn");
bloomFilter.add("666");
bloomFilter.add("八股文");
System.out.println(bloomFilter.contains("Lynn"));
System.out.println(bloomFilter.contains("张三"));
redisson.shutdown();Using Jedis:
Jedis jedis = new Jedis("localhost");
jedis.bfCreate("myfilter", 100, 0.01);
jedis.bfAdd("myfilter", "Lynn");
jedis.bfAdd("myfilter", "666");
jedis.bfAdd("myfilter", "八股文");
System.out.println(jedis.bfExists("myfilter", "Lynn")); // true
System.out.println(jedis.bfExists("myfilter", "张三")); // false
jedis.close();Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
macrozheng
Dedicated to Java tech sharing and dissecting top open-source projects. Topics include Spring Boot, Spring Cloud, Docker, Kubernetes and more. Author’s GitHub project “mall” has 50K+ stars.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
