Fundamentals 8 min read

How to Test Membership in 4 Billion Numbers Using Only 1 GB Memory

This article explains how to determine whether a given unsigned integer belongs to a set of 4 billion distinct numbers within a 1 GB memory limit, comparing a bitmap approach with a Bloom filter, providing detailed implementation steps and C++ code examples for both methods.

ITPUB
ITPUB
ITPUB
How to Test Membership in 4 Billion Numbers Using Only 1 GB Memory

Problem

We have 4 billion distinct unsigned 32‑bit integers stored unsorted. The task is to answer membership queries (does a given number belong to the set?) while using no more than 1 GB of RAM.

Approach Comparison

Bitmap (bit‑set) method – one bit for each possible value (0 … 2³²‑1). Exact answers, O(1) query time, memory usage = 2³² bits ≈ 512 MiB.

Bloom filter – multiple hash functions map each element to bits in a smaller array. Saves memory (≈10 % of bitmap) but introduces a configurable false‑positive probability.

Bitmap Solution

Because 512 MiB fits comfortably within the 1 GB limit, a plain bitmap is the simplest exact solution.

Memory Requirement

2³² bits = 4,294,967,296 bits ≈ 512 MiB.

Implementation Steps

Create an array of uint32_t elements large enough to hold 2³² bits (i.e., 2³² / 32 = 134,217,728 words).

For each input number n, set the corresponding bit:

index = n / 32;
offset = n % 32;
bitmap[index] |= (1U << offset);

To answer a query, test the same bit:

bool exists = (bitmap[index] & (1U << offset)) != 0;

Complete example (C++17):

#include <iostream>
#include <vector>
#include <cstdint>

const std::size_t WORD_COUNT = 1ULL << 30; // 2^32 / 32
std::vector<uint32_t> bitmap(WORD_COUNT, 0);

inline void setBit(uint32_t n) {
    std::size_t idx = n / 32;
    uint32_t off = n % 32;
    bitmap[idx] |= (1U << off);
}

inline bool checkBit(uint32_t n) {
    std::size_t idx = n / 32;
    uint32_t off = n % 32;
    return (bitmap[idx] & (1U << off)) != 0;
}

int main() {
    // Assume "numbers" holds the 4 billion values
    std::vector<uint32_t> numbers = {/* ... */};
    for (uint32_t v : numbers) setBit(v);

    uint32_t query = 123456789U;
    if (checkBit(query))
        std::cout << "Number exists." << std::endl;
    else
        std::cout << "Number does not exist." << std::endl;
    return 0;
}

Bloom Filter Alternative

If memory must be reduced further, a Bloom filter can be used at the cost of false positives.

Typical Parameters

Bit array size ≈ 100 million bits (≈12.5 MiB).

Number of hash functions = 5 (gives a false‑positive rate around 1 % for 4 billion items).

Implementation Steps

Allocate HASH_COUNT vectors of bool of length BIT_ARRAY_SIZE.

Initialize an array of std::hash<uint32_t> (or any fast 32‑bit hash) for the HASH_COUNT functions.

For each input value, compute each hash, take modulo BIT_ARRAY_SIZE, and set the corresponding bit.

To query, compute the same hashes; if any of the bits is 0 the element is definitely absent, otherwise it may be present.

Example (C++17):

#include <array>
#include <vector>
#include <functional>
#include <cstdint>
#include <iostream>

constexpr std::size_t BIT_ARRAY_SIZE = 100'000'000; // 100 M bits
constexpr std::size_t HASH_COUNT = 5;

std::array<std::vector<bool>, HASH_COUNT> filter;
std::array<std::hash<uint32_t>, HASH_COUNT> hasher;

inline void add(uint32_t x) {
    for (std::size_t i = 0; i < HASH_COUNT; ++i) {
        std::size_t idx = hasher[i](x) % BIT_ARRAY_SIZE;
        filter[i][idx] = true;
    }
}

inline bool possiblyContains(uint32_t x) {
    for (std::size_t i = 0; i < HASH_COUNT; ++i) {
        std::size_t idx = hasher[i](x) % BIT_ARRAY_SIZE;
        if (!filter[i][idx]) return false; // definitely not present
    }
    return true; // may be present (false positive possible)
}

int main() {
    for (std::size_t i = 0; i < HASH_COUNT; ++i) {
        filter[i] = std::vector<bool>(BIT_ARRAY_SIZE, false);
        hasher[i] = std::hash<uint32_t>();
    }
    // Insert the 4 billion numbers here …

    uint32_t query = 123456789U;
    if (possiblyContains(query))
        std::cout << "Number might exist (false positive possible)." << std::endl;
    else
        std::cout << "Number definitely does not exist." << std::endl;
    return 0;
}

Performance Considerations

Bitmap: O(1) query and update, exact answer, 512 MiB memory.

Bloom filter: O(k) query where k = number of hash functions (e.g., 5), memory ≈ 12.5 MiB, false‑positive probability configurable (≈1 % with the parameters above).

Conclusion

With a 1 GB memory budget, the bitmap method is the most straightforward and efficient way to achieve exact membership testing for 4 billion unsigned integers. A Bloom filter is an alternative when memory must be reduced further and a small false‑positive rate is acceptable.

Bitmaplarge datasetbloom filterC++membership test
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.