Big Data 6 min read

Check a New Integer Among 4 Billion Records in Seconds Using Bitmap & Distributed Methods

An interviewee faces the challenge of determining whether a newly given integer exists within a set of 4 billion numbers, and the article explores efficient solutions—from naive disk‑I/O approaches to distributed processing and the memory‑saving bitmap technique—highlighting their performance trade‑offs and implementation details.

Java Backend Technology
Java Backend Technology
Java Backend Technology
Check a New Integer Among 4 Billion Records in Seconds Using Bitmap & Distributed Methods

In a mock interview, a candidate is asked: “Given 4 billion integers and a new integer, how would you determine whether the new integer is present among the 4 billion?”

The interviewer points out that loading the data from disk eight times would be extremely slow due to disk I/O, suggesting a distributed solution.

The teacher explains that the data can be split across eight machines; each machine loads its portion into memory and searches in parallel, reducing the query time to seconds.

He then introduces a more efficient “millisecond‑level” approach: using a bitmap (bit‑set) that represents the presence of each possible 32‑bit integer. By allocating 2³² bits (≈500 MB), each integer maps to a single bit; a bit set to 1 indicates the integer exists.

When a new integer arrives, the algorithm checks the corresponding bit; if it is 1 the integer is present, otherwise it is absent. This method eliminates repeated disk reads and achieves constant‑time lookups.

In class, another student proposes an alternative: sort the data externally and store continuous ranges as (start, length) pairs. A binary search on these ranges can locate the new integer, requiring about 1.6 GB of memory in the worst case.

Both solutions illustrate how large‑scale membership queries can be handled efficiently, either via bitmap indexing or range‑based binary search, and demonstrate the trade‑offs between memory usage and query latency.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

algorithmBig DataBitmaplarge datasetdistributed computing
Java Backend Technology
Written by

Java Backend Technology

Focus on Java-related technologies: SSM, Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading. Occasionally cover DevOps tools like Jenkins, Nexus, Docker, and ELK. Also share technical insights from time to time, committed to Java full-stack development!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.