Check a New Integer Among 4 Billion Records in Seconds Using Bitmap & Distributed Methods
An interviewee faces the challenge of determining whether a newly given integer exists within a set of 4 billion numbers, and the article explores efficient solutions—from naive disk‑I/O approaches to distributed processing and the memory‑saving bitmap technique—highlighting their performance trade‑offs and implementation details.
In a mock interview, a candidate is asked: “Given 4 billion integers and a new integer, how would you determine whether the new integer is present among the 4 billion?”
The interviewer points out that loading the data from disk eight times would be extremely slow due to disk I/O, suggesting a distributed solution.
The teacher explains that the data can be split across eight machines; each machine loads its portion into memory and searches in parallel, reducing the query time to seconds.
He then introduces a more efficient “millisecond‑level” approach: using a bitmap (bit‑set) that represents the presence of each possible 32‑bit integer. By allocating 2³² bits (≈500 MB), each integer maps to a single bit; a bit set to 1 indicates the integer exists.
When a new integer arrives, the algorithm checks the corresponding bit; if it is 1 the integer is present, otherwise it is absent. This method eliminates repeated disk reads and achieves constant‑time lookups.
In class, another student proposes an alternative: sort the data externally and store continuous ranges as (start, length) pairs. A binary search on these ranges can locate the new integer, requiring about 1.6 GB of memory in the worst case.
Both solutions illustrate how large‑scale membership queries can be handled efficiently, either via bitmap indexing or range‑based binary search, and demonstrate the trade‑offs between memory usage and query latency.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Backend Technology
Focus on Java-related technologies: SSM, Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading. Occasionally cover DevOps tools like Jenkins, Nexus, Docker, and ELK. Also share technical insights from time to time, committed to Java full-stack development!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
