How to Find the Top 1,000 Numbers in a Billion‑Element Array in Linear Time
In this interview case study, a candidate solves the challenge of extracting the largest 1,000 numbers from a billion‑element dataset using a quick‑select style partition algorithm, achieving O(n) time, and implements the solution in Java with concise code.
A recent interview at a major internet company presented the candidate with the problem: “How to find the top 1,000 largest numbers among 1 billion numbers?”
The candidate proposed using a divide‑and‑conquer method similar to quicksort’s partition: randomly pick a pivot t, partition the array into elements greater than t and those smaller than t.
If the larger side contains more than 1,000 elements, repeat the partition on that side; otherwise, continue on the smaller side to collect the remaining numbers.
The partition operation runs in O(n) time. The first partition costs n, the second costs n/2, the third n/4, and so on, summing to less than 2n, thus the overall complexity is O(n). (A rigorous proof can be found in algorithm textbooks.)
After understanding the algorithm, the candidate quickly wrote the Java implementation.
TopN.java
Main.java
The program produced the expected result, and the interviewer reviewed the solution positively.
During a post‑interview discussion, the candidate reflected that while the divide‑and‑conquer approach is theoretically optimal, in practice a heap‑based method can be faster for large datasets.
Java Backend Technology
Focus on Java-related technologies: SSM, Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading. Occasionally cover DevOps tools like Jenkins, Nexus, Docker, and ELK. Also share technical insights from time to time, committed to Java full-stack development!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
