Fundamentals 4 min read

How to Find the Top 1,000 Numbers in a Billion‑Element Array in Linear Time

In this interview case study, a candidate solves the challenge of extracting the largest 1,000 numbers from a billion‑element dataset using a quick‑select style partition algorithm, achieving O(n) time, and implements the solution in Java with concise code.

Java Backend Technology

Oct 18, 2018

How to Find the Top 1,000 Numbers in a Billion‑Element Array in Linear Time

A recent interview at a major internet company presented the candidate with the problem: “How to find the top 1,000 largest numbers among 1 billion numbers?”

The candidate proposed using a divide‑and‑conquer method similar to quicksort’s partition: randomly pick a pivot t, partition the array into elements greater than t and those smaller than t.

If the larger side contains more than 1,000 elements, repeat the partition on that side; otherwise, continue on the smaller side to collect the remaining numbers.

The partition operation runs in O(n) time. The first partition costs n, the second costs n/2, the third n/4, and so on, summing to less than 2n, thus the overall complexity is O(n). (A rigorous proof can be found in algorithm textbooks.)

After understanding the algorithm, the candidate quickly wrote the Java implementation.

TopN.java

Main.java

The program produced the expected result, and the interviewer reviewed the solution positively.

During a post‑interview discussion, the candidate reflected that while the divide‑and‑conquer approach is theoretically optimal, in practice a heap‑based method can be faster for large datasets.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

algorithm big data Selection quickselect

Written by

Java Backend Technology

Focus on Java-related technologies: SSM, Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading. Occasionally cover DevOps tools like Jenkins, Nexus, Docker, and ELK. Also share technical insights from time to time, committed to Java full-stack development!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.