Why Median Matters: Solving the Data‑Stream Median Problem with Two Heaps
This article explains why the median is a crucial statistic for interpreting massive job‑application data, defines the median concept, introduces the classic LeetCode "Data Stream Median" problem, and shows how to maintain a dynamic median efficiently using a max‑heap and a min‑heap.
Why talk about the median?
Recent campus‑recruitment reports show that nearly 40% of graduates submit more than 50 applications, and some companies receive hundreds of thousands of resumes for a few thousand positions. In such a competitive environment, candidates are not competing for a single job but against thousands of other applicants for the same role, making the median a useful indicator of the “real” mainstream.
For example, a company received 1.19 million resumes for 1,730 positions, averaging 691 applicants per position – a figure that hides the extreme variance in individual application counts.
What is the median?
If the data count is odd, the median is the middle value after sorting.
If the data count is even, the median is the average of the two middle values.
In a sample of six students who applied to 80, 62, 57, 12, 5, and 3 positions, the sorted list is [3, 5, 12, 57, 62, 80]; the median is (12 + 57)/2 = 34.5, which better reflects the typical applicant than the arithmetic mean of 36.5.
Data‑stream median problem
The LeetCode problem "Offer 41. Data Stream Median" asks you to design a data structure that supports adding numbers one by one and retrieving the median at any time, mimicking the continuous flow of job applications.
Conventional approach is too slow
Storing all numbers and sorting each time gives O(1) insertion but O(n log n) median retrieval, which cannot handle thousands of updates per second.
Key idea: two heaps
Maintain a max‑heap ( maxHeap ) for the smaller half of the numbers and a min‑heap ( minHeap ) for the larger half. The heaps are kept balanced so that their sizes differ by at most one.
Algorithm details
If the heaps have different sizes, the larger heap’s top element is the median.
If the heaps are equal in size, the median is the average of the two tops.
When adding a new number, insert it into the appropriate heap and then rebalance by moving the top element of one heap to the other if necessary.
Implementation (Java)
class MedianFinder {
// maxHeap stores the smaller half (as a max‑heap)
PriorityQueue<Integer> maxHeap;
// minHeap stores the larger half (as a min‑heap)
PriorityQueue<Integer> minHeap;
public MedianFinder() {
maxHeap = new PriorityQueue<>((x, y) -> y - x); // max‑heap
minHeap = new PriorityQueue<>(); // min‑heap
}
public void addNum(int num) {
if (maxHeap.size() != minHeap.size()) {
// Insert into minHeap first, then move its smallest to maxHeap
minHeap.add(num);
maxHeap.add(minHeap.poll());
} else {
// Insert into maxHeap first, then move its largest to minHeap
maxHeap.add(num);
minHeap.add(maxHeap.poll());
}
}
public double findMedian() {
if (maxHeap.size() != minHeap.size()) {
return minHeap.peek(); // odd total, median is top of minHeap
} else {
return (maxHeap.peek() + minHeap.peek()) / 2.0; // even total
}
}
}The insertion time complexity is O(log n) and median retrieval is O(1), making this approach ideal for high‑throughput interview scenarios.
In the job‑search analogy, the median separates the bulk of applicants from the outliers; most candidates cluster around this middle line, and crossing it often determines whether one “gets on the boat.”
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
IT Services Circle
Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
