Fundamentals 12 min read

Why Median Matters: Solving the Data‑Stream Median Problem with Two Heaps

This article explains why the median is a crucial statistic for interpreting massive job‑application data, defines the median concept, introduces the classic LeetCode "Data Stream Median" problem, and shows how to maintain a dynamic median efficiently using a max‑heap and a min‑heap.

IT Services Circle
IT Services Circle
IT Services Circle
Why Median Matters: Solving the Data‑Stream Median Problem with Two Heaps

Why talk about the median?

Recent campus‑recruitment reports show that nearly 40% of graduates submit more than 50 applications, and some companies receive hundreds of thousands of resumes for a few thousand positions. In such a competitive environment, candidates are not competing for a single job but against thousands of other applicants for the same role, making the median a useful indicator of the “real” mainstream.

For example, a company received 1.19 million resumes for 1,730 positions, averaging 691 applicants per position – a figure that hides the extreme variance in individual application counts.

What is the median?

If the data count is odd, the median is the middle value after sorting.

If the data count is even, the median is the average of the two middle values.

In a sample of six students who applied to 80, 62, 57, 12, 5, and 3 positions, the sorted list is [3, 5, 12, 57, 62, 80]; the median is (12 + 57)/2 = 34.5, which better reflects the typical applicant than the arithmetic mean of 36.5.

Data‑stream median problem

The LeetCode problem "Offer 41. Data Stream Median" asks you to design a data structure that supports adding numbers one by one and retrieving the median at any time, mimicking the continuous flow of job applications.

Conventional approach is too slow

Storing all numbers and sorting each time gives O(1) insertion but O(n log n) median retrieval, which cannot handle thousands of updates per second.

Key idea: two heaps

Maintain a max‑heap ( maxHeap ) for the smaller half of the numbers and a min‑heap ( minHeap ) for the larger half. The heaps are kept balanced so that their sizes differ by at most one.

Algorithm details

If the heaps have different sizes, the larger heap’s top element is the median.

If the heaps are equal in size, the median is the average of the two tops.

When adding a new number, insert it into the appropriate heap and then rebalance by moving the top element of one heap to the other if necessary.

Implementation (Java)

class MedianFinder {
    // maxHeap stores the smaller half (as a max‑heap)
    PriorityQueue<Integer> maxHeap;
    // minHeap stores the larger half (as a min‑heap)
    PriorityQueue<Integer> minHeap;

    public MedianFinder() {
        maxHeap = new PriorityQueue<>((x, y) -> y - x); // max‑heap
        minHeap = new PriorityQueue<>(); // min‑heap
    }

    public void addNum(int num) {
        if (maxHeap.size() != minHeap.size()) {
            // Insert into minHeap first, then move its smallest to maxHeap
            minHeap.add(num);
            maxHeap.add(minHeap.poll());
        } else {
            // Insert into maxHeap first, then move its largest to minHeap
            maxHeap.add(num);
            minHeap.add(maxHeap.poll());
        }
    }

    public double findMedian() {
        if (maxHeap.size() != minHeap.size()) {
            return minHeap.peek(); // odd total, median is top of minHeap
        } else {
            return (maxHeap.peek() + minHeap.peek()) / 2.0; // even total
        }
    }
}

The insertion time complexity is O(log n) and median retrieval is O(1), making this approach ideal for high‑throughput interview scenarios.

In the job‑search analogy, the median separates the bulk of applicants from the outliers; most candidates cluster around this middle line, and crossing it often determines whether one “gets on the boat.”

Chart
Chart
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

algorithmstatisticsLeetCodeHeapdata streammedian
IT Services Circle
Written by

IT Services Circle

Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.