295. Find Median from Data Stream - Detailed Explanation

Free Coding Questions Catalog
Boost your coding skills with our essential coding questions catalog. Take a step towards a better tech career now!

Problem Statement

You need to design a data structure that supports the following two operations for a data stream of integers:

  1. addNum(int num): Add a number from the data stream to the data structure.
  2. findMedian(): Return the median of all elements so far.

The median is the middle value in an ordered integer list. If the size of the list is even, the median is the average of the two middle numbers.

Constraints:

  • There will be at least one element in the data stream when findMedian() is called.
  • The number of elements can be very large, so the operations should be optimized for efficient time complexity.

Examples

Example 1

Input:

["MedianFinder","addNum","addNum","findMedian","addNum","findMedian"]
[[],[1],[2],[],[3],[]]

Output:

[null,null,null,1.5,null,2.0]

Explanation:

  • After adding 1 and 2, the data stream is [1,2]. The median is (1 + 2) / 2 = 1.5.
  • After adding 3, the data stream becomes [1,2,3]. The median is 2.

Hints

  1. Two Heaps:

    • Use a max-heap to store the lower half of the numbers and a min-heap for the upper half.
    • The max-heap allows you to quickly retrieve the largest element of the lower half, while the min-heap gives you the smallest element of the upper half.
  2. Balancing Heaps:

    • Ensure that the heaps are balanced such that their sizes differ by at most one element.
    • If the total number of elements is odd, one of the heaps will have one extra element. The median is then the root of that heap.
    • If the total is even, the median is the average of the roots of both heaps.
  3. Alternative Approaches:

    • Using an ordered data structure (e.g., balanced BST or sorted list) is another option, but insertion would be O(n) in a sorted list or O(log n) in a balanced BST, which might be less efficient for frequent insertions compared to the heap method.

Approach 1: Two Heaps (Optimal)

Idea

  • Max-Heap: Stores the lower half of the data. In Python, use a min-heap with negated values.
  • Min-Heap: Stores the upper half of the data.
  • Balancing:
    • When adding a number, first add it to one heap and then rebalance the heaps to ensure that the size difference is at most 1.
  • Finding the Median:
    • If the total number of elements is odd, the heap with one extra element provides the median.
    • If even, the median is the average of the roots of both heaps.

Step-by-Step Walkthrough with Table

Suppose we add the following numbers in order: [1, 2, 3].

StepOperationMax-Heap (lower half)Min-Heap (upper half)Median Calculation
1addNum(1)[1][]Only one element → median = 1
2addNum(2)[1][2]Even count → median = (1 + 2) / 2 = 1.5
3addNum(3)[1, 2] (max-heap)[3]Odd count → median = max(max-heap) = 2 (or equivalently, min(min-heap) if heaps are balanced appropriately)

Note: In implementation, the max-heap is often implemented by storing negative values so that the “largest” value can be retrieved in O(1) time using a min-heap library.

Complexity Analysis

  • Time Complexity:
    • Insertion: O(log n) per addNum operation.
    • Finding the median: O(1).
  • Space Complexity: O(n), where n is the number of elements in the stream.

Python Implementation

Python3
Python3

. . . .

Java Implementation

Java
Java

. . . .

Approach 2: Sorted List / Balanced BST (Alternative)

Idea

Another approach is to maintain a sorted list (or use a balanced BST) that supports insertion in O(log n) and order statistics to find the median in O(1) or O(log n) time. However, in Python, maintaining a sorted list typically requires O(n) insertion time (unless using specialized libraries), which is less efficient than the two-heaps method. In Java, one might use a TreeSet with additional data structures to maintain order, but it is more complex.

Complexity

  • Time Complexity:
    • Using a sorted list: O(n) per insertion, O(1) for finding the median.
    • Using a balanced BST: O(log n) per insertion and O(log n) for finding the median.
  • Space Complexity: O(n).

This approach is not covered in full code here since the two-heaps method is typically preferred.

Common Mistakes & Edge Cases

  • Imbalanced Heaps:
    Failing to rebalance the two heaps can lead to incorrect median computation. Always ensure that the size difference between heaps is at most one.

  • Empty Data Structure:
    Ensure that findMedian() is not called when no numbers have been added (the problem guarantees that at least one number will be present).

  • Integer Division Pitfall (Java):
    When computing the average of two integers, be careful to use double division to avoid integer division errors.

  • Negative Numbers:
    The algorithm works for negative numbers as well. Ensure that the max-heap and min-heap logic handles negative values correctly.

  • Sliding Window Median:
    Given an array and a window size, compute the median for each sliding window.
  • Median Maintenance:
    Online algorithms for maintaining the median of a streaming dataset.
  • Order Statistics:
    Problems that require finding the kth smallest or largest element in a data stream.

Summary

The two-heaps approach is the most optimal solution for the "Find Median from Data Stream" problem, achieving O(log n) per insertion and O(1) per median retrieval. We also briefly mentioned an alternative approach using sorted lists or balanced BSTs, but the two-heaps method is generally preferred for its efficiency and simplicity.

TAGS
leetcode
CONTRIBUTOR
Design Gurus Team
-

GET YOUR FREE

Coding Questions Catalog

Design Gurus Newsletter - Latest from our Blog
Boost your coding skills with our essential coding questions catalog.
Take a step towards a better tech career now!
Explore Answers
Is an Intel interview difficult?
What is ISO standard for non-functional requirements?
What is Shopify coding?
Related Courses
Image
Grokking the Coding Interview: Patterns for Coding Questions
Grokking the Coding Interview Patterns in Java, Python, JS, C++, C#, and Go. The most comprehensive course with 476 Lessons.
Image
Grokking Modern AI Fundamentals
Master the fundamentals of AI today to lead the tech revolution of tomorrow.
Image
Grokking Data Structures & Algorithms for Coding Interviews
Unlock Coding Interview Success: Dive Deep into Data Structures and Algorithms.
Image
One-Stop Portal For Tech Interviews.
Copyright © 2025 Design Gurus, LLC. All rights reserved.
;