295. Find Median from Data Stream - Detailed Explanation
Problem Statement
You need to design a data structure that supports the following two operations for a data stream of integers:
- addNum(int num): Add a number from the data stream to the data structure.
- findMedian(): Return the median of all elements so far.
The median is the middle value in an ordered integer list. If the size of the list is even, the median is the average of the two middle numbers.
Constraints:
- There will be at least one element in the data stream when
findMedian()
is called. - The number of elements can be very large, so the operations should be optimized for efficient time complexity.
Examples
Example 1
Input:
["MedianFinder","addNum","addNum","findMedian","addNum","findMedian"]
[[],[1],[2],[],[3],[]]
Output:
[null,null,null,1.5,null,2.0]
Explanation:
- After adding 1 and 2, the data stream is [1,2]. The median is (1 + 2) / 2 = 1.5.
- After adding 3, the data stream becomes [1,2,3]. The median is 2.
Hints
-
Two Heaps:
- Use a max-heap to store the lower half of the numbers and a min-heap for the upper half.
- The max-heap allows you to quickly retrieve the largest element of the lower half, while the min-heap gives you the smallest element of the upper half.
-
Balancing Heaps:
- Ensure that the heaps are balanced such that their sizes differ by at most one element.
- If the total number of elements is odd, one of the heaps will have one extra element. The median is then the root of that heap.
- If the total is even, the median is the average of the roots of both heaps.
-
Alternative Approaches:
- Using an ordered data structure (e.g., balanced BST or sorted list) is another option, but insertion would be O(n) in a sorted list or O(log n) in a balanced BST, which might be less efficient for frequent insertions compared to the heap method.
Approach 1: Two Heaps (Optimal)
Idea
- Max-Heap: Stores the lower half of the data. In Python, use a min-heap with negated values.
- Min-Heap: Stores the upper half of the data.
- Balancing:
- When adding a number, first add it to one heap and then rebalance the heaps to ensure that the size difference is at most 1.
- Finding the Median:
- If the total number of elements is odd, the heap with one extra element provides the median.
- If even, the median is the average of the roots of both heaps.
Step-by-Step Walkthrough with Table
Suppose we add the following numbers in order: [1, 2, 3]
.
Step | Operation | Max-Heap (lower half) | Min-Heap (upper half) | Median Calculation |
---|---|---|---|---|
1 | addNum(1) | [1] | [] | Only one element → median = 1 |
2 | addNum(2) | [1] | [2] | Even count → median = (1 + 2) / 2 = 1.5 |
3 | addNum(3) | [1, 2] (max-heap) | [3] | Odd count → median = max(max-heap) = 2 (or equivalently, min(min-heap) if heaps are balanced appropriately) |
Note: In implementation, the max-heap is often implemented by storing negative values so that the “largest” value can be retrieved in O(1) time using a min-heap library.
Complexity Analysis
- Time Complexity:
- Insertion: O(log n) per addNum operation.
- Finding the median: O(1).
- Space Complexity: O(n), where n is the number of elements in the stream.
Python Implementation
Java Implementation
Approach 2: Sorted List / Balanced BST (Alternative)
Idea
Another approach is to maintain a sorted list (or use a balanced BST) that supports insertion in O(log n) and order statistics to find the median in O(1) or O(log n) time. However, in Python, maintaining a sorted list typically requires O(n) insertion time (unless using specialized libraries), which is less efficient than the two-heaps method. In Java, one might use a TreeSet
with additional data structures to maintain order, but it is more complex.
Complexity
- Time Complexity:
- Using a sorted list: O(n) per insertion, O(1) for finding the median.
- Using a balanced BST: O(log n) per insertion and O(log n) for finding the median.
- Space Complexity: O(n).
This approach is not covered in full code here since the two-heaps method is typically preferred.
Common Mistakes & Edge Cases
-
Imbalanced Heaps:
Failing to rebalance the two heaps can lead to incorrect median computation. Always ensure that the size difference between heaps is at most one. -
Empty Data Structure:
Ensure thatfindMedian()
is not called when no numbers have been added (the problem guarantees that at least one number will be present). -
Integer Division Pitfall (Java):
When computing the average of two integers, be careful to use double division to avoid integer division errors. -
Negative Numbers:
The algorithm works for negative numbers as well. Ensure that the max-heap and min-heap logic handles negative values correctly.
Related Problems
- Sliding Window Median:
Given an array and a window size, compute the median for each sliding window. - Median Maintenance:
Online algorithms for maintaining the median of a streaming dataset. - Order Statistics:
Problems that require finding the kth smallest or largest element in a data stream.
Summary
The two-heaps approach is the most optimal solution for the "Find Median from Data Stream" problem, achieving O(log n) per insertion and O(1) per median retrieval. We also briefly mentioned an alternative approach using sorted lists or balanced BSTs, but the two-heaps method is generally preferred for its efficiency and simplicity.
GET YOUR FREE
Coding Questions Catalog
