1825. Finding MK Average - Detailed Explanation
Problem Statement
You must implement a data structure that supports the following two operations:
-
addElement(int num):
Add an integer num to the stream. -
calculateMKAverage():
If the number of elements in the stream is less than m, return -1. Otherwise, consider the last m elements, remove the smallest k and the largest k elements, and return the floor of the average of the remaining numbers.
Example:
Let m = 8 and k = 2. Suppose the stream (in order) is:
[3, 1, 12, 5, 3, 4, 7, 8, 6, 2]
- When the first 8 elements have been processed (i.e. window =
[3, 1, 12, 5, 3, 4, 7, 8]
), we remove the 2 smallest (1 and 3) and the 2 largest (12 and 8). The remaining numbers are[5, 3, 4, 7]
whose sum is 19 and average is floor(19 / 4) = 4. - As new elements come in, the window “slides” and the data structure must update accordingly.
Constraints:
- 1 ≤ m ≤ 10⁵ (or similar large bounds)
- 1 ≤ k such that 2 * k < m
- There will be many calls to addElement and calculateMKAverage, so both operations must be efficient.
Intuition and Key Challenges
The main challenge is to support efficient insertion and deletion (from the sliding window) as well as the ability to quickly compute the sum of the “middle” group—that is, the set of elements after removing the smallest k and largest k.
A brute-force solution that sorts the window for each calculation would be too slow. Instead, we can maintain the window in three parts (groups):
- Lower Group: The k smallest elements.
- Middle Group: The next m - 2k* elements.
- Upper Group: The k largest elements.
When the window is full (contains exactly m elements), the MK Average is simply:
MK Average = floor(sum(middle group) / (m - 2*k))
The difficulty is to efficiently update these three groups as new elements are added and the oldest element is removed.
Data Structures
To achieve the above efficiently, we maintain:
-
A Queue (or Deque):
To store the last m elements in order (so that we know which element to remove when the window exceeds size m). -
Three Balanced BSTs / Multisets:
(Or similar ordered containers that support logarithmic insertion, deletion, and order-statistics operations.)- lower: Stores the smallest k elements in the current window.
- middle: Stores the next m - 2k* elements.
- upper: Stores the largest k elements.
For each of these groups, we may need to support duplicate values; hence, a multiset (or a tree with counts) is appropriate.
-
A Running Sum:
For the middle group to quickly compute its total.
Many solutions use libraries or custom implementations (like a SortedList in Python or TreeMap/TreeSet in Java with counts) to simulate multisets with order.
Operations and Rebalancing
Adding an Element (addElement)
-
Append to the Window:
Add the new element to the queue. -
Insert the New Element into One of the Three Groups:
- Decision:
Compare the new element with the largest element in the lower group and the smallest element in the upper group.- If it is less than or equal to the maximum of lower, it belongs in the lower group.
- Else if it is greater than or equal to the minimum of upper, it goes into the upper group.
- Otherwise, it goes into the middle group (and add its value to the running sum).
- Decision:
-
Rebalance the Groups (if necessary):
After insertion, the sizes of the groups might violate the invariant sizes (i.e. lower and upper must each have exactly k elements, and middle must have m - 2k* elements once the window is full).-
If the lower group has more than k elements:
Remove the largest element from lower and insert it into the middle group (update sumMiddle accordingly). -
If the lower group has fewer than k elements:
Remove the smallest element from middle and insert it into lower. -
Similarly for the upper group:
If the upper group has too many elements, move the smallest element from upper into middle; if too few, move the largest element from middle into upper.
-
Removing an Element (when window size > m)
-
Identify the Element to Remove:
This is the oldest element in the queue. Remove it from the queue. -
Remove the Element from Its Respective Group:
Check whether it is in lower, middle, or upper and remove it.- If it was in the middle group, subtract its value from the running sum.
-
Rebalance the Groups:
After removal, one or more groups may have fewer elements than required. For example, if the lower group is too small, move the smallest element from middle into lower; similarly, adjust the upper group.
Calculating MK Average (calculateMKAverage)
- Check Window Size:
If fewer than m elements have been added, return -1. - Return the Average:
Otherwise, return floor(sum(middle) / (m - 2*k)).
Code Implementations
Python
Java Implementation
Explanation Recap
-
Data Structures Used:
- A queue (or LinkedList) to maintain the last m elements.
- Three ordered containers (sorted lists in Python and TreeMaps in Java) to partition the window into the lower, middle, and upper groups.
- A running sum for the middle group to allow O(1) average calculation.
-
Operations:
- addElement:
Insert the new element into the proper group, then if the window exceeds m, remove the oldest element and rebalance the groups so that:lower
has exactly k smallest elements,upper
has exactly k largest elements,middle
has m - 2k* elements (with a maintained sum).
- calculateMKAverage:
If fewer than m elements exist, return -1; otherwise, return floor(sum(middle) / (m - 2*k)).
- addElement:
-
Rebalancing:
After every insertion (and possible deletion), the groups are rebalanced so that the invariants hold.
Both implementations aim to achieve efficient updates per operation, with overall time complexity per update roughly O(log m) (using balanced trees or binary search in sorted lists) and constant-time MK Average computation.
Happy coding and best of luck with your interview preparations!
Complexity Analysis
-
Time Complexity:
- Each call to addElement involves insertion into one of the multisets and possibly rebalancing. Each insertion, deletion, or rebalancing step typically takes O(log m) time.
- Thus, addElement runs in O(log m) on average.
- calculateMKAverage runs in O(1) time.
-
Space Complexity:
- The space is dominated by the storage of m elements plus the extra data structures, so O(m).
Step-by-Step Example
Suppose m = 8 and k = 2, and we add the following stream of elements:
[a, b, c, d, e, f, g, h, i, j]
(actual numeric values, for instance, might be given)
-
Step 1:
Until 8 elements have been added, simply insert each element into the data structure (initially, all new elements might be added into the middle group; then after 8 elements, perform a full rebalance to partition into lower, middle, and upper groups). -
Step 2:
After exactly 8 elements are in the window, ensure:- The lower group contains the 2 smallest,
- The upper group contains the 2 largest,
- The middle group contains the remaining 4 elements, and maintain the sum of the middle group.
-
Step 3:
When the 9th element is added, remove the oldest element (the first one) from whichever group it belongs to. Then insert the new element into the correct group and rebalance:- For instance, if the removed element was in the lower group, the lower group size becomes 1. Then, move the smallest element from middle into lower.
- Update the running sum of the middle group accordingly.
-
Step 4:
Every call to calculateMKAverage then returns the sum of the middle group divided by (m - 2*k).
Common Pitfalls
-
Incorrect Group Size Maintenance:
Failing to rebalance correctly can lead to an incorrect partitioning of the window. -
Handling Duplicates:
Since the stream may have duplicate values, the multiset implementation must correctly handle counts. -
Boundary Conditions:
Make sure to handle the case when the number of elements is less than m (return -1). -
Performance:
A naive approach that re-sorts the window for each calculation would be too slow. Efficient balancing is key.
Alternative Approaches
-
Balanced BST / Ordered Multiset with Augmented Data:
Some solutions use a single balanced tree that stores counts and subtree sums. With careful augmentation, you can compute the sum of any range in O(log m) time, but the implementation is more complex. -
Segment Trees or Fenwick Trees:
If the range of input numbers is bounded, a Fenwick tree (Binary Indexed Tree) or segment tree might be used to maintain frequency counts and prefix sums.
GET YOUR FREE
Coding Questions Catalog
