432. All O`one Data Structure - Detailed Explanation
Problem Statement
You are asked to design a data structure that supports the following operations in O(1) time complexity:
-
inc(key):
Increments the count of the string key by 1. If the key does not exist, insert it with count = 1. -
dec(key):
Decrements the count of the string key by 1. If the key’s count becomes 0, remove it from the data structure. -
getMaxKey():
Returns one of the keys with the maximal count. If no element exists, return an empty string""
. -
getMinKey():
Returns one of the keys with the minimal count. If no element exists, return an empty string""
.
Example 1
- Input:
AllOne allOne = new AllOne(); allOne.inc("hello"); allOne.inc("hello"); allOne.getMaxKey(); // returns "hello" allOne.getMinKey(); // returns "hello"
- Explanation:
"hello" is the only key in the data structure and its count is 2.
Example 2
- Input:
AllOne allOne = new AllOne(); allOne.inc("a"); allOne.inc("b"); allOne.inc("b"); allOne.getMaxKey(); // returns "b" allOne.getMinKey(); // returns "a"
- Explanation:
"b" has a count of 2, while "a" has a count of 1.
Example 3
- Input:
AllOne allOne = new AllOne(); allOne.inc("a"); allOne.inc("b"); allOne.inc("b"); allOne.dec("b"); allOne.getMaxKey(); // returns "a" or "b" (both have count 1) allOne.getMinKey(); // returns "a" or "b"
- Explanation:
After decrementing "b", both "a" and "b" have the same count.
Constraints
- All keys are non-empty strings.
- The operations inc, dec, getMaxKey, and getMinKey must all run in O(1) time.
- The data structure should be able to handle a large number of calls efficiently.
2. Hints Before Diving into the Solution
-
Think about supporting O(1) operations:
Consider using a combination of hash maps to store the key counts and a doubly linked list (DLL) to quickly access the minimum and maximum counts. -
Bucket-based organization:
One effective strategy is to group keys by their frequency. Each group (or bucket) represents keys with the same count. This way, you can maintain pointers to the buckets to retrieve the max and min keys in constant time.
Approach 1: Brute Force (Not Optimal)
Description
- Idea:
Use a hash map (or dictionary) to store the count for each key.
For inc and dec, update the count in the hash map.
For getMaxKey and getMinKey, iterate through the entire hash map to find the maximum and minimum counts.
Step-by-Step Walkthrough
- inc(key):
- If
key
is not in the hash map, add it with count 1. - Otherwise, increment the count by 1.
- If
- dec(key):
- If
key
is in the hash map, decrement its count. - If the count becomes 0, remove the key.
- If
- getMaxKey():
- Iterate over all keys to find one with the highest count.
- getMinKey():
- Iterate over all keys to find one with the lowest count.
Complexity Analysis
- inc / dec: O(1) per operation (hash map update).
- getMaxKey / getMinKey: O(n) per operation since you need to scan through all keys.
Conclusion
While the brute force approach is simple, it does not meet the O(1) requirement for all operations. This approach is useful for understanding the problem but is not acceptable for an interview solution.
Approach 2: Optimal Solution Using a Doubly Linked List and Hash Maps
Overview
To achieve all operations in O(1) time, we combine:
-
A Hash Map (
key_to_node
):
Maps each key to its corresponding bucket node in a doubly linked list. -
A Doubly Linked List (DLL) of Buckets:
Each bucket (or node) contains a frequency count and a set of keys with that count. The list is sorted by frequency in ascending order.
Detailed Explanation
Data Structures
-
Node (Bucket):
- Fields:
count
: The frequency for all keys in this bucket.keys
: A set (or list) containing all keys that have the same frequency.prev
andnext
: Pointers to the previous and next bucket nodes.
- Fields:
-
Hash Map (
key_to_node
):- Maps a key to its corresponding bucket node.
-
Doubly Linked List:
- Maintains the order of frequency counts.
- Use two dummy nodes: one at the head (smallest count) and one at the tail (largest count) to simplify edge operations.
Operation Details
-
inc(key):
- If the key is new:
Insert it into the bucket for count 1. If no such bucket exists, create one and link it next to the head. - If the key exists:
Retrieve its current bucket, remove the key from that bucket, and add it to the bucket corresponding to count + 1.
If the bucket for count + 1 doesn’t exist, create it and insert it after the current bucket. - Remove the original bucket if it becomes empty.
- If the key is new:
-
dec(key):
- If the key is not present:
Do nothing. - If the key’s count is 1:
Remove it from the hash map. - If the key’s count is greater than 1:
Move the key to the bucket corresponding to count - 1.
Create a new bucket if necessary, and remove the key from the current bucket. - Remove the current bucket if it becomes empty.
- If the key is not present:
-
getMaxKey():
- Return any key from the last bucket (right before the tail dummy) since it contains the keys with the maximum frequency.
-
getMinKey():
- Return any key from the first bucket (right after the head dummy) since it contains the keys with the minimum frequency.
Visual Example
Imagine the following sequence of operations:
inc("apple")
→ Create bucket for count = 1: [apple]inc("banana")
→ Bucket for count = 1: [apple, banana]inc("apple")
→ Remove "apple" from count = 1 bucket and move to bucket for count = 2.
→ Buckets now:
Head → [banana] (count = 1) → [apple] (count = 2) → Tail
Here, getMinKey()
returns "banana" and getMaxKey()
returns "apple".
Complexity Analysis
- Time Complexity:
All operations (inc, dec, getMaxKey, getMinKey) run in O(1) time. - Space Complexity:
O(n), where n is the number of keys stored.
Python Implementation
Java Implementation
Complexity Analysis
-
Time Complexity:
All operations (inc, dec, getMaxKey, getMinKey) are O(1).
Reason: We use hash maps for direct access and a doubly linked list to maintain order of counts. -
Space Complexity:
O(n) for storing keys and bucket nodes, where n is the number of distinct keys.
Common Mistakes and Pitfalls
-
Incorrect Handling of Buckets:
Not removing a bucket node after all its keys have been moved or removed. This can lead to stale nodes in your doubly linked list. -
Failure to Update Hash Map:
Forgetting to updatekey_to_node
after moving a key from one bucket to another. -
Edge Cases Not Addressed:
- Decrementing a key that doesn’t exist.
- Handling the first insertion and deletion correctly when there is only one key.
Edge Cases to Consider
-
Empty Data Structure:
When callinggetMaxKey()
orgetMinKey()
on an empty structure, ensure you return an empty string. -
Single Key Updates:
When there is only one key and it is incremented or decremented, confirm that the pointers in the DLL update correctly. -
Multiple Keys with Same Count:
The data structure should be able to handle multiple keys having the same count without any conflict.
Alternative Variations and Related Problems
Variations
-
Tracking Additional Statistics:
Modify the data structure to also return the total number of keys or the sum of counts. -
Supporting Additional Operations:
For example, adding a reset operation that resets the count for all keys.
Related Problems
-
LFU Cache (Least Frequently Used Cache):
This problem uses a similar idea of frequency tracking along with a hash map and doubly linked list. -
LRU Cache (Least Recently Used Cache):
While the focus is on recency instead of frequency, managing a doubly linked list for O(1) operations is a common strategy. -
Design HashMap / Design a Data Structure for Fast Retrieval:
Problems that require managing data for fast access, insertion, and deletion.
Related Problems for Further Practice
- LFU Cache
- LRU Cache
- Design HashMap
- Design Twitter
- Insert Delete GetRandom O(1)
GET YOUR FREE
Coding Questions Catalog
