642. Design Search Autocomplete System - Detailed Explanation
Problem Statement
Description:
You need to design an autocomplete system that suggests at most three historical sentences based on the current prefix typed by the user. The system is initialized with a list of historical sentences along with the corresponding number of times they were typed (their frequencies). Each time the user inputs a character (except the special character '#'
), the system returns the top three historical sentences that have the current prefix. The ranking criteria are as follows:
- Higher frequency comes first.
- If two sentences have the same frequency, the sentence with the lexicographically smaller order comes first.
When the user types the character '#'
, it means the current sentence input is complete. In that case, the system should add the sentence to its historical records (or update its frequency if it already exists) and reset the input state for the next query.
Examples:
- Example 1:
-
Initialization:
- Sentences:
["i love you", "island", "iroman", "i love leetcode"]
- Times:
[5, 3, 2, 2]
- Sentences:
-
Input Sequence and Outputs:
-
Input:
'i'
Output:["i love you", "island", "i love leetcode"]
Explanation: All sentences starting with"i"
are considered. Among them,"i love you"
has the highest frequency (5), followed by"island"
(3), then"i love leetcode"
(2). (Note:"iroman"
also has frequency 2 but lexicographically,"i love leetcode"
comes before"iroman"
.) -
Input:
' '
(space)
Output:["i love you", "i love leetcode"]
Explanation: Now the prefix is"i "
. Only sentences that start with"i "
are valid. -
Input:
'a'
Output:[]
Explanation: No historical sentence begins with"i a"
. -
Input:
'#'
Output:[]
Explanation: The user has finished typing"i a"
. This sentence is added to the historical data with frequency 1.
-
-
Constraints:
-
The number of initial historical sentences and the number of input characters may be large.
-
The sentences consist of lowercase letters, spaces, and are non-empty.
-
The length of each sentence is limited (for example, up to 100 characters).
-
The system should handle repeated calls to the
input
function efficiently.
Hints
-
Hint 1:
Consider using a Trie (prefix tree) to store all historical sentences. Each node in the Trie can keep track of the sentences (or references to them) that pass through that node. This way, when a user types a prefix, you can traverse the Trie quickly to get a list of candidate sentences. -
Hint 2:
When returning suggestions, you must sort the candidate sentences by their frequency (in descending order) and lexicographical order (if frequencies are equal). Pre-storing or caching sorted lists in each Trie node can help speed up retrieval. -
Hint 3:
Think about how to update the Trie efficiently when a new sentence is added (or an existing sentence’s frequency is updated) after processing a complete query (when'#'
is input).
3. Approaches to Solve the Problem
1. Brute Force Approach
Idea:
For each input character (forming the current prefix), iterate over all historical sentences and check if they match the prefix. Then sort the matching sentences by frequency and lexicographical order, and return the top three.
Downside:
- This approach may become inefficient if the number of sentences is large or if the system is queried very frequently.
2. Trie-Based Approach (Optimal)
Idea:
-
Trie Construction:
Build a Trie where each node represents a character. At each node, store a list of sentences (or pointers/indices to them) that have the prefix corresponding to the path from the root to that node. -
Querying:
When a user inputs a character, traverse the Trie to the node corresponding to the current prefix. Retrieve the candidate sentences and sort them (or maintain a pre-sorted list) based on frequency and lexicographical order. Return the top three. -
Updating:
When the user types'#'
, add the newly completed sentence to the historical records (or update its frequency). Then update the Trie accordingly by traversing through each character of the sentence and updating the list at each node.
Benefits:
- The Trie-based approach allows for efficient prefix searches.
- With proper maintenance of the candidate list at each node, you can quickly retrieve suggestions without scanning all sentences every time.
Code Implementations
Python Code Implementation
Java Code Implementation
Complexity Analysis
- Time Complexity:
- Trie Construction: Inserting each sentence takes O(L) where L is the sentence length; overall construction is O(N * L) for N sentences.
- Querying: For each input character, traversing down the Trie takes O(1) per character. Retrieving and sorting candidate sentences depends on the number of sentences with the current prefix, but since we only need the top 3 suggestions, using a heap helps keep this efficient.
- Space Complexity:
- The Trie uses space proportional to the total number of characters in all sentences. In the worst case, this is O(N * L).
- Additional space is used for frequency maps and the candidate lists.
Common Pitfalls and Edge Cases
Common Pitfalls:
- Updating the Trie:
When a new sentence is added (after'#'
input), ensure that every node corresponding to the sentence’s characters is updated with the new frequency. - Handling Spaces:
The sentences may contain spaces. Ensure the Trie correctly distinguishes between space and other characters. - Sorting Tie-breakers:
When frequencies are equal, remember to sort lexicographically in ascending order.
Edge Cases:
-
Empty Input:
If the current input string is empty (for example, when starting or after a reset), handle it appropriately. -
No Matching Sentences:
If no sentence matches the current prefix, the system should return an empty list.
-
Repeated Sentences:
When a sentence is added repeatedly, its frequency should update correctly and affect the order of suggestions.
Alternative Variations and Related Problems
Alternative Variations:
- Using Hash Maps Only:
For smaller datasets, a simpler approach might scan all historical sentences for each query, though this is less efficient. - Dynamic Updates:
In systems with frequent updates, consider data structures that support fast insertions and deletions.
Related Problems for Further Practice:
-
Design Search Autocomplete System II:
Variations that include more complex ranking criteria or dynamic updates. -
Design Twitter:
Which requires designing a system with dynamic feed generation and updates. -
Trie-based Word Search:
Problems that use a Trie for searching words based on prefixes.
GET YOUR FREE
Coding Questions Catalog
