642. Design Search Autocomplete System - Detailed Explanation

Free Coding Questions Catalog
Boost your coding skills with our essential coding questions catalog. Take a step towards a better tech career now!

Problem Statement

Description:
You need to design an autocomplete system that suggests at most three historical sentences based on the current prefix typed by the user. The system is initialized with a list of historical sentences along with the corresponding number of times they were typed (their frequencies). Each time the user inputs a character (except the special character '#'), the system returns the top three historical sentences that have the current prefix. The ranking criteria are as follows:

  • Higher frequency comes first.
  • If two sentences have the same frequency, the sentence with the lexicographically smaller order comes first.

When the user types the character '#', it means the current sentence input is complete. In that case, the system should add the sentence to its historical records (or update its frequency if it already exists) and reset the input state for the next query.

Examples:

  • Example 1:
    • Initialization:

      • Sentences: ["i love you", "island", "iroman", "i love leetcode"]
      • Times: [5, 3, 2, 2]
    • Input Sequence and Outputs:

      1. Input: 'i'
        Output: ["i love you", "island", "i love leetcode"]
        Explanation: All sentences starting with "i" are considered. Among them, "i love you" has the highest frequency (5), followed by "island" (3), then "i love leetcode" (2). (Note: "iroman" also has frequency 2 but lexicographically, "i love leetcode" comes before "iroman".)

      2. Input: ' ' (space)
        Output: ["i love you", "i love leetcode"]
        Explanation: Now the prefix is "i ". Only sentences that start with "i " are valid.

      3. Input: 'a'
        Output: []
        Explanation: No historical sentence begins with "i a".

      4. Input: '#'
        Output: []
        Explanation: The user has finished typing "i a". This sentence is added to the historical data with frequency 1.

Constraints:

  • The number of initial historical sentences and the number of input characters may be large.

  • The sentences consist of lowercase letters, spaces, and are non-empty.

  • The length of each sentence is limited (for example, up to 100 characters).

  • The system should handle repeated calls to the input function efficiently.

Hints

  • Hint 1:
    Consider using a Trie (prefix tree) to store all historical sentences. Each node in the Trie can keep track of the sentences (or references to them) that pass through that node. This way, when a user types a prefix, you can traverse the Trie quickly to get a list of candidate sentences.

  • Hint 2:
    When returning suggestions, you must sort the candidate sentences by their frequency (in descending order) and lexicographical order (if frequencies are equal). Pre-storing or caching sorted lists in each Trie node can help speed up retrieval.

  • Hint 3:
    Think about how to update the Trie efficiently when a new sentence is added (or an existing sentence’s frequency is updated) after processing a complete query (when '#' is input).

3. Approaches to Solve the Problem

1. Brute Force Approach

Idea:
For each input character (forming the current prefix), iterate over all historical sentences and check if they match the prefix. Then sort the matching sentences by frequency and lexicographical order, and return the top three.

Downside:

  • This approach may become inefficient if the number of sentences is large or if the system is queried very frequently.

2. Trie-Based Approach (Optimal)

Idea:

  • Trie Construction:
    Build a Trie where each node represents a character. At each node, store a list of sentences (or pointers/indices to them) that have the prefix corresponding to the path from the root to that node.

  • Querying:
    When a user inputs a character, traverse the Trie to the node corresponding to the current prefix. Retrieve the candidate sentences and sort them (or maintain a pre-sorted list) based on frequency and lexicographical order. Return the top three.

  • Updating:
    When the user types '#', add the newly completed sentence to the historical records (or update its frequency). Then update the Trie accordingly by traversing through each character of the sentence and updating the list at each node.

Benefits:

  • The Trie-based approach allows for efficient prefix searches.
  • With proper maintenance of the candidate list at each node, you can quickly retrieve suggestions without scanning all sentences every time.

Code Implementations

Python Code Implementation

Python3
Python3

. . . .

Java Code Implementation

Java
Java

. . . .

Complexity Analysis

  • Time Complexity:
    • Trie Construction: Inserting each sentence takes O(L) where L is the sentence length; overall construction is O(N * L) for N sentences.
    • Querying: For each input character, traversing down the Trie takes O(1) per character. Retrieving and sorting candidate sentences depends on the number of sentences with the current prefix, but since we only need the top 3 suggestions, using a heap helps keep this efficient.
  • Space Complexity:
    • The Trie uses space proportional to the total number of characters in all sentences. In the worst case, this is O(N * L).
    • Additional space is used for frequency maps and the candidate lists.

Common Pitfalls and Edge Cases

Common Pitfalls:

  • Updating the Trie:
    When a new sentence is added (after '#' input), ensure that every node corresponding to the sentence’s characters is updated with the new frequency.
  • Handling Spaces:
    The sentences may contain spaces. Ensure the Trie correctly distinguishes between space and other characters.
  • Sorting Tie-breakers:
    When frequencies are equal, remember to sort lexicographically in ascending order.

Edge Cases:

  • Empty Input:
    If the current input string is empty (for example, when starting or after a reset), handle it appropriately.

  • No Matching Sentences:

    If no sentence matches the current prefix, the system should return an empty list.

  • Repeated Sentences:
    When a sentence is added repeatedly, its frequency should update correctly and affect the order of suggestions.

Alternative Variations:

  • Using Hash Maps Only:
    For smaller datasets, a simpler approach might scan all historical sentences for each query, though this is less efficient.
  • Dynamic Updates:
    In systems with frequent updates, consider data structures that support fast insertions and deletions.
  • Design Search Autocomplete System II:
    Variations that include more complex ranking criteria or dynamic updates.

  • Design Twitter:
    Which requires designing a system with dynamic feed generation and updates.

  • Trie-based Word Search:
    Problems that use a Trie for searching words based on prefixes.

TAGS
leetcode
CONTRIBUTOR
Design Gurus Team
-

GET YOUR FREE

Coding Questions Catalog

Design Gurus Newsletter - Latest from our Blog
Boost your coding skills with our essential coding questions catalog.
Take a step towards a better tech career now!
Explore Answers
What is the hardest thing to learn in coding?
Are Apple interviews difficult?
How to check the pandas version in Python?
Related Courses
Image
Grokking the Coding Interview: Patterns for Coding Questions
Grokking the Coding Interview Patterns in Java, Python, JS, C++, C#, and Go. The most comprehensive course with 476 Lessons.
Image
Grokking Modern AI Fundamentals
Master the fundamentals of AI today to lead the tech revolution of tomorrow.
Image
Grokking Data Structures & Algorithms for Coding Interviews
Unlock Coding Interview Success: Dive Deep into Data Structures and Algorithms.
Image
One-Stop Portal For Tech Interviews.
Copyright © 2025 Design Gurus, LLC. All rights reserved.
;