819. Most Common Word - Detailed Explanation
Problem Statement
Given a string paragraph and an array of strings banned, return the most frequent word in the paragraph that is not banned. The answer should be returned in lowercase.
- The paragraph may contain punctuation and mixed case letters.
- Words are defined as a sequence of letters, and punctuation should be ignored.
- It is guaranteed that at least one word that is not banned exists and that the answer is unique.
Example 1
Input:
paragraph = "Bob hit a ball, the hit BALL flew far after it was hit."
banned = ["hit"]
Output:
"ball"
Explanation:
After converting all words to lowercase and removing punctuation, the paragraph becomes:
["bob", "hit", "a", "ball", "the", "hit", "ball", "flew", "far", "after", "it", "was", "hit"]
The word "hit" appears 3 times but is banned. The word "ball" appears 2 times and is the most frequent non-banned word.
Example 2
Input:
paragraph = "a, a, a, a, b,b,b,c, c"
banned = ["a"]
Output:
"b"
Explanation:
After processing, the words are: ["a", "a", "a", "a", "b", "b", "b", "c", "c"]. With "a" banned, "b" is the most frequent word.
Constraints
- 1 ≤ paragraph.length ≤ 1000
- 0 ≤ banned.length ≤ 100
- 1 ≤ banned[i].length ≤ 10
- The answer is unique, and it is guaranteed that at least one word that is not banned exists.
Hints
-
String Processing:
- Convert the entire paragraph to lowercase to handle case-insensitivity.
- Use regular expressions (or other string manipulation techniques) to replace punctuation with spaces so that words are properly separated.
-
Frequency Count:
- Use a hash map (dictionary) to count the frequency of each word in the processed paragraph.
- Skip any word that is in the banned list.
-
Finding the Maximum:
- Iterate over the frequency map to find the word with the highest count among those not banned.
Approach 1: Dictionary (Hash Map) Based Frequency Count
Explanation
-
Normalize the Paragraph:
- Convert to lowercase.
- Remove punctuation (e.g.,
!?',;.
) by replacing them with spaces.
-
Split into Words:
- Split the processed paragraph into a list of words using whitespace as a delimiter.
-
Count Word Frequencies:
- Use a dictionary to count how many times each word appears.
-
Filter Banned Words:
- Iterate over the dictionary and ignore words present in the banned list.
-
Determine the Most Frequent Word:
- Track the word with the highest frequency that is not banned.
Complexity
-
Time Complexity: O(n + m)
-
O(n) for processing the paragraph (where n is the number of characters)
-
O(m) for iterating through the words (where m is the number of words)
-
-
Space Complexity: O(m + k)
- m for storing the words and their counts, and k for the banned words set.
Python Code
Java Code
Step-by-Step Walkthrough
-
Normalize the Input:
- Convert the paragraph to lowercase so that "Bob" and "bob" are treated the same.
- Replace all punctuation with spaces using a regular expression (Python) or
replaceAll
(Java).
-
Split the Paragraph:
- Split the resulting string into words based on whitespace.
-
Create a Banned Set:
- Convert the banned words array into a set to allow O(1) lookups.
-
Count Frequencies:
- Iterate over each word in the list.
- If the word is not in the banned set, update its frequency count in a dictionary (Python) or HashMap (Java).
-
Determine the Most Frequent Non-Banned Word:
- Traverse the frequency map and track the word with the highest frequency.
Common Mistakes and Edge Cases
-
Punctuation Handling:
- Failing to replace punctuation correctly can lead to words being miscounted (e.g., "ball," vs "ball").
-
Case Sensitivity:
- Not converting the paragraph to lowercase may cause duplicates in different cases.
-
Empty or All Banned Words:
- Ensure that the banned list is handled properly. The problem guarantees at least one valid word exists, but your solution should still be robust.
-
Multiple Spaces:
- Extra spaces from punctuation replacement can lead to empty strings when splitting. Ensure your splitting mechanism ignores these.
Alternative Variations
- Custom Separators:
- The problem could be modified to use different sets of delimiters for words.
- Top K Frequent Words:
- A variation might require returning the top K frequent non-banned words instead of just the most frequent one.
Related Problems
-
Top K Frequent Words (LeetCode 692):
- Find the K most frequent words in a paragraph or list of words.
-
- Involves categorizing words based on their character composition.
GET YOUR FREE
Coding Questions Catalog