616. Add Bold Tag in String - Detailed Explanation
Problem Statement
Description:
You are given a string s and an array of strings dict (or words). Your goal is to return a new string where every substring of s that exists in dict is wrapped in <b>
and </b>
tags. When two or more of these substrings overlap or are adjacent, they should be merged together and wrapped by only one pair of bold tags.
Example 1:
- Input:
- s =
"abcxyz123"
- dict =
["abc", "123"]
- s =
- Output:
"<b>abc</b>xyz<b>123</b>"
- Explanation:
The substrings"abc"
and"123"
are found in the string and are wrapped with bold tags.
Example 2:
- Input:
- s =
"aaabbcc"
- dict =
["aaa", "aab", "bc"]
- s =
- Output:
"<b>aaabbc</b>c"
- Explanation:
- Both
"aaa"
and"aab"
cover a large part of the beginning and overlap, and"bc"
occurs next. These intervals merge to form a single bold region. The final character remains un-bolded.
- Both
Constraints:
- 1 ≤ s.length ≤ 1000 (or larger, depending on the problem version)
- 1 ≤ dict.length ≤ 100
- 1 ≤ dict[i].length ≤ s.length
- s and dict[i] consist of English letters and/or digits.
Hints
-
Hint 1:
Think about how you can determine which characters in s should be bold. A common strategy is to mark the positions that are part of any word in dict. -
Hint 2:
Once you have marked the bold positions, you will need to merge adjacent or overlapping intervals so that you do not insert nested or redundant<b>
tags. -
Hint 3:
Consider iterating over s with a sliding window for each word in dict. For every occurrence, mark the indices (or create an interval) to later merge them and add tags.
Approach 1: Marking with a Boolean Array
Idea
-
Mark Bold Regions:
Create a boolean array (of length equal to s) that indicates whether each character should be bold. For each word in dict, search for its occurrences in s. For each occurrence starting at index i with length l, mark indices i through i + l - 1 astrue
. -
Merge Bold Regions:
Iterate through the boolean array and whenever you detect a sequence oftrue
values, that segment represents a region to be bolded. Insert<b>
at the start and</b>
at the end of this segment. -
Build the Result:
Combine the non-bold and bold parts to build the final string.
Walkthrough Example
Let’s use s = "aaabbcc"
and dict = ["aaa", "aab", "bc"]
.
-
Step 1:
Initialize a boolean arraybold = [false, false, false, false, false, false, false]
. -
Step 2:
For each word:- For
"aaa"
:- It is found at index 0 (substring
"aaa"
). Mark positions 0, 1, 2 →bold = [true, true, true, false, false, false, false]
.
- It is found at index 0 (substring
- For
"aab"
:- It is found at index 1 (substring
"aab"
). Mark positions 1, 2, 3 →bold = [true, true, true, true, false, false, false]
.
- It is found at index 1 (substring
- For
"bc"
:- It is found at index 4 (substring
"bc"
). Mark positions 4 and 5 →bold = [true, true, true, true, true, true, false]
.
- It is found at index 4 (substring
- For
-
Step 3:
Scan through the boolean array:- From index 0 to 5, all are
true
so these characters are merged into one bold segment. - At index 6, the value is
false
.
- From index 0 to 5, all are
-
Step 4:
Insert<b>
at index 0 and</b>
after index 5. The resulting string becomes:
"<b>aaabbc</b>c"
.
Python Code
Java Code
Approach 2: Merging Intervals
Idea
-
Find All Matching Intervals:
For each word in dict, scan s for its occurrences and record the start and end indices as intervals. -
Merge Overlapping Intervals:
Sort the intervals and merge them if they overlap or are consecutive. This yields a list of non-overlapping intervals that represent the bold regions. -
Insert Bold Tags:
Use the merged intervals to insert<b>
and</b>
in the appropriate positions in s.
Discussion
- Pros:
– This approach cleanly separates finding the positions and then merging them. - Cons:
– It requires extra space for storing intervals and involves sorting, which is O(k log k) (where k is the number of intervals).
Complexity Analysis
-
Time Complexity:
- For Approach 1 (Boolean Array):
- For each word, we search the string and mark positions. In the worst case (if many words are present and they occur often), this takes O(n * m) where n is the length of s and m is the number of words.
- Building the final string is O(n).
- Overall: Approximately O(n * m).
- For Approach 1 (Boolean Array):
-
Space Complexity:
- O(n) for the boolean array and O(n) for building the result.
-
For Approach 2 (Intervals):
- The interval collection and merging also take O(n * m) time in the worst case and O(k) space for storing intervals.
Step-by-Step Walkthrough and Visual Example
Using s = "aaabbcc"
and dict = ["aaa", "aab", "bc"]
with the Boolean array method:
-
Initialize:
- s =
"aaabbcc"
- bold = [false, false, false, false, false, false, false]
- s =
-
Mark Occurrences:
- For
"aaa"
: found at index 0 → mark indices 0, 1, 2.
bold becomes:[true, true, true, false, false, false, false]
- For
"aab"
: found at index 1 → mark indices 1, 2, 3.
bold becomes:[true, true, true, true, false, false, false]
- For
"bc"
: found at index 4 → mark indices 4 and 5.
bold becomes:[true, true, true, true, true, true, false]
- For
-
Build the Result String:
- Iterate over s using the bold array.
- From index 0 to 5, all positions are
true
, so wrap these in<b>
and</b>
. - Append the last character (index 6) normally.
- Final result:
"<b>aaabbc</b>c"
.
Common Mistakes and Edge Cases
Common Mistakes
-
Forgetting to Merge Overlapping Regions:
If you mark positions individually without merging, you might insert multiple nested<b>
tags. -
Incorrect Interval Boundaries:
Be careful with the start and end indices—ensure that you mark all characters that belong to a matching word. -
Handling Adjacent Bold Regions:
Two bold regions that touch must be merged into one continuous bold region.
Edge Cases
-
No Matches:
If none of the words in dict occur in s, return s without any modifications. -
Empty String:
If s is empty, return an empty string. -
Overlapping Matches:
Words in dict might overlap (for example,"ab"
and"bc"
in"abc"
), so the algorithm must correctly merge the bold regions.
Variations
-
Different Tag Wrapping:
Instead of bold tags, you may be required to wrap matching substrings with other tags or symbols. -
Case-Insensitive Matching:
Some variations might require matching words in a case-insensitive manner. -
Multiple Patterns with Priorities:
A more advanced variation might involve different patterns that have different formatting rules.
Related Problems for Further Practice
-
Merge Intervals:
A classic problem that involves merging overlapping intervals. -
Substring Search / Pattern Matching:
Problems that require finding occurrences of a pattern within a text. -
Text Formatting:
Problems that involve modifying a string by inserting markers based on certain conditions.
GET YOUR FREE
Coding Questions Catalog
