2707. Extra Characters in a String - Detailed Explanation

Free Coding Questions Catalog

Boost your coding skills with our essential coding questions catalog. Take a step towards a better tech career now!

Problem Statement

You are given a string s and an array of strings dictionary. You want to partition s into substrings such that each substring appears in dictionary. However, some characters in s might not be used in forming these dictionary words. Such characters are called extra characters. Your goal is to remove extra characters in order to form a valid segmentation of s where every remaining substring is a word from dictionary. Return the minimum number of extra characters left over after such a partitioning.

In other words, you need to choose a set of substrings (taken in order from s) that are in dictionary and minimize the total number of characters in s that are not covered by any chosen substring.

Example 1

Input:

s = "leetscode"
dictionary = ["leet", "code", "leetcode"]

Output:

Explanation:

One optimal segmentation is to form "leet" + "s" + "code".
The substrings "leet" and "code" are valid dictionary words.
The extra character is "s" which is not part of any dictionary word.
Thus, the minimum number of extra characters is 1.

Example 2

Input:

s = "sayhelloworld"
dictionary = ["hello", "world"]

Output:

Explanation:

An optimal segmentation is "say" + "hello" + "world".
Here, "hello" and "world" are valid dictionary words.
The extra characters are "s", "a", "y" (i.e. the substring "say" is extra since it is not in the dictionary).
Thus, the answer is 3.

Constraints

1 ≤ s.length ≤ 50
1 ≤ dictionary.length ≤ 50
1 ≤ dictionary[i].length ≤ 50
s and dictionary[i] consist of only lowercase English letters.

Hints

Dynamic Programming (DP) Formulation:
Think about defining a DP array where dp[i] represents the minimum number of extra characters from index i to the end of the string.
Choice at Each Index:
At any position i in s, you have two choices:
- Consider the character at s[i] as an extra character (and then add 1 to dp[i+1]).
- Try to match a dictionary word starting at index i. For every valid word w where s[i:j+1] == w, you can "cover" that part and set dp[i] = min(dp[i], dp[j+1]).
Dictionary Lookup:
Use a hash set for dictionary words so that you can quickly check if a substring is valid.
Bottom-Up vs. Top-Down:
A bottom-up DP solution (iterating from the end of s backwards) works naturally for this problem.

Approaches

Approach 1: Brute Force (Inefficient)

Idea:
Try every possible segmentation of s and count the extra characters for each segmentation.
Downside:
The number of segmentations is exponential in the length of s, making this approach impractical even for moderate-sized inputs.

Approach 2: Dynamic Programming (Optimal)

Idea:
Use a DP array where dp[i] gives the minimum extra characters needed for the substring starting at index i.
DP Recurrence:
For each index i from the end of s to the beginning:
- Initialize dp[i] = dp[i+1] + 1 (treating s[i] as extra).
- Then, for every index j from i to the end:
  - If the substring s[i:j+1] is in the dictionary, update dp[i] = min(dp[i], dp[j+1]).
Final Answer:
The answer is dp[0], representing the minimum extra characters for the entire string.
Why It Works:
The DP approach considers both keeping characters as extra and grouping valid dictionary words, ensuring that every possibility is considered in an efficient manner.

Step-by-Step Walkthrough (Example: "leetscode")

Consider s = "leetscode" and dictionary = ["leet", "code", "leetcode"].

Initialization:
- Let n = s.length = 9.
- Create a DP array dp of size n+1 where dp[n] = 0 (base case: no extra characters beyond the end).
Iteration from Right to Left:
- Index 8 (s[8] = 'e'):
  - Assume 'e' is extra: dp[8] = dp[9] + 1 = 1.
  - Check substrings starting at 8:
    Only "e" is possible; it is not in the dictionary.
- Index 7 (s[7] = 'd'):
  - Default: dp[7] = dp[8] + 1 = 2.
  - Check "de" (s[7:9]): not in dictionary.
- Index 6 (s[6] = 'o'):
  - Default: dp[6] = dp[7] + 1 = 3.
  - Check "ode" (s[6:9]): not in dictionary.
- Index 5 (s[5] = 'c'):
  - Default: dp[5] = dp[6] + 1 = 4.
  - Check "code" (s[5:9]): "code" is in dictionary, so dp[5] = min(4, dp[9]) = 0.
- Index 4 (s[4] = 's'):
  - Default: dp[4] = dp[5] + 1 = 1.
  - Check "scode": not in dictionary.
- Index 3 (s[3] = 't'):
  - Default: dp[3] = dp[4] + 1 = 2.
  - Check "tscode": not in dictionary.
- Index 2 (s[2] = 'e'):
  - Default: dp[2] = dp[3] + 1 = 3.
  - Check "etscode": not in dictionary.
- Index 1 (s[1] = 'e'):
  - Default: dp[1] = dp[2] + 1 = 4.
  - Check "eetscode": not in dictionary.
- Index 0 (s[0] = 'l'):
  - Default: dp[0] = dp[1] + 1 = 5.
  - Check "leet" (s[0:4]): "leet" is in dictionary, so dp[0] = min(5, dp[4]) = min(5, 1) = 1.
  - Also, check "leetscode" (s[0:9]): not in dictionary.
Result:
- dp[0] = 1 which is the minimum number of extra characters.

Code Implementation

Python Code

Python3

. . . .

Java Code

Java