2707. Extra Characters in a String - Detailed Explanation

Free Coding Questions Catalog
Boost your coding skills with our essential coding questions catalog. Take a step towards a better tech career now!

Problem Statement

You are given a string s and an array of strings dictionary. You want to partition s into substrings such that each substring appears in dictionary. However, some characters in s might not be used in forming these dictionary words. Such characters are called extra characters. Your goal is to remove extra characters in order to form a valid segmentation of s where every remaining substring is a word from dictionary. Return the minimum number of extra characters left over after such a partitioning.

In other words, you need to choose a set of substrings (taken in order from s) that are in dictionary and minimize the total number of characters in s that are not covered by any chosen substring.

Example 1

Input:

s = "leetscode"
dictionary = ["leet", "code", "leetcode"]

Output:

1

Explanation:

  • One optimal segmentation is to form "leet" + "s" + "code".
  • The substrings "leet" and "code" are valid dictionary words.
  • The extra character is "s" which is not part of any dictionary word.
  • Thus, the minimum number of extra characters is 1.

Example 2

Input:

s = "sayhelloworld"
dictionary = ["hello", "world"]

Output:

3

Explanation:

  • An optimal segmentation is "say" + "hello" + "world".
  • Here, "hello" and "world" are valid dictionary words.
  • The extra characters are "s", "a", "y" (i.e. the substring "say" is extra since it is not in the dictionary).
  • Thus, the answer is 3.

Constraints

  • 1 ≤ s.length ≤ 50
  • 1 ≤ dictionary.length ≤ 50
  • 1 ≤ dictionary[i].length ≤ 50
  • s and dictionary[i] consist of only lowercase English letters.

Hints

  1. Dynamic Programming (DP) Formulation:
    Think about defining a DP array where dp[i] represents the minimum number of extra characters from index i to the end of the string.

  2. Choice at Each Index:
    At any position i in s, you have two choices:

    • Consider the character at s[i] as an extra character (and then add 1 to dp[i+1]).
    • Try to match a dictionary word starting at index i. For every valid word w where s[i:j+1] == w, you can "cover" that part and set dp[i] = min(dp[i], dp[j+1]).
  3. Dictionary Lookup:
    Use a hash set for dictionary words so that you can quickly check if a substring is valid.

  4. Bottom-Up vs. Top-Down:
    A bottom-up DP solution (iterating from the end of s backwards) works naturally for this problem.

Approaches

Approach 1: Brute Force (Inefficient)

  • Idea:
    Try every possible segmentation of s and count the extra characters for each segmentation.

  • Downside:
    The number of segmentations is exponential in the length of s, making this approach impractical even for moderate-sized inputs.

Approach 2: Dynamic Programming (Optimal)

  • Idea:
    Use a DP array where dp[i] gives the minimum extra characters needed for the substring starting at index i.

  • DP Recurrence:
    For each index i from the end of s to the beginning:

    • Initialize dp[i] = dp[i+1] + 1 (treating s[i] as extra).
    • Then, for every index j from i to the end:
      • If the substring s[i:j+1] is in the dictionary, update dp[i] = min(dp[i], dp[j+1]).
  • Final Answer:
    The answer is dp[0], representing the minimum extra characters for the entire string.

  • Why It Works:
    The DP approach considers both keeping characters as extra and grouping valid dictionary words, ensuring that every possibility is considered in an efficient manner.

Step-by-Step Walkthrough (Example: "leetscode")

Consider s = "leetscode" and dictionary = ["leet", "code", "leetcode"].

  1. Initialization:

    • Let n = s.length = 9.
    • Create a DP array dp of size n+1 where dp[n] = 0 (base case: no extra characters beyond the end).
  2. Iteration from Right to Left:

    • Index 8 (s[8] = 'e'):
      • Assume 'e' is extra: dp[8] = dp[9] + 1 = 1.
      • Check substrings starting at 8:
        Only "e" is possible; it is not in the dictionary.
    • Index 7 (s[7] = 'd'):
      • Default: dp[7] = dp[8] + 1 = 2.
      • Check "de" (s[7:9]): not in dictionary.
    • Index 6 (s[6] = 'o'):
      • Default: dp[6] = dp[7] + 1 = 3.
      • Check "ode" (s[6:9]): not in dictionary.
    • Index 5 (s[5] = 'c'):
      • Default: dp[5] = dp[6] + 1 = 4.
      • Check "code" (s[5:9]): "code" is in dictionary, so dp[5] = min(4, dp[9]) = 0.
    • Index 4 (s[4] = 's'):
      • Default: dp[4] = dp[5] + 1 = 1.
      • Check "scode": not in dictionary.
    • Index 3 (s[3] = 't'):
      • Default: dp[3] = dp[4] + 1 = 2.
      • Check "tscode": not in dictionary.
    • Index 2 (s[2] = 'e'):
      • Default: dp[2] = dp[3] + 1 = 3.
      • Check "etscode": not in dictionary.
    • Index 1 (s[1] = 'e'):
      • Default: dp[1] = dp[2] + 1 = 4.
      • Check "eetscode": not in dictionary.
    • Index 0 (s[0] = 'l'):
      • Default: dp[0] = dp[1] + 1 = 5.
      • Check "leet" (s[0:4]): "leet" is in dictionary, so dp[0] = min(5, dp[4]) = min(5, 1) = 1.
      • Also, check "leetscode" (s[0:9]): not in dictionary.
  3. Result:

    • dp[0] = 1 which is the minimum number of extra characters.

Code Implementation

Python Code

Python3
Python3

. . . .

Java Code

Java
Java

. . . .

Complexity Analysis

  • Time Complexity:

    • For each index i, we consider every substring s[i:j+1] (where j goes from i to n-1).

    • In the worst case, this results in O(n²) substring checks.

    • Each substring lookup in the dictionary (using a hash set) takes O(1) on average.

    • Overall Time Complexity: O(n²).

  • Space Complexity:

    • The DP array uses O(n) space.
    • The dictionary set uses O(m) space where m is the total length of all dictionary words.
    • Overall Space Complexity: O(n + m).

Common Mistakes and Edge Cases

  1. Not Using an Efficient Lookup:

    • Using a list for dictionary lookup instead of a hash set can lead to slower performance.
  2. Incorrect DP Initialization:

    • Ensure that the base case dp[n] = 0 is set correctly; otherwise, the recursion will not terminate properly.
  3. Substring Boundaries:

    • Be careful with the indices when forming substrings. Off-by-one errors are common when iterating over indices.
  4. Edge Cases:

    • When s is already entirely composed of dictionary words, the answer should be 0.
    • When no dictionary word can be matched starting at some index, the algorithm must correctly count those characters as extra.
  • Word Break Problems:
    Similar techniques (using dynamic programming) are used in word break problems where you determine if a string can be segmented into a space-separated sequence of dictionary words.

  • Minimum Unrecognized Characters:
    Variations where you try to minimize the number of unrecognized or extra characters after segmenting a string.

  • Segmenting Strings with Constraints:
    Problems that involve partitioning a string into valid segments under various constraints (e.g., maximum segmentation cost).

  • Word Break
  • Word Break II
  • Longest Valid Parentheses
  • Minimum Edit Distance
  • Concatenated Words
TAGS
leetcode
CONTRIBUTOR
Design Gurus Team
-

GET YOUR FREE

Coding Questions Catalog

Design Gurus Newsletter - Latest from our Blog
Boost your coding skills with our essential coding questions catalog.
Take a step towards a better tech career now!
Explore Answers
Related Courses
Grokking the Coding Interview: Patterns for Coding Questions
Grokking the Coding Interview Patterns in Java, Python, JS, C++, C#, and Go. The most comprehensive course with 476 Lessons.
Grokking Modern AI Fundamentals
Master the fundamentals of AI today to lead the tech revolution of tomorrow.
Grokking Data Structures & Algorithms for Coding Interviews
Unlock Coding Interview Success: Dive Deep into Data Structures and Algorithms.
Image
One-Stop Portal For Tech Interviews.
Copyright © 2025 Design Gurus, LLC. All rights reserved.
;