2707. Extra Characters in a String - Detailed Explanation

Free Coding Questions Catalog
Boost your coding skills with our essential coding questions catalog. Take a step towards a better tech career now!

Problem Statement

You are given a string s and an array of strings dictionary. You want to partition s into substrings such that each substring appears in dictionary. However, some characters in s might not be used in forming these dictionary words. Such characters are called extra characters. Your goal is to remove extra characters in order to form a valid segmentation of s where every remaining substring is a word from dictionary. Return the minimum number of extra characters left over after such a partitioning.

In other words, you need to choose a set of substrings (taken in order from s) that are in dictionary and minimize the total number of characters in s that are not covered by any chosen substring.

Example 1

Input:

s = "leetscode"
dictionary = ["leet", "code", "leetcode"]

Output:

1

Explanation:

  • One optimal segmentation is to form "leet" + "s" + "code".
  • The substrings "leet" and "code" are valid dictionary words.
  • The extra character is "s" which is not part of any dictionary word.
  • Thus, the minimum number of extra characters is 1.

Example 2

Input:

s = "sayhelloworld"
dictionary = ["hello", "world"]

Output:

3

Explanation:

  • An optimal segmentation is "say" + "hello" + "world".
  • Here, "hello" and "world" are valid dictionary words.
  • The extra characters are "s", "a", "y" (i.e. the substring "say" is extra since it is not in the dictionary).
  • Thus, the answer is 3.

Constraints

  • 1 ≤ s.length ≤ 50
  • 1 ≤ dictionary.length ≤ 50
  • 1 ≤ dictionary[i].length ≤ 50
  • s and dictionary[i] consist of only lowercase English letters.

Hints

  1. Dynamic Programming (DP) Formulation:
    Think about defining a DP array where dp[i] represents the minimum number of extra characters from index i to the end of the string.

  2. Choice at Each Index:
    At any position i in s, you have two choices:

    • Consider the character at s[i] as an extra character (and then add 1 to dp[i+1]).
    • Try to match a dictionary word starting at index i. For every valid word w where s[i:j+1] == w, you can "cover" that part and set dp[i] = min(dp[i], dp[j+1]).
  3. Dictionary Lookup:
    Use a hash set for dictionary words so that you can quickly check if a substring is valid.

  4. Bottom-Up vs. Top-Down:
    A bottom-up DP solution (iterating from the end of s backwards) works naturally for this problem.

Approaches

Approach 1: Brute Force (Inefficient)

  • Idea:
    Try every possible segmentation of s and count the extra characters for each segmentation.

  • Downside:
    The number of segmentations is exponential in the length of s, making this approach impractical even for moderate-sized inputs.

Approach 2: Dynamic Programming (Optimal)

  • Idea:
    Use a DP array where dp[i] gives the minimum extra characters needed for the substring starting at index i.

  • DP Recurrence:
    For each index i from the end of s to the beginning:

    • Initialize dp[i] = dp[i+1] + 1 (treating s[i] as extra).
    • Then, for every index j from i to the end:
      • If the substring s[i:j+1] is in the dictionary, update dp[i] = min(dp[i], dp[j+1]).
  • Final Answer:
    The answer is dp[0], representing the minimum extra characters for the entire string.

  • Why It Works:
    The DP approach considers both keeping characters as extra and grouping valid dictionary words, ensuring that every possibility is considered in an efficient manner.

Step-by-Step Walkthrough (Example: "leetscode")

Consider s = "leetscode" and dictionary = ["leet", "code", "leetcode"].

  1. Initialization:

    • Let n = s.length = 9.
    • Create a DP array dp of size n+1 where dp[n] = 0 (base case: no extra characters beyond the end).
  2. Iteration from Right to Left:

    • Index 8 (s[8] = 'e'):
      • Assume 'e' is extra: dp[8] = dp[9] + 1 = 1.
      • Check substrings starting at 8:
        Only "e" is possible; it is not in the dictionary.
    • Index 7 (s[7] = 'd'):
      • Default: dp[7] = dp[8] + 1 = 2.
      • Check "de" (s[7:9]): not in dictionary.
    • Index 6 (s[6] = 'o'):
      • Default: dp[6] = dp[7] + 1 = 3.
      • Check "ode" (s[6:9]): not in dictionary.
    • Index 5 (s[5] = 'c'):
      • Default: dp[5] = dp[6] + 1 = 4.
      • Check "code" (s[5:9]): "code" is in dictionary, so dp[5] = min(4, dp[9]) = 0.
    • Index 4 (s[4] = 's'):
      • Default: dp[4] = dp[5] + 1 = 1.
      • Check "scode": not in dictionary.
    • Index 3 (s[3] = 't'):
      • Default: dp[3] = dp[4] + 1 = 2.
      • Check "tscode": not in dictionary.
    • Index 2 (s[2] = 'e'):
      • Default: dp[2] = dp[3] + 1 = 3.
      • Check "etscode": not in dictionary.
    • Index 1 (s[1] = 'e'):
      • Default: dp[1] = dp[2] + 1 = 4.
      • Check "eetscode": not in dictionary.
    • Index 0 (s[0] = 'l'):
      • Default: dp[0] = dp[1] + 1 = 5.
      • Check "leet" (s[0:4]): "leet" is in dictionary, so dp[0] = min(5, dp[4]) = min(5, 1) = 1.
      • Also, check "leetscode" (s[0:9]): not in dictionary.
  3. Result:

    • dp[0] = 1 which is the minimum number of extra characters.

Code Implementation

Python Code

Python3
Python3

. . . .

Java Code

Java
Java

. . . .

Complexity Analysis

  • Time Complexity:

    • For each index i, we consider every substring s[i:j+1] (where j goes from i to n-1).

    • In the worst case, this results in O(n²) substring checks.

    • Each substring lookup in the dictionary (using a hash set) takes O(1) on average.

    • Overall Time Complexity: O(n²).

  • Space Complexity:

    • The DP array uses O(n) space.
    • The dictionary set uses O(m) space where m is the total length of all dictionary words.
    • Overall Space Complexity: O(n + m).

Common Mistakes and Edge Cases

  1. Not Using an Efficient Lookup:

    • Using a list for dictionary lookup instead of a hash set can lead to slower performance.
  2. Incorrect DP Initialization:

    • Ensure that the base case dp[n] = 0 is set correctly; otherwise, the recursion will not terminate properly.
  3. Substring Boundaries:

    • Be careful with the indices when forming substrings. Off-by-one errors are common when iterating over indices.
  4. Edge Cases:

    • When s is already entirely composed of dictionary words, the answer should be 0.
    • When no dictionary word can be matched starting at some index, the algorithm must correctly count those characters as extra.
  • Word Break Problems:
    Similar techniques (using dynamic programming) are used in word break problems where you determine if a string can be segmented into a space-separated sequence of dictionary words.

  • Minimum Unrecognized Characters:
    Variations where you try to minimize the number of unrecognized or extra characters after segmenting a string.

  • Segmenting Strings with Constraints:
    Problems that involve partitioning a string into valid segments under various constraints (e.g., maximum segmentation cost).

  • Word Break
  • Word Break II
  • Longest Valid Parentheses
  • Minimum Edit Distance
  • Concatenated Words
TAGS
leetcode
CONTRIBUTOR
Design Gurus Team
-

GET YOUR FREE

Coding Questions Catalog

Design Gurus Newsletter - Latest from our Blog
Boost your coding skills with our essential coding questions catalog.
Take a step towards a better tech career now!
Explore Answers
Mastering complexity analysis for advanced coding questions
Is a Spotify interview hard?
Emphasizing result-driven outcomes in behavioral interview stories
Related Courses
Image
Grokking the Coding Interview: Patterns for Coding Questions
Grokking the Coding Interview Patterns in Java, Python, JS, C++, C#, and Go. The most comprehensive course with 476 Lessons.
Image
Grokking Modern AI Fundamentals
Master the fundamentals of AI today to lead the tech revolution of tomorrow.
Image
Grokking Data Structures & Algorithms for Coding Interviews
Unlock Coding Interview Success: Dive Deep into Data Structures and Algorithms.
Image
One-Stop Portal For Tech Interviews.
Copyright © 2025 Design Gurus, LLC. All rights reserved.
;