139. Word Break - Detailed Explanation
Problem Statement
Given a string s
and a dictionary of strings wordDict
, determine if s
can be segmented into a space-separated sequence of one or more dictionary words.
Note: The same dictionary word may be reused multiple times in the segmentation.
Example 1:
Input: s = "leetcode"
, wordDict = ["leet", "code"]
Output: true
Explanation: The string can be segmented as "leet code"
. The prefix "leet"
and suffix "code"
are both in the dictionary, so the whole string can be composed of dictionary words.
Example 2:
Input: s = "applepenapple"
, wordDict = ["apple", "pen"]
Output: true
Explanation: The string can be segmented as "apple pen apple"
. Here the word "apple"
from the dictionary is used twice (reuse of dictionary words is allowed).
Example 3:
Input: s = "catsandog"
, wordDict = ["cats", "dog", "sand", "and", "cat"]
Output: false
Explanation: Even though "cats"
and "and"
are valid dictionary words that appear in the string, there is no valid word to cover the remaining "og"
at the end. No complete segmentation of "catsandog"
into dictionary words is possible.
Constraints:
1 <= s.length <= 300
1 <= wordDict.length <= 1000
1 <= wordDict[i].length <= 20
s
andwordDict[i]
consist of only lowercase English letters- All words in
wordDict
are unique
Hints Before the Solution
-
Explore All Partitions: Try to break the string at every possible position. Consider every prefix of the string and check if it is a valid word in the dictionary. What happens to the remaining suffix if the prefix is valid?
-
Recursion & Backtracking: A brute force approach would involve recursion or backtracking. If a prefix is a valid word, recursively attempt to break the rest of the string. If you reach an empty suffix successfully, the segmentation is valid.
-
Reuse of Words: Remember that dictionary words can be reused. For example, after using a word, you can still use it again later in the segmentation.
-
Avoid Repeated Work: The naive approach will end up checking the same substrings repeatedly. Can you store results (memoization) to avoid recomputing solutions for the same suffix over and over?
-
Dynamic Programming: Consider a DP approach where you build up a solution by solving smaller subproblems. For instance, define a boolean DP array where
dp[i]
indicates whether the substrings[0..i-1]
(the firsti
characters) can be segmented using the dictionary. How would you formulate the recurrence fordp[i]
?
Multiple Approaches
Brute Force Approach
Idea: The brute force solution tries all possible ways to split the string. We can recursively check every prefix of the string to see if it’s in the dictionary, and then solve the remaining substring with the same approach. This is essentially a recursion with backtracking: at each position, decide to cut the string if the prefix is a dictionary word, and recursively check if the suffix can be broken down. If any sequence of valid cuts leads to consuming the entire string, return true. Otherwise, if we exhaust all options, return false.
How it works: We define a recursive function breakable(substring)
that returns true if substring
can be formed by dictionary words. Initially call breakable(s)
. For each recursive call on a substring:
-
If the substring is empty, return true (we have segmented the entire original string).
-
Otherwise, loop over all possible prefix lengths from 1 to
len(substring)
. For each prefix:- If the prefix is in
wordDict
, recursively check if the rest of the substring (suffix) can be segmented. - If the recursive call returns true for the suffix, propagate true upward (we found a valid break).
- If the prefix is in
-
If no prefix works out, return false for this substring.
This approach will explore every possible partitioning of the string. It works but is very inefficient for long strings because it recomputes results for overlapping suffixes many times.
Time Complexity: Exponential in the worst case. In the worst scenario, each partition yields two recursive paths (cut or not cut), leading to roughly 2^n possibilities. For example, a string like "aaaaaaa..."
with dictionary ["a","aa","aaa",...]
will cause many overlapping recursive calls. Thus, the time complexity is O(2^n) in the worst case (exponential growth).
Space Complexity: The recursion depth can go up to n in the worst case (e.g., cutting one character at a time), so space complexity due to the call stack is O(n). We also use O(n) space to create prefix substrings during recursion. If using a set for the dictionary, that uses O(m) space (where m is total characters in the dictionary).
Below is the brute force solution implemented in Python and Java:
Python Implementation (Brute Force):
Java Implementation (Brute Force):
Analysis: The brute force approach correctly finds a solution if it exists, but it is extremely slow for larger inputs due to repeated recalculation of subproblems. For instance, if a suffix of the string can be formed by dictionary words, that fact is rediscovered multiple times in a naive recursion. In the worst case with no valid segmentation, the algorithm may try all possible ways to split the string, leading to roughly 2^n calls. This becomes infeasible as n grows (for n=300, 2^{300} is astronomically large). The space usage is proportional to the recursion depth and any additional data structures (O(n) stack and O(n) for substring storage in worst case).
Optimized Approach (Dynamic Programming)
Idea: We can dramatically improve efficiency using Dynamic Programming (DP) to avoid recomputing overlapping subproblems. The key observation is that whether a substring can be segmented depends on smaller substrings. We define a boolean DP array dp
of length n+1
(if string length is n). dp[i]
will be True if the substring s[0:i]
(first i characters) can be segmented into dictionary words, and False otherwise. By building this up, we can determine the segmentability of the whole string.
DP Formulation:
-
Initialize
dp[0] = True
because the empty string is trivially segmented (no characters to match). This is the base case (having segmented 0 characters successfully). -
For each position
i
from 1 to n (inclusive), determinedp[i]
by checking all possible split pointsj
from 0 to I:-
If
dp[j]
is True (meaning the prefixs[0:j]
can be segmented) and the substrings[j:i]
(from j to i-1) is a word in the dictionary, then setdp[i] = True
. This means the first i characters can be segmented (split at j, with the segment after j being a valid word). We break out of the inner loop as soon as we find a valid split fori
. -
If no valid j yields a True, then
dp[i]
remains False.
-
-
In the end,
dp[n]
(True or False) tells us if the entire string can be segmented.
Using this method, each substring result is computed once and stored, so we never recompute the segmentation of the same suffix multiple times – this eliminates the redundant recursion paths of the brute force approach.
Optimization: To improve performance, we can use a hash set for quick dictionary lookups (O(1) average time for membership). We can also optimize the inner loop by limiting the length of the substring we check: for example, keep track of the maximum word length in the dictionary (maxWordLen
). When checking s[j:i]
, we only need to consider j
such that i-j <= maxWordLen
(since any segment longer than the longest dictionary word can’t be a valid word). This can significantly cut down the number of checks when wordDict
contains only short words. Another optimization is using memoization via recursion (top-down DP), or using BFS from the start index – these yield the same complexity as the bottom-up DP approach but with different implementations. Regardless of implementation, the DP strategy ensures we solve each subproblem (each index) only once.
Time Complexity: The DP solution runs in polynomial time. The basic DP has two nested loops: for each i
up to n, we iterate over j
from 0 to i, and substring checks or dictionary lookups inside. Checking substring membership in a set is O(substring_length) due to slicing, but since substring length ≤ n, that adds another factor. Overall, the worst-case time complexity is O(n^3) for the naive implementation (two loops and substring operation cost). In practice, using a set and optimizing the inner loop by word lengths leads to about O(n^2) checks. Given the constraints (n ≤ 300, each word ≤ 20 letters), this is efficient.
Space Complexity: We use O(n) space for the dp
array of size n+1. Additionally, storing the dictionary in a set costs O(m) space (where m is the total length of all words in wordDict
). The space is very reasonable for the given limits. If using recursion + memo, space is also O(n) for the memo plus recursion stack. The DP approach does not require additional recursion stack space.
Below is the dynamic programming solution in Python and Java:
Python Implementation (DP):
Java Implementation (DP):
Why this is efficient: Each index i
of the string is solved once. For each i
we scan backward to find a j
that had a valid segmentation and a valid word s[j:i]
. If the string is length 300, the outer loop runs 300 times, and the inner loop runs at most 300 times for each, totaling 90,000 checks. Each check involves a substring lookup in a set (O(1) average). In the worst-case scenario (e.g., a string that cannot be segmented at all and many dictionary words), it could approach the cubic time mentioned, but 300^3 is 27 million — still fine for modern computers. The DP ensures we never repeat a check for the same segment twice; contrast this with the brute force recursion that might explore the same suffix exponential times. Thus, DP brings the solution down to polynomial time, well within the problem’s limits.
Step-by-Step Walkthrough
Let’s walk through an example to see how the solution works step by step. Consider s = "leetcode"
and wordDict = ["leet", "code"]
. We will trace the dynamic programming approach:
-
Initial state:
dp = [True, False, False, False, False, False, False, False, False]
(length 9 for string length 8). Here,dp[0] = True
since an empty string is considered segmented, and all other positions are False initially (we haven’t found any valid segments yet). The indices 1 through 8 correspond to prefixes of"leetcode"
of that length. -
i = 1: Check prefix
s[0:1] = "l"
. The only possible split is at j=0 ("" + "l"
).dp[0]
is True, but"l"
is not in the dictionary.
No valid word ends at position 1, sodp[1]
remains False.
-
i = 2: Consider
s[0:2] = "le"
. We try splits:- j=0 (
"" + "le"
):dp[0]
is True,"le"
not in dict. - j=1 (
"l" + "e"
):dp[1]
is False, so even though"e"
might be a word (it isn’t), the prefix"l"
was not segmentable.
No success, sodp[2]
stays False.
- j=0 (
-
i = 3: Prefix
s[0:3] = "lee"
. Possible splits:- j=0:
dp[0]
True,"lee"
not in dict. - j=1:
dp[1]
False, skip. - j=2:
dp[2]
False, skip. No word ends at 3, sodp[3]
remains False.
- j=0:
-
i = 4: Prefix
s[0:4] = "leet"
. Try splits:- j=0:
dp[0]
True and"leet"
is in dict. Success! We setdp[4] = True
. (We found a valid segmentation:"leet"
+""
.) We break out early because we found a valid word ending at 4. Nowdp
looks like:[True, False, False, False, True, False, False, False, False]
. The prefix"leet"
can be segmented.
- j=0:
-
i = 5: Prefix
s[0:5] = "leetc"
. We try all j:- If j=4,
dp[4]
is True, and we check"c" = s[4:5]
."c"
is not in dict. - Other j (0,1,2,3) have dp False, so skip.
No valid word ends at 5, sodp[5]
stays False.
- If j=4,
-
i = 6: Prefix
"leetco"
. We check j from 0 to 5:- j=4:
dp[4]
True, check"co" = s[4:6]
, not in dict. - Other j have dp False.
dp[6]
remains False.
- j=4:
-
i = 7: Prefix
"leetcod"
. Check splits:- j=4:
dp[4]
True, check"cod" = s[4:7]
, not in dict. - Others j < 4 are False.
dp[7]
stays False.
- j=4:
-
i = 8: Prefix
"leetcode"
. Try splits:- j=4:
dp[4]
is True (we know"leet"
was a valid segment), and check"code" = s[4:8]
."code"
is in the dictionary . This means the substring from index 4 to 7 forms the word"code"
. Sincedp[4]
was True, now we can setdp[8] = True
. We found that"leet"
(prefix) and"code"
(suffix) are both valid words, covering the entire string. - We break out as soon as we find this match, no need to check other j.
- j=4:
Now dp = [True, False, False, False, True, False, False, False, True]
. The last cell dp[8]
is True, indicating the entire string "leetcode"
can be segmented into dictionary words. We return True.
For a visual summary of the DP table progression for "leetcode"
:
Index: 0 1 2 3 4 5 6 7 8
dp: T F F F T F F F T (T=True, F=False)
Here dp[4]
and dp[8]
are True (at indices 4 and 8 corresponding to lengths of valid segmented prefixes). This matches our reasoning: s[0:4] = "leet"
is valid, and s[0:8] = "leetcode"
is valid, whereas no other prefix of intermediate length is fully segmentable.
Now consider why Example 3 ("catsandog"
) returns false. The DP process (or recursion) would find some prefixes:
"cat"
(dp[3] True) and"cats"
(dp[4] True) are valid words from the start.- If we took
"cats"
, then at index 4 we have True. We might later find"and"
as a word at index 7 (dp[7]
True, corresponding to "catsand"). But then from index 7 to 9, the substring"og"
is leftover, which is not in the dictionary. No matter how you segment, the"og"
cannot be matched, leavingdp[9]
False. The algorithm tries all combinations ("cat"+"sandog"
,"cats"+"andog"
, etc.) and finds no full coverage, so the result is False. This aligns with the explanation that having some dictionary words match is not enough – every part of the string must be covered by a dictionary word.
Common Mistakes & Edge Cases
Common Pitfalls:
-
Greedy Choices: Assuming a greedy strategy (always taking the longest possible word first) will work can be a mistake. For example, in
"pineapplepen"
, a greedy approach might take"pineapple"
(if in dict) and then fail to match"pen"
. But a correct segmentation could be"pine"+"applepen"
. You must consider all possible breaks, not just the largest word first. -
Ignoring Reusability: Forgetting that dictionary words can be reused arbitrarily often. If one assumes each word can only be used once, they might incorrectly return False for cases like
"applepenapple"
(where"apple"
is used twice). The problem explicitly allows reuse of words. -
No Memoization: Writing a recursive solution without memoization (or DP) may pass simple tests but will time out on longer strings. It’s a classic mistake to recompute the segmentation of the same suffix multiple times, leading to exponential blow-up.
-
Off-by-One Errors in DP: When implementing the DP, it’s easy to make off-by-one mistakes with indices. Remember that
dp[i]
refers to substrings[0..i-1]
. Ensure that when you check a words[j:i]
, you pair it withdp[j]
(prefix ending just before the word starts). Also be careful to initializedp[0] = True
and iteratei
from 1 to n inclusive. -
Not Using a Set for Dictionary: Checking membership in the word list directly inside the double loop can lead to O(n * n * m) time, which may be too slow. Using a HashSet (or Python set) for
wordDict
makes lookups O(1) on average, significantly speeding up the checks. -
Missing Base Case: In a recursive solution, not handling the empty string case properly can cause errors. Ensure that when the remaining string becomes empty, you return True (meaning all previous segments formed valid words).
Edge Cases to Consider:
-
String of Length 1: If
s
is one character, the solution should simply check if that single character is in the dictionary. -
No Possible Segments: If none of the dictionary words even appear as a substring in
s
, the algorithm should quickly return False. For instance,s = "abcd"
andwordDict = ["ef","gh"]
clearly cannot be segmented. -
Whole String is a Word: If
s
exactly matches one of the dictionary words, the answer should be True (e.g.,s="apple"
,wordDict=["apple","pen"]
-> True). The DP should handle this by checkingdp[len(s)]
after possibly setting it true in one step. -
Dictionary with One-Character Words: e.g.,
s = "aaaa"
,wordDict=["a"]
. This should return True and the algorithm must handle repeated usage of the same word. (Brute force recursion without memo would explore many redundant paths here, but DP handles it efficiently.) -
Empty String or Empty Dictionary: By problem constraints,
s
has at least length 1 and dictionary has at least one word. But conceptually, ifs = ""
(empty string), it should return True (since no characters need to be matched). IfwordDict
is empty (no words), then any non-emptys
should return False. (These scenarios are usually not allowed by the constraints, but good to understand.) -
Overlapping Words: Two dictionary words might share prefixes (e.g.,
wordDict=["ab","abc","cd"]
ands="abcd"
). The algorithm must not stop at the wrong segmentation. In this example,"ab"+"cd"
is correct, whereas taking"abc"
first would leave"d"
which fails. The DP/recursion tries both, ensuring the correct result.
Alternative Variations & Related Problems
This Word Break problem is a classic example of using DP or DFS with memoization to solve a string segmentation task. There are several variations and related problems:
-
Word Break II (LeetCode 140) – Hard: Instead of just returning True/False, this variation asks you to return all possible sentences that can be formed by segmenting the string into dictionary words. The approach requires backtracking (DFS) with memoization to explore all segmentations. It’s more complex because the output can be exponentially large (all combinations). The DP approach from Word Break I can be adapted to check feasibility, but ultimately you need to collect combinations of words (often by building sentences in recursion).
-
Concatenated Words (LeetCode 472) – Given a list of words, find all words that are concatenations of at least two other words in the list. This can be solved by running a Word Break check on each word using the others as the dictionary. Efficient solutions sort the words by length and build up a set of valid words, using DP or DFS (similar to Word Break) for each word.
-
Decode Ways (LeetCode 91) – Although not about dictionary words, it’s a similar DP string problem. Given a string of digits, count how many ways it can be decoded into letters. This is conceptually like segmenting the string into valid codes (like '1'->'A', '26'->'Z'). The state transition is similar (if a prefix is valid, add ways from suffix).
-
Palindrome Partitioning (LeetCode 131) – Here the task is to split a string into substrings such that each substring is a palindrome. The approach uses backtracking or DP to try cuts at every position (similar in spirit to trying dictionary words as cuts). One could use recursion or DP to find all palindromic partitions.
-
BFS/DFS State Search for Segmentation: An alternate way to solve Word Break is using BFS or DFS on indices. For example, treat index 0 as a start state, and from each index, jump to a new index that is +length of a dictionary word if a word matches starting at the current index. This BFS will find a path to the end index n if a valid segmentation exists. The complexity is similar to the DP approach. This is essentially the same problem solved via graph traversal: each index is a node, and there's an edge from j to i if
s[j:i]
is a word. The Word Break problem reduces to finding a path from 0 to n in this implicit graph. This approach must mark visited indices to avoid infinite loops, but it works efficiently and is logically equivalent to the DP method.
GET YOUR FREE
Coding Questions Catalog
