408. Valid Word Abbreviation - Detailed Explanation
Problem Statement
Definition: An abbreviation of a string replaces one or more non-adjacent, non-empty substrings with their respective lengths (written as digits). The length numbers cannot have leading zeros. For example, the word "substitution"
can be abbreviated in many ways:
"s10n"
– here"ubstitutio"
(10 letters) is replaced by10
."sub4u4"
– here"stit"
(4 letters) and"tion"
(4 letters) are replaced by4
each."12"
– the entire word"substitution"
(12 letters) is replaced by12
."substitution"
– no letters replaced (the word itself).
The task: Given a string word
and an abbreviation abbr
, determine if abbr
is a valid abbreviation of word
. In other words, if we expand the abbreviation abbr
into its full form (replacing numbers with the actual letters they represent in word
), does it match exactly the original word
?
Examples:
-
Input:
word = "internationalization"
,abbr = "i12iz4n"
Output:true
Explanation: The abbreviation can be expanded as "i" + 12 letters + "iz" + 4 letters + "n", which reconstructs"internationalization"
. For instance,"i12iz4n"
corresponds to"i nternational iz atio n"
(skip the 12 letters"nternational"
and skip 4 letters"atio"
in the word). -
Input:
word = "apple"
,abbr = "a2e"
Output:false
Explanation:"a2e"
would expand to"appe"
(the"2"
skips"pp"
) which is missing the last letter"l"
. Since"appe" != "apple"
, the abbreviation is invalid). -
Input:
word = "apple"
,abbr = "5"
Output:true
Explanation: The abbreviation"5"
means skipping all 5 letters of"apple"
. It correctly represents the entire word (replacing"apple"
with its length5
).
Constraints:
1 <= word.length <= 20
(the original word has length up to 20).1 <= abbr.length <= 10
(the abbreviation has length up to 10).word
consists of lowercase English letters.abbr
consists of lowercase letters and digits.- Any number in
abbr
will fit in a 32-bit integer. (Also, abbreviation numbers cannot have leading zeros.)
Hints
- Hint 1: Think about how an abbreviation is interpreted. A digit (or sequence of digits) in
abbr
represents the count of letters to skip in the original word. For example,"a2b"
means "a then skip 2 letters then b". - Hint 2: Use two pointers to traverse the strings: one pointer for
word
and one forabbr
. Move them step by step, synchronizing the position in the word with the abbreviation. - Hint 3: When you encounter a letter in
abbr
, it should match the letter at the current position inword
. When you encounter a digit, you’ll need to convert the sequence of digit(s) into a number and advance the word-pointer by that many positions. - Hint 4: Be careful to handle multi-digit numbers. For instance, the abbreviation
"i18n"
for"internationalization"
has the number 18, not1
and8
separately. You may need to accumulate digits to form the full number. - Hint 5: Watch out for edge cases: a number segment in
abbr
should never start with'0'
(no leading zeros allowed), and the abbreviation should never skip more characters than the word has left.
Approach 1: Brute Force – Generate All Abbreviations
Idea: One straightforward (but not efficient) approach is to generate every possible valid abbreviation of the given word
and then check if abbr
is among them. This leverages the definition of abbreviation: for each letter in the word, we have a choice to either keep it or abbreviate (omit) it as part of a number count.
We can use backtracking or recursion to generate all abbreviations:
- At each position in
word
, decide whether to abbreviate that character or not. - If we abbreviate, we accumulate a count of how many characters we’ve skipped; if we reach the end of a skip sequence or decide to stop skipping, we append the number to the abbreviation string.
- If we do not abbreviate a character, we append the actual character to the abbreviation string and reset the skip count.
Example (Brute Force Concept): For word = "cat"
:
- Possible abbreviations include
"cat"
(no abbreviation),"c1t"
(abbreviate the second letter"a"
as1
),"1at"
(abbreviate the first letter"c"
),"2t"
(abbreviate"ca"
as2
),"c2"
(abbreviate"at"
), and"3"
(abbreviate the whole word as3
). We would generate all such combinations and then simply check if the givenabbr
matches one of them.
Complexity: This brute force approach is exponential in the length of word
. If n = word.length
, there are 2^n possible abbreviations in the worst case (each character either kept or abbreviated) – for n = 20
this could be about 1,048,576 possibilities. Generating and checking all of them is impractical for larger n
. Given the constraints here (word length ≤ 20), it might just be barely feasible, but it’s not efficient and not the intended solution. It also uses a lot of extra space to store the list of abbreviations (exponential space complexity).
Conclusion: We mention this approach for completeness (to understand the problem space), but we should look for a more direct and efficient method.
Approach 2: Two-Pointer Simulation (Optimized Solution)
The optimal solution is to simulate the abbreviation matching process in one pass using two pointers (or indices), one for the word
and one for abbr
. This approach will verify the abbreviation without explicitly constructing it or generating all possibilities. The idea is to iterate through abbr
and word
simultaneously and ensure they align correctly.
Key steps:
- Initialize two indices:
i = 0
forword
andj = 0
forabbr
. - Loop while
i < len(word)
andj < len(abbr)
:
a. Ifabbr[j]
is a letter:- It must match
word[i]
. Ifabbr[j] != word[i]
, then the abbreviation is invalid (return False). - If it matches, advance both pointers:
i++
(move to next letter in the word) andj++
(move to next character in the abbreviation).
b. Ifabbr[j]
is a digit: - If the digit is
'0'
, it's not allowed to be a leading zero in a number. (E.g.,"02"
is invalid for representing 2.) Return False immediately in this case. - Otherwise, compute the full number starting at this position in
abbr
. There might be multiple digit characters in a row forming a multi-digit number (e.g.,"12"
). Parse them as an integer. For example, ifabbr[j:j+2] = "12"
, interpret it as the number 12, not1
and then2
. - Move the
j
pointer forward past all digit characters that were part of this number. - Increase the
i
pointer by the value of the number to skip that many characters inword
. Essentially, we are saying "this number in the abbreviation represents X letters of the word that are abbreviated."
- It must match
- After the loop, we have consumed some or all of the characters from
word
andabbr
. For the abbreviation to be valid, both the entireword
and the entireabbr
should be consumed exactly by this process. This means:i
should now be atlen(word)
(we've matched or skipped all letters of the word), andj
should be atlen(abbr)
(we've processed the whole abbreviation).
If either of these pointers is not at the end, it means the abbreviation didn’t perfectly cover the word (either too short or too long), so return False. Otherwise, return True.
This two-pointer simulation runs in linear time, scanning each character of abbr
and word
at most once or a constant number of times, giving O(n + m) time complexity (with n = len(word)
, m = len(abbr)
). The space complexity is O(1), as we only use a few integer variables for indexing and parsing.
Why this works:
By reading through the abbreviation and advancing the index in the original word accordingly, we mimic the expansion of the abbreviation in-place. If at any point a letter doesn't match or a skip count goes beyond the word's length, we catch the discrepancy. If we finish and both indices align at the ends of their strings, it’s a perfect match.
Example Walkthrough (Valid case):
Consider word = "substitution"
(length 12) and abbr = "s10n"
. We will step through the algorithm:
- Initialize
word_ptr = 0
(pointing at s in "substitution") andabbr_ptr = 0
(pointing at s in "s10n").abbr[0]
='s'
(letter). Compare withword[0]
='s'
. They match.
Action: advance both pointers →word_ptr = 1
(now at u),abbr_ptr = 1
(now at 1).abbr[1]
='1'
(digit). This could be part of a number. Actually,abbr
has"10"
starting at position 1.- No leading zero issue (
'1'
is not'0'
), so continue to parse the number. - Read the digit(s):
"10"
forms the number 10. - Advance
abbr_ptr
past the digits: it was at index 1, it will move to index 3 (right after"10"
in the abbreviation). - Advance
word_ptr
by 10 positions: it was at index 1, soword_ptr = 1 + 10 = 11
. (We skipped the 10 letters"ubstitutio"
in the word.)
- No leading zero issue (
- Now,
word_ptr = 11
(pointing at n, which is the 12th character in "substitution"), andabbr_ptr = 3
(pointing at n in "s10n").abbr[3]
='n'
(letter). Compare withword[11]
='n'
. They match.
Action: advance both pointers →word_ptr = 12
,abbr_ptr = 4
.
- Now
word_ptr = 12
, which equalslen(word)
(we've reached end of the word), andabbr_ptr = 4
, which equalslen(abbr)
(reached end of abbreviation). Both strings are fully consumed with a consistent match. Result: Valid abbreviation (True
).
Notice how "s10n"
correctly abbreviated the word by skipping the middle 10 letters `"ubstitutio". We never explicitly built the expanded string "substitution", but the two-pointer method verified the match segment by segment.
Example Walkthrough (Invalid case):
Now consider word = "apple"
and abbr = "a2e"
, which we expect to be invalid (as shown in the examples):
word_ptr = 0
( a ),abbr_ptr = 0
( a ).abbr[0] = 'a'
(letter). Compare withword[0] = 'a'
. They match.
Action:word_ptr = 1
(pointing at p),abbr_ptr = 1
(pointing at 2).abbr[1] = '2'
(digit).- It's not
'0'
, so parse the number. The abbreviation has a single digit '2' here (which is the number 2). - Advance
abbr_ptr
past the digit →abbr_ptr = 2
. - Advance
word_ptr
by 2 →word_ptr = 1 + 2 = 3
. (We skip 2 characters in "apple": we skipped"pp"
which are indices 1 and 2.)
- It's not
- Now,
word_ptr = 3
(pointing at l, the 4th character in "apple"), andabbr_ptr = 2
(pointing at e in "a2e").abbr[2] = 'e'
(letter). Compare withword[3] = 'l'
. They do not match. At this point, we can already conclude the abbreviation is invalid. We expected the abbreviation to have an'l'
here to match the word, or the skip count was wrong.
- The algorithm would return False due to the mismatch. Indeed, after expanding
"a2e"
, we got"appe"
, which doesn’t equal `"apple".
This two-pointer approach cleanly handles the logic in a single pass through the strings, checking each part of the abbreviation against the word.
Code Implementation (Python)
Code Implementation (Java)
Both implementations follow the same logic. We iterate through abbr
, handling digit sequences and letters. If at any point a rule is violated (mismatch or invalid format), we return false
. If we successfully reach the end of both strings with everything matching up, we return true
.
Complexity Analysis
-
Brute Force Approach: Generating all abbreviations has a time complexity of O(2^N * N) in the worst case (because there are 2^N possible abbreviations for a word of length N, and handling each could take up to O(N) to construct or compare). For N = 20 (max length), 2^20 ≈ 1e6, which is very large. The space complexity is also exponential, as storing all abbreviations would take a huge amount of memory. This approach is not efficient for even moderate N and is mostly theoretical for this problem.
-
Two-Pointer Simulation: The optimized solution runs in O(N + M) time, where N is the length of
word
and M is the length ofabbr. We simply scan through each string once (each character of
wordand
abbr` is processed at most one time in the loop). The space complexity is O(1) (constant extra space because we only use a few pointers/counters regardless of input size. This is very efficient given the constraints (and would remain efficient even if the input sizes were larger).
In this specific problem, N ≤ 20 and M ≤ 10, so the algorithm is extremely fast. Even if N and M were larger (say in the thousands), the linear scan would be quite scalable.
Step-by-Step Solution Walkthrough with Examples
Let's visually walk through how the algorithm works on a couple of examples, illustrating pointer movements and checks at each step:
Example: word = "internationalization"
, abbr = "i12iz4n"
.
This abbreviation has multiple parts (letters and numbers). We expect this to be valid (as given in Example 1).
- Start with
word_ptr = 0
(pointing at i in the word) andabbr_ptr = 0
(pointing at i in the abbreviation).-
Step 1:
abbr[0] = 'i'
(a letter). Compare withword[0] = 'i'
. They match. Move both pointers:word_ptr → 1
(now at n),abbr_ptr → 1
(now at 1 in "12"). -
Step 2:
abbr[1] = '1'
(digit). We detect a number starting here. Parse the full number:"12"
which equals 12. Moveabbr_ptr
past the digits to index 3 (after"12"
). Moveword_ptr
forward by 12 positions: from 1 to 13. (We skipped the 12 letters"nternationaliz"
in the word.) -
Step 3: Now
word_ptr = 13
(pointing at i – the 14th character in the word "internationalization"), andabbr_ptr = 3
(pointing at i in "iz4n").abbr[3] = 'i'
(letter) should matchword[13] = 'i'
. It matches. Move both:word_ptr → 14
(now at z),abbr_ptr → 4
(now at z in "iz4n"). -
Step 4:
abbr[4] = 'z'
(letter). Compare withword[14] = 'z'
. Match! Move both:word_ptr → 15
(now at a),abbr_ptr → 5
(now at 4 in "4n"). -
Step 5:
abbr[5] = '4'
(digit). Parse number "4". Moveabbr_ptr
past it to 6 (after the "4"). Advanceword_ptr
by 4: from 15 to 19. (We skipped the 4 letters"atio"
in the word.) -
Step 6: Now
word_ptr = 19
(pointing at n, the last character of the word), andabbr_ptr = 6
(pointing at n, the last char of "i12iz4n").abbr[6] = 'n'
vsword[19] = 'n'
– match. Move both:word_ptr → 20
,abbr_ptr → 7
. -
End state:
word_ptr = 20
which islen(word)
, andabbr_ptr = 7
which islen(abbr)
. Both ended together, so the abbreviation is confirmed valid.
-
Throughout this process, every part of "i12iz4n"
correctly corresponded to segments of "internationalization"
. We effectively checked that "internationalization"
can be segmented as "i [12 letters] iz [4 letters] n"
, which it can.
Example: word = "apple"
, abbr = "a5"
(consider a case where abbreviation might overshoot).
We expect this to be invalid because "a5"
would mean "a" + 5 letters, but "apple" has only 5 letters in total. The abbreviation tries to represent 6 letters (1 letter 'a' + 5 skipped).
word_ptr = 0
(a),abbr_ptr = 0
(a).abbr[0] = 'a'
vsword[0] = 'a'
: match. Advanceword_ptr → 1
(now at p),abbr_ptr → 1
(now at 5).abbr[1] = '5'
(digit). Parse number = 5. Advanceabbr_ptr
to 2 (end of abbreviation), advanceword_ptr
by 5 → from 1 to 6.
- Now
abbr_ptr = 2
(end of abbr),word_ptr = 6
. Butlen(word) = 5
, soword_ptr
has moved beyond the end of the word (index 6 is out of bounds). This means the abbreviation tried to skip more characters than the word had available. In the algorithm, after the loop ends, we checkword_ptr == len(word)
. Hereword_ptr == 6
andlen(word) == 5
, so they are not equal. The condition fails, and we return False. The abbreviation"a5"
is invalid for"apple"
because it implies the word should have 6 characters (one matching 'a' and five more skipped) when it actually has only 5.
Through these examples, you can see how the two-pointer method faithfully reproduces the abbreviation-checking logic: matching letters and skipping ahead for numbers, and catching any inconsistencies.
Common Mistakes and Edge Cases
When implementing this solution, watch out for these common pitfalls:
-
Leading Zeros in Numbers: If the abbreviation contains a number with a leading zero (e.g.
"01"
or just"0"
), it should be immediately deemed invalid. A leading zero means the abbreviation is not properly formatted. In our algorithm, we handle this by checking if a digit segment starts with '0'. For example,word="substitution"
,abbr="s010n"
should return false due to"010"
having a leading zer. -
Skipping Too Far: If a number in
abbr
causes the word pointer to advance beyond the end ofword
(or not land exactly on the end when the abbreviation is finished), that's a mistake. Always ensure after processing the entire abbreviation that the word pointer is also at the end of the word. For instance, in the"a5"
vs"apple"
example above, not checking the end condition would miss the error of overshooting the word length. -
Not Handling Multi-digit Numbers Properly: When multiple digits occur in a row in
abbr
, make sure you combine them to form the full number. A common bug is to only take one digit at a time. For example, ifabbr = "a12b"
, you need to recognize the number is 12 (not 1 and then 2 separately). Forgetting to multiply the existing number by 10 before adding a new digit is a typical error when parsing numbers (e.g., constructing 12 as1*10 + 2
. -
Missing Pointer Increments: Be careful to increment the abbreviation pointer (
j
) correctly after processing a number. In the structure above, we advancej
inside the inner loop that parses the number. If you forget to advancej
or double-advance it, the loop can get stuck or skip characters. Similarly, ensure the word pointer (i
) moves forward by exactly the number of characters indicated. -
Checking Final State: Another mistake is forgetting to verify that both the word and abbreviation are fully consumed at the end. Even if all character checks passed, if one string has remaining characters, the abbreviation is not a perfect match. Always include the final condition
i == len(word) and j == len(abbr)
before returning true. -
Edge Cases: Consider minimal inputs and edge cases:
- The shortest word and abbreviation (e.g.,
word = "a"
,abbr = "1"
should return true, since "1" abbreviates the single letter). - A word with no abbreviation (abbr is exactly the same as word) should return true.
- An abbreviation that is purely numeric (e.g.,
word = "hello"
,abbr = "5"
) should return true if the number equals the word’s length, or false otherwise. - An abbreviation that is longer than the word (e.g.,
word = "hi"
,abbr = "1i1"
– here the abbreviation length is 3, which might still be valid actually "1i1" -> skip 1 (skip 'h'), then 'i', then skip 1 (skip nothing because word had only 2 letters)? Actually "1i1" on "hi" would skip 'h', match 'i', then try to skip 1 more letter which doesn't exist, so false). Always test scenarios where the abbreviation has extra numbers or letters that don’t align with the word length.
- The shortest word and abbreviation (e.g.,
Being mindful of these issues will help avoid bugs when coding the solution.
Alternative Variations
The problem as stated is about validating a single abbreviation against a single word. Some variations or related scenarios could require adjustments in approach:
-
Multiple Abbreviations or Words: If we had to check abbreviations for multiple words (e.g., a list of word-abbreviation pairs), the approach would remain the same for each pair, just repeated. There’s no significant change except wrapping the check in a loop.
-
Generating an Abbreviated Form: A different task might be to create an abbreviation for a given word (for example, producing the shortest or a standardized abbreviation). In that case, you might decide on a specific scheme (like always keep the first and last letter and replace the middle with the count, e.g., "internationalization" -> "i18n"). The logic would be different because you'd be constructing an abbreviation rather than validating one.
-
Allowing Adjacent Number Abbreviations: The definition we used does not allow adjacent abbreviated substrings (you wouldn't have two numbers back-to-back in a valid abbreviation, since you could combine them into one number). If a variation of the problem did allow something like that (treating them as separate skip commands), the validation logic would need to handle multiple number segments in a row. Our current solution inherently treats consecutive digits as one number, which is actually what we want for the given problem (so that "12" is one skip of 12, not two skips of 1 and 2). A different interpretation could complicate the logic.
-
Reconstructing the Word from Abbreviation: A twist on the problem could give you
abbr
and ask you to reconstruct the originalword
(assuming you had additional information like the length or some constraints). This would be an inverse problem. Without knowing the word, however, an abbreviation alone might be ambiguous (many different words could fit the same abbreviation pattern if letters are not given). Typically, you'd need the original word to validate an abbreviation uniquely.
Overall, the core logic of interpreting numbers as skips and letters as literal matches remains central to any such variation.
Related Problems for Further Practice
Here are a few related LeetCode problems that involve abbreviations or similar string manipulation concepts:
-
Generalized Abbreviation (LeetCode 320) – Generate all possible abbreviations of a given word. This is essentially the brute-force generation approach discussed, and it requires exploring all combinations of keeping or abbreviating characters. It’s a good exercise in backtracking (the result is a list of abbreviations).
-
Unique Word Abbreviation (LeetCode 288) – Design a class that is initialized with a dictionary of words and can check if an abbreviation is unique (not shared by other words in the dictionary). This involves hashing or storing abbreviations for quick lookup and handling duplicates.
-
Minimum Unique Word Abbreviation (LeetCode 411) – (Advanced) Given a target word and a list of words that cannot be abbreviated the same way, find the shortest abbreviation of the target word that is unique. This is a more complex problem that combines the ideas of generating abbreviations and pruning those that conflict with a given set.
GET YOUR FREE
Coding Questions Catalog
