408. Valid Word Abbreviation - Detailed Explanation

Free Coding Questions Catalog
Boost your coding skills with our essential coding questions catalog. Take a step towards a better tech career now!

Problem Statement

Definition: An abbreviation of a string replaces one or more non-adjacent, non-empty substrings with their respective lengths (written as digits). The length numbers cannot have leading zeros. For example, the word "substitution" can be abbreviated in many ways:

  • "s10n" – here "ubstitutio" (10 letters) is replaced by 10.
  • "sub4u4" – here "stit" (4 letters) and "tion" (4 letters) are replaced by 4 each.
  • "12" – the entire word "substitution" (12 letters) is replaced by 12.
  • "substitution" – no letters replaced (the word itself).

The task: Given a string word and an abbreviation abbr, determine if abbr is a valid abbreviation of word. In other words, if we expand the abbreviation abbr into its full form (replacing numbers with the actual letters they represent in word), does it match exactly the original word?

Examples:

  1. Input: word = "internationalization", abbr = "i12iz4n"
    Output: true
    Explanation: The abbreviation can be expanded as "i" + 12 letters + "iz" + 4 letters + "n", which reconstructs "internationalization". For instance, "i12iz4n" corresponds to "i nternational iz atio n" (skip the 12 letters "nternational" and skip 4 letters "atio" in the word).

  2. Input: word = "apple", abbr = "a2e"
    Output: false
    Explanation: "a2e" would expand to "appe" (the "2" skips "pp") which is missing the last letter "l". Since "appe" != "apple", the abbreviation is invalid).

  3. Input: word = "apple", abbr = "5"
    Output: true
    Explanation: The abbreviation "5" means skipping all 5 letters of "apple". It correctly represents the entire word (replacing "apple" with its length 5).

Constraints:

  • 1 <= word.length <= 20 (the original word has length up to 20).
  • 1 <= abbr.length <= 10 (the abbreviation has length up to 10).
  • word consists of lowercase English letters.
  • abbr consists of lowercase letters and digits.
  • Any number in abbr will fit in a 32-bit integer. (Also, abbreviation numbers cannot have leading zeros.)

Hints

  • Hint 1: Think about how an abbreviation is interpreted. A digit (or sequence of digits) in abbr represents the count of letters to skip in the original word. For example, "a2b" means "a then skip 2 letters then b".
  • Hint 2: Use two pointers to traverse the strings: one pointer for word and one for abbr. Move them step by step, synchronizing the position in the word with the abbreviation.
  • Hint 3: When you encounter a letter in abbr, it should match the letter at the current position in word. When you encounter a digit, you’ll need to convert the sequence of digit(s) into a number and advance the word-pointer by that many positions.
  • Hint 4: Be careful to handle multi-digit numbers. For instance, the abbreviation "i18n" for "internationalization" has the number 18, not 1 and 8 separately. You may need to accumulate digits to form the full number.
  • Hint 5: Watch out for edge cases: a number segment in abbr should never start with '0' (no leading zeros allowed), and the abbreviation should never skip more characters than the word has left.

Approach 1: Brute Force – Generate All Abbreviations

Idea: One straightforward (but not efficient) approach is to generate every possible valid abbreviation of the given word and then check if abbr is among them. This leverages the definition of abbreviation: for each letter in the word, we have a choice to either keep it or abbreviate (omit) it as part of a number count.

We can use backtracking or recursion to generate all abbreviations:

  • At each position in word, decide whether to abbreviate that character or not.
  • If we abbreviate, we accumulate a count of how many characters we’ve skipped; if we reach the end of a skip sequence or decide to stop skipping, we append the number to the abbreviation string.
  • If we do not abbreviate a character, we append the actual character to the abbreviation string and reset the skip count.

Example (Brute Force Concept): For word = "cat":

  • Possible abbreviations include "cat" (no abbreviation), "c1t" (abbreviate the second letter "a" as 1), "1at" (abbreviate the first letter "c"), "2t" (abbreviate "ca" as 2), "c2" (abbreviate "at"), and "3" (abbreviate the whole word as 3). We would generate all such combinations and then simply check if the given abbr matches one of them.

Complexity: This brute force approach is exponential in the length of word. If n = word.length, there are 2^n possible abbreviations in the worst case (each character either kept or abbreviated) – for n = 20 this could be about 1,048,576 possibilities. Generating and checking all of them is impractical for larger n. Given the constraints here (word length ≤ 20), it might just be barely feasible, but it’s not efficient and not the intended solution. It also uses a lot of extra space to store the list of abbreviations (exponential space complexity).

Conclusion: We mention this approach for completeness (to understand the problem space), but we should look for a more direct and efficient method.

Approach 2: Two-Pointer Simulation (Optimized Solution)

The optimal solution is to simulate the abbreviation matching process in one pass using two pointers (or indices), one for the word and one for abbr. This approach will verify the abbreviation without explicitly constructing it or generating all possibilities. The idea is to iterate through abbr and word simultaneously and ensure they align correctly.

Key steps:

  1. Initialize two indices: i = 0 for word and j = 0 for abbr.
  2. Loop while i < len(word) and j < len(abbr):
    a. If abbr[j] is a letter:
    • It must match word[i]. If abbr[j] != word[i], then the abbreviation is invalid (return False).
    • If it matches, advance both pointers: i++ (move to next letter in the word) and j++ (move to next character in the abbreviation).
      b. If abbr[j] is a digit:
    • If the digit is '0', it's not allowed to be a leading zero in a number. (E.g., "02" is invalid for representing 2.) Return False immediately in this case.
    • Otherwise, compute the full number starting at this position in abbr. There might be multiple digit characters in a row forming a multi-digit number (e.g., "12"). Parse them as an integer. For example, if abbr[j:j+2] = "12", interpret it as the number 12, not 1 and then 2.
    • Move the j pointer forward past all digit characters that were part of this number.
    • Increase the i pointer by the value of the number to skip that many characters in word. Essentially, we are saying "this number in the abbreviation represents X letters of the word that are abbreviated."
  3. After the loop, we have consumed some or all of the characters from word and abbr. For the abbreviation to be valid, both the entire word and the entire abbr should be consumed exactly by this process. This means:
    • i should now be at len(word) (we've matched or skipped all letters of the word), and
    • j should be at len(abbr) (we've processed the whole abbreviation).
      If either of these pointers is not at the end, it means the abbreviation didn’t perfectly cover the word (either too short or too long), so return False. Otherwise, return True.

This two-pointer simulation runs in linear time, scanning each character of abbr and word at most once or a constant number of times, giving O(n + m) time complexity (with n = len(word), m = len(abbr)). The space complexity is O(1), as we only use a few integer variables for indexing and parsing.

Why this works:

By reading through the abbreviation and advancing the index in the original word accordingly, we mimic the expansion of the abbreviation in-place. If at any point a letter doesn't match or a skip count goes beyond the word's length, we catch the discrepancy. If we finish and both indices align at the ends of their strings, it’s a perfect match.

Example Walkthrough (Valid case):
Consider word = "substitution" (length 12) and abbr = "s10n". We will step through the algorithm:

  • Initialize word_ptr = 0 (pointing at s in "substitution") and abbr_ptr = 0 (pointing at s in "s10n").
    1. abbr[0] = 's' (letter). Compare with word[0] = 's'. They match.
      Action: advance both pointers → word_ptr = 1 (now at u), abbr_ptr = 1 (now at 1).
    2. abbr[1] = '1' (digit). This could be part of a number. Actually, abbr has "10" starting at position 1.
      • No leading zero issue ('1' is not '0'), so continue to parse the number.
      • Read the digit(s): "10" forms the number 10.
      • Advance abbr_ptr past the digits: it was at index 1, it will move to index 3 (right after "10" in the abbreviation).
      • Advance word_ptr by 10 positions: it was at index 1, so word_ptr = 1 + 10 = 11. (We skipped the 10 letters "ubstitutio" in the word.)
    3. Now, word_ptr = 11 (pointing at n, which is the 12th character in "substitution"), and abbr_ptr = 3 (pointing at n in "s10n").
      • abbr[3] = 'n' (letter). Compare with word[11] = 'n'. They match.
        Action: advance both pointers → word_ptr = 12, abbr_ptr = 4.
    4. Now word_ptr = 12, which equals len(word) (we've reached end of the word), and abbr_ptr = 4, which equals len(abbr) (reached end of abbreviation). Both strings are fully consumed with a consistent match. Result: Valid abbreviation (True).

Notice how "s10n" correctly abbreviated the word by skipping the middle 10 letters `"ubstitutio". We never explicitly built the expanded string "substitution", but the two-pointer method verified the match segment by segment.

Example Walkthrough (Invalid case):
Now consider word = "apple" and abbr = "a2e", which we expect to be invalid (as shown in the examples):

  • word_ptr = 0 ( a ), abbr_ptr = 0 ( a ).
    1. abbr[0] = 'a' (letter). Compare with word[0] = 'a'. They match.
      Action: word_ptr = 1 (pointing at p), abbr_ptr = 1 (pointing at 2).
    2. abbr[1] = '2' (digit).
      • It's not '0', so parse the number. The abbreviation has a single digit '2' here (which is the number 2).
      • Advance abbr_ptr past the digit → abbr_ptr = 2.
      • Advance word_ptr by 2 → word_ptr = 1 + 2 = 3. (We skip 2 characters in "apple": we skipped "pp" which are indices 1 and 2.)
    3. Now, word_ptr = 3 (pointing at l, the 4th character in "apple"), and abbr_ptr = 2 (pointing at e in "a2e").
      • abbr[2] = 'e' (letter). Compare with word[3] = 'l'. They do not match. At this point, we can already conclude the abbreviation is invalid. We expected the abbreviation to have an 'l' here to match the word, or the skip count was wrong.
    4. The algorithm would return False due to the mismatch. Indeed, after expanding "a2e", we got "appe", which doesn’t equal `"apple".

This two-pointer approach cleanly handles the logic in a single pass through the strings, checking each part of the abbreviation against the word.

Code Implementation (Python)

Python3
Python3

. . . .

Code Implementation (Java)

Java
Java

. . . .

Both implementations follow the same logic. We iterate through abbr, handling digit sequences and letters. If at any point a rule is violated (mismatch or invalid format), we return false. If we successfully reach the end of both strings with everything matching up, we return true.

Complexity Analysis

  • Brute Force Approach: Generating all abbreviations has a time complexity of O(2^N * N) in the worst case (because there are 2^N possible abbreviations for a word of length N, and handling each could take up to O(N) to construct or compare). For N = 20 (max length), 2^20 ≈ 1e6, which is very large. The space complexity is also exponential, as storing all abbreviations would take a huge amount of memory. This approach is not efficient for even moderate N and is mostly theoretical for this problem.

  • Two-Pointer Simulation: The optimized solution runs in O(N + M) time, where N is the length of word and M is the length of abbr. We simply scan through each string once (each character of wordandabbr` is processed at most one time in the loop). The space complexity is O(1) (constant extra space because we only use a few pointers/counters regardless of input size. This is very efficient given the constraints (and would remain efficient even if the input sizes were larger).

In this specific problem, N ≤ 20 and M ≤ 10, so the algorithm is extremely fast. Even if N and M were larger (say in the thousands), the linear scan would be quite scalable.

Step-by-Step Solution Walkthrough with Examples

Let's visually walk through how the algorithm works on a couple of examples, illustrating pointer movements and checks at each step:

Example: word = "internationalization", abbr = "i12iz4n".
This abbreviation has multiple parts (letters and numbers). We expect this to be valid (as given in Example 1).

  • Start with word_ptr = 0 (pointing at i in the word) and abbr_ptr = 0 (pointing at i in the abbreviation).
    • Step 1: abbr[0] = 'i' (a letter). Compare with word[0] = 'i'. They match. Move both pointers: word_ptr → 1 (now at n), abbr_ptr → 1 (now at 1 in "12").

    • Step 2: abbr[1] = '1' (digit). We detect a number starting here. Parse the full number: "12" which equals 12. Move abbr_ptr past the digits to index 3 (after "12"). Move word_ptr forward by 12 positions: from 1 to 13. (We skipped the 12 letters "nternationaliz" in the word.)

    • Step 3: Now word_ptr = 13 (pointing at i – the 14th character in the word "internationalization"), and abbr_ptr = 3 (pointing at i in "iz4n"). abbr[3] = 'i' (letter) should match word[13] = 'i'. It matches. Move both: word_ptr → 14 (now at z), abbr_ptr → 4 (now at z in "iz4n").

    • Step 4: abbr[4] = 'z' (letter). Compare with word[14] = 'z'. Match! Move both: word_ptr → 15 (now at a), abbr_ptr → 5 (now at 4 in "4n").

    • Step 5: abbr[5] = '4' (digit). Parse number "4". Move abbr_ptr past it to 6 (after the "4"). Advance word_ptr by 4: from 15 to 19. (We skipped the 4 letters "atio" in the word.)

    • Step 6: Now word_ptr = 19 (pointing at n, the last character of the word), and abbr_ptr = 6 (pointing at n, the last char of "i12iz4n"). abbr[6] = 'n' vs word[19] = 'n' – match. Move both: word_ptr → 20, abbr_ptr → 7.

    • End state: word_ptr = 20 which is len(word), and abbr_ptr = 7 which is len(abbr). Both ended together, so the abbreviation is confirmed valid.

Throughout this process, every part of "i12iz4n" correctly corresponded to segments of "internationalization". We effectively checked that "internationalization" can be segmented as "i [12 letters] iz [4 letters] n", which it can.

Example: word = "apple", abbr = "a5" (consider a case where abbreviation might overshoot).
We expect this to be invalid because "a5" would mean "a" + 5 letters, but "apple" has only 5 letters in total. The abbreviation tries to represent 6 letters (1 letter 'a' + 5 skipped).

  • word_ptr = 0 (a), abbr_ptr = 0 (a).
    • abbr[0] = 'a' vs word[0] = 'a': match. Advance word_ptr → 1 (now at p), abbr_ptr → 1 (now at 5).
    • abbr[1] = '5' (digit). Parse number = 5. Advance abbr_ptr to 2 (end of abbreviation), advance word_ptr by 5 → from 1 to 6.
  • Now abbr_ptr = 2 (end of abbr), word_ptr = 6. But len(word) = 5, so word_ptr has moved beyond the end of the word (index 6 is out of bounds). This means the abbreviation tried to skip more characters than the word had available. In the algorithm, after the loop ends, we check word_ptr == len(word). Here word_ptr == 6 and len(word) == 5, so they are not equal. The condition fails, and we return False. The abbreviation "a5" is invalid for "apple" because it implies the word should have 6 characters (one matching 'a' and five more skipped) when it actually has only 5.

Through these examples, you can see how the two-pointer method faithfully reproduces the abbreviation-checking logic: matching letters and skipping ahead for numbers, and catching any inconsistencies.

Common Mistakes and Edge Cases

When implementing this solution, watch out for these common pitfalls:

  • Leading Zeros in Numbers: If the abbreviation contains a number with a leading zero (e.g. "01" or just "0"), it should be immediately deemed invalid. A leading zero means the abbreviation is not properly formatted. In our algorithm, we handle this by checking if a digit segment starts with '0'. For example, word="substitution", abbr="s010n" should return false due to "010" having a leading zer.

  • Skipping Too Far: If a number in abbr causes the word pointer to advance beyond the end of word (or not land exactly on the end when the abbreviation is finished), that's a mistake. Always ensure after processing the entire abbreviation that the word pointer is also at the end of the word. For instance, in the "a5" vs "apple" example above, not checking the end condition would miss the error of overshooting the word length.

  • Not Handling Multi-digit Numbers Properly: When multiple digits occur in a row in abbr, make sure you combine them to form the full number. A common bug is to only take one digit at a time. For example, if abbr = "a12b", you need to recognize the number is 12 (not 1 and then 2 separately). Forgetting to multiply the existing number by 10 before adding a new digit is a typical error when parsing numbers (e.g., constructing 12 as 1*10 + 2.

  • Missing Pointer Increments: Be careful to increment the abbreviation pointer (j) correctly after processing a number. In the structure above, we advance j inside the inner loop that parses the number. If you forget to advance j or double-advance it, the loop can get stuck or skip characters. Similarly, ensure the word pointer (i) moves forward by exactly the number of characters indicated.

  • Checking Final State: Another mistake is forgetting to verify that both the word and abbreviation are fully consumed at the end. Even if all character checks passed, if one string has remaining characters, the abbreviation is not a perfect match. Always include the final condition i == len(word) and j == len(abbr) before returning true.

  • Edge Cases: Consider minimal inputs and edge cases:

    • The shortest word and abbreviation (e.g., word = "a", abbr = "1" should return true, since "1" abbreviates the single letter).
    • A word with no abbreviation (abbr is exactly the same as word) should return true.
    • An abbreviation that is purely numeric (e.g., word = "hello", abbr = "5") should return true if the number equals the word’s length, or false otherwise.
    • An abbreviation that is longer than the word (e.g., word = "hi", abbr = "1i1" – here the abbreviation length is 3, which might still be valid actually "1i1" -> skip 1 (skip 'h'), then 'i', then skip 1 (skip nothing because word had only 2 letters)? Actually "1i1" on "hi" would skip 'h', match 'i', then try to skip 1 more letter which doesn't exist, so false). Always test scenarios where the abbreviation has extra numbers or letters that don’t align with the word length.

Being mindful of these issues will help avoid bugs when coding the solution.

Alternative Variations

The problem as stated is about validating a single abbreviation against a single word. Some variations or related scenarios could require adjustments in approach:

  • Multiple Abbreviations or Words: If we had to check abbreviations for multiple words (e.g., a list of word-abbreviation pairs), the approach would remain the same for each pair, just repeated. There’s no significant change except wrapping the check in a loop.

  • Generating an Abbreviated Form: A different task might be to create an abbreviation for a given word (for example, producing the shortest or a standardized abbreviation). In that case, you might decide on a specific scheme (like always keep the first and last letter and replace the middle with the count, e.g., "internationalization" -> "i18n"). The logic would be different because you'd be constructing an abbreviation rather than validating one.

  • Allowing Adjacent Number Abbreviations: The definition we used does not allow adjacent abbreviated substrings (you wouldn't have two numbers back-to-back in a valid abbreviation, since you could combine them into one number). If a variation of the problem did allow something like that (treating them as separate skip commands), the validation logic would need to handle multiple number segments in a row. Our current solution inherently treats consecutive digits as one number, which is actually what we want for the given problem (so that "12" is one skip of 12, not two skips of 1 and 2). A different interpretation could complicate the logic.

  • Reconstructing the Word from Abbreviation: A twist on the problem could give you abbr and ask you to reconstruct the original word (assuming you had additional information like the length or some constraints). This would be an inverse problem. Without knowing the word, however, an abbreviation alone might be ambiguous (many different words could fit the same abbreviation pattern if letters are not given). Typically, you'd need the original word to validate an abbreviation uniquely.

Overall, the core logic of interpreting numbers as skips and letters as literal matches remains central to any such variation.

Here are a few related LeetCode problems that involve abbreviations or similar string manipulation concepts:

  • Generalized Abbreviation (LeetCode 320) – Generate all possible abbreviations of a given word. This is essentially the brute-force generation approach discussed, and it requires exploring all combinations of keeping or abbreviating characters. It’s a good exercise in backtracking (the result is a list of abbreviations).

  • Unique Word Abbreviation (LeetCode 288) – Design a class that is initialized with a dictionary of words and can check if an abbreviation is unique (not shared by other words in the dictionary). This involves hashing or storing abbreviations for quick lookup and handling duplicates.

  • Minimum Unique Word Abbreviation (LeetCode 411) – (Advanced) Given a target word and a list of words that cannot be abbreviated the same way, find the shortest abbreviation of the target word that is unique. This is a more complex problem that combines the ideas of generating abbreviations and pruning those that conflict with a given set.

TAGS
LeetCode
CONTRIBUTOR
Design Gurus Team
-

GET YOUR FREE

Coding Questions Catalog

Design Gurus Newsletter - Latest from our Blog
Boost your coding skills with our essential coding questions catalog.
Take a step towards a better tech career now!
Explore Answers
1813. Sentence Similarity III - Detailed Explanation
Learn to solve Leetcode 1813. Sentence Similarity III with multiple approaches.
TikTok System Design Interview Questions – Key Concepts and Architecture Insights
Master TikTok System Design Interview Questions: Essential Strategies for Scalable Video, Real-Time Streaming, and Data-Driven Architectures
How do I practice DSA?
Related Courses
Image
Grokking the Coding Interview: Patterns for Coding Questions
Grokking the Coding Interview Patterns in Java, Python, JS, C++, C#, and Go. The most comprehensive course with 476 Lessons.
Image
Grokking Modern AI Fundamentals
Master the fundamentals of AI today to lead the tech revolution of tomorrow.
Image
Grokking Data Structures & Algorithms for Coding Interviews
Unlock Coding Interview Success: Dive Deep into Data Structures and Algorithms.
Image
One-Stop Portal For Tech Interviews.
Copyright © 2025 Design Gurus, LLC. All rights reserved.
;