395. Longest Substring with At Least K Repeating Characters - Detailed Explanation
Problem Statement
Given a string s and an integer k, find the length of the longest substring of s such that the frequency of every character in this substring is at least k. In other words, every character that appears in the substring must appear at least k times.
Example 1
- Input: s = "aaabb", k = 3
- Output: 3
- Explanation: The longest substring that satisfies the condition is "aaa" where the character 'a' appears 3 times.
Example 2
- Input: s = "ababbc", k = 2
- Output: 5
- Explanation: The longest substring is "ababb" where 'a' appears 2 times and 'b' appears 3 times.
Constraints
- s consists only of lowercase English letters.
- 1 ≤ s.length ≤ 10⁴ (or higher in some versions)
- 1 ≤ k ≤ 10⁴
Hints
-
Frequency Check:
First, count the frequency of every character in the string. If every character meets the frequency threshold k, then the entire string is the answer. -
Splitting on Invalid Characters:
If you find a character whose total frequency is less than k, then this character cannot be part of any valid substring. Use it as a delimiter to split the string and solve for each part recursively. -
Recursive Divide and Conquer:
By splitting the string around characters that are “invalid” (appear fewer than k times), you narrow down the problem and search for valid substrings in smaller segments.
Approaches Overview
Approach 1: Brute Force (Inefficient)
- Idea:
Check every possible substring and validate if each character in that substring appears at least k times. - Drawbacks:
With a string of length n, there are O(n²) substrings and checking each substring for frequency takes additional time, leading to an overall inefficient solution for larger inputs.
Approach 2: Divide and Conquer (Recursive Splitting)
- Idea:
- Count the frequency of each character in the current substring.
- If all characters satisfy the condition (frequency ≥ k), return the length of the substring.
- Otherwise, identify a character with frequency less than k. This character cannot be part of any valid solution, so split the string by this character.
- Recursively solve the problem for each resulting substring and return the maximum length.
- Benefits:
This method often prunes many unnecessary checks and can run much faster in practice compared to brute force.
Approach 3: Sliding Window (Optimized for Fixed Unique Count)
- Idea:
Use a sliding window to iterate through the string while maintaining counts for characters and dynamically adjusting the window to satisfy the condition of having a fixed number of unique characters. - Procedure:
- Try for all possible numbers of unique characters (from 1 up to 26).
- Within each iteration, use two pointers to form a window and adjust it while keeping track of the count of characters and the count of characters meeting the k threshold.
- Benefits:
This approach can be very efficient when the alphabet size is limited.
Detailed Step-by-Step Explanation (Divide and Conquer)
-
Count Frequencies:
For the given substring, count how many times each character appears. -
Check Validity:
If every character in the substring has a frequency ≥ k, then the substring is valid. Return its length. -
Find a Splitting Character:
Iterate over the substring and identify a character whose frequency is less than k. This character can never be part of a valid substring. -
Split and Recurse:
Use the identified character as a delimiter to split the substring into smaller segments.
Recursively call the function on each segment to find the maximum valid substring length. -
Combine Results:
The answer is the maximum length found among all segments.
Python Implementation (Divide and Conquer)
Java Implementation (Divide and Conquer)
Complexity Analysis
-
Time Complexity:
In the worst-case scenario, each recursive call splits the string at a character that occurs very infrequently. The divide and conquer approach can approach O(n²) in pathological cases, but on average, it performs much better due to early pruning. -
Space Complexity:
The space used is O(n) for the recursion call stack in the worst case and O(1) extra space for frequency counting (limited to 26 letters).
Step-by-Step Walkthrough and Visual Example
Consider the input s = "ababbc", k = 2:
- Initial Frequency Count:
- 'a': 2, 'b': 3, 'c': 1
- Identify Invalid Character:
- The character 'c' appears only 1 time (< k).
- Split the String:
- Splitting by 'c' gives substrings: "ababb" and "" (the empty string can be ignored).
- Recursive Check on "ababb":
- Frequency count for "ababb": 'a': 2, 'b': 3.
- All characters meet the condition (≥ 2).
- Return length of "ababb", which is 5.
- Final Answer:
- The maximum valid substring length is 5.
Common Mistakes
-
Not Splitting Correctly:
Failing to split the string at the right delimiter leads to checking parts of the string that are already invalid. -
Overlooking Base Cases:
Not handling the case where the string’s length is less than k can lead to incorrect results or infinite recursion. -
Inefficient Frequency Updates:
Recalculating frequencies for overlapping substrings without optimization can slow down the solution.
Edge Cases
-
String Length Less Than k:
Immediately return 0 because no substring can have every character repeated at least k times. -
Entire String Valid:
When every character appears at least k times, the whole string is the answer. -
Multiple Invalid Characters:
Ensure that the recursive splits handle cases where multiple different characters do not meet the frequency requirement.
Alternative Variations
-
Sliding Window Approach:
Instead of recursion, use a sliding window technique for a fixed number of unique characters. This method is particularly effective when the alphabet size is limited and can provide a good alternative solution. -
Optimizing Frequency Recalculation:
In certain scenarios, using additional data structures to cache frequency information for overlapping substrings can further optimize performance.
Related Problems
-
Longest Substring Without Repeating Characters (LeetCode 3):
Focuses on finding the longest substring with all unique characters using the sliding window technique. -
Longest Repeating Character Replacement (LeetCode 424): Uses a sliding window to find the longest substring where characters can be replaced to form a repeating sequence.
GET YOUR FREE
Coding Questions Catalog
