2168. Unique Substrings With Equal Digit Frequency - Detailed Explanation
Problem Statement
Given a digit string s
, we need to find the number of unique substrings of s
where every digit that appears in the substring occurs the same number of times. A substring is a contiguous sequence of characters within the string. The uniqueness requirement means that even if a qualifying substring appears in multiple places, it is only counted once.
Example 1:
Input: s = "1212"
Output: 5
Explanation: The substrings that meet the criteria are "1"
, "2"
, "12"
, "21"
, and "1212"
. Even though "12"
appears twice in different positions, it is counted only once as a unique substring. (Substring "1212"
qualifies because 1
and 2
each appear twice. "121"
does not qualify because 1
appears twice while 2
appears once.)
Example 2:
Input: s = "12321"
Output: 9
Explanation: Qualifying substrings are: "1"
, "2"
, "3"
, "12"
, "23"
, "32"
, "21"
, "123"
, and "321"
. Each of these substrings has all digits occurring with equal frequency. (For instance, "123"
has 1
, 2
, 3
each once. The full string "12321"
is not counted because 1
and 2
appear twice but 3
appears once.)
Example 3:
Input: s = "1122"
Output: 6
Explanation: Qualifying substrings are: "1"
, "2"
, "11"
, "22"
, "12"
, and "1122"
. Each of these has equal frequencies for all digits present. For example, "1122"
has two 1
s and two 2
s. Substrings like "112"
are not counted because 1
appears twice while 2
appears once.
Constraints:
1 <= s.length <= 1000
s
consists only of digits'0'
–'9'
. (No letters or other characters.)
Hints
- Frequency Condition: For a substring to be valid, every digit in that substring must appear the same number of times. What does this imply about a substring’s length and the number of distinct digits in it? (Think about how the length relates to the frequency of each distinct digit.)
- Brute Force Thought: You could try checking every possible substring. How would you verify if a substring meets the condition? Consider counting digits in that substring and seeing if the counts are all equal.
- Optimization Idea: Checking every substring by counting from scratch would be slow. Can you reuse computations? Perhaps using a prefix sum or running frequency tally could allow you to get the count of each digit in any substring quickly, without recounting from scratch each time.
Brute Force Approach
A straightforward approach is to generate all substrings of s
and check each one for the equal-frequency property. This is brute force because there are O(n²) substrings (where n is the length of s
), and checking each substring’s frequencies naively can be costly. Here's how the brute force solution can be outlined:
- Step 1: Enumerate all possible substrings of
s
. Use two nested loops: the outer loop picks a starting indexi
, and the inner loop picks an ending indexj
(withj >= i
). This gives substrings[i...j]
. - Step 2: For each substring, count the frequency of each digit (
0-9
) in that substring. - Step 3: Check if all the digits that appear in this substring have the same frequency. (Digits that do not appear can be ignored.)
- Step 4: Keep a set or other structure to record substrings that satisfy the condition. Using a set ensures we count each unique substring only once, even if it occurs multiple times in
s
. - Step 5: After checking all substrings, the answer is the size of this set of valid substrings.
Step-by-Step Example (Brute Force):
Consider s = "121"
. All substrings of "121"
are:
"1"
(from index 0) – contains only digit1
(frequency 1, trivially equal-frequency)."12"
(from indices 0–1) – contains1
and2
, each appearing once (equal-frequency ✔)."121"
(from indices 0–2) – contains1
twice and2
once (not equal-frequency ✘)."2"
(from index 1) – contains only2
(frequency 1, equal-frequency ✔)."21"
(from indices 1–2) – contains2
and1
, each appearing once (equal-frequency ✔)."1"
(from index 2) – contains1
(frequency 1, equal-frequency ✔).
Unique substrings among these that have equal digit frequency are: "1"
, "2"
, "12"
, and "21"
. The substring "1"
appears in two different places, but it is counted only once as a unique substring. So the output for "121"
would be 4 in this case.
Brute Force Python Implementation:
Brute Force Java Implementation:
Time Complexity: The brute force method generates all substrings and checks each one. There are about n*(n+1)/2
substrings (O(n²)). Counting frequencies in each substring takes up to O(n) time. Therefore, the worst-case time complexity is O(n³). For n = 1000
, this is on the order of 1 billion operations, which would be far too slow in practice.
Space Complexity: In the worst case, the set of unique substrings itself could be very large (up to O(n²) substrings stored). Each substring stored can be of length up to n. In a pessimistic scenario, this could use a lot of memory. The frequency count uses O(10) = O(1) space per substring (just 10 counters for digits). Overall, the dominant space usage is storing substrings in the set, which in the worst case is O(n² * n) characters (though typical inputs won’t hit this extreme). For example, if s
contains all distinct digits in various orders, many substrings might qualify and be unique.
Optimized Approach
The brute force approach is too slow for the upper constraint (n = 1000
). We need to optimize the checking of substrings and avoid redundant work. Here are key observations and improvements for an optimized solution:
- Reuse Frequency Calculations: Instead of counting digit frequencies from scratch for each substring, we can preprocess the string to quickly compute the frequency of any digit in any substring. A common technique is to use prefix sum arrays for frequencies. We create a prefix frequency table where
prefix[i][d]
represents how many times digitd
appears in the substrings[0..i-1]
(from the start up to but not including indexi
).- Using this table, the count of digit
d
in substrings[i..j]
can be obtained in O(1) time ascount_d(i,j) = prefix[j+1][d] - prefix[i][d]
.
- Using this table, the count of digit
- Check Equality Efficiently: With the frequency counts of a substring readily available, we need to check if all non-zero counts are equal. Since there are only 10 possible digits, we can simply scan through the 10 counts and compare. This check is O(10) = O(1) per substring, which is negligible.
- Avoid Duplicate Counting: We still need to avoid counting the same substring content more than once. We will use a set to store an identifier for each valid substring we find. In a simple implementation, we can store the substring itself (as done in brute force). To reduce memory usage, one optimization is to store a hash of the substring instead of the full string. A rolling hash (e.g., polynomial rolling hash) can uniquely identify substrings with very high probability, using much less space. For clarity, our example code will store the substrings, but using a hash set of integers (or a tuple of frequency counts) would be a memory optimization in a real scenario.
Approach Outline:
- Build Prefix Frequency Table: Create a 2D array
prefix
of size(n+1) x 10
(where n is the length ofs
). Initializeprefix[0]
with all zeros (no characters seen yet). For each indexi
in the string (1-indexed in the prefix table), copy the counts fromprefix[i-1]
and then increment the count for the digits[i-1]
. After this step,prefix[i]
will hold the frequencies of all digits ins[0..i-1]
. - Enumerate Substrings (Two Loops): Similar to brute force, use two nested loops to pick a start index
i
and end indexj
for substrings (0 <= i <= j < n
). This is O(n²) possibilities. - Compute Frequencies via Prefix: For a chosen
(i, j)
pair, compute the frequency of each digitd
ins[i..j]
bycount_d = prefix[j+1][d] - prefix[i][d]
. This operation is O(1) for eachd
, so O(10) per substring (constant). - Check Equal Frequency: Check the counts for digits
0-9
. Ignore any digit that has count 0 in this substring (it’s not present). Among the digits that have count > 0, ensure they all have the same value. We can do this by comparing each non-zero count to the first non-zero count found. If any differ, the substring doesn’t qualify. - Record Valid Substrings: If the substring is valid, record it in a set of seen substrings (or record its hash signature). This ensures uniqueness.
- Result: The answer is the size of the set after processing all substrings.
This approach still looks at all O(n²) substrings, but it significantly speeds up the check for each substring by using precomputed information, reducing the per-substring work to constant time. In practice, an O(n²) algorithm with n up to 1000 (and only a small constant factor for checking 10 digits) is efficient enough.
Example to Illustrate Optimization:
Consider s = "1212"
again. The prefix frequency table for this string (showing counts for digits 1
and 2
only, others are 0) would look like:
prefix[0] = [0, 0]
(counts of1
and2
before any character)prefix[1] = [1, 0]
(after reading first char "1": one1
, zero2
s)prefix[2] = [1, 1]
(after reading "12": one1
and one2
)prefix[3] = [2, 1]
(after reading "121": two1
s, one2
)prefix[4] = [2, 2]
(after reading "1212": two1
s, two2
s)
Now, to check any substring quickly:
- e.g. Substring
s[1..2]
which is"21"
: We takeprefix[3] - prefix[1]
. For digit1
:prefix[3][1] - prefix[1][1] = 2 - 1 = 1
. For digit2
:prefix[3][2] - prefix[1][2] = 1 - 0 = 1
. This yields frequencies {1
:1,2
:1}, which are equal. We would mark"21"
as a valid substring without having to recount from scratch.
By using the prefix table, we reused the counting work done during preprocessing for every substring check.
Optimized Python Implementation:
Optimized Java Implementation:
Time Complexity: Building the prefix table takes O(10 * n) = O(n) time. Then, we iterate over all substrings with two loops (≈ n²/2 substrings) and perform a constant amount of work (checking 10 counts) for each. This yields a time complexity of O(n²). For n = 1000
, n² is 1,000,000, and checking 10 digits for each is a 10x multiplier (10,000,000 operations), which is efficient in C++/Java and manageable in Python with optimizations. This is a dramatic improvement over the O(n³) brute force.
Space Complexity: The prefix table uses O(10 * n) space, which is O(n). We also use a set to store seen substrings or their hashes. In the worst case, this set could hold O(n²) substrings (in scenarios where many substrings qualify and are all distinct). Storing actual substrings can be memory heavy (as discussed before). However, using a hash (integer) representation for each substring can reduce per-entry space. In terms of Big-O, the worst-case space remains O(n²) due to potentially many valid substrings. Typically, the distribution of digits will limit how many substrings qualify, and using hashing can mitigate memory usage.
Note: In a real interview or contest setting, you might implement further optimizations:
- Use rolling hash or a tuple of frequency counts as the key in the set to avoid storing large strings.
- Prune some checks by noticing that the length of a valid substring must be divisible by the number of distinct digits in it (since each distinct digit's frequency times the number of distinct digits gives the substring length). This can skip checking some substrings that cannot possibly have equal frequencies. However, such pruning can be complex to implement and the above solution is already efficient enough for the given constraints.
Edge Cases and Common Mistakes
- Single Character String: If
s
has length 1 (e.g.,"5"
), the result should be 1, since the only substring is the string itself and trivially all (single) digit frequencies are equal. Ensure the solution handlesn=1
correctly. - All Characters Same: If
s
consists of the same digit repeated (e.g.,"1111"
or"000"
), then every substring is valid because any substring will have just that one digit. However, many substrings will be duplicates in terms of content. For example,s = "1111"
has substrings"1"
,"11"
,"111"
,"1111"
as the unique ones (any substring of length 2 is"11"
, etc.). A common mistake is to count duplicates separately if not using a set. Always ensure uniqueness by using a set or equivalent. - Ignoring Absent Digits: Only consider digits that appear in the substring when checking frequency equality. Digits that are not present can be ignored. A mistake would be to include zeros in the equality check. For instance, in substring
"11"
, we have two1
s and zero of every other digit. We consider it valid because the only digit present (1
) appears with equal frequency (there’s just one distinct digit). We do not require that0,2,3,...,9
also appear two times (that would be impossible). - Duplicate Substrings: Ensure that if the same qualifying substring appears in multiple places, it is counted once. This is why using a set of seen substrings is crucial. Forgetting this will overcount results. For example, in
"1212"
, the substring"12"
appears at indices [0..1] and [2..3]; without a uniqueness check, one might count it twice. - Performance Pitfalls: A common pitfall is to attempt a brute force solution without any optimization on frequency counting. Computing frequency from scratch for each substring (O(n) per substring) will time out on larger inputs. Using the prefix sum method or maintaining a running count is important. Another performance mistake can be not breaking out of loops early when a inequality is found in frequencies (in our code, we break out as soon as we detect two different counts).
- Off-by-One Errors: When using prefix sum arrays or slicing, it’s easy to get indices wrong. Be careful with the inclusive/exclusive nature of indices. For example, we used
prefix[j+1] - prefix[i]
to get counts for substrings[i..j]
. Forgetting the +1 or usingprefix[j]
instead would lead to incorrect counts.
Alternative Variations
- Letters Instead of Digits: The problem could be generalized to a string with alphabetic characters (e.g., lowercase letters
'a'
-'z'
) instead of digits. The condition would then be that every letter in the substring appears the same number of times. The solution approach would be similar: you would use a frequency array of size 26 for letters (or size 52 if considering both lowercase and uppercase), and the prefix sum technique would still apply. The complexity might increase because there are more possible characters, but for moderate string lengths it would still be manageable. - Mixed Character Types: If the string can contain any characters (letters, digits, symbols), you can extend the approach by using a hashmap or a large frequency array to count characters. The concept of checking equal frequencies remains the same. Using a dynamic structure like a hashmap for frequencies might slow things down a bit, but the overall approach of using prefix sums (perhaps with an index mapping for characters) would still work.
- Exact k Frequency (Fixed count): A twist on this problem is if you were asked to find substrings where each unique character appears exactly k times (for a given k). This is actually a known variant (related to “equal count substrings”). In that case, you would look for substrings of length
m * k
wherem
distinct characters each appear k times. The approach could involve fixingk
and using a sliding window of length that is a multiple of k to count how many valid substrings meet that criterion. - Almost Equal Frequency: Another variation could allow a tolerance, for example substrings where each distinct digit's frequency is either x or x+1. This is a more complex variant, but it shows up in problems like reorganizing strings or making frequencies almost equal. Solving that would require careful counting logic beyond exact equality.
- Subsequence vs Substring: This problem specifically deals with contiguous substrings. If one were to ask for subsequences (not necessarily contiguous) with equal digit frequency, the problem becomes much harder, as combinatorial explosion of subsequences would need different techniques (usually not feasible to brute force for large strings).
In summary, the idea of ensuring equal frequency has broad applications. The approach of counting frequencies and using prefix sums or hashing can be adapted to many of these variations with appropriate adjustments.
Related Problems
GET YOUR FREE
Coding Questions Catalog