362. Design Hit Counter - Detailed Explanation
Problem Statement:
Design a Hit Counter that counts the number of hits received in the past 5 minutes (i.e., past 300 seconds). The system should provide two functions:
hit(timestamp)
: Record a hit that occurs at the giventimestamp
(in seconds).getHits(timestamp)
: Return the number of hits in the past 5 minutes from the giventimestamp
(i.e., count all hits that occurred fromtimestamp-299
totimestamp
, inclusive).
Assume that:
- Timestamps are given in non-decreasing chronological order (each new call’s timestamp is >= previous call’s timestamp).
- The earliest timestamp starts at 1.
- It’s possible to have multiple hits at the same timestamp (multiple calls to
hit(t)
in the same second).
Example 1:
Operations: hit(1)
, hit(2)
, hit(3)
, getHits(4)
, hit(300)
, getHits(300)
, getHits(301)
- After hits at timestamps 1, 2, 3,
getHits(4)
counts hits in [4-299, 4] = [−295, 4], which includes 1, 2, 3. Output: 3. - After a hit at 300,
getHits(300)
counts hits in [300-299, 300] = [1, 300], which includes 1, 2, 3, 300. Output: 4. getHits(301)
counts hits in [301-299, 301] = [2, 301], which includes 2, 3, 300 (hit at 1 is now 300 seconds old and no longer counted). Output: 3.
Example 2:
Operations: hit(100)
(called 3 times), getHits(100)
, getHits(101)
, getHits(400)
-
Three hits at timestamp 100.
-
getHits(100)
counts hits in [100-299, 100] = [−199, 100], which includes all three hits at 100. Output: 3. -
getHits(101)
counts hits in [101-299, 101] = [−198, 101], still includes the three hits at 100 (they occurred within the last 5 minutes). Output: 3. -
getHits(400)
counts hits in [400-299, 400] = [101, 400]. The hits at 100 are now more than 5 minutes old (100 is outside [101,400]), so they are not counted. Output: 0.
Example 3:
Operations: hit(1)
, hit(2)
, hit(3)
, getHits(300)
, getHits(600)
, hit(600)
, getHits(600)
-
Hits at 1, 2, 3.
-
getHits(300)
counts hits in [300-299, 300] = [1, 300], which includes 1, 2, 3. Output: 3. -
getHits(600)
(with no new hits since 3) counts hits in [600-299, 600] = [301, 600]. All hits at 1, 2, 3 are now expired (older than 5 minutes). Output: 0. -
A new hit at 600.
-
getHits(600)
now counts hits in [600-299, 600] = [301, 600] again, but this time it includes the hit at 600. Output: 1.
Constraints & Expectations:
- We need to handle a continuous stream of hits efficiently. Even if there are many hits, the operations should be fast.
- Memory should be managed so that we do not store an unbounded number of past hits unnecessarily (only the last 5 minutes of hits are relevant at any time).
- The solution should handle the case of no hits gracefully (return 0 for
getHits
if there have been no hits in the past 5 minutes).
Hints:
-
Hint 1: Think about the sliding window of time (5 minutes = 300 seconds). How can you keep track of hits that fall into the last 300-second window and discard those that are older? What data structure allows adding new items and removing old items in order? (Consider using a FIFO structure like a queue.)
-
Hint 2: A naive approach would be to record every hit and then on each query count how many fall in the range. However, iterating over all hits for each query is inefficient. Instead, try to update the count incrementally as new hits come in and old hits fall out of the 5-minute window.
-
Hint 3 (Follow-up idea): If there could be an extremely large number of hits per second, storing every single hit might be inefficient. Can you group hits by timestamp to optimize memory and counting? (For example, store pairs of {timestamp, number_of_hits_at_that_timestamp} in your data structure.)
Approaches:
- Brute Force Approach:
-
Idea: Use a list (or array) to record the timestamp of every hit. For
getHits(timestamp)
, iterate through the list and count how many hits fall within the last 300 seconds of the given timestamp. -
Why it's inefficient: If there are
N
hits recorded, each call togetHits
may scan up toN
elements to count relevant hits. In the worst case with many hits, this becomes very slow (linear time per query). As the number of hits grows, the time to answer a query grows too, which is not optimal. Also, if we never remove old hits, the list can grow indefinitely, wasting memory on hits that are no longer needed. -
Walkthrough Example:
Suppose we have hits at times [10, 50, 300, 301] and we call
getHits(301)
. A brute-force method would scan the entire list: it checks 10 (which is older than 301-299=2, so not counted), checks 50 (older than 2, not counted), checks 300 (within [2,301], counted), checks 301 (within [2,301], counted). It would return 2. If we then callgetHits(302)
, it would again scan all hits. As this list grows, scanning all elements for each query is slow.
-
Python Implementation (Brute Force):
Java Implementation (Brute Force):
Complexity Analysis (Brute Force):
-
Time Complexity: Recording a hit is O(1) (just appending to a list). However, each
getHits
call is O(N) in the worst case, where N is the number of hits recorded, because we potentially check each hit. If many queries are made or N is large, this is very slow. -
Space Complexity: O(N) for storing all hit timestamps. This can become large if the system runs for a long time without cleaning up old hits.
- Optimized Approach (Using Queue/Deque):
-
Idea: Use a queue to maintain only the hits from the last 300 seconds. The queue naturally preserves the order of hits (oldest in front, newest at back). For each new
hit(t)
, add the timestamp to the queue. For eachgetHits(t)
, first remove any timestamps from the front of the queue that are older than 5 minutes fromt
(i.e., remove hits with timestamp <=t-300
). The remaining elements in the queue are all hits within the last 300 seconds. The answer is then the length of the queue. -
Why it’s efficient: The queue only holds hits that are still relevant (within the last 5 minutes). Old hits get removed as we process queries, so the queue size is at most 300 seconds worth of hits. In the worst case where there’s at least 1 hit every second, the queue would hold at most 300 entries (or slightly more if multiple per second, unless we group them). Each hit is enqueued once and dequeued at most once, making each operation very efficient on average.
-
Walkthrough Example: Consider the sequence of operations from Example 1:
-
At t=1: call
hit(1)
. Queue = [1]. -
At t=2: call
hit(2)
. Queue = [1, 2]. -
At t=3: call
hit(3)
. Queue = [1, 2, 3]. -
At t=4: call
getHits(4)
. Before counting, remove hits older than 5 minutes from time 4 (remove hits ≤ 4-300 = -296). None are older, so Queue stays [1,2,3]. Count = 3. -
At t=300: call
hit(300)
. Queue = [1, 2, 3, 300]. -
At t=300: call
getHits(300)
. Remove hits older than 5 minutes from time 300 (remove hits ≤ 300-300 = 0). None are ≤0, so Queue stays [1,2,3,300]. Count = 4. -
At t=301: call
getHits(301)
. Remove hits older than 5 minutes from time 301 (remove hits ≤ 301-300 = 1). This removes the hit at timestamp 1. Now Queue = [2, 3, 300]. Count = 3.
This matches the expected outputs: 3, 4, and 3 for the getHits calls, respectively. Notice how the queue automatically drops the oldest hit when it falls out of the 300-second window, and we never iterate over all hits explicitly – we just pop from the front until the oldest timestamp is within the window.
-
-
Python Implementation (Optimized with Queue):
Java Implementation (Optimized with Queue):
Complexity Analysis (Optimized):
- Time Complexity: Both
hit
andgetHits
operate in O(1) time on average. Recording a hit is O(1). ForgetHits
, removing expired hits is O(k) where k is the number of hits to remove, but each old hit is removed exactly once in its lifetime. Thus, each hit will be dequeued at most one time, making the amortized time per operation O(1). This is much faster than the brute force approach for large N. - Space Complexity: O(H) where H is the number of hits in the last 5 minutes. In the worst case (if there’s at least one hit every second and we don't group them), H could be up to 300 (or a bit more if multiple per second without grouping). This is effectively O(300) which is a constant upper bound, or O(N) in terms of N hits but with old hits constantly removed. The memory use remains bounded by the 5-minute window of data, which is efficient.
Step-by-Step Walkthrough (Optimized Approach Visualization):
Let's visualize how the optimized queue approach works with a sequence of operations. We will use a queue to store recent hit timestamps and update it as hits come in or as we query the hit count.
Suppose we perform the following operations:
hit(1) -> hit(2) -> hit(3) -> getHits(4) -> hit(300) -> getHits(300) -> getHits(301)
We maintain a queue of timestamps of hits that are within the last 5 minutes of the current time:
-
After
hit(1)
: Queue contains[1]
. (Hit at 1 recorded) -
After
hit(2)
: Queue contains[1, 2]
. (Hits at 1 and 2 recorded in order) -
After
hit(3)
: Queue contains[1, 2, 3]
. (Hits at 1, 2, 3 in the queue) -
On
getHits(4)
: We check for hits older than 5 minutes from time 4. The window is [4-299, 4] = [-295, 4], so no hits are too old (all current hits 1,2,3 are within the last 300 seconds). Queue remains[1, 2, 3]
. The number of hits in the queue is 3, sogetHits(4)
returns 3. -
After
hit(300)
: Queue contains[1, 2, 3, 300]
. (We add the hit at 300; note that 1, 2, 3 are still in queue but 1 is now exactly 299 seconds older than 300) -
On
getHits(300)
: Now check for hits older than 5 minutes from time 300. The window is [300-299, 300] = [1, 300], so we should keep hits that occurred at timestamp ≥1. The hit at1
is exactly at the boundary (300 - 299 = 1, so 1 is 299 seconds older than 300, which is within 5 minutes). Therefore, no hits are removed yet (since we remove hits with timestamp ≤ 0, and none qualify). Queue stays[1, 2, 3, 300]
. The queue size is 4, sogetHits(300)
returns 4. -
On
getHits(301)
: Now consider the window [301-299, 301] = [2, 301]. Any hit with timestamp < 2 is now out of the 5-minute range. The hit at1
is older than 5 minutes from 301 (because 301 - 300 = 1, so anything at 1 or earlier is beyond 300 seconds ago). We remove1
from the queue. Queue becomes[2, 3, 300]
. All remaining hits (2, 3, 300) are within the last 5 minutes (between 2 and 301). The queue size is 3, sogetHits(301)
returns 3.
This step-by-step process shows how we incrementally maintain the count of hits. We never recount from scratch; we just drop off old hits as they expire and add new ones as they come, keeping the queue up to date. This makes each operation efficient.
Common Mistakes:
- Not removing expired hits: A common mistake is to record hits but forget to remove hits that are older than 5 minutes. This can cause
getHits
to return incorrect counts (too high) and also waste memory. Always ensure you discard hits that fall outside the 300-second window when processing a query. - Off-by-one errors in time window: It’s easy to make a mistake with inclusive/exclusive bounds. Remember that if we want the last 5 minutes inclusive, we consider hits with timestamps in the range
[timestamp-299, timestamp]
. In implementation, removing hits withtimestamp <= current_time - 300
is correct, because for a current timet
, anything at timet-300
or earlier is more than 5 minutes old (since att-300
it’s exactly 5 minutes difference, which should be excluded). In our example, a hit at time 1 is counted at time 300 (difference 299s) but not at time 301 (difference 300s). Be careful with these boundary conditions. - Using the wrong data structure: Some might try using structures like a fixed-size array or a hash map without fully understanding how to evict old entries. A queue is conceptually simpler for this problem. If using an array of size 300 (indexed by second modulo 300), one might forget to reset counts when reusing an index or check if an entry is stale (the timestamp associated with that index is not within the current window). Ensure that if you use a circular buffer, you also store the timestamp of the last update for each slot to know if it's still valid.
- Not handling multiple hits in the same second: If multiple hits occur at the same timestamp and you choose to optimize by grouping (e.g., storing [timestamp, count] pairs), be careful to update the count rather than adding multiple separate entries for the same timestamp in that structure. If you don't account for this, you might either overcount or undercount. (In the simple queue approach above, we simply add multiple entries of the same timestamp, which works fine too.)
- Assuming queries only come after hits: In some designs, people might assume
getHits
will always be called after at least one hit. Make sure your code returns 0 forgetHits
if no hits have been recorded in the past 5 minutes or even if no hits at all have been recorded yet.
Edge Cases:
Consider these edge cases to ensure your solution handles them correctly:
- No hits at all: Calling
getHits
on a brand new counter (or after a long period with no hits) should return 0. - No hits in the last 5 minutes: If there were hits earlier, but none in the recent 5-minute window, the function should also return 0 (and ideally, the data structure would have removed all old hits). For example, if hits occurred at times 1,2,3 and now the timestamp is 600, there are no recent hits to count.
- Continuous hits over a long time: The data structure should not grow indefinitely. For instance, if there is at least one hit every few seconds for hours, the queue approach should continuously drop old timestamps and not use more memory than necessary.
- Multiple hits at the exact same timestamp: e.g.,
hit(10)
called 5 times when timestamp = 10. ThegetHits(10)
should count all 5 hits. Our queue approach handles this by having five entries of value 10 (or if optimized by grouping, one entry with count 5). Ensure your logic accounts for all hits. - Timestamp jumps forward significantly: If a
getHits(t)
is called wheret
is much larger than the last recorded hit’s time (e.g., last hit at 100, nowgetHits(1000)
), the code should correctly remove all old hits. In our queue approach, the while-loop will clear out everything since all hits are older thant-300
. The result will be 0. Make sure such scenarios don’t break the logic (like an empty queue should be handled properly).
Alternative Variations and Related Problems:
- Follow-up (High Hit Rate): The LeetCode problem asks how to handle the case where there could be a very large number of hits per second. In that scenario, storing every single hit might use too much memory. A common improvement is to group hits by second. For example, instead of storing a timestamp for each hit, store pairs like
(timestamp, hit_count_in_that_second)
. If multiple hits share the same timestamp, you increment the count for that timestamp. The queue would then store at most 300 entries (one per second in the window). This reduces memory usage when there are many hits in the same second, while logic for adding and evicting remains similar. - Sliding Window Average or Sum: Problems like “Moving Average from Data Stream” (LC 346) or “Maximum number of events in a time frame” use similar sliding window techniques. They require maintaining a window of the last N elements or last T seconds of data and computing a statistic. Techniques from this hit counter (using a queue or circular buffer) apply to those as well.
- Recent Calls Counter: LeetCode has a very similar problem called “Number of Recent Calls” (LC 933) where you have to count recent requests in the last 3000 milliseconds. The solution is almost identical: using a queue to store timestamps of calls and removing old ones. If you understood the Hit Counter, you can solve that easily.
- Logger Rate Limiter: Another related problem (LC 359) involves limiting how often a message can be logged within a time frame. While the context is different (it's about allowing/disallowing messages rather than counting hits), it also uses a time-window concept (in that case, a dictionary of message -> last timestamp seen, or similar sliding window logic). This is a good exercise in thinking about time-based data structures.
GET YOUR FREE
Coding Questions Catalog
