1661. Average Time of Process per Machine - Detailed Explanation
Problem Statement
You are given an array of logs where each log is represented as:
[machine_id, process_id, start_time, end_time]
Your task is to return a mapping (or dictionary) that gives, for each machine, the average processing time of its processes.
Example Input 1:
logs = [
["m1", "p1", 1, 4],
["m1", "p2", 2, 6],
["m2", "p3", 1, 3]
]
Output:
{"m1": 3.5, "m2": 2.0}
Explanation:
- For machine m1: Process times are (4 - 1) = 3 and (6 - 2) = 4. Average = (3 + 4) / 2 = 3.5.
- For machine m2: Process time is (3 - 1) = 2. Average = 2 / 1 = 2.0.
Example Input 2:
logs = [
["m1", "p1", 1, 2]
]
Output:
{"m1": 1.0}
Explanation:
- Only one process on m1 with duration (2 - 1) = 1.
Example Input 3:
logs = [
["m1", "p1", 2, 2],
["m2", "p2", 5, 5]
]
Output:
{"m1": 0.0, "m2": 0.0}
Explanation:
- Both processes have a duration of 0.
Constraints:
- The number of logs can be large (up to 10⁵).
- Start and end times are non-negative integers.
- It is guaranteed that for every log,
end_time >= start_time
.
Hints
- Hint 1: Think about how you would compute the average if you had to compute the total processing time and the count of processes for each machine.
- Hint 2: Consider using a hash map (or dictionary) where keys are machine identifiers and values are pairs of (total processing time, count of processes). Once you have these values, you can compute the average for each machine in one pass.
Brute Force Approach
One basic idea is to:
- Iterate over all unique machines.
- For each machine, loop through every log and sum up the processing times for logs belonging to that machine and count them.
- Finally, compute the average for each machine.
Drawback:
- This approach has a nested loop structure, leading to a time complexity of O(M * N) (where M is the number of unique machines and N is the total number of logs). This could be too slow for large inputs.
Optimal Approach
A more efficient solution is to use a single-pass aggregation:
- Initialize an empty dictionary (or hash map) where the key is the machine id.
- As you loop through the logs, for each log:
-
Compute the process duration as
end_time - start_time
. -
Update the dictionary by adding the duration to a running sum and incrementing a count for that machine.
-
- After processing all logs, iterate through the dictionary to compute the average for each machine by dividing the total time by the count.
Time Complexity:
- O(N) because you are making one pass through the logs and then one pass through the dictionary keys.
Space Complexity:
- O(M) where M is the number of unique machines, used for the dictionary.
Python Code
Java Code
Complexity Analysis
-
Time Complexity:
- Both the Python and Java optimal solutions iterate through the list of logs once, resulting in O(N) time, where N is the number of logs. Then they iterate over the unique machines (M keys) to compute averages, which is O(M). Since M ≤ N, the overall time is O(N).
-
Space Complexity:
- The extra space is used for the dictionary (or hash map) which stores at most M key-value pairs. Thus, the space complexity is O(M).
Step-by-Step Walkthrough and Visual Example
-
Initialization:
Create an empty dictionary (or hash map) to store statistics for each machine. -
Processing Logs:
For every log in the input:- Compute the duration as
end_time - start_time
. - If the machine already exists in the dictionary, update its total processing time and increment the count.
- Otherwise, add the machine with the initial duration and a count of 1.
Visual Example (using Test Case 1):
- Process log
["m1", "p1", 1, 4]
:- Duration = 3
- Dictionary becomes: { m1: {total: 3, count: 1} }
- Process log
["m1", "p2", 2, 6]
:- Duration = 4
- Update m1: {total: 7, count: 2}
- Process log
["m2", "p3", 1, 3]
:- Duration = 2
- Dictionary becomes: { m1: {total: 7, count: 2}, m2: {total: 2, count: 1} }
- Compute the duration as
-
Computing Averages:
For each machine, divide the total duration by the count:- m1: 7 / 2 = 3.5
- m2: 2 / 1 = 2.0
-
Return Result:
Return the mapping/dictionary with each machine’s average processing time.
Common Mistakes and Edge Cases
-
Common Mistakes:
- Forgetting to update the count when processing a log.
- Not handling the edge case where the start time equals the end time (resulting in zero duration).
- Using a nested loop to compute the sum for each machine, which may lead to higher time complexity.
-
Edge Cases:
- A log where the process duration is zero.
- Only one log is provided.
- All logs belong to the same machine.
- A large number of logs (ensure that the solution scales properly).
Alternative Variations
-
Variation 1:
Compute the average time per process type if logs include a process category in addition to machine id. This would involve grouping by both machine and process type. -
Variation 2:
Given a continuous stream of logs (data stream), maintain a running average for each machine without storing all the logs in memory. -
Variation 3:
Instead of average, compute the median processing time for each machine. This requires different data structures (such as heaps) to efficiently find the median.
Related Problems for Further Practice
Below are some additional LeetCode problems that can help you sharpen various skills—from working with arrays and hash maps to handling intervals and scheduling challenges:
-
Two Sum (LeetCode 1):
Practice using hash maps for fast lookups in an array. -
Longest Substring Without Repeating Characters (LeetCode 3):
Excellent for mastering the sliding window technique. -
Group Anagrams (LeetCode 49):
Develop strategies for grouping items using sorted strings or frequency counts. -
Valid Anagram (LeetCode 242):
Strengthen your ability to compare frequency counts in strings.
GET YOUR FREE
Coding Questions Catalog
