175. Combine Two Tables - Detailed Explanation
Problem Statement
We are given two tables: Person and Address. The Person table contains information about people, and the Address table contains address details that may correspond to a person. The goal is to combine the two tables so that each person’s information is accompanied by their address details (if available). If a person does not have a corresponding address, the address fields (city and state) should be returned as NULL. The final output must be sorted by personId
.
Below are the table schemas:
Person
Column Name | Type |
---|---|
personId | int |
firstName | varchar |
lastName | varchar |
Address
Column Name | Type |
---|---|
addressId | int |
personId | int |
city | varchar |
state | varchar |
Example 1
Person
personId | firstName | lastName |
---|---|---|
1 | John | Doe |
Address
addressId | personId | city | state |
---|---|---|---|
1 | 1 | New York | NY |
Output
personId | firstName | lastName | city | state |
---|---|---|---|---|
1 | John | Doe | New York | NY |
Explanation: John Doe has an address in New York, so his row is fully populated.
Example 2
Person
personId | firstName | lastName |
---|---|---|
1 | John | Doe |
2 | Jane | Smith |
Address
addressId | personId | city | state |
---|---|---|---|
1 | 2 | Boston | MA |
Output
personId | firstName | lastName | city | state |
---|---|---|---|---|
1 | John | Doe | NULL | NULL |
2 | Jane | Smith | Boston | MA |
Explanation: Jane Smith has an address while John Doe does not, so John’s address fields are NULL.
Example 3
Person
personId | firstName | lastName |
---|---|---|
1 | John | Doe |
2 | Jane | Smith |
3 | Bob | Lee |
Address
addressId | personId | city | state |
---|---|---|---|
1 | 1 | Chicago | IL |
2 | 3 | Seattle | WA |
Output
personId | firstName | lastName | city | state |
---|---|---|---|---|
1 | John | Doe | Chicago | IL |
2 | Jane | Smith | NULL | NULL |
3 | Bob | Lee | Seattle | WA |
Explanation: John and Bob have addresses, while Jane does not.
Constraints:
- The Person table always has at least one row.
- A person may or may not have an associated address in the Address table.
- The final output must be sorted by
personId
.
Hints
- Join on the Common Key: Identify that both tables share the column
personId
. This is the key to combine the tables. - LEFT JOIN Concept: Since some persons might not have an address, think of using a LEFT JOIN (or its equivalent when coding) so that every person appears in the output even if the address is missing.
Approaches Overview
There are two primary approaches to solve this problem when simulating the SQL join in code:
Approach 1: Brute Force (Nested Loop Join)
- Idea: For each row in the Person table, iterate through the Address table to find a matching
personId
. - Explanation:
- Loop through every person.
- For each person, loop through all addresses to check if the
personId
matches. - If a match is found, attach the address information; otherwise, set the address fields as NULL.
- Complexity: O(n * m), where n is the number of persons and m is the number of addresses.
- Drawbacks: This approach can be inefficient when dealing with large datasets.
Approach 2: Optimal Approach (Hash Join / Left Join)
-
Idea: Use a hash map (or dictionary) to map each
personId
from the Address table to its address details. -
Explanation:
- First, build a hash map from the Address table using
personId
as the key. - Then, iterate over the Person table and quickly look up the corresponding address in the hash map.
- If found, merge the details; if not, attach NULL for the address fields.
- First, build a hash map from the Address table using
-
Complexity: O(n + m) because building the map is O(m) and iterating through persons is O(n).
-
Advantages: Much more efficient for large datasets.
Below are detailed code implementations for the optimal approach in Python and Java.
Python Implementation (Optimal Approach)
Python Walkthrough
-
Building the Address Map:
A dictionary comprehension creates a mapping ofpersonId
to address details, allowing constant time lookup. -
Iterating through Persons:
For each person, we check if there is a corresponding address in the dictionary. If not, we default to{'city': None, 'state': None}
. -
Sorting and Output:
The result is sorted bypersonId
(ensuring correct order) and then printed row by row.
Java Implementation (Optimal Approach)
Java Walkthrough
-
Data Classes:
We define classes forPerson
,Address
, andJoinedRow
to encapsulate the table data. -
Building the Address Map:
AHashMap
is used to map eachpersonId
from the addresses to its corresponding address details. -
Performing the Join:
We iterate over the persons list and for each person, use the map to find the corresponding address. If an address is missing,null
values are used for city and state. -
Sorting and Printing:
The resulting list is sorted bypersonId
and printed to the console.
Complexity Analysis
- Brute Force Approach: O(n * m) time complexity (inefficient for large datasets).
- Optimal (Hash Join) Approach: O(n + m) time complexity, where n is the number of persons and m is the number of addresses. This is much more efficient.
Step-by-Step Walkthrough and Visual Example
- Initialization:
Imagine you have three persons and two addresses. Create a mapping for addresses usingpersonId
as the key. - Processing Each Person:
- For person with
personId = 1
, find the address from the map. - For person with
personId = 2
, no address exists; assign NULL to city and state. - For person with
personId = 3
, find the address from the map.
- For person with
- Final Output:
The joined table is then sorted bypersonId
and printed.
Common Mistakes
- Using an Inner Join Instead of a Left Join:
This would omit persons without an address. - Not Sorting the Result:
The final output must be sorted bypersonId
. - Inefficient Looping:
Using nested loops without a hash map can lead to poor performance with larger datasets.
Edge Cases
-
No Address Records:
All address fields should be NULL. -
Empty Person Table:
Although the problem guarantees at least one row, always consider handling empty inputs gracefully. -
Multiple Addresses for One Person:
This problem assumes a one-to-one relationship; if multiple addresses exist, decide whether to return one record or multiple records per person.
Alternative Variations
-
Inner Join Variation:
Only return persons who have an associated address. -
Aggregation Variation:
If a person has multiple addresses, you might need to aggregate the address details into a single row (for example, concatenating cities). -
Additional Filtering:
You could extend the query to filter results based on conditions in either table (e.g., only persons from a specific state).
Related Problems
-
Second Highest Salary (LeetCode 176): Focuses on retrieving the second highest salary from an employee table using SQL.
GET YOUR FREE
Coding Questions Catalog
