What is Manhattan Distance in machine learning?

In machine learning and data science, the Manhattan distance, also known as the L1 distance or taxicab distance, is a measure of the distance between two points in a grid-based system. It is named after the grid-like street geography of the borough of Manhattan in New York City, where the shortest path between two points is along the grid lines.

Definition

The Manhattan distance between two points ( P ) and ( Q ) in an ( n )-dimensional space is defined as the sum of the absolute differences of their corresponding coordinates. Mathematically, if ( P = (p_1, p_2, \ldots, p_n) ) and ( Q = (q_1, q_2, \ldots, q_n) ), the Manhattan distance ( D ) between ( P ) and ( Q ) is given by:

[ D(P, Q) = \sum_{i=1}^{n} |p_i - q_i| ]

Example Calculation

Let's consider an example in a 2-dimensional space:

If ( P = (1, 2) ) and ( Q = (4, 6) ), the Manhattan distance ( D ) is calculated as follows:

[ D(P, Q) = |1 - 4| + |2 - 6| = 3 + 4 = 7 ]

Properties of Manhattan Distance

Non-Negativity: ( D(P, Q) \geq 0 ) and ( D(P, Q) = 0 ) if and only if ( P = Q ).
Symmetry: ( D(P, Q) = D(Q, P) ).
Triangle Inequality: ( D(P, R) \leq D(P, Q) + D(Q, R) ) for any points ( P ), ( Q ), and ( R ).

Usage in Machine Learning

Manhattan distance is used in various machine learning algorithms and scenarios, including:

Nearest Neighbor Algorithms: In K-Nearest Neighbors (KNN), Manhattan distance can be used as a distance metric to find the closest neighbors to a data point.
Clustering Algorithms: In clustering algorithms like K-means, Manhattan distance can be used to assign points to the nearest cluster center.
Optimization Problems: Manhattan distance is often used in optimization problems where the solution space is grid-based or where the cost function is defined by the sum of absolute differences.
Feature Engineering: It can be used as a measure of similarity between feature vectors, especially when features are sparse or have different scales.

Comparison with Euclidean Distance

Manhattan distance and Euclidean distance (L2 distance) are two common distance metrics, but they measure distance differently:

Manhattan Distance (L1 distance): Measures the distance as the sum of absolute differences along each dimension.
Euclidean Distance (L2 distance): Measures the straight-line (or "as-the-crow-flies") distance between two points, calculated as the square root of the sum of the squared differences.

For example, in a 2D space:

Manhattan Distance: ( D(P, Q) = |x_1 - x_2| + |y_1 - y_2| )
Euclidean Distance: ( D(P, Q) = \sqrt{(x_1 - x_2)^2 + (y_1 - y_2)^2} )

Conclusion

Manhattan distance is a simple yet powerful distance metric used in machine learning and data science. Its reliance on absolute differences makes it particularly useful in high-dimensional spaces and for certain types of data, such as those with outliers or sparse representations. Understanding and appropriately choosing distance metrics like Manhattan distance can significantly impact the performance and outcomes of machine learning algorithms.

TAGS

System Design Interview

CONTRIBUTOR

Design Gurus Team

GET YOUR FREE

Coding Questions Catalog