fast2value

Many Machine Learning Algorithms - Supervised or Unsupervised, use distance metrics to know the input data pattern in order to make any data based decision. A good distance metric helps in improving the performance of classification, clustering and information retrieval significantly. Choosing a good distance metric becomes important in those instances where the algorithm wants to recognize similarities between the contents.

Distance between pairs of examples is measured and used to:

- Cluster nearby examples into a common class (unlabelled data) or
- Find a classified surface in space of examples that optimally separate different (labelled) collections of examples from other provided examples.

Three key approaches to measuring distance are Euclidian, Manhattan and Chebyshev.

**EUCLIDIAN DISTANCE**

This can be defined as a straight line between 2 points.

The formula to calculate this has been shown in the image.

**MANHATTAN DISTANCE**

Also known as L1 distance. The distance between two points is the sum of the (absolute) differences of their coordinates. E.g. it only costs 1 unit for a straight move, but 2 if one wants to take a crossed move.

The formula to calculate this has been shown in the image.

**CHEBYSHEV DISTANCE**

**In Chebyshev distance, all 8 adjacent cells from the given point can be reached by one unit.**

The formula to calculate this has been shown in the image.

**USING CHESS MOVES AS AN EXAMPLE**

In chess, the distance between squares on the chessboard for Rooks (Castles) is measured in Manhattan distance.

Kings and Queens use Chebyshev distance, and Bishops use the Manhattan distance (between diagonal squares of the same colour) on the chessboard. To reach from one square to another, only Kings require the number of moves equal to the distance; Rooks (Castles), Queens and Bishops require one or two moves

****Minkowski distance is a metric in Normed vector space.

Given two points P and Q, then Minkowski distance is defined as

In the case of p=2, we get Euclidean distance. In the case of p=1, we get Manhattan distance.

Learned models depend upon the following factors

- Distance metric between examples
- Choice of feature vectors
- Constraints on complexity of model

- Specified number of vectors
- Complexity of separating surface
- Need to avoid over-fitting problems (each example is its own cluster, or a complex separating surface).

**CLASSIFICATION APPROACHES**

Need to find boundaries in feature space that separate different classes of labelled examples.

- Look for simple surface (e.g. best line in a plane) that separates classes
- Look for more complex surfaces (subject to constraints) that separate classes
- Use voting systems (find “k” nearest training examples, use majority vote to select label)

ISSUES

- How to avoid over-fitting of data
- How to measure performance
- How to select best features