Robots, Art, Offshore Finance, and Life 

fast2value

ML Confusion Matrix and explanation

USE OF CONFUSION MATRICES

A confusion matrix is a summary of prediction results on a classification problem which helps determine training error. The number of correct and incorrect predictions are summarised with count values and broken down by each class. This is the key to the confusion matrix.  The confusion matrix shows the ways in which your classification model is confused when it makes predictions.

It gives you insight not only into the errors being made by your classifier, but more importantly the types of errors that are being made. It is this breakdown that overcomes the limitation of using classification accuracy alone.


How to Calculate a Confusion Matrix

You need a test data-set or a validation data-set with expected outcome values.
Make a prediction for each row in your test data-set.
From the expected outcomes and predictions count:

  • The number of correct predictions for each class.
  • The number of incorrect predictions for each class, organised by the class that was predicted.


These numbers are then organised into a table, or a matrix as follows:

  1. Expected down the side: Each row of the matrix corresponds to a predicted class.
  2. Predicted across the top: Each column of the matrix corresponds to an actual class.


The counts of correct and incorrect classification are then filled into the table.
The total number of correct predictions for a class go into the expected row for that class value and the predicted column for that class value.

In the same way, the total number of incorrect predictions for a class go into the expected row for that class value and the predicted column for that class value.

















TRAINING ACCURACY OF MODELS


Accuracy = True Positive + True Negative / True Positive + True Negative + False Positive + False Negative


Using the example above;

  • Classifier 1 Accuracy = (12 + 9) / (12 + 0 + 9 + 9) = 0.7
  • Classifier 2 Accuracy = (11 + 10) / (11 + 10 + 1 + 8) = 0.7

The score is the same, 0.7, for both models.

This raises the issue as to which is better, and some other accuracy models follow.


PPV (POSITIVE PREDICTIVE VALUE)

ppv = True Positive / True Positive + False Negative)


SENSITIVITY

S = True Positive / True Positive + False Negative

This indicates the % correctly found.


SPECIFICITY

SP = True Negative / True Negative + False Positive

This indicates the % correctly rejected

CONFUSION MATRICES AND ACCURACY MEASURES

Robots, Art, Offshore Finance, Life - Machine Learning Confusion Matrices and Accuracy Measures