A confusion matrix is a summary of prediction results on a classification problem which helps determine training error. The number of correct and incorrect predictions are summarised with count values and broken down by each class. This is the key to the confusion matrix. The confusion matrix shows the ways in which your classification model is confused when it makes predictions.
It gives you insight not only into the errors being made by your classifier, but more importantly the types of errors that are being made. It is this breakdown that overcomes the limitation of using classification accuracy alone.
You need a test data-set or a validation data-set with expected outcome values.
Make a prediction for each row in your test data-set.
From the expected outcomes and predictions count:
These numbers are then organised into a table, or a matrix as follows:
The counts of correct and incorrect classification are then filled into the table.
The total number of correct predictions for a class go into the expected row for that class value and the predicted column for that class value.
In the same way, the total number of incorrect predictions for a class go into the expected row for that class value and the predicted column for that class value.
Accuracy = True Positive + True Negative / True Positive + True Negative + False Positive + False Negative
Using the example above;
The score is the same, 0.7, for both models.
This raises the issue as to which is better, and some other accuracy models follow.
ppv = True Positive / True Positive + False Negative)
S = True Positive / True Positive + False Negative
This indicates the % correctly found.
SP = True Negative / True Negative + False Positive
This indicates the % correctly rejected