What are the various performance matrices in Statistical Machine Learning?
Performance metrics are an important part of any machine learning algorithm. Without accurate measurements, it can be difficult to determine the effectiveness of an algorithm. In this blog post, we’ll take a look at some of the most common performance metrics used in statistical machine learning.
- Confusion Matrix
- Classification Accuracy
- Precision / Positive Prediction Value (PPV)
- Sensitivity / Recall / True Positive Rate (TPR)
- Specificity / True Negative Rate (TNR)
- Negative Prediction Value (NPV)
- F1 Score
- F-beta Score
- Matthews Correlation Coefficient / Phi coefficient / Mean Square Contingency Coefficient
- Balanced Accuracy
1. Confusion Matrix
A confusion matrix is a 2x2 matrix that is used to help understand how well a classification model is working. The matrix is made up of four cells: True positive, True negative, False positive and False negative. True positive is the number of items that were correctly classified True by the model. True negative is the number of items that were correctly classified False by the model. False positive and False negative are the other two cells that are the errors in the model classification.
2. Classification Accuracy
Classification accuracy is a measure of how accurately a machine learning algorithm assigns a new observation to a class, compared to the true class of the observation. In other words, it is the percentage of observations that are correctly classified by the machine learning algorithm.
Several factors can affect classification accuracy. Some of these include the size and quality of the training data set, the type of machine learning algorithm used, and the complexity of the problem being solved.
3. Precision / Positive Prediction Value
Precision is the percentage of correct predictions made by a model. It is the ratio of True Positives to he sum of True Positives and False Positives. If more number of TP’s are there and no FP’s then the model will be called precise. As the FP’s increase, the model gets less and less precise. Maximum precision is 1 whereas worst is 0.
4. Sensitivity / Recall / True Positive Rate
Sensitivity is the measure of a True positives out of all positives. There are some use cases where there is high importance given to sensitivity / Recall.
In some cases high sensitivity is needed if the user do not want to take high risks, such as in medical industry, if a user makes a model to detect a brain tumor, he/she will increase the sensitivity so that even a completely fine brain scan will show a tumor but an actual case of tumor will not be classified as fine.
Low sensitivity means if the classification is true, still it will be labeled as false.
5. Specificity / True Negative Rate
The specificity of a test, also referred to as the true negative rate (TNR), is the proportion of samples that are genuinely negative that give a negative result using the test in question . For example, a scan that identifies all healthy people as tumorous is very specific. Another scan that incorrectly identifies 30 % of healthy people as having the tumor would be deemed to be less specific, having a higher false positive rate (FPR). Also referred to as type I errors, false positives are the rejection of a true null hypothesis (the null hypothesis being that the sample is negative).
6. Negative Prediction Value
The negative predictive value is defined as the number of true negatives divided by the total number of negative. Out of 80 negative scans, 76 are true negatives (they don’t have the tumor) and 4 are false negatives (they tested negative, but they actually have the tumor). Therefore, the NPV would be 95% (76/80). You can expect that 95% of people who test negative would actually be negative for brain tumor.
7. F1 Score
The F1 score is the harmonic mean of the precision and recall. The more generic F beta score applies additional weights, valuing one of precision or recall more than the other.
The highest possible value of an F-score is 1.0, indicating perfect precision and recall, and the lowest possible value is 0, if either the precision or the recall is zero.
8. F-beta Score
The F-beta score is the weighted harmonic mean of precision and recall, reaching its optimal value at 1 and its worst value at 0.
The beta parameter determines the weight of recall in the combined score.
Value of β varies as the importance of FP and FN are decided.
- when FP and FN are both equally important, β = 1
- when FP is more important, β < 1 (i.e. 0.5)
- when FN is more important, β > 1 (i.e. 2)
Thus F-beta score can be called as F1 or F0.5 or F2 score
9. Matthews Correlation Coefficient
The Matthews correlation coefficient is used in machine learning as a measure of the quality of binary and multiclass classifications. It takes into account true and false positives and negatives and is generally regarded as a balanced measure which can be used even if the classes are of very different sizes. The MCC is in essence a correlation coefficient value between -1 and +1. A coefficient of +1 represents a perfect prediction, 0 an average random prediction and -1 an inverse prediction.
10. Balanced Accuracy
Balanced accuracy is a metric we can use to assess the performance of a classification model. This metric is particularly useful when the two classes are imbalanced — that is, one class appears much more than the other.
There are more Statistically significant methods of measuring the model’s performance which are equally important and effective as well. But the ones mentioned above are most used in regular practices. As a Data Scientist one should be able to understand the above ones and should have an idea about when and where to use these methods.
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
Like , Share if you found this helpful.
I am open to any Suggestions/ Corrections/ Comments so please feel free.
Also , Connect with me on LinkedIn
Open to Entry Level Jobs as Data Analyst. Please DM on LinkedIn for my Resume for any openings in near future 🤗 🙏