The Classification Problem
Predictive Analytics week at the data school was nothing short of educational and intense. One problem that particularly caught my interest was demonstrating the results of a classification model. Our Coach, Stephen Barr, repeated over and over again that the accuracy does not fully represent the performance of a classification model, it is rather the business problem compared to four other measures. Stephen also stressed on the importance of making sure the client properly understands what every score in the model evaluation mean.
The Confusing-Confusion Matrix
One of our practice exercises during the week was creating a classifier for the famous Kaggle Titanic Data Set. The confusion matrix below is what is typically needed to evaluate the performance of a model, which could potentially confuse the client on the goodness of fit of the model because an easy takeaway to make of this table is that the correct predictions made were 145/178 which is very acceptable. In a Binary Classification model, the business objective is essential for evaluation. What if we were trying to predict the likelihood of someone being infected with a disease and the data set only contains 1 infected person out of 100 patients. Our model could classify all 100 to be healthy, and we could achieve a 99% accuracy, while missing the initial business objective. For that purpose, I have built the dashboard below to help explain a classification model to a business client.
How many of the target class were correctly predicted?
How many of the secondary class were correctly predicted?
3. False Negative Rate
How many of the target class were misclassified?
4. False Positive Rate
How many of the secondary class were misclassified?
Approaching the model explanation visually could potentially be a good approach to explain classification models for clients. Other measures that i also did not discuss in this blog include precision, prevalence, and AUC curve, etc…