DataScience With Python/R/SAS: Model Evaluation Metrics | Machine Learning | Scikit-Learn

Lets load the Pima Indians diabetes dataset and understand the features and target. We use pandas library to load the data.

Lets calculate the accuracy for the model,

Classification accuracy is the easiest classification metric to understand .But, it does not tell you the underlying distribution of response values and it does not tell you what "types" of errors your classifier is making.
Confusion Matrix: Confusion matrix gives you a more complete picture of how your classifier is performing.

Every observation in the testing set is represented in exactly one box
It's a 2x2 matrix because there are 2 response classes

The format shown here is not universal

Basic terminology:-

True Positives (TP): we correctly predicted that they do have diabetes
True Negatives (TN): we correctly predicted that they don't have diabetes
False Positives (FP): we incorrectly predicted that they do have diabetes (a "Type I error")
False Negatives (FN): we incorrectly predicted that they don't have diabetes (a "Type II error")

DataScience With Python/R/SAS

Easy Pages

Model Evaluation Metrics | Machine Learning | Scikit-Learn | Part-4

No comments:

Post a Comment