Classification introduction#

Classification refers to a predictive modeling problem where a categorical class label is predicted.

Examples of classification problems include:

  • Given an example, classify if it is spam or not.

  • Given a handwritten character, classify it as one of the known characters.

  • Given recent user behavior, classify as churn or not.

In the next sections, we’ll cover the primary building blocks of classification models. Therefore, we mainly use content from Google’s excellent Machine Learning Crash Course.

Confusion matrix#

We’ll start with the metrics we’ll use to evaluate classification models:



Performance metrics#

Accuracy#

Accuracy is one metric for evaluating classification models:

Precision and recall#

Learn about precision and recall:

Next, check your understanding by answering this questions about Accuracy, Precision and Recall)

Read the interactive article Attacking discrimination with smarter machine learning to get a better understanding of the relevance of thresholds in classification problems.

ROC and AUC#

An ROC curve (receiver operating characteristic curve) is a graph showing the performance of a classification model at all classification thresholds.

Check your understanding: ROC and AUC

Unbalanced data#

Finally, we’ll cover a common problem in machine learning tasks: unbalanced data.