Classification introduction#

Classification refers to a predictive modeling problem where a categorical class label is predicted.

Examples of classification problems include:

Given an example, classify if it is spam or not.
Given a handwritten character, classify it as one of the known characters.
Given recent user behavior, classify as churn or not.

In the next sections, we’ll cover the primary building blocks of classification models. Therefore, we mainly use content from Google’s excellent Machine Learning Crash Course.

Confusion matrix#

We’ll start with the metrics we’ll use to evaluate classification models:

Resources

Performance metrics#

Accuracy#

Accuracy is one metric for evaluating classification models:

Read: Accuracy

Precision and recall#

Learn about precision and recall:

Read: Precision and Recall

Next, check your understanding by answering this questions about Accuracy, Precision and Recall)

Read the interactive article Attacking discrimination with smarter machine learning to get a better understanding of the relevance of thresholds in classification problems.

ROC and AUC#

An ROC curve (receiver operating characteristic curve) is a graph showing the performance of a classification model at all classification thresholds.

Read: ROC Curve and AUC

Check your understanding: ROC and AUC

Unbalanced data#

Finally, we’ll cover a common problem in machine learning tasks: unbalanced data.

Analytics & Big Data

Classification introduction

Contents

Classification introduction#

Confusion matrix#

Performance metrics#

Accuracy#

Precision and recall#

ROC and AUC#

Unbalanced data#