Chapter 14 Workflows
The following content is based on the excellent tidymodels documentation.
In this tutorial, we’ll explore the tidymodels packages recipes
and workflows
which are designed to help you preprocess your data before training your model.
Recipes
are built as a series of preprocessing steps, such as:
- converting qualitative predictors to indicator variables (also known as dummy variables),
- transforming data to be on a different scale (e.g., taking the logarithm of a variable),
- transforming whole groups of predictors together,
- extracting key features from raw variables (e.g., getting the day of the week out of a date variable),
and so on. If you are familiar with R’s formula interface, a lot of this might sound familiar and like what a formula already does. Recipes can be used to do many of the same things, but they have a much wider range of possibilities. This article shows how to use recipes for modeling.
To combine all of the steps discussed above with model building, we can use the package workflows. A workflow is an object that can bundle together your pre-processing, modeling, and post-processing requests.
To use the code in this tutorial, you will need to install the following packages:
nycflights13
,skimr
, andtidymodels.
library(tidymodels)
# for flight data:
library(nycflights13)
# for variable summaries:
library(skimr)