Chapter 6 Model building

6.1 Model specification

lm_spec <- 
  linear_reg() %>% 
  set_engine("lm") %>% 
  set_mode(mode = "regression")

6.2 Evaluate models

We can use the set of resamples to estimate the performance of our two simple models using the fit_resamples() function to fit the models on each of the 5 folds and store the results as lm_1_res and lm_2_res respectively.

fit_resamples() will fit our model to each resample and evaluate on the heldout set from each resample. The function is only used for computing performance metrics across some set of resamples to evaluate our models - the models are not stored.

6.3 Fit model 1

lm_1_res <-
  lm_spec %>% 
  fit_resamples(
    median_house_value ~ median_income, 
    resamples = cv_folds
    )

6.4 Fit model 2

lm_2_res <-
  lm_spec %>% 
  fit_resamples(
    median_house_value ~ total_rooms, 
    resamples = cv_folds
    )

6.5 Performance metrics

Now we can collect the performance metrics with collect_metrics():

lm_1_res %>% 
  collect_metrics()
## # A tibble: 2 x 6
##   .metric .estimator      mean     n   std_err .config             
##   <chr>   <chr>          <dbl> <int>     <dbl> <chr>               
## 1 rmse    standard   83548.        5 577.      Preprocessor1_Model1
## 2 rsq     standard       0.479     5   0.00913 Preprocessor1_Model1
lm_2_res %>% 
  collect_metrics()
## # A tibble: 2 x 6
##   .metric .estimator        mean     n   std_err .config             
##   <chr>   <chr>            <dbl> <int>     <dbl> <chr>               
## 1 rmse    standard   114847.         5 650.      Preprocessor1_Model1
## 2 rsq     standard        0.0174     5   0.00286 Preprocessor1_Model1

The metrics show the average performance across all folds. Model 1 shows a better performance in comparison to model 2, why we choose this model.

Note that if we are interested in the results of every split, we could use the option summarize = FALSE:

lm_1_res %>% 
  collect_metrics(summarize = FALSE)
## # A tibble: 10 x 5
##    id    .metric .estimator .estimate .config             
##    <chr> <chr>   <chr>          <dbl> <chr>               
##  1 Fold1 rmse    standard   83900.    Preprocessor1_Model1
##  2 Fold1 rsq     standard       0.462 Preprocessor1_Model1
##  3 Fold2 rmse    standard   84527.    Preprocessor1_Model1
##  4 Fold2 rsq     standard       0.485 Preprocessor1_Model1
##  5 Fold3 rmse    standard   84928.    Preprocessor1_Model1
##  6 Fold3 rsq     standard       0.456 Preprocessor1_Model1
##  7 Fold4 rmse    standard   82157.    Preprocessor1_Model1
##  8 Fold4 rsq     standard       0.491 Preprocessor1_Model1
##  9 Fold5 rmse    standard   82227.    Preprocessor1_Model1
## 10 Fold5 rsq     standard       0.504 Preprocessor1_Model1