Chapter 6 Model building
6.1 Model specification
<-
lm_spec linear_reg() %>%
set_engine("lm") %>%
set_mode(mode = "regression")
6.2 Evaluate models
We can use the set of resamples to estimate the performance of our two simple models using the fit_resamples()
function to fit the models on each of the 5 folds and store the results as lm_1_res
and lm_2_res
respectively.
fit_resamples()
will fit our model to each resample and evaluate on the heldout set from each resample. The function is only used for computing performance metrics across some set of resamples to evaluate our models - the models are not stored.
6.3 Fit model 1
<-
lm_1_res %>%
lm_spec fit_resamples(
~ median_income,
median_house_value resamples = cv_folds
)
6.4 Fit model 2
<-
lm_2_res %>%
lm_spec fit_resamples(
~ total_rooms,
median_house_value resamples = cv_folds
)
6.5 Performance metrics
Now we can collect the performance metrics with collect_metrics()
:
%>%
lm_1_res collect_metrics()
## # A tibble: 2 x 6
## .metric .estimator mean n std_err .config
## <chr> <chr> <dbl> <int> <dbl> <chr>
## 1 rmse standard 83548. 5 577. Preprocessor1_Model1
## 2 rsq standard 0.479 5 0.00913 Preprocessor1_Model1
%>%
lm_2_res collect_metrics()
## # A tibble: 2 x 6
## .metric .estimator mean n std_err .config
## <chr> <chr> <dbl> <int> <dbl> <chr>
## 1 rmse standard 114847. 5 650. Preprocessor1_Model1
## 2 rsq standard 0.0174 5 0.00286 Preprocessor1_Model1
The metrics show the average performance across all folds. Model 1 shows a better performance in comparison to model 2, why we choose this model.
Note that if we are interested in the results of every split, we could use the option summarize = FALSE
:
%>%
lm_1_res collect_metrics(summarize = FALSE)
## # A tibble: 10 x 5
## id .metric .estimator .estimate .config
## <chr> <chr> <chr> <dbl> <chr>
## 1 Fold1 rmse standard 83900. Preprocessor1_Model1
## 2 Fold1 rsq standard 0.462 Preprocessor1_Model1
## 3 Fold2 rmse standard 84527. Preprocessor1_Model1
## 4 Fold2 rsq standard 0.485 Preprocessor1_Model1
## 5 Fold3 rmse standard 84928. Preprocessor1_Model1
## 6 Fold3 rsq standard 0.456 Preprocessor1_Model1
## 7 Fold4 rmse standard 82157. Preprocessor1_Model1
## 8 Fold4 rsq standard 0.491 Preprocessor1_Model1
## 9 Fold5 rmse standard 82227. Preprocessor1_Model1
## 10 Fold5 rsq standard 0.504 Preprocessor1_Model1