15.5 Extending Machine Learning
How do I know which variables matter? Examples of more complex machine learning methods.
- Random forests
- Gradient boosting
- LASSO
- SVM
Tradeoffs: Machine Learning as “black box”- see here
Here is a rough example using caret
with one of these methods along with cross-validation. Be sure to look at the documentation before using this in your own work. The performance metrics are described here.
library(caret)
## Establish type of training
<- trainControl(## 5-fold CV
fitControl method = "cv",
number = 5)
## Train model. Note the . means include all variables. Let's subset first
library(tidyverse)
<- don %>% dplyr::select(donation, Edsum, same_state, sameparty, NetWorth, peragsen)
don2 $donation <-as.factor(ifelse(don2$donation == 1, "Donated", "Not Donated"))
don2<- na.omit(don2)
don2 <- train(donation ~ ., data = don2,
mod_fit method = 'gbm',
verbose=F,
trControl = fitControl)
mod_fit
Stochastic Gradient Boosting
52888 samples
5 predictor
2 classes: 'Donated', 'Not Donated'
No pre-processing
Resampling: Cross-Validated (5 fold)
Summary of sample sizes: 42310, 42311, 42310, 42311, 42310
Resampling results across tuning parameters:
interaction.depth n.trees Accuracy Kappa
1 50 0.9640561 0.00000000
1 100 0.9641506 0.06508933
1 150 0.9640750 0.10469663
2 50 0.9643208 0.07823384
2 100 0.9638292 0.08893512
2 150 0.9639049 0.08536604
3 50 0.9641318 0.07915004
3 100 0.9639238 0.07251411
3 150 0.9641128 0.07076335
Tuning parameter 'shrinkage' was held constant at a value of 0.1
Tuning parameter 'n.minobsinnode' was held constant at a value of 10
Accuracy was used to select the optimal model using the largest value.
The final values used for the model were n.trees = 50, interaction.depth =
2, shrinkage = 0.1 and n.minobsinnode = 10.
There are nearly countless ways to use machine learning. See here.