15.3 Cross-Validation | MLE for Political Science

15.3 Cross-Validation

Cross-Validation is an approach to address overfitting issues

Take data for which you know the answer – we call this “training data”
Randomly subset out a portion of the training data. This will become our “test” data.
Develop a model based on the training data.
Test the accuracy of the model on the test data (out-of-sample data that was not used to train the model).
Repeat process for different portions of the data.

Goal: See how well our model will generalize to new data (data the model hasn’t seen).

15.3.1 k-fold cross-validation

Divide your data into folds (how many \(k\) depends on how much data you have)

Fit your model to \(k-1\) folds
See how well your model predicts the data in the \(k\)th fold.
Can repeat, leaving out a different fold each time

15.3.2 Leave-one-out cross-validation

Best for smaller data

Fit your model to all but one observation in your data
See how well your model predicts the left-out observation
Can repeat, continuing to leave out one observation each time