8.5 Step 3: Iterate and Compare Models

When building predictive models, often researchers want to minimize this Root-Mean Squared Error – minimizing the magnitude of the typical prediction error (the distance between the actual value of our outcome, and the true value)

Example: Let’s compare the RMSE from two different models:

## Predicting Runs Scored with OBP
fit <- lm(RS ~ OBP, data = baseball)
sigma(fit)
## [1] 39.82189
## Predicting Runs Scored with Batting Average
fit2 <- lm(RS ~ BA, data = baseball)
sigma(fit2)
## [1] 51.48172

The Oakland A’s noticed that OBP was a more precise predictor than BA, and RMSE gives us one way to assess this.

8.5.1 Regression with Multiple Predictors

You can also add more than 1 predictor to a regression using the + sign.

## Predicting Runs Scored with OBP and Slugging Percentage
fit3 <- lm(RS ~ OBP + SLG, data = baseball)
sigma(fit3)
## [1] 25.12196

Look how the RMSE dropped again, improving our prediction.