4.1 Introducing OLS Regression

The regression method describes how one variable depends on one or more other variables. Ordinary Least Squares regression is a linear model with the matrix representation:

\(Y = \alpha + X\beta + \epsilon\)

Given values of variables in \(X\), the model predicts the average of an outcome variable \(Y\). For example, if \(Y\) is a measure of how wealthy a country is, \(X\) may contain measures related to the country’s natural resources and/or features of its institutions (things that we think might contribute to how wealthy a country is.) In this equation:

  • \(Y\) is the outcome variable (\(n \times 1\)).1
  • \(\alpha\) is a parameter representing the intercept
  • \(\beta\) is a parameter representing the slope/marginal effect (\(k \times 1\)), and
  • \(\epsilon\) is the error term (\(n \times 1\)).

In OLS, we estimate a line of best fit to predict \(\hat{Y}\) values for different values of X:

  • \(\hat{Y} = \hat{\alpha} + X\hat{\beta}\).
  • When you see a “\(\hat{hat}\)” on top of a letter, that means it is an estimate of a parameter.
  • As we will see in the next section, in multiple regression, sometimes this equation is represented as just \(\hat{Y} = X\hat{\beta}\), where this generally means that \(X\) is a matrix that includes several variables and \(\hat \beta\) is a vector that includes several coefficients, including a coefficient representing the intercept \(\hat \alpha\)

We interpret linear regression coefficients as describing how a dependent variable is expected to change when a particular independent variable changes by a certain amount. Specifically:

  • “Associated with each one unit increase in a variable \(x_1\), there is a \(\hat{\beta_1}\) estimated expected average increase in \(y\).”
  • If we have more than one explanatory variable (i.e., a multiple regression), we add the phrase “controlling on/ holding constant other observed factors included in the model.”

We can think of the interpretation of a coefficient in multiple regression using an analogy to a set of light switches:

We ask: How much does the light in the room change when we flip one switch, while holding constant the position of all the other switches?

This would be a good place to review the Wheelan chapter and Gelman and Hill 3.1 and 3.2 to reinforce what a regression is and how to interpret regression results.


  1. Recall this notation means rows by columns, \(Y\) is a vector of length \(n\) (the number of observations), and since there is only 1 outcome measure, it is 1 column.↩︎