6.2 R code for fitting logistic regression
We can fit logistic regressions in R through glm()
. Let’s build on the ANES example from section 5.3 and analyze a dichotomized measure of participation where 1=participated in at least some form and 0=did not participate.
<- read.csv("https://raw.githubusercontent.com/ktmccabe/teachingdata/main/anesdems.csv")
anes $partbinary <- ifelse(anes$participation > 0, 1, 0) anes
We can then fit using glm
where family = binomial(link="logit")
<- glm(partbinary ~ female + edu + age + sexism, data=anes,
out.logit family = binomial(link="logit"))
The summary output includes the logit coefficients, standard errors, z-scores, and p-values.
summary(out.logit)
Call:
glm(formula = partbinary ~ female + edu + age + sexism, family = binomial(link = "logit"),
data = anes)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.5668 0.3328 0.4475 0.6287 1.2936
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.016734 0.334656 3.038 0.00238 **
female -0.382087 0.151516 -2.522 0.01168 *
edu 0.321190 0.050945 6.305 2.89e-10 ***
age 0.008682 0.004046 2.146 0.03188 *
sexism -1.593694 0.336373 -4.738 2.16e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1361.5 on 1584 degrees of freedom
Residual deviance: 1252.9 on 1580 degrees of freedom
(355 observations deleted due to missingness)
AIC: 1262.9
Number of Fisher Scoring iterations: 5
6.2.1 Writing down the regression model
In the articles you write, you will describe the methods you use in detail, including the variables in the model and the type of regression (e.g., logistic regression). Sometimes you may want to go a step further and be very explicit about the model that you ran. We’ve already seen the regression equations for linear models. For the GLMs, they will look very similar, but we need to make the link/response function an explicit part of the equation.
For example, for logistic regression we have a few ways of writing it, including:
- \(\log \frac{\pi_i}{1-\pi_i} = \mathbf{x_i'}\beta\), or alternatively
- \(Pr(Y_i = 1 | \mathbf{x}_i) = logit^{-1}(\mathbf{x}_i'\beta) = \frac{exp(\mathbf{x_i'}\beta)}{(1 + exp(\mathbf{x_i'}\beta)}\)
(You can also write out the individual variable names.) There is a new R package equatiomatic that can also be used to help write the equations from regression models. It’s not perfect, but should get you there for most basic models.
## First time, you need to install one of these
#remotes::install_github("datalorax/equatiomatic")
#install.packages("equatiomatic")
## Each time after, run library
library(equatiomatic)
## Will output in latex code, though see package for details on options
extract_eq(out.logit, wrap = TRUE, terms_per_line = 3)
\[ \begin{aligned} \log\left[ \frac { P( \operatorname{partbinary} = \operatorname{1} ) }{ 1 - P( \operatorname{partbinary} = \operatorname{1} ) } \right] &= \alpha + \beta_{1}(\operatorname{female}) + \beta_{2}(\operatorname{edu})\ + \\ &\quad \beta_{3}(\operatorname{age}) + \beta_{4}(\operatorname{sexism}) \end{aligned} \]