6.2 R code for fitting logistic regression

We can fit logistic regressions in R through glm(). Let’s build on the ANES example from section 5.3 and analyze a dichotomized measure of participation where 1=participated in at least some form and 0=did not participate.

anes <- read.csv("https://raw.githubusercontent.com/ktmccabe/teachingdata/main/anesdems.csv")
anes$partbinary <- ifelse(anes$participation > 0, 1, 0)

We can then fit using glm where family = binomial(link="logit")

out.logit <- glm(partbinary ~ female + edu + age + sexism, data=anes,
                 family = binomial(link="logit"))

The summary output includes the logit coefficients, standard errors, z-scores, and p-values.

summary(out.logit)


Call:
glm(formula = partbinary ~ female + edu + age + sexism, family = binomial(link = "logit"), 
    data = anes)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.5668   0.3328   0.4475   0.6287   1.2936  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)  1.016734   0.334656   3.038  0.00238 ** 
female      -0.382087   0.151516  -2.522  0.01168 *  
edu          0.321190   0.050945   6.305 2.89e-10 ***
age          0.008682   0.004046   2.146  0.03188 *  
sexism      -1.593694   0.336373  -4.738 2.16e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1361.5  on 1584  degrees of freedom
Residual deviance: 1252.9  on 1580  degrees of freedom
  (355 observations deleted due to missingness)
AIC: 1262.9

Number of Fisher Scoring iterations: 5

6.2.1 Writing down the regression model

In the articles you write, you will describe the methods you use in detail, including the variables in the model and the type of regression (e.g., logistic regression). Sometimes you may want to go a step further and be very explicit about the model that you ran. We’ve already seen the regression equations for linear models. For the GLMs, they will look very similar, but we need to make the link/response function an explicit part of the equation.

For example, for logistic regression we have a few ways of writing it, including:

\(\log \frac{\pi_i}{1-\pi_i} = \mathbf{x_i'}\beta\), or alternatively
\(Pr(Y_i = 1 | \mathbf{x}_i) = logit^{-1}(\mathbf{x}_i'\beta) = \frac{exp(\mathbf{x_i'}\beta)}{(1 + exp(\mathbf{x_i'}\beta)}\)

(You can also write out the individual variable names.) There is a new R package equatiomatic that can also be used to help write the equations from regression models. It’s not perfect, but should get you there for most basic models.

## First time, you need to install one of these
#remotes::install_github("datalorax/equatiomatic")
#install.packages("equatiomatic")

## Each time after, run library
library(equatiomatic)

## Will output in latex code, though see package for details on options
extract_eq(out.logit, wrap = TRUE, terms_per_line = 3)

\[ \begin{aligned} \log\left[ \frac { P( \operatorname{partbinary} = \operatorname{1} ) }{ 1 - P( \operatorname{partbinary} = \operatorname{1} ) } \right] &= \alpha + \beta_{1}(\operatorname{female}) + \beta_{2}(\operatorname{edu})\ + \\ &\quad \beta_{3}(\operatorname{age}) + \beta_{4}(\operatorname{sexism}) \end{aligned} \]