8.3 Fitting Ordinal Models in R

To fit an ordinal logit or probit model in R, we can use the MASS package and polr function.

You may need to install this in your RStudio the first time you use it.

install.packages("MASS")

When we are ready to run the model, we then open the package and use the function polr. The first inputs are very similar to lm and glm. However, we add an argument specifying the method = which can be “logistic” (the default) or “probit.” We also specify Hess=T to make sure we get the uncertainty estimates with the results.

By the way, why would that argument be called Hess?

library(MASS)
fit <- polr(as.factor(dissent) ~ treat, data= pg, Hess = T, method = "probit")
summary(fit)

Call:
polr(formula = as.factor(dissent) ~ treat, data = pg, Hess = T, 
    method = "probit")

Coefficients:
       Value Std. Error t value
treat 0.2645     0.1021   2.589

Intercepts:
    Value   Std. Error t value
1|2 -0.5102  0.0761    -6.7017
2|3 -0.0576  0.0740    -0.7784
3|4  0.2697  0.0742     3.6344

Residual Deviance: 1253.844 
AIC: 1261.844 
(116 observations deleted due to missingness)

To get p-values, we can use the AER package.

library(AER)

Warning: package 'zoo' was built under R version 4.0.2

Warning: package 'sandwich' was built under R version 4.0.2

coeftest(fit)


z test of coefficients:

       Estimate Std. Error z value  Pr(>|z|)    
treat  0.264455   0.102143  2.5891 0.0096238 ** 
1|2   -0.510237   0.076135 -6.7017  2.06e-11 ***
2|3   -0.057570   0.073958 -0.7784 0.4363235    
3|4    0.269665   0.074198  3.6344 0.0002786 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Or calculate manually

round(2*pnorm(abs(summary(fit)$coefficients[,3]), 
              lower.tail = F), digits=6)

   treat      1|2      2|3      3|4 
0.009624 0.000000 0.436323 0.000279

For details on how to fit this manually in R using optim, expand.

# Grab X and Y, listwise deletion
pgsub <- na.omit(cbind(pg$dissent, pg$treat))
Y <- pgsub[,1]
X <- pgsub[, 2]

## Log likelihood in R for this example
llikor <- function(par, Y, X){
  k <- 1 # X columns
  j <- 4 # categories
  beta <- par[1:k]
  zeta <- par[(k+1):length(par)]
  X <-cbind(X)
  beta <- cbind(beta)
  ## linear predictor
  eta <- X %*% beta
  ## indicator variables
  y1 <- ifelse(Y==1, 1, 0)
  y2 <- ifelse(Y==2, 1, 0)
  y3 <- ifelse(Y==3, 1, 0)
  y4 <- ifelse(Y==4, 1, 0)
  ## likelihood for each category
  l1 <- y1*log(pnorm(zeta[1] - eta))
  l2 <- y2*log(pnorm(zeta[2] - eta) - pnorm(zeta[1] - eta)) 
  l3 <- y3*log(pnorm(zeta[3] - eta) - pnorm(zeta[2] - eta)) 
  l4 <- y4*log(1 - pnorm(zeta[3] - eta))
  ## sum together
  llik <- sum(c(l1, l2, l3, l4))
  return(llik)
}

## Optimize
opt.out <- optim(par=c(.2, 1, 2, 3), fn=llikor, X=X, Y=Y, method="BFGS",
      control=list(fnscale=-1), hessian = T)

## Coefficients
opt.out$par[1]

[1] 0.2644911

coef(fit)

    treat 
0.2644546

## Cutpoints
opt.out$par[2:4]

[1] -0.51019872 -0.05757855  0.26969063

fit$zeta

       1|2        2|3        3|4 
-0.5102370 -0.0575699  0.2696646

## Standard errors
sqrt(diag(solve(- opt.out$hessian)))

[1] 0.10214310 0.07613473 0.07395763 0.07419776

sqrt(diag(vcov(fit)))

     treat        1|2        2|3        3|4 
0.10214309 0.07613503 0.07395770 0.07419775

8.3.1 Quantities of Interest

In discussing the results of Table 3 from the ordered probit analysis in the paper, the authors note, “A shift of .26 probits implies, for example, that a health group respondent with a 30% chance of strongly agreeing to dissent would move to a 40% chance if assigned to the reconciliation program group. This is a large, but not implausibly large, shift in opinion.”

Similar to the binary case, we can convert our estimates to probability, but here we will do it for specific categories. Let’s first conduct an estimate of the difference in probability of being in the top category of dissent for those in the treatment - control conditions.

We will follow the formulas outlined above. For the top category (category 4 here) we have:

\(Pr(Y_i = C_4) = 1 - Pr(Y^* \leq \zeta_{4-1} | X_i)\) where our estimate of \(Y* = X\hat \beta\).

To calculate this we take \(1 -\) pnorm\((\zeta_{4-1} - X\hat \beta)\)

Note that we use pnorm because this is a probit model. If it were a logistic model, we would use plogis instead.
We are estimating the probability that our observation is in category 4, when setting \(X\) to particular values. In this case, we will estimate the probability when an observation is in the treatment or control group.

Let’s find \(X\)

X <- model.matrix(fit)
head(X)

  (Intercept) treat
1           1     0
2           1     0
3           1     0
4           1     0
5           1     0
6           1     0

Let’s estimate the probability when someone is in the treatment group vs. control group. For this, we will set X to be 1 or 0. Because we only have one coefficient in our model, we actually don’t need the entire matrix. For example, once we set treat to be 1, all observations (rows) are identical to each other:

X[, "treat"] <- 1 # in treatment
head(X)

  (Intercept) treat
1           1     1
2           1     1
3           1     1
4           1     1
5           1     1
6           1     1

So instead of estimating the outcome for every single row and then take the mean, we could just estimate the outcome for one row given that all rows are identical. In regressions where you have other covariates, those covariates will likely have different values, making the rows distinct. This is a special case where the below code would give identical results.

Xt <- cbind(1,1) # A 1 for the intercept and 1 for the treatment
Xc <- cbind(1,0) # A 1 for the intercept and 0 for the treatment (being in the control)

By default, R will include an Intercept in the model.matrix. We actually do not want this when calculating ordinal regression quantities of interest because the cutpoints will be serving the purpose of the intercept. So we will remove this from \(X\), but we will use as.matrix() to make sure R will still treat the object like a matrix.

If you instead used the full \(X\) matrix, the only extra step you would need to do is to take the mean() of the probability estimates below (e.g., the mean of pp1.4). See the practice problem solution in 8.5 for an example.

X <- as.matrix(X[, -1])
head(X)

Xt <- as.matrix(Xt[, -1])
Xt

     [,1]
[1,]    1

Xc <- as.matrix(Xc[, -1])
Xc

     [,1]
[1,]    0

## probability of being in top category
cutpoint3 <- fit$zeta[3] # grab the top cutpoint J - 1, 4-1=3
b <- coef(fit) # coefficients

pp1.4 <- 1 - pnorm(cutpoint3 - Xt%*%b) # prob of top category for treatment
pp1.4

          [,1]
[1,] 0.4979215

pp0.4 <- 1 - pnorm(cutpoint3 - Xc%*%b) # prob of top category for control
pp0.4

          [,1]
[1,] 0.3937091

## difference
pp1.4 - pp0.4

          [,1]
[1,] 0.1042124

We recover that there is an 10 percentage point difference in probability of being in the top category of dissent, predicted for those in the treatment vs. control.

If we were interested in other categories, we could similarly compare probabilities within those. For example, what about category 3?

Here we just apply a different formula from 8.2: \(Pr(Y_i = C_3) = Pr(Y^* \leq \zeta_{3} | X_i) - Pr(Y^* \leq \zeta_{2} | X_i)\) where our estimate of \(Y^* = X\hat \beta\).

To calculate this we take pnorm\((\zeta_{3} - X\hat \beta)\) - pnorm\((\zeta_{2} - X\hat \beta)\)

cutpoint3 <- fit$zeta[3]
cutpoint2 <- fit$zeta[2]

pp1.3 <- pnorm(cutpoint3 - Xt%*%b) - pnorm(cutpoint2 - Xt%*%b)
pp1.3

          [,1]
[1,] 0.1283614

pp0.3 <- pnorm(cutpoint3 - Xc%*%b) - pnorm(cutpoint2 - Xc%*%b)
pp0.3

          [,1]
[1,] 0.1292452

pp1.3 - pp0.3

             [,1]
[1,] -0.000883825

These represent the estimated probabilities of being in category 3, or the estimated difference in probability of being in category 3 for the treatment vs. control condition.

What about category 2 or category 1?

Try on your own, and then expand for the solution.

Category 2: Here we just apply a different formula from 8.2: \(Pr(Y_i = C_2) = Pr(Y^* \leq \zeta_{2} | X_i) - Pr(Y^* \leq \zeta_{1} | X_i)\).
Category 1: Here we just apply a different formula from 8.2: \(Pr(Y_i = C_1) = Pr(Y^* \leq \zeta_{1} | X_i)\).

cutpoint1 <- fit$zeta[1]
cutpoint2 <- fit$zeta[2]

## category 2
pp1.2 <- pnorm(cutpoint2 - Xt%*%b) - pnorm(cutpoint1 - Xt%*%b)
pp1.2

          [,1]
[1,] 0.1544561

pp0.2 <- pnorm(cutpoint2 - Xc%*%b) - pnorm(cutpoint1 - Xc%*%b)
pp0.2

          [,1]
[1,] 0.1721029

pp1.2 - pp0.2

            [,1]
[1,] -0.01764679

## category 1 (bottom category)
pp1.1 <- pnorm(cutpoint1 - Xt%*%b)
pp1.1

          [,1]
[1,] 0.2192609

pp0.1 <- pnorm(cutpoint1 - Xc%*%b) 
pp0.1

          [,1]
[1,] 0.3049427

pp1.1 - pp0.1

            [,1]
[1,] -0.08568175

We can use our R shortcuts for all categories. Here, to make sure we are in probabilities, we can use type="probs".

## predict
prs <- predict(fit, newdata = data.frame(treat = c(0,1)), type = "probs")
prs

          1         2         3         4
1 0.3049427 0.1721029 0.1292452 0.3937091
2 0.2192609 0.1544561 0.1283614 0.4979215

The first row shows the probability of being in each category (1 through 4) for the first entry for the treat variable (0), implying that the observation was in the control group. The second row does the same thing for when the treat variable is 1, implying that the observation was in the treatment group. These should match up with our manual calculations.

The prediction package will also work to some extent, which is nice when you have covariates you want to hold at observed values. Note that we put whatever the names of the levels of our variables are.

It just so happens that in this case, both treat and our dissent outcomes have numbers for the categories. In other cases, you might have text as the category names, in which case you would enter those text values (e.g., “Somewhat Agree” or “Treatment condition”) as the variable or category labels.

## Prediction
library(prediction)
## specify which category
outp <- prediction(fit, at = list(treat = c(0,1)), 
                   category = "4")
summary(outp)$Prediction

[1] 0.3937091 0.4979215

Finally, the margins package should also work in many cases. You can compare this output with the pp1.4 - pp0.4 calculation from above.

library(margins)
marg.obj <- summary(margins(fit, variables = "treat", change = c(0, 1), 
                            vce="bootstrap", category = "4"))
marg.obj

 factor    AME     SE      z      p  lower  upper
  treat 0.1042 0.0388 2.6877 0.0072 0.0282 0.1802

How can we visualize the results?

Given our independent variable of interest, treat just has two categories, we could make a plot similar to the barplot at the beginning of the section.
When our independent variable of interest has several categories, things can get a bit messier. We will do an example in a weekly tutorial.

How can we get uncertainty?

Like before, we can use bootstrap or simulation, or ask Zelig or margins to do it for us.

How can we incorporate covariates?

Like before, we would just have a matrix of \(X\) covariates instead of a single column for the treatment.
Like before, we could also calculate probabilities holding those covariates at observed values or setting them to designated values of interest.

How could we fit a logit version?

We just switch the method = "logistic", and then we should be careful to also use plogis() in place of pnorm for the probability calculation.
Note: ordered logits also have the benefit of having the odds ratio interpretation when we exp(coef(fit)) exponentiate the coefficients. See this post’s section on “Interpreting the odds ratio” halfway down the page for more information. Again, it is more common in political science to see probabilities than odds ratios, but different disciplines prefer different quantities of interest.