8.3 Fitting Ordinal Models in R
To fit an ordinal logit or probit model in R, we can use the MASS
package and polr
function.
You may need to install this in your RStudio the first time you use it.
install.packages("MASS")
When we are ready to run the model, we then open the package and use the function polr
. The first inputs are very similar to lm
and glm
. However, we add an argument specifying the method =
which can be “logistic” (the default) or “probit.” We also specify Hess=T
to make sure we get the uncertainty estimates with the results.
- By the way, why would that argument be called
Hess
?
library(MASS)
<- polr(as.factor(dissent) ~ treat, data= pg, Hess = T, method = "probit")
fit summary(fit)
Call:
polr(formula = as.factor(dissent) ~ treat, data = pg, Hess = T,
method = "probit")
Coefficients:
Value Std. Error t value
treat 0.2645 0.1021 2.589
Intercepts:
Value Std. Error t value
1|2 -0.5102 0.0761 -6.7017
2|3 -0.0576 0.0740 -0.7784
3|4 0.2697 0.0742 3.6344
Residual Deviance: 1253.844
AIC: 1261.844
(116 observations deleted due to missingness)
To get p-values, we can use the AER
package.
library(AER)
Warning: package 'zoo' was built under R version 4.0.2
Warning: package 'sandwich' was built under R version 4.0.2
coeftest(fit)
z test of coefficients:
Estimate Std. Error z value Pr(>|z|)
treat 0.264455 0.102143 2.5891 0.0096238 **
1|2 -0.510237 0.076135 -6.7017 2.06e-11 ***
2|3 -0.057570 0.073958 -0.7784 0.4363235
3|4 0.269665 0.074198 3.6344 0.0002786 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Or calculate manually
round(2*pnorm(abs(summary(fit)$coefficients[,3]),
lower.tail = F), digits=6)
treat 1|2 2|3 3|4
0.009624 0.000000 0.436323 0.000279
For details on how to fit this manually in R using optim
, expand.
# Grab X and Y, listwise deletion
<- na.omit(cbind(pg$dissent, pg$treat))
pgsub <- pgsub[,1]
Y <- pgsub[, 2]
X
## Log likelihood in R for this example
<- function(par, Y, X){
llikor <- 1 # X columns
k <- 4 # categories
j <- par[1:k]
beta <- par[(k+1):length(par)]
zeta <-cbind(X)
X <- cbind(beta)
beta ## linear predictor
<- X %*% beta
eta ## indicator variables
<- ifelse(Y==1, 1, 0)
y1 <- ifelse(Y==2, 1, 0)
y2 <- ifelse(Y==3, 1, 0)
y3 <- ifelse(Y==4, 1, 0)
y4 ## likelihood for each category
<- y1*log(pnorm(zeta[1] - eta))
l1 <- y2*log(pnorm(zeta[2] - eta) - pnorm(zeta[1] - eta))
l2 <- y3*log(pnorm(zeta[3] - eta) - pnorm(zeta[2] - eta))
l3 <- y4*log(1 - pnorm(zeta[3] - eta))
l4 ## sum together
<- sum(c(l1, l2, l3, l4))
llik return(llik)
}
## Optimize
<- optim(par=c(.2, 1, 2, 3), fn=llikor, X=X, Y=Y, method="BFGS",
opt.out control=list(fnscale=-1), hessian = T)
## Coefficients
$par[1] opt.out
[1] 0.2644911
coef(fit)
treat
0.2644546
## Cutpoints
$par[2:4] opt.out
[1] -0.51019872 -0.05757855 0.26969063
$zeta fit
1|2 2|3 3|4
-0.5102370 -0.0575699 0.2696646
## Standard errors
sqrt(diag(solve(- opt.out$hessian)))
[1] 0.10214310 0.07613473 0.07395763 0.07419776
sqrt(diag(vcov(fit)))
treat 1|2 2|3 3|4
0.10214309 0.07613503 0.07395770 0.07419775
8.3.1 Quantities of Interest
In discussing the results of Table 3 from the ordered probit analysis in the paper, the authors note, “A shift of .26 probits implies, for example, that a health group respondent with a 30% chance of strongly agreeing to dissent would move to a 40% chance if assigned to the reconciliation program group. This is a large, but not implausibly large, shift in opinion.”
Similar to the binary case, we can convert our estimates to probability, but here we will do it for specific categories. Let’s first conduct an estimate of the difference in probability of being in the top category of dissent for those in the treatment - control conditions.
We will follow the formulas outlined above. For the top category (category 4 here) we have:
- \(Pr(Y_i = C_4) = 1 - Pr(Y^* \leq \zeta_{4-1} | X_i)\) where our estimate of \(Y* = X\hat \beta\).
To calculate this we take \(1 -\) pnorm
\((\zeta_{4-1} - X\hat \beta)\)
- Note that we use
pnorm
because this is a probit model. If it were a logistic model, we would useplogis
instead. - We are estimating the probability that our observation is in category 4, when setting \(X\) to particular values. In this case, we will estimate the probability when an observation is in the treatment or control group.
Let’s find \(X\)
<- model.matrix(fit)
X head(X)
(Intercept) treat
1 1 0
2 1 0
3 1 0
4 1 0
5 1 0
6 1 0
Let’s estimate the probability when someone is in the treatment group vs. control group. For this, we will set X to be 1 or 0. Because we only have one coefficient in our model, we actually don’t need the entire matrix. For example, once we set treat
to be 1, all observations (rows) are identical to each other:
"treat"] <- 1 # in treatment
X[, head(X)
(Intercept) treat
1 1 1
2 1 1
3 1 1
4 1 1
5 1 1
6 1 1
So instead of estimating the outcome for every single row and then take the mean, we could just estimate the outcome for one row given that all rows are identical. In regressions where you have other covariates, those covariates will likely have different values, making the rows distinct. This is a special case where the below code would give identical results.
<- cbind(1,1) # A 1 for the intercept and 1 for the treatment
Xt <- cbind(1,0) # A 1 for the intercept and 0 for the treatment (being in the control) Xc
By default, R will include an Intercept in the model.matrix
. We actually do not want this when calculating ordinal regression quantities of interest because the cutpoints will be serving the purpose of the intercept. So we will remove this from \(X\), but we will use as.matrix()
to make sure R will still treat the object like a matrix.
- If you instead used the full \(X\) matrix, the only extra step you would need to do is to take the
mean()
of the probability estimates below (e.g., the mean ofpp1.4
). See the practice problem solution in 8.5 for an example.
<- as.matrix(X[, -1])
X head(X)
[,1]
1 1
2 1
3 1
4 1
5 1
6 1
<- as.matrix(Xt[, -1])
Xt Xt
[,1]
[1,] 1
<- as.matrix(Xc[, -1])
Xc Xc
[,1]
[1,] 0
## probability of being in top category
<- fit$zeta[3] # grab the top cutpoint J - 1, 4-1=3
cutpoint3 <- coef(fit) # coefficients
b
.4 <- 1 - pnorm(cutpoint3 - Xt%*%b) # prob of top category for treatment
pp1.4 pp1
[,1]
[1,] 0.4979215
.4 <- 1 - pnorm(cutpoint3 - Xc%*%b) # prob of top category for control
pp0.4 pp0
[,1]
[1,] 0.3937091
## difference
.4 - pp0.4 pp1
[,1]
[1,] 0.1042124
We recover that there is an 10 percentage point difference in probability of being in the top category of dissent, predicted for those in the treatment vs. control.
If we were interested in other categories, we could similarly compare probabilities within those. For example, what about category 3?
- Here we just apply a different formula from 8.2: \(Pr(Y_i = C_3) = Pr(Y^* \leq \zeta_{3} | X_i) - Pr(Y^* \leq \zeta_{2} | X_i)\) where our estimate of \(Y^* = X\hat \beta\).
To calculate this we take pnorm
\((\zeta_{3} - X\hat \beta)\) - pnorm
\((\zeta_{2} - X\hat \beta)\)
<- fit$zeta[3]
cutpoint3 <- fit$zeta[2]
cutpoint2
.3 <- pnorm(cutpoint3 - Xt%*%b) - pnorm(cutpoint2 - Xt%*%b)
pp1.3 pp1
[,1]
[1,] 0.1283614
.3 <- pnorm(cutpoint3 - Xc%*%b) - pnorm(cutpoint2 - Xc%*%b)
pp0.3 pp0
[,1]
[1,] 0.1292452
.3 - pp0.3 pp1
[,1]
[1,] -0.000883825
These represent the estimated probabilities of being in category 3, or the estimated difference in probability of being in category 3 for the treatment vs. control condition.
What about category 2 or category 1?
Try on your own, and then expand for the solution.
- Category 2: Here we just apply a different formula from 8.2: \(Pr(Y_i = C_2) = Pr(Y^* \leq \zeta_{2} | X_i) - Pr(Y^* \leq \zeta_{1} | X_i)\).
- Category 1: Here we just apply a different formula from 8.2: \(Pr(Y_i = C_1) = Pr(Y^* \leq \zeta_{1} | X_i)\).
<- fit$zeta[1]
cutpoint1 <- fit$zeta[2]
cutpoint2
## category 2
.2 <- pnorm(cutpoint2 - Xt%*%b) - pnorm(cutpoint1 - Xt%*%b)
pp1.2 pp1
[,1]
[1,] 0.1544561
.2 <- pnorm(cutpoint2 - Xc%*%b) - pnorm(cutpoint1 - Xc%*%b)
pp0.2 pp0
[,1]
[1,] 0.1721029
.2 - pp0.2 pp1
[,1]
[1,] -0.01764679
## category 1 (bottom category)
.1 <- pnorm(cutpoint1 - Xt%*%b)
pp1.1 pp1
[,1]
[1,] 0.2192609
.1 <- pnorm(cutpoint1 - Xc%*%b)
pp0.1 pp0
[,1]
[1,] 0.3049427
.1 - pp0.1 pp1
[,1]
[1,] -0.08568175
We can use our R shortcuts for all categories. Here, to make sure we are in probabilities, we can use type="probs"
.
## predict
<- predict(fit, newdata = data.frame(treat = c(0,1)), type = "probs")
prs prs
1 2 3 4
1 0.3049427 0.1721029 0.1292452 0.3937091
2 0.2192609 0.1544561 0.1283614 0.4979215
The first row shows the probability of being in each category (1 through 4) for the first entry for the treat variable (0), implying that the observation was in the control group. The second row does the same thing for when the treat variable is 1, implying that the observation was in the treatment group. These should match up with our manual calculations.
The prediction
package will also work to some extent, which is nice when you have covariates you want to hold at observed values. Note that we put whatever the names of the levels of our variables are.
- It just so happens that in this case, both
treat
and ourdissent
outcomes have numbers for the categories. In other cases, you might have text as the category names, in which case you would enter those text values (e.g., “Somewhat Agree” or “Treatment condition”) as the variable or category labels.
## Prediction
library(prediction)
## specify which category
<- prediction(fit, at = list(treat = c(0,1)),
outp category = "4")
summary(outp)$Prediction
[1] 0.3937091 0.4979215
Finally, the margins
package should also work in many cases. You can compare this output with the pp1.4 - pp0.4
calculation from above.
library(margins)
<- summary(margins(fit, variables = "treat", change = c(0, 1),
marg.obj vce="bootstrap", category = "4"))
marg.obj
factor AME SE z p lower upper
treat 0.1042 0.0388 2.6877 0.0072 0.0282 0.1802
How can we visualize the results?
- Given our independent variable of interest,
treat
just has two categories, we could make a plot similar to the barplot at the beginning of the section. - When our independent variable of interest has several categories, things can get a bit messier. We will do an example in a weekly tutorial.
How can we get uncertainty?
- Like before, we can use bootstrap or simulation, or ask
Zelig
ormargins
to do it for us.
How can we incorporate covariates?
- Like before, we would just have a matrix of \(X\) covariates instead of a single column for the treatment.
- Like before, we could also calculate probabilities holding those covariates at observed values or setting them to designated values of interest.
How could we fit a logit version?
- We just switch the
method = "logistic"
, and then we should be careful to also useplogis()
in place ofpnorm
for the probability calculation. - Note: ordered logits also have the benefit of having the odds ratio interpretation when we
exp(coef(fit))
exponentiate the coefficients. See this post’s section on “Interpreting the odds ratio” halfway down the page for more information. Again, it is more common in political science to see probabilities than odds ratios, but different disciplines prefer different quantities of interest.