8.2 Likelihood Framework

In the binary case, we wanted to estimate \(Pr(Y_i = 1 | X)\). Our goal in an ordinal model is to estimate the probability that \(Y_i\) is in a particular \(j\) category \(C_j\).

\(Pr(Y_i = C_j | X)\)

To do so, we are going to use the cumulative distribution functions to estimate the probability that \(Y_i\) is below a particular cutpoint \(\zeta_j\) or between two cutpoints \(\zeta_j\) and \(\zeta_{j+1}\).

Finding predicted probabilities for a given \(j\) category can be written as follows: \(Pr(Y_i|X_i) = \mathbf 1(Y_i=C_j) \{ \Pr(Y^\ast \le \zeta_{j})- \Pr(Y^\ast \le \zeta_{j-1})\}\)

We can spell this out more explicitly for each \(j\) category:

  • \(Pr(Y_i = C_{J} | X) = 1 - Pr(Y^\ast\leq \zeta_{J - 1} | X_i)\)
  • \(Pr(Y_i = C_{3} | X) = Pr(Y^\ast \leq \zeta_{3} | X_i) - Pr(Y^\ast \leq \zeta_{2} | X_i)\)
  • \(Pr(Y_i = C_{2} | X) = Pr(Y^\ast \leq \zeta_{2} | X_i) - Pr(Y^\ast \leq \zeta_{1} | X_i)\)
  • \(Pr(Y_i = C_{1} | X) = Pr(Y^\ast \leq \zeta_{1} | X_i)\)

Just as in the binary case, we use our \(\Phi()\) pnorm or \(\frac{exp^{X\beta}}{1 + exp^{X\beta}}\) plogis functions to get our probabilities from the linear predictors. However, in this ordered case, we also have to include the estimate for the cutpoint \(\zeta_j\) when performing this operation. You can kind of think of this as having a separate intercept for each category instead of just one intercept in the binary case.

For example, in the ordinal logit case, the linear predictor is in the scale of log of the proportional-odds. We can write our regression as:

\(\log \frac{P(Y \leq j | X)}{P(Y > j | X)} = (\zeta_j - \eta)\) where \(\eta = x_1\beta_1 + x_2\beta_2 + ... + x_k\beta_k\)

  • To get probability we apply the plogis function \(logit^{-1}(\zeta_j - \eta)\)
  • Same for probit, but we use pnorm: \(probit^{-1}(\zeta_j - \eta)\)

8.2.1 Likelihood

The likelihood of all observations, assuming independence is:

\(\mathcal L(\beta, \zeta | Y) = \prod_{i=1}^{N} Pr(Y_i = C_j)\) for a given category. To incorporate all \(J\) categories, we can write:

\(\mathcal L(\beta, \zeta | Y) = \prod_{i=1}^{N} \prod_{j=1}^{J} \mathbf 1(Y_i=C_j) \{ \Pr(Y^\ast \le \zeta_{j}) - \Pr(Y^\ast \le \zeta_{j-1})\}\) where \(\mathbf 1(Y_i=C_j)\) is an indicator for whether or not a given \(Y_i\) is observed in the \(jth\) category.

Note that here instead of estimating just \(\beta\), we now also estimate the cutpoints \(\zeta\).

The log likelihood then just changes this to the sum:

\[\begin{align*} \mathcal l(\beta, \zeta | Y) &= \sum_{i=1}^{N} \sum_{j=1}^{J}\mathbf 1(Y_i=C_j) \{ \log( \Pr(Y^\ast \le \zeta_{j}) - \Pr(Y^\ast \le \zeta_{j-1}))\}\\ &= \sum_{i=1}^{N} \sum_{j=1}^{J} \mathbf 1(Y_i=C_j) \{\log(\Phi(\zeta_j - \mathbf x_i'\beta) - \Phi(\zeta_{j-1} - \mathbf x_i'\beta))\} \end{align*}\]

In addition to assuming independence of our observations, we assume each category has a nonzero probability of being observed and that the cutpoints are monotonically increasing: \(\zeta_j < \zeta_{j+1}\).