9.1 Overview of Nominal Data
Our goal in these models is generally to predict the probability of being in a particular category \(C_j\).
The model we will use to estimate this is the multinomial logit.
- This is a generalization of binary and ordered logit.
- Coefficients defined relative to baseline outcome category \(J\):
\(\log \frac{\pi_j}{\pi_J} = \eta_j\) where \(\eta_j = \alpha_j + X_1\beta_{1j} + X_2\beta_{2j} + ... X_k\beta_{kj}\)
- Note that \(\pi_i\) (our probability) is now indexed by \(j\) or \(J\)
- This means we have more than just one set of estimates for \(\hat \beta\), depending on the category comparison we make
- (Recall that in ordinal logit we had to assume that one set of coefficients was sufficient. Here, we have \(J-1\) sets of coefficients.)
- \(\beta_J = 0\) by design for identification (\(J\) represents the baseline category);
- \(\sum_{j=1}^{J} \pi_j = 1\), The probabilities of being in each category, together, must sum to 1.
- \(Y_i = C_j\) according to \(Y_{ij}^* = max(Y_{i1}^*, Y_{i2}^*, ..., Y_{ij}^*)\). The outcome belongs to the category that has the highest \(Y_i*\).
- The probability of \(Y_i\) being in a particular category is:
\[\begin{align*} \pi_{ij} = Pr(Y_i = C_j | x_i) &= \frac{\exp(\mathbf x_i^T\beta_j)}{1 + \sum_{j=1}^{J-1} \exp(\mathbf x_i^T\beta_j)} \end{align*}\]
9.1.1 Multinomial Likelihood
Here, similar to the ordinal log likelihood, we need to sum over all observations and all outcome categories to represent the joint probability:
\(\mathcal L(\beta, | Y) = \prod_i^N \prod_{j=1}^{J} \mathbf 1(Y_i=C_j){\pi_{i, j}}\)
\(\mathcal l(\beta, | Y) = \sum_i^N \sum_{j=1}^{J} \mathbf 1(Y_i=C_j){\log \pi_{i, j}}\)
Like we have done previously in likelihood, where \(\pi\) is a function of \(X\) and \(\beta\).