9.1 Overview of Nominal Data

Our goal in these models is generally to predict the probability of being in a particular category \(C_j\).

The model we will use to estimate this is the multinomial logit.

\(\log \frac{\pi_j}{\pi_J} = \eta_j\) where \(\eta_j = \alpha_j + X_1\beta_{1j} + X_2\beta_{2j} + ... X_k\beta_{kj}\)

Note that \(\pi_i\) (our probability) is now indexed by \(j\) or \(J\)
- This means we have more than just one set of estimates for \(\hat \beta\), depending on the category comparison we make
- (Recall that in ordinal logit we had to assume that one set of coefficients was sufficient. Here, we have \(J-1\) sets of coefficients.)
\(\beta_J = 0\) by design for identification (\(J\) represents the baseline category);
\(\sum_{j=1}^{J} \pi_j = 1\), The probabilities of being in each category, together, must sum to 1.
\(Y_i = C_j\) according to \(Y_{ij}^* = max(Y_{i1}^*, Y_{i2}^*, ..., Y_{ij}^*)\). The outcome belongs to the category that has the highest \(Y_i*\).
The probability of \(Y_i\) being in a particular category is:

\[\begin{align*} \pi_{ij} = Pr(Y_i = C_j | x_i) &= \frac{\exp(\mathbf x_i^T\beta_j)}{1 + \sum_{j=1}^{J-1} \exp(\mathbf x_i^T\beta_j)} \end{align*}\]

Here, similar to the ordinal log likelihood, we need to sum over all observations and all outcome categories to represent the joint probability:

\(\mathcal L(\beta, | Y) = \prod_i^N \prod_{j=1}^{J} \mathbf 1(Y_i=C_j){\pi_{i, j}}\)

\(\mathcal l(\beta, | Y) = \sum_i^N \sum_{j=1}^{J} \mathbf 1(Y_i=C_j){\log \pi_{i, j}}\)

Like we have done previously in likelihood, where \(\pi\) is a function of \(X\) and \(\beta\).