11.4 Tobit Model aka Censored regression model
Dealing with outcomes that are “top-coded” or “bottom-coded” at a threshold value
Example: if income is top-coded as “above $250k”
\(Y_i\) = \[\begin{cases} Y_i^*, \; Y_i < 250 \\ \text{ above 250 } \; Y_i \geq 250 \end{cases}\]We are interested in \(Y_i^*\): actual income, not censored income. Problem- it’s unobserved for part of the sample
- Example: want to use SAT as measure of aptitude, but scores capped between 200 and 800
- Example: want to measure support for candidate but legal maximum for campaign donations is $5000
- Example: want to measure like-dislike of candy bars, but candy bars consumed bottom-coded at 0
- Note: in classic tobit model, censoring happens at zero
For elaboration in R, see this UCLA resource and tobit()
in the AER package
11.4.1 Tobit Model Assumptions
- Assume homoskedastic and normally distributed errors
- When data are censored at zero (clumping at zero), assume same underlying stochastic process to determine
- whether the response is zero or positive
- as well as the value of a positive response
- Any variable which increases the probability of a non-zero value must also increase the mean of positive values.
- Should generally be used in cases where the dependent variable could take on negative values
An alternative model discussed in the count data section: “two-part” and hurdle model–appropriate when the 0 is a “true zero”