11.4 Tobit Model aka Censored regression model

Dealing with outcomes that are “top-coded” or “bottom-coded” at a threshold value

Example: if income is top-coded as “above $250k”

$Y_i$ = \[\begin{cases} Y_i^*, \; Y_i < 250 \\ \text{ above 250 } \; Y_i \geq 250 \end{cases}\]

We are interested in $Y_i^*$: actual income, not censored income. Problem- it’s unobserved for part of the sample

Example: want to use SAT as measure of aptitude, but scores capped between 200 and 800
Example: want to measure support for candidate but legal maximum for campaign donations is $5000
Example: want to measure like-dislike of candy bars, but candy bars consumed bottom-coded at 0
Note: in classic tobit model, censoring happens at zero

For elaboration in R, see this UCLA resource and tobit() in the AER package

Assume homoskedastic and normally distributed errors
When data are censored at zero (clumping at zero), assume same underlying stochastic process to determine
- whether the response is zero or positive
- as well as the value of a positive response
- Any variable which increases the probability of a non-zero value must also increase the mean of positive values.
Should generally be used in cases where the dependent variable could take on negative values

An alternative model discussed in the count data section: “two-part” and hurdle model–appropriate when the 0 is a “true zero”