11.4 Tobit Model aka Censored regression model

Dealing with outcomes that are “top-coded” or “bottom-coded” at a threshold value

Example: if income is top-coded as “above $250k”

\(Y_i\) = \[\begin{cases} Y_i^*, \; Y_i < 250 \\ \text{ above 250 } \; Y_i \geq 250 \end{cases}\]

We are interested in \(Y_i^*\): actual income, not censored income. Problem- it’s unobserved for part of the sample

  • Example: want to use SAT as measure of aptitude, but scores capped between 200 and 800
  • Example: want to measure support for candidate but legal maximum for campaign donations is $5000
  • Example: want to measure like-dislike of candy bars, but candy bars consumed bottom-coded at 0
  • Note: in classic tobit model, censoring happens at zero

For elaboration in R, see this UCLA resource and tobit() in the AER package

11.4.1 Tobit Model Assumptions

  • Assume homoskedastic and normally distributed errors
  • When data are censored at zero (clumping at zero), assume same underlying stochastic process to determine
    • whether the response is zero or positive
    • as well as the value of a positive response
    • Any variable which increases the probability of a non-zero value must also increase the mean of positive values.
  • Should generally be used in cases where the dependent variable could take on negative values

An alternative model discussed in the count data section: “two-part” and hurdle model–appropriate when the 0 is a “true zero”