14.1 Survival Overview
Survival data is also known as event history or duration data analysis. When we have this type of data, we are generally interested in these types of questions:
- How long does something last?
- Examples: length of conflict, length of peace agreement, length of congressional career
- What is the expected time to an event?
- And how does this duration differ across subgroups?
- How do covariates influence this duration?
There are two key components to survival data: time (e.g., days, months, years. etc) and the event of interest or “status” (i.e., whether an event has occurred). Canonically this could be an event such as death, but in political science this might be an event like the experience of a conflict, end of conflict, end of regime, etc.)
14.1.1 Survival and hazard functions
For example, we might call something “survival” analysis in a context where we were interested in the time to death of someone who has just had a particular medical diagnosis.
We will have two primary components: the survival function \(S(Y)\):
- This gives the probability that the duration of survival (time to event) is longer than some specified time (\(Pr(Y_i > y)\))
Time to failure \(Y_i\) as outcome: \(Y_i \geq 0\)
- \(S(y) = Pr(Y_i > y) = 1 - Pr(Y_i \leq y)\)
- where \(Pr(Y_i \leq y)\) is the CDF
- PDF \(f(y) = - \frac{d}{dy} S(y)\)
- This is nondecreasing.
## Example of S(y) according to the Weibull distribution
## Note: It is 1-pweibull()
<- seq(0, 100, 1)
y <- pweibull(y, shape=2, scale=50)
Fy plot(x=y, y=(1-Fy), type="l")
More on the Weibull distribution here.
And the hazard function \(h(y)\): Given survival up until \(y\), the instantaneous rate at which something fails (that the event occurs).
- \(h(y)=\frac{f(y)}{S(y)} =- \frac{d}{dy} \log S(y)\)
- \(S(y) = exp (-\int_0^y h(t)dt)\)
Note: hazard rate is not exactly a probability and is difficult to interpret, but higher hazard rates reflect greater likelihood of failure.
14.1.2 Censoring
In survival data, right-censoring of the data is common
- E.g., An observation does not experience an event during the study period, but the study period ends.
- Example: How many years does it take to finish grad school? If we cut off our period of study right now, you might be a censored observation. You are going to finish grad school at some point, but we have not observed that during our observation period.
- Can address by assumption: Given covariates, hazard rates of those who are censored \(C_i\) do not systematically differ from those who are not: \(Y_i \perp C_i | X_i\). \(Y_i\) independent of censoring status, conditional on covariates.