3.7 Subsetting data in R
Subsetting Dataframes in R
Maybe we are interested in differences in callbacks for females. One approach for looking at the treatment effect for female applicants, only, is to subset our data to include only female names.
- To do this, we will assign a new
data.frame
object that keeps only those rows wheresex == "female"
and retains all columns - Below are two approaches for this subsetting, one that uses brackets and one that uses the
subset
function
## option one
<- resume[resume$sex == "female", ]
females ## option two using subset()- preferred
<- subset(resume, sex == "female") females
Now that we have subset the data, this simplifies estimating the ATE for female applicants only.
We said the ATE = \(\bar{Y}(treatment) - \bar{Y}(control)\)
<- mean(females$call[females$race == "black"]) -
ate.females mean(females$call[females$race == "white"])
ate.females
## [1] -0.03264689
3.7.1 Getting Booooooooolean
We can make this slightly more complex by adding more criteria. Let’s say we wanted to know the callback rates for just female black (sounding) names.
- R allows use to use
&
(and) and|
(or)
<- subset(resume, sex == "female" & race == "black") femaleblack
We could now find the callback rate for Black females using the tools from above:
mean(femaleblack$call)
## [1] 0.06627784