3.7 Subsetting data in R

Subsetting Dataframes in R

Maybe we are interested in differences in callbacks for females. One approach for looking at the treatment effect for female applicants, only, is to subset our data to include only female names.

  • To do this, we will assign a new data.frame object that keeps only those rows where sex == "female" and retains all columns
  • Below are two approaches for this subsetting, one that uses brackets and one that uses the subset function
## option one
females <- resume[resume$sex == "female", ]
## option two using subset()- preferred
females <- subset(resume, sex == "female")

Now that we have subset the data, this simplifies estimating the ATE for female applicants only.

We said the ATE = \(\bar{Y}(treatment) - \bar{Y}(control)\)

ate.females <- mean(females$call[females$race == "black"]) -
  mean(females$call[females$race == "white"])
ate.females
## [1] -0.03264689

3.7.1 Getting Booooooooolean

We can make this slightly more complex by adding more criteria. Let’s say we wanted to know the callback rates for just female black (sounding) names.

  • R allows use to use & (and) and | (or)
femaleblack <- subset(resume, sex == "female" & race == "black")

We could now find the callback rate for Black females using the tools from above:

mean(femaleblack$call)
## [1] 0.06627784