3.3 Application: Is there racial discrimination in the labor market?
Marianne Bertrand and Sendhil Mullainathan. 2004. “Are Emily and Greg more employable than Lakisha and Jamal? A field experiment on labor market discrimination.”
“We perform a field experiment to measure racial discrimination in the labor market. We respond with fictitious resumes to help-wanted ads in Boston and Chicago newspapers.”
- Recruitment: Construct resumes to send to ads
- Randomization: To manipulate perception of race, each resume is (randomly) assigned
- Treatment: either a very African American sounding name
- Control: or a very White sounding name
- Outcome: Does the resume receive a callback?
- Comparison: Callback rates for African American (sounding) names vs. White (sounding) names (the difference in means between groups)
For a video explainer of the code in this section, see below. The video only discusses the code. Use the notes and lecture discussion for additional context. (Via youtube, you can speed up the playback to 1.5 or 2x speed.)
Let’s load the data. Note: When we have variables that are text-based categories, we may want to tell R to treat these “strings” of text information as factor variables, a particular type of variable that represents data as a set of nominal (unordered) or ordinal (ordered) categories. We do this with the stringsAsFactors
argument.
<- read.csv("resume.csv", stringsAsFactors = T) resume
<- read.csv("https://raw.githubusercontent.com/ktmccabe/teachingdata/main/resume.csv",
resume stringsAsFactors = T)
Variables and Description
firstname
: first name of the fictitious job applicantsex
: sex of applicant (female or male)race
: race of applicant (black or white)call
: whether a callback was made (1 = yes, 0 = no)
The data contain 4870 resumes and 4 variables.
nrow(resume) # number of rows
## [1] 4870
ncol(resume) # number of columns
## [1] 4
dim(resume) # number of rows and columns
## [1] 4870 4
Note: These data look a little different from what we used last week. For example, the sex
and race
variables contain words, not numbers.
head(resume)
## firstname sex race call
## 1 Allison female white 0
## 2 Kristen female white 0
## 3 Lakisha female black 0
## 4 Latonya female black 0
## 5 Carrie female white 0
## 6 Jay male white 0
3.3.1 Variable classes
We can check the class of each variable: Look, we have a new type, a “factor” variable.
class(resume$firstname)
## [1] "factor"
class(resume$sex)
## [1] "factor"
class(resume$race)
## [1] "factor"
class(resume$call)
## [1] "integer"
We have now encountered numeric, character
, and factor
vectors and/or variables in R. Note: This is simply how R understands them. Sometimes R can get it wrong. For example, if we write:
<- c("1", "3", "4")
somenumbers class(somenumbers)
## [1] "character"
Because we put our numbers in quotation marks, R thinks the values in somenumbers
are text. The number “3” might as well be the word “blue” for all R knows. Fortunately, we can easily switch between classes.
<- as.numeric(somenumbers)
somenumbers class(somenumbers)
## [1] "numeric"
Here, we used as.numeric()
to overwrite and change the character vector into a numeric vector.
Rules of Thumb
- Usually, we want
character
variables to store text (e.g., open-ended survey responses) - We want
numeric
variables to store numbers. - Usually, we want
factor
variables to store categories.- Within R, factor variables assign a number to each category, which is given a label or
level
in the form of text. - Categories might be ordinal or “ordered” (e.g., Very likely, Somewhat likely, Not likely) or
- Unordered (e.g., “male”, “female”)
- R won’t know if a factor variable is ordered or unordered. Alas, we have to be smarter than R.
- R might think you have a character variable when you want it to be a factor or the reverse.
- That’s when
as.factor()
andas.character()
are useful.
- That’s when
- Within R, factor variables assign a number to each category, which is given a label or
- Always check
class()
to find out the variable type