3.3 Application: Is there racial discrimination in the labor market?

Marianne Bertrand and Sendhil Mullainathan. 2004. “Are Emily and Greg more employable than Lakisha and Jamal? A field experiment on labor market discrimination.”

“We perform a field experiment to measure racial discrimination in the labor market. We respond with fictitious resumes to help-wanted ads in Boston and Chicago newspapers.”

Recruitment: Construct resumes to send to ads
Randomization: To manipulate perception of race, each resume is (randomly) assigned
Treatment: either a very African American sounding name
Control: or a very White sounding name
Outcome: Does the resume receive a callback?
Comparison: Callback rates for African American (sounding) names vs. White (sounding) names (the difference in means between groups)

For a video explainer of the code in this section, see below. The video only discusses the code. Use the notes and lecture discussion for additional context. (Via youtube, you can speed up the playback to 1.5 or 2x speed.)

Let’s load the data. Note: When we have variables that are text-based categories, we may want to tell R to treat these “strings” of text information as factor variables, a particular type of variable that represents data as a set of nominal (unordered) or ordinal (ordered) categories. We do this with the stringsAsFactors argument.

resume <- read.csv("resume.csv", stringsAsFactors = T)

resume <- read.csv("https://raw.githubusercontent.com/ktmccabe/teachingdata/main/resume.csv",
                   stringsAsFactors = T)

Variables and Description

firstname: first name of the fictitious job applicant
sex: sex of applicant (female or male)
race: race of applicant (black or white)
call: whether a callback was made (1 = yes, 0 = no)

The data contain 4870 resumes and 4 variables.

nrow(resume) # number of rows

## [1] 4870

ncol(resume) # number of columns

## [1] 4

dim(resume) # number of rows and columns

## [1] 4870    4

Note: These data look a little different from what we used last week. For example, the sex and race variables contain words, not numbers.

head(resume)

##   firstname    sex  race call
## 1   Allison female white    0
## 2   Kristen female white    0
## 3   Lakisha female black    0
## 4   Latonya female black    0
## 5    Carrie female white    0
## 6       Jay   male white    0

3.3.1 Variable classes

We can check the class of each variable: Look, we have a new type, a “factor” variable.

class(resume$firstname)

## [1] "factor"

class(resume$sex)

## [1] "factor"

class(resume$race)

## [1] "factor"

class(resume$call)

## [1] "integer"

We have now encountered numeric, character, and factor vectors and/or variables in R. Note: This is simply how R understands them. Sometimes R can get it wrong. For example, if we write:

somenumbers <- c("1", "3", "4")
class(somenumbers)

## [1] "character"

Because we put our numbers in quotation marks, R thinks the values in somenumbers are text. The number “3” might as well be the word “blue” for all R knows. Fortunately, we can easily switch between classes.

somenumbers <- as.numeric(somenumbers)
class(somenumbers)

## [1] "numeric"

Here, we used as.numeric() to overwrite and change the character vector into a numeric vector.

Rules of Thumb

Usually, we want character variables to store text (e.g., open-ended survey responses)
We want numeric variables to store numbers.
Usually, we want factor variables to store categories.
- Within R, factor variables assign a number to each category, which is given a label or level in the form of text.
- Categories might be ordinal or “ordered” (e.g., Very likely, Somewhat likely, Not likely) or
- Unordered (e.g., “male”, “female”)
- R won’t know if a factor variable is ordered or unordered. Alas, we have to be smarter than R.
- R might think you have a character variable when you want it to be a factor or the reverse.
  - That’s when as.factor() and as.character() are useful.
Always check class() to find out the variable type