11.5 Additional Descriptive Statistics

Are the length of speeches changing? The nchar() function tells you the number of characters in a “string.”

speeches$speechlength <- nchar(speeches$sotu_text)

Let’s plot the length of speeches over time and annotate with informative colors and labels.

Is the length of speeches changing?

plot(x=1:236, y= speeches$speechlength, 
    pch=15,
     xaxt="n",
     xlab="", 
     ylab = "Number of Characters")

## add x axis
axis(1, 1:236, labels=speeches$year, las=3, cex.axis=.7)

We can add color to distinguish written vs. spoken speeches

speechcolor <- ifelse(speeches$sotu_type == "written", "black", "green3")
plot(x=1:236, y= speeches$speechlength, 
     xaxt="n", pch=15,
     xlab="", 
     ylab = "Number of Characters",
     col = speechcolor)

## add x axis
axis(1, 1:236, labels=speeches$year, las=3, cex.axis=.7)

## add legend
legend("topleft", c("spoken", "written"), 
       pch=15, 
       col=c("green3", "black"), bty="n")

11.5.1 Dictionary Analysis

We can characterize the content of speeches in different ways. For example, we can see if speeches mention specific words, such as `“terrorism.”

  • The function grepl() lets you search for a pattern of text in a character string
  • The function str_detect() works similarly with the opposite order of inputs
speeches$terrorism <- ifelse(grepl("terror", speeches$sotu_text), 1,0)
speeches$terrorism2 <- ifelse(str_detect(speeches$sotu_text,"terror"), 1,0)
sort(tapply(speeches$terrorism, speeches$president, sum), 
     decreasing=T)[1:10]
##        George W. Bush    William J. Clinton          Barack Obama 
##                     8                     8                     7 
##         Ronald Reagan Franklin D. Roosevelt        Andrew Jackson 
##                     6                     4                     2 
##     Chester A. Arthur      Grover Cleveland        Harry S Truman 
##                     2                     2                     2 
##          Jimmy Carter 
##                     2

We can characterize the content of speeches in different ways. For example, we can see if speeches mention specific words, such as “terrorism.”

  • The function str_count() counts the number of times a piece of text appears in a character string
speeches$terrorismcount <- str_count(speeches$sotu_text, "terror")
sort(tapply(speeches$terrorismcount, speeches$president, sum), 
     decreasing=T)[1:10]
##        George W. Bush          Barack Obama    William J. Clinton 
##                   171                    37                    29 
##         Ronald Reagan Franklin D. Roosevelt     Lyndon B. Johnson 
##                    10                     6                     5 
##        Harry S Truman          Jimmy Carter        Andrew Jackson 
##                     3                     3                     2 
##     Chester A. Arthur 
##                     2

We can add multiple words with the | operator. This is often called a “dictionary analysis.”

speeches$warcount <- str_count(speeches$sotu_text, 
                               "terror|war|military|drone")
sort(tapply(speeches$warcount, speeches$president, sum), decreasing=T)[1:10]
##        Harry S Truman    Theodore Roosevelt Franklin D. Roosevelt 
##                   554                   481                   441 
##         James K. Polk          Jimmy Carter  Dwight D. Eisenhower 
##                   390                   348                   332 
##      William McKinley        George W. Bush      Grover Cleveland 
##                   324                   323                   257 
##      Ulysses S. Grant 
##                   233

What are possible limitations of this analysis?