11.5 Additional Descriptive Statistics
Are the length of speeches changing? The nchar()
function tells you the number of characters in a “string.”
$speechlength <- nchar(speeches$sotu_text) speeches
Let’s plot the length of speeches over time and annotate with informative colors and labels.
Is the length of speeches changing?
plot(x=1:236, y= speeches$speechlength,
pch=15,
xaxt="n",
xlab="",
ylab = "Number of Characters")
## add x axis
axis(1, 1:236, labels=speeches$year, las=3, cex.axis=.7)
We can add color to distinguish written vs. spoken speeches
<- ifelse(speeches$sotu_type == "written", "black", "green3")
speechcolor plot(x=1:236, y= speeches$speechlength,
xaxt="n", pch=15,
xlab="",
ylab = "Number of Characters",
col = speechcolor)
## add x axis
axis(1, 1:236, labels=speeches$year, las=3, cex.axis=.7)
## add legend
legend("topleft", c("spoken", "written"),
pch=15,
col=c("green3", "black"), bty="n")
11.5.1 Dictionary Analysis
We can characterize the content of speeches in different ways. For example, we can see if speeches mention specific words, such as `“terrorism.”
- The function
grepl()
lets you search for a pattern of text in a character string - The function
str_detect()
works similarly with the opposite order of inputs
$terrorism <- ifelse(grepl("terror", speeches$sotu_text), 1,0)
speeches$terrorism2 <- ifelse(str_detect(speeches$sotu_text,"terror"), 1,0) speeches
sort(tapply(speeches$terrorism, speeches$president, sum),
decreasing=T)[1:10]
## George W. Bush William J. Clinton Barack Obama
## 8 8 7
## Ronald Reagan Franklin D. Roosevelt Andrew Jackson
## 6 4 2
## Chester A. Arthur Grover Cleveland Harry S Truman
## 2 2 2
## Jimmy Carter
## 2
We can characterize the content of speeches in different ways. For example, we can see if speeches mention specific words, such as “terrorism.”
- The function
str_count()
counts the number of times a piece of text appears in a character string
$terrorismcount <- str_count(speeches$sotu_text, "terror") speeches
sort(tapply(speeches$terrorismcount, speeches$president, sum),
decreasing=T)[1:10]
## George W. Bush Barack Obama William J. Clinton
## 171 37 29
## Ronald Reagan Franklin D. Roosevelt Lyndon B. Johnson
## 10 6 5
## Harry S Truman Jimmy Carter Andrew Jackson
## 3 3 2
## Chester A. Arthur
## 2
We can add multiple words with the | operator. This is often called a “dictionary analysis.”
$warcount <- str_count(speeches$sotu_text,
speeches"terror|war|military|drone")
sort(tapply(speeches$warcount, speeches$president, sum), decreasing=T)[1:10]
## Harry S Truman Theodore Roosevelt Franklin D. Roosevelt
## 554 481 441
## James K. Polk Jimmy Carter Dwight D. Eisenhower
## 390 348 332
## William McKinley George W. Bush Grover Cleveland
## 324 323 257
## Ulysses S. Grant
## 233
What are possible limitations of this analysis?