8.6 Application: Predicting Campaign Donations
Can we predict campaign donations?
Data from Barber, Michael J., Brandice Canes‐Wrone, and Sharece Thrower. “Ideologically sophisticated donors: Which candidates do individual contributors finance?.” American Journal of Political Science 61.2 (2017): 271-288
load("donationdata.RData")
Variables
donation
: 1=made donation to senator, 0=no donation madetotal_donation
: Dollar amount of donation made by donor to Senatorsameparty
: 1=self-identifies as being in the candidate’s party; 0 otherwiseNetWorth
: Donor’s net worth. 1=less than 250k, 2=250-500k; 3=500k-1m; 4=1-2.5m; 5=2.5-5m; 6=5-10m; 7=more than 10mIncomeLastYear
: Donor’s household annual income in 2013. 1=less than 50k; 2=50-100k; 3=100-125k; 4=125-150k; 5=150-250k; 6=250-300k; 7=300-350k; 8=350-400k; 9=400-500k; 10=more than 500kperagsen
: percent issue agreement between donor and senatorper2agchal
: percent issue agreement between donor and the senator’s challengercook
: Cook competitiveness score for the senator’s race. 1 = Solid Dem or Solid Rep; 2 = Likelymatchcommf
: 1=Senator committee matches donor’s profession as reported in FEC file; 0=otherwiseEdsum
: Donor’s self-described educational attainment. 1=less than high school; 2=high school; 3=some college; 4=2-year college degree; 5=4-year college degree; 6=graduate degree
Data represent information on past donors to campaigns across different states. The key dependent variable that we want to predict is total_donation
: the total dollar amount a particular person in the data gave to their senator in the 2012 election campaign.
Can we predict how much someone donates to a U.S. Senate campaign?
- Choose approach: regression of donations on donor characteristics
- Check accuracy: calculate root-mean-squared error
- Iterate: try different regression model specifications
Let’s try a prediction based on a person’s income.
<- lm(total_donation ~ IncomeLastYear, data = donationdata) fit
From this, we can
- Plot the relationship
- Make specific predictions at different levels of income
- Check accuracy by calculating the prediction errors and RMSE
8.6.1 Visualizing the results
Note that the correlation is a bit weaker here.
plot(x=donationdata$IncomeLastYear,
y=donationdata$total_donation,
ylab= "Total Donation ($)",
xlab = "Income Last Year",
main = "Predicting Total Donations Using Income")
abline(fit, col="green4", lwd=2)
8.6.2 Step 1: Calculate Predictions
We can calculate predictions based on a level of income. Example: Level 5 of income represents an income of $150k-250k. What level of donation would we expect?
predict(fit, data.frame(IncomeLastYear = 5))
## 1
## 348.8581
## alternative using coef()
coef(fit)[1] + coef(fit)["IncomeLastYear"]*5
## (Intercept)
## 348.8581
8.6.3 Step 2: Check Accuracy
We can calculate the Root Mean Squared Error
sigma(fit)
## [1] 915.2528
8.6.4 Step 3: Iterate
YOUR TURN: Change the model and see if it improves the prediction using RMSE using sigma
.
8.6.5 Adding Model Predictors
New Model Example
<- lm(total_donation ~ IncomeLastYear + NetWorth + sameparty,
fitnew data=donationdata)
New Predictions: note how we add more variables
predict(fitnew, data.frame(IncomeLastYear = 5, NetWorth = 4, sameparty = 1))
## 1
## 406.9705
## alternative using coef()
coef(fitnew)[1] + coef(fitnew)["IncomeLastYear"]*5 +
coef(fitnew)["NetWorth"]*4 + coef(fitnew)["sameparty"]*1
## (Intercept)
## 406.9705
Root Mean Squared Error
sigma(fitnew)
## [1] 910.4256
When we have multiple predictors, this changes our interpretation of the coefficients slightly.
- We now interpret the slope as the change in the outcome expected with a 1-unit change in the independent variable– holding all other variables constant (or ``controlling” for all other variables)
- For example, for a 1-unit change in Income, we would expect about a $68 increase in estimated donations, holding constant Net Worth and whether the person shared partisanship with the senator.
coef(fitnew)
## (Intercept) IncomeLastYear NetWorth sameparty
## -242.02780 67.96825 29.55847 190.92323
Think of this like a set of light switches. How does adjusting one light switch affect the light in the room– holding constant all other switches.
When we make predictions with multiple variables, we have to tell R where we want to set each variable’s value.
predict(fitnew, data.frame(IncomeLastYear = 5, NetWorth = 4, sameparty = 1))
## 1
## 406.9705
See how the prediction changes if you shift IncomeLastYear
but keep Net Worth and partisanship where they are. That’s the idea of “controlling” for the other variables!
How could we keep improving the predictions?
Eventually, we would want to apply this prediction model in a real-world setting.
- How could campaigns use these types of prediction models?