8.6 Application: Predicting Campaign Donations

Can we predict campaign donations?

Data from Barber, Michael J., Brandice Canes‐Wrone, and Sharece Thrower. “Ideologically sophisticated donors: Which candidates do individual contributors finance?.” American Journal of Political Science 61.2 (2017): 271-288

load("donationdata.RData")

Variables

  • donation: 1=made donation to senator, 0=no donation made
  • total_donation: Dollar amount of donation made by donor to Senator
  • sameparty: 1=self-identifies as being in the candidate’s party; 0 otherwise
  • NetWorth: Donor’s net worth. 1=less than 250k, 2=250-500k; 3=500k-1m; 4=1-2.5m; 5=2.5-5m; 6=5-10m; 7=more than 10m
  • IncomeLastYear: Donor’s household annual income in 2013. 1=less than 50k; 2=50-100k; 3=100-125k; 4=125-150k; 5=150-250k; 6=250-300k; 7=300-350k; 8=350-400k; 9=400-500k; 10=more than 500k
  • peragsen: percent issue agreement between donor and senator
  • per2agchal: percent issue agreement between donor and the senator’s challenger
  • cook: Cook competitiveness score for the senator’s race. 1 = Solid Dem or Solid Rep; 2 = Likely
  • matchcommf: 1=Senator committee matches donor’s profession as reported in FEC file; 0=otherwise
  • Edsum: Donor’s self-described educational attainment. 1=less than high school; 2=high school; 3=some college; 4=2-year college degree; 5=4-year college degree; 6=graduate degree

Data represent information on past donors to campaigns across different states. The key dependent variable that we want to predict is total_donation: the total dollar amount a particular person in the data gave to their senator in the 2012 election campaign.

Can we predict how much someone donates to a U.S. Senate campaign?

  1. Choose approach: regression of donations on donor characteristics
  2. Check accuracy: calculate root-mean-squared error
  3. Iterate: try different regression model specifications

Let’s try a prediction based on a person’s income.

fit <- lm(total_donation ~ IncomeLastYear, data = donationdata)

From this, we can

  • Plot the relationship
  • Make specific predictions at different levels of income
  • Check accuracy by calculating the prediction errors and RMSE

8.6.1 Visualizing the results

Note that the correlation is a bit weaker here.

plot(x=donationdata$IncomeLastYear, 
     y=donationdata$total_donation,
     ylab= "Total Donation ($)",
     xlab = "Income Last Year",
     main = "Predicting Total Donations Using Income")
abline(fit, col="green4", lwd=2)

8.6.2 Step 1: Calculate Predictions

We can calculate predictions based on a level of income. Example: Level 5 of income represents an income of $150k-250k. What level of donation would we expect?

predict(fit, data.frame(IncomeLastYear = 5))
##        1 
## 348.8581
## alternative using coef()
coef(fit)[1] + coef(fit)["IncomeLastYear"]*5
## (Intercept) 
##    348.8581

8.6.3 Step 2: Check Accuracy

We can calculate the Root Mean Squared Error

sigma(fit)
## [1] 915.2528

8.6.4 Step 3: Iterate

YOUR TURN: Change the model and see if it improves the prediction using RMSE using sigma.

8.6.5 Adding Model Predictors

New Model Example

fitnew <- lm(total_donation ~ IncomeLastYear + NetWorth + sameparty, 
             data=donationdata)

New Predictions: note how we add more variables

predict(fitnew, data.frame(IncomeLastYear = 5, NetWorth = 4, sameparty = 1))
##        1 
## 406.9705
## alternative using coef()
coef(fitnew)[1] + coef(fitnew)["IncomeLastYear"]*5 + 
  coef(fitnew)["NetWorth"]*4 + coef(fitnew)["sameparty"]*1
## (Intercept) 
##    406.9705

Root Mean Squared Error

sigma(fitnew)
## [1] 910.4256

When we have multiple predictors, this changes our interpretation of the coefficients slightly.

  • We now interpret the slope as the change in the outcome expected with a 1-unit change in the independent variable– holding all other variables constant (or ``controlling” for all other variables)
  • For example, for a 1-unit change in Income, we would expect about a $68 increase in estimated donations, holding constant Net Worth and whether the person shared partisanship with the senator.
coef(fitnew)
##    (Intercept) IncomeLastYear       NetWorth      sameparty 
##     -242.02780       67.96825       29.55847      190.92323

Think of this like a set of light switches. How does adjusting one light switch affect the light in the room– holding constant all other switches.

When we make predictions with multiple variables, we have to tell R where we want to set each variable’s value.

predict(fitnew, data.frame(IncomeLastYear = 5, NetWorth = 4, sameparty = 1))
##        1 
## 406.9705

See how the prediction changes if you shift IncomeLastYear but keep Net Worth and partisanship where they are. That’s the idea of “controlling” for the other variables!

How could we keep improving the predictions?

Eventually, we would want to apply this prediction model in a real-world setting.

  • How could campaigns use these types of prediction models?