7.5 Additional R shortcuts

There are a few R packages that help generate these quantities of interest AND estimate uncertainty. If you understand what is going on underneath the packages, then you should feel free to use them to avoid manually coding up each process.

7.5.1 Prediction

We’ve already used the prediction package, but now let’s add uncertainty calculation to this. This package is useful for generating confidence intervals around single quantities of interest. The package below is better for constructing confidence intervals around the difference between quantities of interest.

library(prediction)
preout <- prediction(out.logit, at = list(Allies = c(0, 1)), 
           type = "response", calculate_se = T)
summary(preout)
 at(Allies) Prediction        SE      z         p    lower    upper
          0   0.003711 0.0002270 16.343 4.878e-60 0.003266 0.004156
          1   0.002760 0.0003135  8.804 1.322e-18 0.002145 0.003374

I believe prediction relies on the delta method for uncertainty. The others below will have options for simulations or bootstrapped standard errors.

7.5.2 Margins

Thomas Leeper developed a package called margins which is modeled after the Stata margins command. Here is some documentation for this package.

We first fit a regression model like normal.

out <- glm(Conflict ~  MajorPower + Contiguity + Allies + ForeignPolicy +
    BalanceOfPower + YearsSince, data=mids, family=binomial(link = "logit"))

Open the package and use the margins command. It is similar to prediction but we will specify the uncertainty with vce and tell it which variable we want the marginal effect, and what type of change in that variable we want to calculate the effect.

## install.packages("margins")
library(margins)
Warning: package 'margins' was built under R version 4.0.2
## Difference in Allies 0 vs. 1 holding other covariates at 0
## Using Delta Method for uncertainty
marg1 <- margins(out, vce = "delta",
                 at = list(MajorPower=0, Contiguity=0,
              ForeignPolicy=0, BalanceOfPower=0, YearsSince=0),
                 variables = "Allies",
                 change = c(0, 1),
              type="response") 
Warning in check_values(data, at): A 'at' value for 'BalanceOfPower' is outside
observed data range (0.000173599997651763,1)!
summary(marg1)
 factor MajorPower Contiguity ForeignPolicy BalanceOfPower YearsSince     AME
 Allies     0.0000     0.0000        0.0000         0.0000     0.0000 -0.0010
     SE       z      p   lower   upper
 0.0004 -2.3910 0.0168 -0.0018 -0.0002

Let’s try a second example, shifting the uncertainty estimate and the X covariates.

## Difference in Allies 0 vs. 1 holding MajorPower at 1, other covariates at observed values
## Using simulations for uncertainty
marg2 <- margins(out, vce = "simulation",
                 at = list(MajorPower =1),
                 variables = "Allies",
                 change = c(0, 1),
              type="response") 
summary(marg2)
 factor MajorPower     AME     SE       z      p   lower  upper
 Allies     1.0000 -0.0020 0.0010 -1.9576 0.0503 -0.0041 0.0000
## Manual equivalent of point estimate
X.marg1 <- model.matrix(out)
X.marg1[, "Allies"] <-1
X.marg1[, "MajorPower"] <- 1

X.marg0 <- model.matrix(out)
X.marg0[, "Allies"] <-0
X.marg0[, "MajorPower"] <- 1

BH <- coef(out)
ame <- mean(plogis(X.marg1 %*% BH) - plogis(X.marg0 %*% BH))

## Compare
ame
[1] -0.002039019
summary(marg2)$AME
      Allies 
-0.002039019 

7.5.3 Zelig

Zelig has a lot of built-in plotting functions. This can be a great package if you want to present results the way they have set up the package. They make it slightly harder to extract specific elements from the results. Documentation here.

## install.packages("Zelig")
library(Zelig)

z.out <- zelig(Conflict ~  MajorPower + Contiguity + Allies + ForeignPolicy +
    BalanceOfPower + YearsSince, data=mids, model = "logit", cite = F)

## Use function setx to designate values of X
## Let's set everything to 0
set.x <- setx(z.out, MajorPower=0, Contiguity=0, Allies=0,
              ForeignPolicy=0, BalanceOfPower=0, YearsSince=0)
s.out1 <- sim(z.out, set.x, num = 100) # should do more like 1000

## Expected value and 95% CI
s.out1

 sim x :
 -----
ev
            mean          sd         50%        2.5%       97.5%
[1,] 0.003755489 0.001072985 0.003499368 0.002113449 0.006454072
pv
     0 1
[1,] 1 0
## Change Allies to 1, and let's look at the difference between Allies vs. Non-Allies
set.x1 <- setx1(z.out, MajorPower=0, Contiguity=0, Allies=1,
              ForeignPolicy=0, BalanceOfPower=0, YearsSince=0)

## Let's look at the QOI that is the difference between these
s.out2 <- sim(z.out, set.x, set.x1, num = 50)

## Get mean and CI's from distribution
mean(s.out2$get_qi(qi="fd", xvalue="x1"))
[1] -0.0009859548
quantile(s.out2$get_qi(qi="fd", xvalue="x1"), c(0.025, 0.975))
         2.5%         97.5% 
-0.0018093458 -0.0002380693 

7.5.4 Using expand.grid

When we use the predict function in R, we specify a “`newdata” dataframe to indicate for which values of \(X\) we want to estimate values of the outcome. When we do this, we are actually building a dataframe, like the below:

df <- data.frame(MajorPower = 0, Allies=1, Contiguity = 0)
df
  MajorPower Allies Contiguity
1          0      1          0

In cases where we want to make predictions for multiple values of a given variable, we can feed a vector into the data.frame:

df <- data.frame(MajorPower = 0, Allies=c(0,1), Contiguity = 0)
df
  MajorPower Allies Contiguity
1          0      0          0
2          0      1          0

However, this becomes more tedious when we want to estimate for combinations of different variables (e.g., MajorPower and Allies as 0 or 1). You have to tell R precisely which rows should include which values. This means indicating four separate values for each variables to get all possible combinations.

df <- data.frame(MajorPower = c(0, 0, 1,1), Allies=c(0,1, 0, 1), Contiguity = 0)
df
  MajorPower Allies Contiguity
1          0      0          0
2          0      1          0
3          1      0          0
4          1      1          0

What expand.grid does is let you more quickly indicate that you want to estimate for all possibles combinations of the values of the variables you supply. E.g.:

df <- expand.grid(MajorPower = c(0, 1), Allies=c(0,1), Contiguity = 0)
df
  MajorPower Allies Contiguity
1          0      0          0
2          1      0          0
3          0      1          0
4          1      1          0

This can be a useful shortcut when you need to look at combinations of variables that have many different values.