13.3 Constructing your own weights
Sometimes you might have collected your own survey data, which you know is not representative, but no weights are provided for you.
If you think that weighting the data will be important, then you can construct your own weights! We can:
- Collect information on our respondents to compare sample to population (including demographic questions on your survey)
- Gather statistics on the population (e.g., Census data)
- Construct weights to indicate how many units in the population each \(i\) unit in our sample should represent
- One example of constructing your own weights is through a process called Raking. Here, we find weights so that the weighted distributions of your variables in your sample match the distributions of the variables in the general population very closely. How?
- Decide the variables on which you want to weight your data (e.g., gender, education, age)
- Find the proportion that each subgroup within these variables exist in your target population (e.g., maybe .55 women, .45 men)
- Use a raking algorithm to adjust your sample data to, when weighted, reflect these distributions in the population. For example, perhaps your sample had 70% women. After raking, we would hope that when
svymean
is applied, you would find the weighted proportion in your data is 55%.
- One example of constructing your own weights is through a process called Raking. Here, we find weights so that the weighted distributions of your variables in your sample match the distributions of the variables in the general population very closely. How?
This R bloggers tutorial goes through an example using the survey
package and function rake
.
In addition, Pew has released an R package with similar weighting capabilities described here.
Raking is not the only type of weighting that is possible. Pew describes and evaluates other weighting processes here.
Several pollsters interested in measuring vote choice and public opinion have started turning to a process called multilevel regression and post-stratfication (MRP or, fondly, Mr. P).
- This “multilevel” refers to the same multilevel modeling approach we discussed, using
lme4
. An R introduction to this is here with a companion paper here that has citations to political science examples using MRP. - The technique has been around for a long time, but now is starting to infiltrate mainstream polling in addition to political science.
- See Andy Gelman’s take on it here along with a debate with Nate Silver here.
- For example, CBS used MRP in their 2020 election tracking .
- Doug Rivers provided a short course on MRP at an AAPOR conference. Slides here.