Experimental researchers are often interested in identifying causal effects among population subgroups or, to put it differently, in exploring potential moderators of treatment effects. But how should these surveys be designed? According to Montgomery, Nyhan, and Torres (Reference Montgomery, Nyhan and Torres2018), researchers should measure potential moderators before treatment since conditioning on a post-treatment variable can lead to biased estimates. In contrast, Klar, Leeper, and Robison (Reference Klar, Leeper and Robison2020) argue that measuring potential moderators before treatment could prime respondents and change the treatment effect being measured. Footnote 1
In this research note, we conduct a test of these competing perspectives in a short survey that randomizes the order of the experiment and measurement of the moderator variable.
“Welfare” versus “assistance to the poor” and spending views
In a now classic survey experiment, Smith (Reference Smith1987) and Rasinski (Reference Rasinski1989) show that when a policy is described as “welfare” as opposed to “assistance to the poor,” respondents are more likely to say there is too much spending on the policy. Footnote 2 We analyze a new survey that allows us to replicate these basic results while also testing important hypotheses about the impact of question ordering.
We recruited 1,590 participants for a short survey that included the question wording experiment about spending views as well as a standard four-question battery of racial resentment items (Kinder and Sanders Reference Kinder and Sanders1996). Footnote 3 Importantly, we randomly assign the order of the question wording experiment and racial resentment battery. Footnote 4 We recode each item in the racial resentment battery so that 0 (1) is the least (most) racially resentful response and create a racial resentment scale by averaging the four variables together. This racial resentment scale serves as our potential moderator of interest. We pre-registered this study and our analysis plan with the OSF [https://doi.org/10.17605/OSF.IO/JYZBE].
Table 1 presents a basic (naive) analysis of the experiment, ignoring possible issues with moderator order. Model 1 shows that the estimated effect of the “welfare” wording on spending views (1 = “too much”; −1 = “too little”; 0 = “about the right amount”) is .27 and highly significant. We might also suspect that racial resentment predicts respondents’ views on welfare spending (Federico Reference Federico2004) and Model 2 confirms this. Footnote 5 Furthermore, Model 3 shows that the effect of the welfare wording treatment is much stronger for respondents with higher levels of racial resentment.
Notes: Linear regression coefficient estimates with standard errors underneath in parentheses
* p < 0.05;
** p < 0.01;
*** p < .001.
But Table 1’s results are a prime example of the thorny issues discussed above. Because Models 2 and 3 condition on racial resentment, we may be wary of measuring racial resentment post-treatment. But if the battery of racial resentment questions is asked before the main spending question, we may worry that respondents are primed by being asked questions about race and the ability of different groups to get ahead in society. Because our study randomized the order of these questions, we are able to directly estimate these question order effects.
Does asking about racial resentment change the question wording effect?
In Table 2, Model 1 presents the results of a linear regression predicting spending views with an indicator for the “welfare” wording, RRpost (an indicator for whether racial resentment was measured after, as opposed to before, the question wording experiment), and an interaction between these two variables. We see that the welfare wording has a strong effect on spending views, as in Table 1.
Notes: Linear regression coefficient estimates with standard errors underneath in parentheses
* p < 0.05;
** p < 0.01;
*** p < .001.
But there is little evidence that the ordering of the racial resentment battery affects responses to the spending question. More importantly, the coefficient on the interaction between welfare wording and being asked racial resentment items post-treatment is small in magnitude and highly insignificant (p = .59), suggesting that asking the racial resentment items before the question wording experiment does not meaningfully change the experiment’s treatment effect. The point estimate for this coefficient is -.04, which, taken at face value, would imply that the effect of the “welfare” wording is roughly 15% smaller for those who are asked the racial resentment battery post-treatment, but it should be noted that our study is not adequately powered to detect such a small effect difference if it were real.
Although not our main focus, Model 2 shows that more resentful respondents are more likely to think too much is spent and racial resentment has a larger impact under the “welfare” wording treatment. Footnote 6 There is suggestive evidence that racial resentment has a smaller association when asked post-treatment, while the triple interaction term is not significant, although it is estimated with a decent amount of uncertainty.
Is racial resentment affected by the question wording treatment?
Another way to explore the possibility of post-treatment bias from conditioning on moderators measured after an experiment is to ask whether the average measured level of the moderator itself appears to be affected by the treatment. Footnote 7 Table 3 shows the results of a linear regression predicting racial resentment with RRpost and an interaction between RRpost and the “welfare” question wording. Footnote 8 Both coefficient estimates are small in magnitude and highly insignificant. It should be noted that the estimated difference in racial resentment between those receiving the racial resentment battery first and those receiving it afterwards who were shown the “welfare” wording might be termed marginally significant (p = .12), but this estimated difference of -.03 is small in magnitude. To put this in perspective, the point estimates for these coefficients would imply that compared to respondents who were asked the racial resentment battery at the start of the survey, those asked the racial resentment items post-treatment would have racial resentment levels on average 2% lower in the “assistance to the poor” condition, and just over 7% lower in the “welfare” condition.
Notes: Linear regression coefficient estimates with standard errors underneath in parentheses.
* p < 0.05;
** p < 0.01;
*** p < .001.
Collectively, these results provide little evidence that measured racial resentment differs meaningfully on average based on the ordering and wording of the experiment (note that testing the null hypothesis that both slope coefficients are equal to zero produces a p-value of .30). Though we are primarily interested in whether moderator order affects the treatment, our study design also allows us to test the opposite – does the question wording experiment affect the moderator? We do not find strong evidence that it does.
The appendix also shows the results of Kolmogorov–Smirnov tests comparing the overall distribution of racial resentment between these three groups. None of these tests for equality of distribution between the three groups (presented in detail in Appendix) reached conventional significance levels.
Discussion
Choosing the ordering of questions in a survey experiment presents difficult tradeoffs. Our results speak to competing concerns about asking potential moderators before versus after a survey experiment. If asked before, the moderator may change the treatment effect from what it would have been had the moderator not been asked. Asking these items after the survey experiment is also potentially problematic since conditioning on post-treatment variables can bias effect estimates.
Our survey considers a situation in which both of these problems might be expected to be particularly severe. Asking about race, society, and different groups getting ahead or being held back might seem likely to alter the effect of the “welfare” versus “assistance to the poor” question wording. Conversely, being asked about “welfare” might be expected to affect one’s measured racial resentment.
The results of our study, however, show little evidence that asking this particular moderator pre- versus post-treatment changes the treatment effect. There is also little evidence that the question wording treatment affected respondents’ measured racial resentment. These results should be of interest to researchers evaluating concerns about post-treatment bias against the generalizability of experimental findings.
While our results reject the hypothesis of large differences in treatment effects based on moderator order, we cannot confidently rule out small or even some modest differences. And although we believe our particular experiment is a tough test for such differences, it is certainly possible that this issue is more severe in other experiments. These possibilities can be investigated by future studies, perhaps leveraging similar designs to the one we employ here.
Supplementary material
To view supplementary material for this article, please visit https://doi.org/10.1017/XPS.2022.18
Data availability
Support for this research was provided through a University of Texas VPR Special Research Grant. The data, code, and any additional materials required to replicate all analyses in this article are available at the Journal of Experimental Political Science Dataverse within the Harvard Dataverse Network at: https://doi.org/10.7910/DVN/SZCAZF.
Acknowledgements
The authors thank Christopher Federico, Mike Findley, and Jacob Montgomery for helpful feedback. This project was supported by a Special Research Grant from the Vice President for Research at the University of Texas at Austin.
Conflicts of interest
The authors declare no conflicts of interest relevant to this study.