Introduction
We conducted a large-scale experiment to assess the effectiveness of partisan robo calls.Footnote 1 We find a positive, significant treatment effect of increasing voter turnout. Our experiment has several key features that stand out in the get-out-the-vote (GOTV) literature. First, our experiment varies the intensity of the treatment dosage by using a different number of calls (one, three, or six) for different treatment groups. We found the largest treatment effects for subjects scheduled to receive three calls. Second, a Republican-leaning media firm administered the experiment and targeted likely Republican voters. In contrast, many similar published experiments are nonpartisan, Democratic, or otherwise left-leaning. Third, the subject target of our experiment was potentially marginal voters. Specifically, people who had voted in two of the previous four elections. These three key features may explain why this experiment shows that robo calls have a positive, significant treatment effect on voter participation, while many other robo call experiments do not. In this section, we will discuss important results from the robo call GOTV literature and highlight several important features of the experiment and how they relate to previous GOTV studies.
Previous research shows that the impact of robo calls on voter turnout is small and not statistically significant. This is summarized in a meta-study by Green, McGrath, and Aronow (Reference Green, McGrath and Aronow2013). They report a statistically insignificant increase in voter turnout of 0.156 percentage points, attributable to prerecorded phone messages. Another meta-analysis reports an “average intent-to-treat effect of 0.113 percentage point, with a 95% confidence interval ranging from −0.336 to 0.563” (Green and Gerber Reference Green and Gerber2015, p. 196), also implying a statistically insignificant effect of robo calls on voter participation.Footnote 2
One study that did find a significant positive treatment effect of robo calls is Gerber et al. (Reference Gerber, Green, Kaplan and Kern2010), which used a placebo design to increase the precision of the estimate of its treatment effect.Footnote 3 Studying the 2008 primary election, Gerber et al. (Reference Gerber, Green, Kaplan and Kern2010) find an approximately two percentage point increase in voter participation for subjects who were successfully contacted, relative to subjects in a placebo group who were successfully contacted. The authors attribute the large size of their reported effect to call content, or “social pressure” messaging, which reminded subjects that they had voted in the past two general elections but not in the past primary, and informed subjects that their voting records are publicly available. However, when attempting to replicate their results for the November 2008 general election in a follow-up study, the authors find “a weakly positive effect in the November 2008 general election, in keeping with the usual pattern of weaker turnout effects in high-salience elections” (Green and Gerber Reference Green and Gerber2019, 89). Ali and Lin (Reference Ali and Lin2013) develop a model of voter turnout based on similar social dynamics. DellaVigna et al. (2014) present evidence from a field experiment that “individuals vote because they expect to be asked” about it and find lying costly. That research suggests that understanding the dynamics of social pressure messaging amidst an evolving campaign landscape is a fruitful research agenda.
Our experiment does not use social pressure messaging but does find a statistically significant positive effect for robo calls placed before the November 2014 general election. Subjects in the combined treatment group were 0.318 percentage points more likely to vote than subjects in the control group – or a rate of 1 mobilized voter for every 314 subjects called. The treatment with three calls was the most effective. Subjects in the three-call treatment were 0.695 percentage points more likely to vote than subjects in the control group – or a rate of 1 mobilized voter for every 144 subjects called.
We posit that robo calls are most effective when they are received by voters who are on the margin as to whether to participate or not to participate in voting. That is, a GOTV treatment will be less effective if it is primarily targeted to individuals who are habitual voters or individuals who never vote. Most of the individuals with a high propensity to vote will vote regardless of the treatment. Similarly, individuals with a very low propensity to vote are unlikely to be mobilized by a robo call. However, an individual’s propensity to vote is context-dependent and can change based on the salience of a given election to that potential voter (Arceneaux and Nickerson Reference Arceneaux and Nickerson2009). Moreover, that an individual’s propensity to vote is context dependent is consistent with an array of rational choice models of voting (Ali and Lin Reference Ali and Lin2013; Coate and Conlin Reference Coate and Conlin2004; Degan and Merlo Reference Degan and Merlo2011; Dhillon and Peralta Reference Dhillon and Peralta2002; Feddersen and Sandroni Reference Feddersen and Sandroni2006). Thus, along several dimensions, such as the choice of states without multiple high-profile statewide races and the subject selection criteria, the experiment was designed to include a higher proportion of subjects who were likely to be marginal voters.
We built a sample of potentially marginal voters by focusing on registered voters who had voted exactly twice, during the 2010 and 2012 primary and general elections. Thus, these voters missed voting in two of the four elections. Other GOTV studies use alternative criteria for subject selection. For example, Ramirez (Reference Ramirez2005, p. 70) selects subjects from a list of “registered Latino voters in low-propensity precincts” Other studies select subjects that are expected to be receptive to the treatment message, which is in a similar spirit to our experimental design. For example, Shaw et al. (Reference Shaw, Green, Gimpel and Gerber2012, p. 236), whose calls include an endorsement by Texas Governor Rick Perry, direct their calls to subjects that are “both likely primary voters and strong Perry supporters.” Gerber et al. (Reference Gerber, Green, Kaplan and Kern2010) select only subjects who voted in the past two general elections but did not vote in the most recent primary.
As noted above, our experiment is targeted at Republican voters, while many other robo call and other GOTV experiments target Democratic or nonpartisan voters. Examples of such experiments include, e.g., Nickerson and Rogers (Reference Nickerson and Rogers2010), Barton, Castillo, and Petrie (Reference Barton, Castillo and Petrie2016), and Rogers, Green, Ternovski, and Young (Reference Rogers, Green, Ternovski and Young2017) which are all targeted to Democrats (with messages of varying degrees of partisanship) while Ramirez (Reference Ramirez2005), Nickerson (Reference Nickerson2008), Gerber, Green, Kaplan, and Kern (Reference Gerber, Green, Kaplan and Kern2010), and Panagopoulos (Reference Panagopoulos2011) had nonpartisan targeting. Among studies of robo calls, Shaw, Green, Gimpel, and Gerber (Reference Shaw, Green, Gimpel and Gerber2012) is an exception; it targeted voters in a Republican primary.
It is possible that differences between partisan groups or differences in messages could cause particular GOTV methods, such as robo calls, to be more effective for one voter group than another. This effect could possibly be mediated by voters’ age or other voter characteristics. We do not have prior expectations that robo calls are more (or less) effective when targeting Republican voters as opposed to Democratic or nonpartisan voters, especially after controlling for individual characteristics. However, we believe that it is important to close the empirical gap in studies of the effectiveness of robo calls with experiments that have Republicans in general elections as their main subject pool. This approach will give insights into whether findings from experiments targeting Democratic and nonpartisan voters generalize to Republican voters.
One of the important features of this experiment is the use of treatment groups that vary in the number of GOTV robo calls received. With a few exceptions, published GOTV experiments use a single robo call, while it is common for political candidates and advocacy groups to deploy multiple iterations of multiple types of GOTV communication. Ramirez (Reference Ramirez2005) included a treatment with two robo calls among a wide assortment of GOTV methods. Recently, Zelizer (Reference Zelizer2020) studied the effect of repeating robo calls, dividing approximately 40,000 registered voters into eight treatments that received zero to seven calls in the week leading up to a primary election. Zelizer found that a treatment with five calls was the most effective, but the differences between treatments were not statistically significant. We are not aware of any other studies that directly measure the effect of robo call dosage.
Similarly, Green and Zelizer (Reference Green and Zelizer2017) conducted an experiment using multiple treatments with GOTV mail and found that efficacy declined after five mailers. Other experimental studies have raised the issue of GOTV over-saturation (Gerber and Green Reference Gerber and Green2000) or analyzed GOTV synergies via multiple treatments (Green and Gerber Reference Green and Gerber2019, p. 184), but did not find an effect of such multiple treatments. In a manual for political operatives, Lofy (Reference Lofy2005, p. 153) recommends two to nine contacts before an election to increase the likelihood that habitual voters cast a ballot on the day of the election.
Procedures
The field experiment occurred in the days leading up to the November 2014 general election. The experiment was funded and conducted by a political consulting firm seeking to evaluate the effectiveness of their GOTV services.Footnote 4 The firm identified registered voters in six states as subjects. The potential subject pool consisted of likely Republican voters who had registered to vote in Georgia, Nebraska, New Mexico, Ohio, Pennsylvania, or Virginia prior to January 1, 2010.Footnote 5 The pool of potential subjects was narrowed to people who had voted in exactly two of the 2010 and 2012 primary and general elections. People who voted early or who cast an absentee ballot in the 2010 or 2012 general election were also excluded.
From this group, 42,000 subjects with unique landline phone numbers were randomly selected from each state. Many of these landline numbers are associated with households that have more than one voter. Because we cannot identify who answers a robo call, we included every registered voter in each selected household in the analysis even though these household members are not expected to all match the original selection criteria in terms of voting history or even party affiliation. The 42,000 households from each state yielded subject pools ranging from 86,714 to 95,557 registered voters. In total, the study includes 539,567 subjects.
Within each state, households were randomly assigned to one of four groups: a control group and three treatment groups. Subjects in the three treatment groups, T1, T3, and T6 received one, three, or six treatment calls over 1, 3, or 6 days, respectively. The robo calls to subjects in T6 started 6 days before the election, the calls to subjects in T3 started 3 days before the election, and all subjects in the three treatment groups received a call on the morning of Election Day. On average, each household has 2.16 registered voters. Detailed information, including summary statistics of subject characteristics in each treatment group, is in Appendix A. The table of summary statistics shows that there are some imbalances across treatments on some control variables, particularly 2010 and 2012 voting rates. To address this issue, we include these covariates in our regression models. Further, we cluster the standard errors by household in all household-level specifications.
Treatment call messages were slightly different on each call date. Each message included a reminder about the date of the election and a short partisan message encouraging the subject to vote. Most of the messages invoke negative partisanship rather than making specific promises or describing specific plans.Footnote 6 The duration of the messages was between 35 and 45 seconds. Appendix B lists the script for each message. Table 1 shows the schedule of calls and schedule of scripts, which were identical for every state.
We define “live answerers” as subjects residing in households where at least one call resulted in a live answer, meaning someone in that household answered the phone for any length of time. We classify all other subjects as “treatment non-answerers,” regardless of whether the call went to an answering machine, operator, no answer, busy, fax machine, or was otherwise uncompleted.Footnote 7 However, calls that went to answering machines or had a live answer were considered successful treatment for the purposes of the local average treatment effect (LATE) analysis described in Appendix E, because subjects may have listened to and been influenced by the message left on their answering machine. Our outcome variable – whether a subject voted in the election – is based on verified voting records for the November 2014 election.
Results
Table 2 shows that the percent of subjects with at least one live answer for T1, T3, and T6 were 39, 60, and 67%, respectively.Footnote 8 The remaining subjects in each group did not answer any of the treatment calls. Table 3 shows the mean number of calls with live answers for subjects in each treatment group.Footnote 9 This pattern of more live answers in treatments with more calls is consistent across states.
Note: The table shows the percent of subjects in each treatment group that either answered at least one call live, had no live answer, had at least one answering machine answer, etc. We refer to these “responses” as “call outcomes.” Treatment call outcome categories, which are not all mutually exclusive, are listed in the rows of this table. Percents are relative to the total number of subjects in each treatment group. “AM” denotes answering machine. The raw number of subjects in each cell is given in parentheses.
Note: The top panel of this table shows the mean number of live answers for all subjects in each treatment group. The bottom panel shows the mean number of live answers only among subjects with at least one live answer. Standard errors for each mean are listed in parentheses.
Table 4 reports the voting rates for each of the treatment groups. These results show that treatment was associated with modestly higher voting rates, particularly the three call treatments, with voting rates among all treated subjects and subjects in T3 approximately 0.3 and 0.6 percentage points higher, respectively.
Table 5 uses multivariate regressions to report the intent-to-treat estimates from comparing voting rates of the three treatment groups to those of the control group while controlling for subject-specific voting history, demographics, and socioeconomic characteristics.Footnote 10 Specifically, our control variables include whether or not a subject voted in the 2010 and/or 2012 general elections and subject age, gender, educational attainment, estimated income, state of residence, and the number of registered voters in their household.Footnote 11 The coefficients on each of the treatment variables show the difference in voting rates, measured in percentage points, between the corresponding treatment group and the control group.Footnote 12 Standard errors are clustered by household and listed below each regression coefficient.
Note: The table shows OLS estimates for the difference in voting rates between subjects in the treatment group and subjects in the control group. The subset of subjects used to estimate each specification is shown with the “Population” label. Subject controls include whether or not a subject voted in the 2010 or 2012 general elections or both and subject age, gender, education, income, number of subjects in the household, and state of residence. Standard errors are clustered by household and displayed in parentheses.
*** p < 0.01,
** p < 0.05,
* p < 0.1.
In Table 5, Column 1, the coefficient on the All Treatments variable is 0.00318, which indicates that voter participation for subjects in the pooled three treatment group is 0.318 percentage points higher than for subjects in the control group. This point estimate is statistically significant at the 5% level. The magnitude of this estimate indicates a yield of three additional voters for every 1,000 subjects who were called, or 314 subjects called per additional voter mobilized.Footnote 13 This effect is larger than the average robo call effect reported in Green and Gerber (Reference Green and Gerber2015, p. 196). This result implies a cost of under $9 to induce a subject from not participating in the election to casting a ballot.Footnote 14
Table 5, Column 2 reports estimation results by each treatment group. The findings indicate that the most effective treatment is T3, the treatment in which subjects receive an automated call on each of 3 days. T3’s treatment effect is 0.695 percentage points which is statistically significant at the one percent level.Footnote 15 This estimate implies the turnout of one additional voter for every 144 subjects assigned to T3.Footnote 16 The treatment effect for groups T1 and T6 are positive, but smaller and statistically insignificant. The implied cost to induce an additional vote for only subjects in T3 is under $4.
Our findings show that calling on 3 days was more effective than calling on 1 day or over 6 days.Footnote 17 The Wald statistic for testing for the equality of the coefficients for T1 and T3 is 9.04 and is significant at the one percent level. The Wald statistic testing the equality of the coefficients for T3 and T6 is 8.21 and is statistically significant at the one percent level. Thus, for both tests, we reject the hypothesis of equal dosage effects. There is no statistically significant difference between the coefficients for T1 and T6.
In households with multiple registered voters, we do not know which subject or subjects answered the treatment call or calls. Because the treatment effect may be diluted in multi-voter household, we also analyze subjects in single-voter households, which accounted for approximately one tenth of all subjects.Footnote 18 In Table 5, Columns 3 and 4 provide those estimation results. Our data show a larger treatment effect for subjects in single-voter households. In Column 3, the average treatment effect for single-voter households is 0.388 percentage points, but it is not statistically significant at the ten percent level.
Table 5, Column 4 separates the results by treatment group and shows the estimated effect among single-voter households. The treatment effects for T1 and T3 are both positive, although only T3 is statistically significant at the ten percent level. The estimate for T6 is negative, but not statistically significant.
As noted in the Procedures section above, the treatment messages were different from day to day. Within T3 and T6, subjects differed in terms of which days had live answers, and thus which messages were heard. However, we were not able to identify any meaningful differences in turnout that can be attributed to differences between hearing specific message scripts.Footnote 19 Similarly, although previous researchers find that GOTV calls are less effective the earlier these calls are placed prior to an election (Green and Gerber Reference Green and Gerber2019, p. 186; Murray and Matland Reference Murray and Matland2014; Nickerson Reference Nickerson2007; Panagopoulos Reference Panagopoulos2011), we did not observe a timing effect within the groups of voters who received three or six calls. This suggests that within the time frame of our study, the effectiveness of the treatment did not measurably decay.
We found that the pattern of treatment effectiveness varied across the states in our sample. We have included a chart in Figure 1 to show each group’s treatment effect for the six states in the experiment. While there are differences in the balance of subject characteristics across states, we do not observe a correlation between differences in some of the balance tests and the estimated treatment effects across the states. The state-by-state subject characteristic summary statistics are included in Appendix A as are the regression tables for the results graphed in Figure 1.
In Appendix D, we discuss an alternative method of estimating the causal treatment effect using data from a call placed after the election. In Appendix E, we report the conventional method of estimating the treatment on treated effect by calculating local average treatment effects. In Appendix F, we test whether our results are significant after correcting for multiple hypothesis testing.
Conclusion
This experiment employed multiple-call dosage and mimicked the behavior of real political GOTV campaigns. We show that targeted automated calls have a positive effect on voter turnout even without using a social pressure message. We find that, across all treatments, the intent-to-treat effect is larger than most previous measures of robo call effectiveness. Our results suggest that it is not irrational for campaigns to deploy robo calls as a cheap additional tool to increase voter turnout. We find that the treatment increases voter participation by between three to six additional voters for every 1,000 subjects. This corresponds to a cost of $4 to $9 to induce a subject to vote in elections.
We also find that dosage matters. The intent-to-treat effect for dosage T3 is two to three times larger than for dosage T1, while dosage T6 typically had the smallest effect. These results suggest that additional calls can increase effectiveness, but that too many calls may be counterproductive. Our data are not rich enough to determine why the T6 treatment had the smallest treatment effect. One explanation for this finding might be that receiving so many calls annoyed the subjects and that they started ignoring the calls that were closer to the election, even if they were picking up the phone. While T6 reached a higher percentage of subjects than T3 – 67 and 60% – respectively, the increase from three calls to six calls nearly doubled the total number of live answers for each live answerer – from 2.1 in T3 to 3.8 in T6. Annoyance with this stark increase in live answers received may explain the decreased effectiveness of T6. This topic may warrant further experiments, using placebos, to determine how receiving marginal additional calls affects voting behavior.
Overall, the pattern of dosage results observed in our experiment is consistent with the broader GOTV literature. The 0.12 percentage point effect of T1 is consistent with meta-analyses (Green and Gerber Reference Green and Gerber2015, p. 196; Green, McGrath, and Aronow Reference Green, McGrath and Aronow2013),Footnote 20 while the larger 0.65 percentage point treatment effect of T3 suggests, along with more recent work (such as Zelizer Reference Zelizer2020), provides evidence that robo call treatments are more effective with some, but not too much, repetition.
Supplementary material
To view supplementary material for this article, please visit https://doi.org/10.1017/XPS.2022.16
Data availability
The data, code, and any additional materials required to replicate all analyses in this article are available at the Journal of Experimental Political Science Dataverse within the Harvard Dataverse Network, at: https://doi.org/10.7910/DVN/DMJ7EA.
Conflicts of interest
The authors are not aware of any conflicts of interest related to this paper.
Ethics statement
The research presented here was approved by the George Mason University IRB under Project Title: [854034-1] Robo call GOTV data analysis.
To the best of our knowledge, this research adheres to APSA’s Principles and Guidance for Human Subjects Research.
Details of the experimental procedure are described in the manuscript above. Further description pertinent to the APSA standards is in online Appendix G.