WHY TURNOUT?
Predicting elections has a long history in PS: Political Science & Politics. Although early predictions focused on forecasting the winner of the popular vote for the US presidency, more recent models have expanded to predicting the winner of the Electoral College or the distribution of congressional seats.
However, predicting election turnout is a far less-studied phenomenon. The most recent article on the topic is now more than a decade old. Evans and Ivaldi (Reference Evans and Ivaldi2012) built a model to predict turnout for European Union elections. They found that lagged turnout was the most important predictor, but that electoral circumstances and structures—such as the absence or presence of a compulsory voting system, the time since the last election, and the number of parties—also mattered. In sharp contrast to forecasts of elections, no economic variables were significant. Furthermore, no studies to date have focused on forecasting turnout in the United States.
Voter turnout is a fundamental pillar of democracy and a key indicator of the health of a country. Participation in elections has profound implications for the proper functioning of government. Turnout rates are linked to important aspects of democratic governance, ranging from the accuracy of representation to the perceived legitimacy of those who are elected.
The importance of turnout can be seen in examining its role across different political contexts. In authoritarian regimes, for instance, reported turnout figures are frequently viewed as manipulated or coerced and are often met with skepticism by international observers. Conversely, in established democracies, low turnout rates can be a source of considerable concern and frustration, especially for elections that receive less attention, such as midterm congressional races or special elections. In these cases, diminished voter participation may raise questions about the mandate of those elected and the overall health of the democratic process.
Many factors encompassing a range of social, political, and institutional variables help shape turnout. Some of the rules and structures governing the electoral process may inadvertently create barriers to voting, and others may encourage or facilitate greater participation. Differences in social or demographic factors can influence turnout and may reflect varying levels of political engagement, access to information, or feelings of efficacy. Additionally, the perceived importance of a particular election, the competitiveness of races, and the broader political climate may affect how individuals approach each election.
Even though we understand the importance of turnout, as well as the forces that can change it, there has not been work in forecasting turnout in the US context. This article is thus an attempt to lay a foundation for this field.
The exercise of forecasting turnout has several real-world implications. For practical reasons, forecasting may shape how election administrators perform their work. It could assist with preparation for elections to ensure that there are an adequate number of polling places or poll workers. It may also serve as a check against anomalies: forecasting could check against unusually high or low turnout. Given an atmosphere in which many worry about the fair counting of ballots, forecasting turnout may serve as a balm to guard against politicized claims of fraud.
There are also strategic reasons to forecast turnout. Campaigns may use forecasting to better understand the structural forces in elections. Modeling turnout may also help with the distribution of financial resources, because campaigns may wish to prioritize mobilization under some circumstances and persuasion in others.
Two models are introduced in this article: the national and the state model. As their names imply, the former model attempts to predict national turnout, whereas the latter model is a forecast at the state level. The national model predicts that turnout will be 65.3% for 2024, whereas the state model finds that turnout will generally increase across the country from 2020.
CREATING THE MODELS
Data comes from Bednarczuk (Reference Bednarczuk2024). In deciding what to model, several considerations were top of mind. Variables should be logical; that is, there should be prima facie support for their inclusion. Ideally, the variables should be available far in advance of the election so that the forecast can be made early; however, this consideration may have to be weighed against the increased precision that may come from the use of data gathered shortly before the election.
Inspired by the literature on voter turnout and on the earlier model of forecasting turnout, I decided to classify the independent variables as either institutional, campaign, economic, or demographic. Institutional variables capture the rules and structures that may shape turnout. These rules include laws such as same-day voter registration (Grumbach and Hill Reference Grumbach and Hill2022) and automatic voter registration (Kim Reference Kim2023), both of which have been shown to increase turnout.
Campaign variables focus on features in the political environment that can affect turnout. For example, the presence of other high-profile races, such as senatorial or gubernatorial contests (Jackson Reference Jackson2002; Springer Reference Springer2012), and of additional candidates (Bol and Ivandic Reference Bol and Ivandic2022) can increase turnout.
Economic variables include measures of the economic forces that may affect turnout, ranging from the global (Park Reference Park2023) to the national (Frank and Martinez i Coma Reference Frank and i Coma2023) to the individual; for example, Aytac et al. (Reference Aytaç, Rau and Stokes2020) recently found that turnout is lower among the unemployed.
Demographic variables include group identities that may shape turnout. For example, Barber and Holbein (Reference Barber and Holbein2022) recently documented how turnout is much lower among minority citizens and young people.
As with many models, some ideas worked; others did not. Given the novelty of these models, the following sections describe what worked and what did not work when formalizing them.
THE NATIONAL MODEL
The national model is designed to forecast turnout for the country as a whole. It includes the years 1868 through 2020 and is represented by the following formula:
Data for the initial model came from elections held from 1960 to 2020. The values for the dependent variable came from the United States Election Project and reflect turnout for the voting-eligible population.
Campaign variables included the presence of an elected incumbent, the presence of a sizable third-party challenge, and whether a poll showed a minor party earning more than 10% of the potential vote during an election year. Economic variables included the national unemployment rate for the year and the change in GDP from the prior presidential election. This data came from the Bureau of Labor Statistics and the Federal Reserve Bank of St. Louis (FRED), respectively. Demographic variables included the percent of the country over 65 and the percent of the country that was nonwhite, both of which came from US Census data. Lagged turnout was also included.
Using a time-series cross-section regression model and various combinations of independent variables, the only variable that was significant was prior turnout (table 1). Additionally, the results showed poor model fit. Altogether, this is an unsatisfactory result for forecasting.
* Note: Variables that are significant at the 0.05 level are in bold.
What might help explain these results? Two possible explanations center around the sample size and the variation in the dependent variable. Fewer than 20 observations were included, and turnout varied by fewer than 20 percentage points. Given either complicating factor, creating a useful forecast may have been challenging, but the presence of both may have been too much to overcome.
Therefore, I considered additional models. Based on the results of the earlier model, the only independent variable was lagged turnout. Given data availability, I was able to extend the series to the election of 1792 to increase the sample size to 58. Although doing so greatly improved the model fit, there were sizable residuals, which were largely driven by the low levels of turnout in antebellum elections. Thus, another model was considered that only included post–Civil War turnout. This became the national model.
A plot of the predicted results and the actual turnout can be seen in figure 1. Some of the largest misses seem to be driven by times of great change in the composition of the electorate. For example, the predicted turnout of 37% in 1920 was far lower than the actual turnout of 49%; this may be because it was the first presidential election following the ratification of the Nineteenth Amendment.
An additional check analyzed the out-of-sample performance of the model. This was done by dropping an election, reestimating the model, and checking the difference between the prediction for this missing election and the actual turnout rate. Doing this showed that the model fit improved for recent elections; the average miss since 1956 has been less than 3%, with the overall average miss at 4%. For example, the prediction for 2016 was 59%, whereas actual turnout was 60%.
The national model predicts a national turnout for 2024 of 65.3%, with a confidence interval that ranges from 63.6% to 67.0%.
THE STATE MODEL
The state model attempts to forecast turnout state by state. Again, data come from the United States Election Project, which only has state-level data from 1980 through 2020. Given that lagged turnout was included, this meant that only data from 1984 to 2020 elections would be used. Therefore, this state-level model had a sample size of 500. Turnout varies across states from a low of 36.5 (Nevada in 1996) to a high of 74.8 (Minnesota in 2020), with a mean turnout across years of 56.1.
The state model is represented by the following formula:
where Y = voter turnout, β₀ = intercept β₁ = coefficient for lagged voter turnout, β₂ = coefficient for same-day voter registration, β₃ = coefficient for percent of population that is white, β₄ = coefficient for percent of population with a four-year degree, and ε = error term
Several other variables were tested before formalization of this model. First, I introduced a new category of independent variables—institutional—because there are varying laws across states that may shape turnout; for instance, whether states allow same-day registration through Election Day. These data were collected from the National Conference of State Legislatures. Campaign variables included the presence of significant statewide contests (gubernatorial or senatorial) and whether those races featured an incumbent; these data were gathered from a review of elections results from each state. Economic variables included the unemployment rate in the state for the first three months of the election year, as well as the median household income per state. I gathered unemployment data from FRED and derived median household income from census data. Demographic variables included the percent of the state that is nonwhite, the percent that has at least a bachelor’s degree, and the percent of the state in poverty, all of which came from census and American Community Survey data. Lagged turnout was also included.
I again used a time-series cross-section regression model. After testing various combinations of the proposed independent variables, the best-fitting model included lagged turnout, same-day registration, race, and education. More college-educated voters and a smaller minority population suggest higher turnout in a state, as does the presence of same-day registration. However, the variable that has the largest effect was lagged turnout (table 2).Footnote 1
* Note: Variables that are significant at the 0.05 level are in bold.
To check model fit, I created a plot of the predicted turnout and the actual turnout, as shown in figure 2. The predicted turnout for each state was weighed by its voting-eligible population, aggregated by year, and then compared to the actual national turnout. The forecast tracks closely to actual turnout: the average result misses by less than 3%. As an additional check, the out-of-sample performance of this model was also analyzed. Much like in the national, the average miss per state was less than 4%.
This model was then used to forecast turnout for each state for the 2024 election. Data for unemployment came from FRED, whereas college graduate data and data on the racial makeup of states came from the most recent American Community Survey.
Turning to predictions for the upcoming election, 41 states are forecasted to see an increase in their turnout from 2020; the average shift from 2020 turnout is 1.5 percentage points. States seeing the largest prospective gains in turnout tend to be those with same-day voter registration. Their turnout shifts by an average of 2.3 percentage points; those without it move around 0.8 percentage points. Nine states are forecasted to have a turnout higher than 70%, whereas thirteen are expected to see turnout of less than 60% (table 3).Footnote 2
What do these results suggest for the potential swing states in 2024? First, let us look at the Democratic “blue wall” of the Midwest, which consists of Wisconsin, Michigan, and Pennsylvania. All three are high-turnout states, but they have some important differences. Wisconsin’s turnout could be higher than it was in 2016, driven by its largely homogeneous population and same-day registration. Michigan should be right behind, brought down slightly by its more diverse population. Lagging these states is Pennsylvania, which is broadly similar in demographics to Michigan but lacks same-day registration. Given the high turnout in these states, motivating people to go to the polls may not be as large a problem as in other states, but Democrats would have to focus more on Pennsylvania than on the other two.
There are other states to consider as well. Potential key states in the Sun Belt include Arizona, Nevada, and Georgia. These are lower-turnout states than the Midwestern states; all three cluster around 60% turnout. This suggests there may be more room for increasing the size of the electorate. Arizona and Nevada both have same-day registration, which may make them easier states to mobilize given their late deadlines. Georgia, in contrast, would require an earlier get-out-the-vote effort.
DISCUSSION
Aside from lagged turnout, Why are there no independent variables for forecasting turnout with the national model? This could be based on its lack of campaign variables. Perhaps the inclusion of measures such as campaign interest and spending or the number of battlegrounds may improve fit; however, those variables would restrict the model to the most recent elections, and such a reduction in sample size may not be worth the trade-off.
A similar critique can be made of the state model. Although related variables, such as down-ballot campaigns, did not have a demonstrable effect on turnout, other electoral variables, such as whether a state is a battleground or swing state, should be considered in additional models. Relatedly, this model does not include any candidate-specific variables. Measures of interest or popularity may be relevant.
Future models may also wish to consider the effect of other demographic variables, especially if the current variables become less predictive over time. Considerations may need to be given to the dependent variable as well; there may be factors that shape the size of the voting-eligible population that should be modeled. Perhaps changes over time in state laws regarding the restoration of felon voting rights should be included.
The lack of economic indicators is also striking. Although these indicators are normally linked to turnout, no combination of economic measures in either model had an effect. That was also found in Evans and Ivaldi (Reference Evans and Ivaldi2012)’s forecast of EU turnout. Perhaps these indicators do more to shape vote choice than the choice to vote.
However, the lack of economic indicators does point to a general strength of both models: they are able to generate a prediction well in advance of an election. The national model can issue its prediction for 2028 once the official turnout records for 2024 are tabulated, whereas the state model will be ready at the beginning of an election year. Given the sometimes competing interests of lead time and accuracy in forecasting, these models decidedly tilt toward the former. This could be useful for both election administrators and political campaigns as they prepare for the fall election cycles.
CONCLUSION
I set out to create a model to forecast turnout for US presidential elections and ended up creating two: the national and the state model. The national model relies strictly on lagged turnout, whereas the state model includes demographic and institutional variables.
This forecast suggests a more positive reading of the health of democracy across the country in 2024 that might be garnered from other sources. Turnout is forecasted to be higher across most of the country for 2024, which is impressive given the high turnout in 2020. Furthermore, although some forces may work against each other—increasing levels of education contrasting with increasing racial diversity—with respect to turnout, there are some reforms that do appear to have a liberalizing effect on turnout, such as same-day voter registration. May this model be one of many to analyze turnout, and may there be a robust debate on this topic in PS in the years to come.
SUPPLEMENTARY MATERIAL
To view supplementary material for this article, please visit http://doi.org/10.1017/S1049096524000829.
ACKNOWLEDGEMENTS
I am grateful to the anonymous reviewers and to the editors of this symposium for their feedback and guidance.
DATA AVAILABILITY STATEMENT
Research documentation and data that support the findings of this study have not yet been verified by PS’s replication team. Data will be openly available at the Harvard Dataverse on publication of the final article.
CONFLICTS OF INTEREST
The author declares no ethical issues or conflicts of interest in this research.