Published online by Cambridge University Press: 04 January 2017
The accuracy of U.S. Social Security Administration (SSA) demographic and financial forecasts is crucial for the solvency of its Trust Funds, other government programs, industry decision-making, and the evidence base of many scholarly articles. Because SSA makes public insufficient replication information and uses antiquated statistical forecasting methods, no external group has ever been able to produce fully independent forecasts or evaluations of policy proposals to change the system. Yet, no systematic evaluation of SSA forecasts has ever been published by SSA or anyone else—until a companion paper to this one. We show that SSA's forecasting errors were approximately unbiased until about 2000, but then began to grow quickly, with increasingly overconfident uncertainty intervals. Moreover, the errors are largely in the same direction, making the Trust Funds look healthier than they are. We extend and then explain these findings with evidence from a large number of interviews with participants at every level of the forecasting and policy processes. We show that SSA's forecasting procedures meet all the conditions the modern social-psychology and statistical literatures demonstrate make bias likely. When those conditions mixed with potent new political forces trying to change Social Security, SSA's actuaries hunkered down, trying hard to insulate their forecasts from strong political pressures. Unfortunately, this led the actuaries into not incorporating the fact that retirees began living longer lives and drawing benefits longer than predicted. We show that fewer than 10% of their scorings of major policy proposals were statistically different from random noise as estimated from their policy forecasting error. We also show that the solution to this problem involves SSA or Congress implementing in government two of the central projects of political science over the last quarter century: (1) transparency in data and methods and (2) replacing with formal statistical models large numbers of ad hoc qualitative decisions too complex for unaided humans to make optimally.
Authors' note: For helpful advice or comments, we are grateful to Bill Alpert, Jim Alt, Steve Ansolabehere, Neal Beck, Nicholas Christakis, Mo Fiorina, Dan Gilbert, Alexander Hertel-Fernandez, Martin Holmer, David Langer, and Theda Skocpol. Thanks also to the many participants in the forecasting and policy process for information and advice. Replication data are available on the Political Analysis Dataverse at http://dx.doi.org/10.7910/DVN/28323. Supplementary materials for this article are available on the Political Analysis Web site.