Pre-analysis plans (PAPs)—public documents that specify in advance the hypotheses a researcher will investigate and how the data will be collected and analyzed—have been championed as an important tool for addressing the problem of research credibility in the social sciences (Ioannidis Reference Ioannidis2005; Franco, Malhotra, and Simonovits Reference Franco, Malhotra and Simonovits2014; Simonsohn, Nelson, and Simmons Reference Simonsohn, Nelson and Simmons2014; Open Science Collaboration 2015; Christensen, Freese, and Miguel Reference Christensen, Freese and Miguel2019).Footnote 1 As shown in figure 1, which displays the number of PAPs registered on the Evidence in Governance and Politics (EGAP) and American Economic Association (AEA) registries since 2011, their numbers have skyrocketed in recent years.Footnote 2 Graduate students are now taught that registering a PAP is a de rigeur part of undertaking their research projects.
PAPs are advocated for two main reasons. First, they prevent “fishing” (also referred to as “p-hacking” or “data mining”). Fishing is the practice of selectively reporting, from among the many possible results that might be generated from a given set of data, the subset of findings that are statistically significant, novel, or allow the researcher to tell a cleaner or more compelling story.Footnote 3 PAPs solve this problem by specifying in advance exactly which econometric specifications, outcome variables, coding rules, covariates, sub-samples, and inclusion rules will be used to generate the results that will be presented as the definitive test of the research question. Specifying the key details of the analysis in advance reduces the “researcher degrees of freedom” (Simmons, Nelson, and Simonsohn Reference Simmons, Nelson and Simonsohn2011; Wicherts et al. Reference Wicherts, Veldkamp, Augusteijn, Bakker, Van Aert and Van Assen2016) that provide latitude for consciously or unconsciously selecting particular specifications that make the results more striking.
Second, PAPs prevent hypothesizing after results are known (sometimes abbreviated as “HARKing”). HARKing involves interpreting results ex post based on the results of the analysis rather than ex ante based on expectations derived from theory. PAPs address this problem by specifying in advance which hypotheses a researcher is intending to test, thus preventing the researcher from succumbing to hindsight bias and emphasizing in the presentation of her findings the hypotheses that happened to find support in the data (Nosek et al. Reference Nosek, Ebersole, DeHaven and Mellor2018). Registering research hypotheses in advance in a PAP need not prevent researchers from using their data to conduct exploratory research. Pre-registration simply clarifies which of the analyses presented in the paper are confirmatory (i.e., testing hypotheses specified before the results were known) and which should be treated as exploratory (i.e., products of learning and new hypothesis generation based on the patterns that emerged in the data). Both confirmatory and exploratory findings can be sources of insight, but the evidentiary status of each is quite different.
These two benefits of PAPs are clear and, for those committed to improving the credibility of social science research, compelling. But whether PAPs are actually achieving these goals in practice is an empirical question—albeit an extremely challenging one to answer definitively.Footnote 4 One cannot compare the degree of fishing and post-hoc hypothesis adjustment in studies implemented with and without PAPs because, absent a PAP, there is no record of the analyses or hypotheses that were pre-specified. And even if such a comparison were possible, the conclusions one could draw would be undermined by the fact that researchers self-select into whether they pre-register their analyses, and the researchers who file PAPs are quite likely different from those who do not. Moreover, even researchers who regularly file PAPs do not register them for all of their studies, so the lack of randomness in who pre-registers a PAP is compounded by within-researcher selection across projects.Footnote 5
We therefore adopt a different approach. Rather than attempt to test whether PAPs cause research to be more credible, we ask whether PAPs are written and employed in a way that makes such an improvement in research credibility possible. To do this, we draw a representative sample of PAPs and analyze their contents to determine whether they are sufficiently clear, precise, and comprehensive as to meaningfully limit the scope for fishing and post-hoc hypothesis adjustment. We also assess whether PAPs do, in fact, tie researchers hands by comparing a subset of the PAPs we examine to the publicly available papers that report the findings of the investigations they pre-specified. These are, of course, subjective evaluations. But we have undertaken in our coding rules and our procedures to be both transparent and objective in the judgements we make.Footnote 6 Our analysis provides an illuminating assessment of whether PAPs, as they are actually written and used, are able to accomplish the main objectives that have motivated their widespread promotion and adoption.Footnote 7 Our findings suggest that, in many cases, they are not.
The importance of such an assessment is rooted in the significant costs associated with writing and following a PAP (Olken Reference Olken2015; Coffman and Niederle Reference Coffman and Niederle2015; van’t Veer and Giner-Sorolla Reference Veer, Elisabeth and Giner-Sorolla2016; Duflo et al. Reference Duflo, Banerjee, Finkelstein, Katz, Olken and Sautmann2020). The modal researcher in our 2018 potential PAP users’ survey (discussed later) reports spending two to four weeks preparing her pre-registration materials, and more than one-quarter of researchers report spending more than a month. Beyond the time they take to write, the hand-tying that PAPs entail is claimed to limit the scope for breakthroughs that come from unexpected findings, restrict flexibility to adapt to changing circumstances or new opportunities, and generate boring, mechanical papers that are disfavored by reviewers and journal editors. PAPs are also said to force researchers to undertake analyses that they know to be inappropriate or sub-optimal once they have encountered their data. In addition, critics point out that whatever the benefits of pre-registration may be in theory, PAPs are unlikely to enhance research credibility without vigorous policing—something the disciplines provide little guidance for undertaking and generally do not reward (Laitin Reference Laitin2013; Laitin and Reich Reference Laitin and Reich2017). Still others argue that publicly posting the details of one’s proposed analyses creates a risk of getting “scooped.” This is especially a concern for junior scholars and other researchers who may lack the resources to quickly implement promising research designs.
While there are good responses to many of these objections (many of which we discuss later), they nonetheless underscore the importance of assessing how much weight should be put on the positive side of the pre-registration ledger. Doing so requires undertaking the stocktaking exercise we present here. We have summarized our findings to be accessible to members of the discipline who have heard about PAPs but are not familiar with the rationale behind them or the debates surrounding their usage. In this respect, the paper serves as both an introduction to this important and relatively new research practice, and as an empirical evaluation of whether it is achieving its desired ends. Because this stocktaking covers only the first six years of PAP adoption, it provides clearer answers about the ability of the first generation of PAPs to reduce the scope for fishing and HARKing than about the clarity, precision, or completeness of PAPs registered in the last year or two. However, the discussion of the costs and benefits of pre-registration, along with the discussion of the complementary norms and institutions that might encourage and reinforce the positive impacts of registering a PAPs, remain highly relevant today.
Are PAPs Achieving Their Objectives?
Empirical Approach
To evaluate whether PAPs are written sufficiently clearly and comprehensively to achieve their intended objectives, we drew a representative sample of PAPs from the universe of studies registered on the EGAP and AEA registries between their initiation and 2016.Footnote 8 Because we were interested not just in the PAPs’ contents but also in how those contents shaped the reporting of the research that was undertaken, we drew our sample so that roughly half of the PAPs would be from studies that had resulted in publicly available journal articles or working papers. Our procedures, which we describe in detail in the Appendix, yielded a sample of 195 PAPs, 93 of which had resulted in publicly available papers.
We coded all 195 PAPs according to a common rubric that recorded details of the pre-specified hypotheses; the dependent and independent variables that would be used in the analysis; the sampling strategy, inclusion and exclusion rules; and the statistical models to be run, among other features. For the sub-sample of ninety-three PAPs for which publicly available papers were available, we added further questions that addressed how faithfully the study authors adhered to the pre-specified details of the analysis in the resulting paper. The complete coding rubric for PAPs with papers is provided in online appendix C. All PAPs were coded by at least two different people—a research assistant and one of this paper’s authors—and any discrepancies between them were investigated and recoded.
Although much of the information collected in the coding rubric was straightforward and unambiguous—for example, whether the PAP was registered prior to data collection or whether it included a power analysis, committed to a multiple testing adjustment, or was ever private/gated—a number of the key coding items involved subjective judgements. Chief among these was whether the main research hypotheses and the key causal and outcome variables were specified sufficiently clearly to prevent post-hoc adjustments. For the latter, our coding rules asked the coder to consider, following Olken (Reference Olken2015), whether “if you gave the PAP to two different programmers and asked each to prepare the data for the primary dependent/independent variable(s), they [would] both be able to do so without asking any questions, and they [would] both be able to get the same answer.” As for the clarity of the research hypotheses, we defined a “clear hypothesis” as one that describes a relationship between an independent and dependent variable in which the direction of the effect is specified.
In the discussion that follows, we occasionally draw on examples from the PAPs we analyzed to illustrate our points. When we do so, we change the details to protect the anonymity of the PAP authors. This is in keeping with our goal of identifying broad patterns in how PAPs are written and used, not singling out individual authors for particularly weak (or strong) practices.
We supplemented our coding of PAPs with an anonymous survey of potential PAP users to elicit their experiences with writing and using PAPs in their research. We were especially interested in collecting information about investigators’ decisions surrounding whether or not to pre-register a study, and how the practice of composing and registering a PAP had changed the ways in which they went about their work, as well as how the rise of pre-registration affected their professional behavior more generally. The survey was conducted in 2018, so it captures a set of attitudes and behaviors closer to the present day than the patterns reflected in the PAPs we coded. The survey (reproduced in full in online appendix D) was sent to all affiliated researchers in the EGAP and Innovations for Poverty Action (IPA) research networks (N=664). We received 155 responses, of which 81% reported having registered a PAP for at least one project and 60% reported having registered multiple PAPs.Footnote 9
Before turning to our findings, it will be useful to say something about the sample of PAPs on which our stocktaking is based. The overwhelming majority of the 195 PAPs we coded were from field (63%), survey (27%), or lab (4%) experiments; observational studies comprised just 4% of our sample. Eighty-one percent of PAPs were registered on the EGAP or AEA websites prior to data collection, and another 19% were registered after data collection but before the researchers had access to their data or began their analysis.Footnote 10 Among the PAPs with papers, 66% were working papers and 33% were journal articles. In keeping with their share in the population of PAPs registered on the EGAP and AEA registries during the period we studied, and reflecting the rapid uptake of pre-registration during this time frame, 45% of the PAPs we coded were registered in 2016, the final year of our analysis. This imbalance (somewhat) allays concerns that the findings we present come from the very early period of PAP usage, when researchers were still just learning how to use PAPs as tools in their research. However, it is impossible to rule out that the patterns we find are different from those we would have discovered had we focused on the present day rather than the first six years of pre-registration.Footnote 11
Do PAPs Reduce the Scope for Fishing?
Fishing is made possible by imprecise variable definitions and by lack of clarity about the statistical models that will be run, the covariates that will be included, and the rules that will be applied for excluding cases, among other details of the analysis that will be undertaken. The failure to clearly specify these aspects of the research design in advance provides scope for researchers to run their analyses multiple ways and then present as their “test” of the hypothesis in question the specification that happens to generate the most appealing results.Footnote 12 This can happen either nefariously (by researchers searching for findings that they think are more likely to be published or bring them renown) or inadvertently via post-hoc rationalization (“Of course this was the right specification to run! Silly of me not to have seen this at the outset!”)—a skill at which human beings are dangerously accomplished (Nosek et al. Reference Nosek, Ebersole, DeHaven and Mellor2018). Whatever the source, fishing undermines the credibility of the research findings by ignoring or downplaying null/disconfirming results that, if reported, might provide a more accurate reflection of the true relationships in the area of study.
One of the key features we coded in our sample of PAPs was whether the primary dependent and independent/treatment variables were operationalized sufficiently clearly as to prevent post-hoc adjustments. Examples of lack of clarity include defining outcomes of interest in overly broad and unspecific terms—for example, “political participation,” “democratic consolidation,” or “educational attainment”—without specifying how these concepts are to be measured. Promising to “create an index” or do a “content analysis of programming” without specifying exactly how the index is to be constructed or the content analysis is to be undertaken offer other illustrations. None of these examples would pass the Olken test described earlier. These violations are relatively rare, however. In our sample of PAPs, 77% of primary dependent variables and 93% of independent/treatment variables were judged to have been clearly specified.Footnote 13
PAP authors were not as good, however, at clearly specifying their control variables. Many PAPs indicated the researchers’ intentions to “include baseline controls to improve precision” or to control for vaguely defined covariates such as “wealth,” “demographic characteristics,” “employment status,” or “cognitive ability.” While these variables may well be relevant to include, describing them in the PAP in such broad and non-specific terms leaves wide scope for fishing at the data analysis stage. Even when attempts are made to clarify how such variables are to be measured, the clarifications themselves are sometimes also problematic. For example, defining “wealth” as an index based on characteristics such as the condition of a respondent’s dwelling, asset ownership, or the number of days household members go without food still leaves broad latitude for subjectivity (which dwelling conditions? which specific assets? what if there is enough food for some family members but not others?) and fails the Olken test.
Lack of clarity in variable definition is not the only issue. In 44% of PAPs, the number of pre-specified control variables was judged to be unclear, making it nearly impossible to compare what was pre-registered with what is ultimately presented in the resulting paper. The flexibility stemming from such imprecision provides wide scope for generating results that might not otherwise have reached traditional levels of statistical significance.Footnote 14
Further scope for fishing comes from imprecision in the empirical models that are pre-specified.Footnote 15 Insofar as researchers can generate different results if they run their analyses using ordinary least squares, weighted least squares, multinomial logit, or other approaches, and with or without particular adjustments for calculating standard errors, it is critical to commit in advance to a particular statistical model. Sixty-eight percent of PAPs were judged to have spelled out the precise statistical model to be tested; 37% specified how they would estimate their standard errors. In 19% of cases, the models presented in the resulting papers deviated from the models specified in the PAP—for example, two-stage least squares was run when ordinary least squares was pre-specified; controls were added or omitted; covariate adjustment was specified in the PAP but not undertaken in the paper. Such deviations are not a problem if they are noted and a rationale is provided for the divergence from what was pre-registered. However, in the fourteen instances in our sample where deviations occurred, the change was noted in only one case.
Additional latitude for specification searching comes from lack of clarity about the rules that researchers will apply to include or exclude units from their analyses and, in experimental work, to deal with unanticipated imbalances across treatment and control groups. Such rules are important because unforeseen implementation challenges—attrition, noncompliance, project delays, problems with randomization—often force researchers to make fixes at the analysis stage that can bias the results, intentionally or unintentionally, toward a particular conclusion. Twenty-five percent of PAPs specified how they would deal with missing values or attrition; 13% specified how they would deal with noncompliance; 8% specified how they would deal with outliers; and 20% specified how they would deal with covariate imbalances. It would appear that study authors are less careful about pre-specifying what they will do if their implementation does not go according to plan than they are about pre-specifying other details of their proposed analysis. While all of the studies for which rules about missingness, non-compliance, and outliers were pre-specified followed them in the resulting papers, the fact that so many PAPs were silent on these issues underscores the incompleteness of most PAPs—and the opportunities that such omissions provide for researchers to tweak their analyses in ways that generate particular results.
The practical difficulties of pre-specifying responses to every possible implementation problem that might arise are severe. As Duflo et al. (Reference Duflo, Banerjee, Finkelstein, Katz, Olken and Sautmann2020) underscore, “trying to write a detailed PAP that covers all contingencies, especially the ones that are ex ante unlikely, becomes an extraordinarily costly enterprise.” One response to this problem is the adoption of standard operating procedures (SOPs)—a set of default practices adopted by a lab or research group to which study authors can commit in advance to guide decisions that are not addressed specifically in the PAP (Lin and Green Reference Lin and Green2016). However, notwithstanding the utility (and time savings) that might come from committing to SOPs, just 3% of the PAPs in our sample indicated that they would rely on SOPs to deal with unanticipated deviations from their pre-registered designs.
Do PAPs Reduce the Scope for Post-Hoc Hypothesis Adjustment?
The clearest strategy for eliminating the scope for post-hoc hypothesis adjustment is to specify the research hypotheses in a way that leaves no ambiguity about the propositions that the analysis will test. In this respect, PAP authors in the sample we studied did quite well. Ninety percent of the PAPs we coded were judged to have specified clear hypotheses.
However, even clearly specified hypotheses can leave scope for HARKing if authors pre-specify so many hypotheses that they can pick and choose which ones to report after they have seen their results. In this respect, PAP authors fared less favorably. While 34% of PAPs specified between one and five hypotheses—a number sufficiently small as to limit the leeway for selective presentation of results downstream—18% specified between six and ten hypotheses; 18% specified between 11 and 20 hypotheses; 21% specified between 21 and 50 hypotheses; and 8% specified more than 50 hypotheses (see figure 2, panel A). PAPs that pre-specify so many hypotheses raise questions about the value of pre-registration.Footnote 16
One safeguard against this pitfall is to distinguish between primary and secondary hypotheses. Many PAPs adopt this protection: among authors who pre-specified more than five hypotheses, 60% make such a distinction. But they often do so in ways that do little to solve the underlying problem. As shown in panel B of figure 2, 42% of PAPs that distinguished between primary and secondary hypotheses limited the number of primary hypotheses they specified to five or fewer. Twenty-six percent pre-specified six to ten primary hypotheses; 12% pre-specified eleven to twenty; 17% pre-specified twenty-one to fifty; and 3% pre-specified more than fifty. From the standpoint of reducing the scope for selective presentation of research findings, distinguishing between primary and secondary hypotheses is only useful if the number of primary hypotheses is kept small.
Another safeguard is to pre-commit to a multiple testing adjustment. Multiple testing adjustments down-weight the statistical significance of any single result based on the number of hypotheses that are being tested, thus guarding against the cherry-picking results in instances where there are many possible findings to choose from and the chances of generating a false positive are high. Among the PAPs in our sample that pre-specified more than five hypotheses, 29% pre-committed to a multiple testing adjustment.
Taken together, these practices leave significant leeway for authors to omit results that are null or that complicate the story they wish to tell. But do authors take advantage of this latitude in practice? To find out, we examined the sub-sample of ninety-three PAPs that had publicly available papers and compared the primary hypotheses pre-specified in the PAP with the hypotheses discussed in the paper or its appendices.Footnote 17 We find that study authors faithfully presented the results of all their pre-registered primary hypotheses in their paper or its appendices in 61% of cases. More than one- third of studies had at least one pre-registered hypothesis that was never reported. Taking primary and secondary hypotheses together, the median paper in our sample neglected to report 25% of the hypotheses that had been pre-specified in the PAP. To be sure, constraints on journal space, the desire to package a study’s results in a more readable form, and sometimes the requests of editors or reviewers, rather than unscrupulous research practice, likely accounts for many of the omitted hypotheses.Footnote 18 But the frequency of the mismatch between what is pre-registered and what is presented undermines research credibility.
Apart from pre-registering hypotheses that are not reported in the paper, authors may also deviate from the PAP, sometimes in response to requests by reviewers, by reporting the results of hypotheses that were not pre-registered at all. We found that 18% of the papers in our sample presented tests of novel hypotheses that were not pre-registered.Footnote 19 Such deviations need not be a problem for research credibility if authors are transparent about the fact that the hypotheses were generated after the PAP was filed. But authors that presented results based on hypotheses that were not pre-registered failed to mention this in 82% of cases.
Other Issues
Addressing the “file drawer problem.” Beyond reducing the scope for fishing and post-hoc hypothesis adjustment, PAPs can help address the “file drawer problem” (Rosenthal Reference Rosenthal1979).Footnote 20 The file drawer problem refers to the bias in the published literature on a given topic resulting from the tendency for authors not to submit, reviewers not to support, or journals not to publish results that fail to reach conventional thresholds of statistical significance.Footnote 21 Although the root of the file drawer problem lies in disciplinary norms that disfavor null results, pre-registration and PAPs can aid in addressing the dilemma.
Absent pre-registration, consumers of research only have access to the subset of studies that have been published or made publicly available as working papers. Although studies commonly fail to result in publications or working papers for reasons that are uncorrelated with the outcomes that they generated, much evidence suggests that some fail to enter the public realm because they generate null results (Gerber and Malhotra Reference Gerber and Malhotra2008; Franco, Malhotra, and Simonovits Reference Franco, Malhotra and Simonovits2014; Andrews and Kasy Reference Andrews and Kasy2019). With pre-registration, consumers of research gain access to a record of studies that were initiated but never made public, thus enabling consumers of research to make an educated inference about how likely it is that the findings in the public domain are representative of the underlying distribution of results that have been generated. If social science registries contain dozens of pre-registered studies on a given topic but the literature contains only a handful of publications, then researchers would be right to be skeptical of the published findings.
Whether pre-registration aids in addressing this problem, however, depends on whether researchers actually consult registries to learn whether investigations on a given topic have been undertaken. We asked researchers about this in our potential PAP users’ survey, and 38% reported that they had ever consulted a registry for this purpose.Footnote 22 Like a tree falling in a forest with nobody nearby to hear it, PAPs—and pre-registration more generally—will do little to reduce the file drawer problem if researchers do not take advantage of the public record that pre-registration provides about what has been done.
Several journals in political science and economics have responded to the file drawer problem by experimenting with “registered reports” in which authors submit PAPs in lieu of finished research papers. Editors and reviewers then evaluate these submissions based on the importance of the questions that motivate the research and the quality of the proposed designs, with strong submissions accepted in principle on the condition that the data is collected and analyzed as proposed.Footnote 23 Registered reports enhance the probability of publishing null results on questions of theoretical importance and align the incentives of paper authors and reviewers to present the very best articulation of the theory and the most appropriate empirical tests.
One such experiment in political science, a 2016 special issue of Comparative Political Studies, generated mixed reviews. Study authors generally liked the results-free submission and review process (Bush et al. Reference Bush, Erlich, Prather and Zeira2018), but the journal editors concluded that the costs outweighed the potential benefits and indicated that they would not be moving toward a registered reports model for the journal writ large (Ansell and Samuels Reference Ansell and Samuels2016; Findley et al. Reference Findley, Jensen, Malesky and Pepinsky2016). Another experiment, at the Journal of Development Economics, appears to have been more positive, although the pilot’s organizers identified a number of challenges, including the difficulty in judging submissions without seeing the final research findings, the up-front costs of composing guidelines for authors and reviewers, and the considerable effort required to guide authors and reviewers through a novel process that was demanding and “out of their comfort zone” (Foster et al. Reference Foster, Karlan, Miguel and Bogdanoski2019).
Protecting against research partners with rival interests. Another leading rationale for PAPs is that they can help protect researchers against partners with rival interests. Donors and governments often fund the research activities for which PAPs are written. Like pharmaceutical companies that underwrite research in the medical sciences, these actors may have interests in having the research generate particular conclusions. By providing an opportunity to discuss and agree in advance on both the results that will be reported and the specifications that will be employed to generate them, PAPs can help protect against pressure from such partners to favor particular empirical approaches or findings once the data analysis has begun and the results are becoming clear. Although most researchers in our potential PAP users’ survey indicated that they had not yet used a PAP to protect themselves against a research partner with rival interests, several indicated that they had, and others indicated that they imagined that a PAP could be useful for this purpose.
Objections to PAPs
In addition to allowing us to evaluate whether PAPs are delivering on their promise, our data also puts us in a position to address some of the objections to PAPs that have been raised in the literature.Footnote 24
Too Time Consuming
Foremost among the objections to PAPs is that they are too time-consuming to prepare. Eighty-eight percent of researchers in our potential PAP users’ survey reported devoting a week or more to writing the PAP for a typical project, with 32% reporting spending an average of two to four weeks and 26% reporting spending more than a month. It is perhaps not surprising, then, that 34% of researchers said that writing a PAP delayed their project’s implementation. In some situations—for example, when there is a limited window of opportunity to initiate an experiment before an election takes place or a new policy comes into force—such delays can make it impossible to undertake the project at all, and the opportunity can be lost. The time cost of registering and adhering to a PAP may also exclude researchers from less well-resourced institutions who do not have the time, resources, or training to carry out research in the ways that PAPs require.
However, while the potential PAP users we surveyed nearly all agreed that writing a PAP was costly in terms of time, 64% agreed with the statement that “it takes a considerable amount of time, but it is worth it.”Footnote 25 An overwhelming majority (eight in ten) said that drafting a PAP caused them to discover things about their project that led to refinements in their research protocols or data analysis plans. Sixty-five percent said that it put them in a position to receive useful feedback on their project design that they otherwise would not have received. And 52% said that they experienced downstream time savings from having written a PAP, with 64% (so, 33% overall) indicating that these savings were equal to or greater than the time spent to draft the PAP in the first place. PAPs thus appear to shift the timing of work on research projects from the back end, when the analysis is done, the results are written up, and most of the careful thinking about the project has traditionally taken place, to the front end. But, for at least some researchers, it is not clear that, on net, PAPs generate significantly more work. To the extent that they do, this cost must be weighed against the benefits to research credibility that result from a study whose analyses and hypotheses were pre-registered.
Limit Flexibility and Scope for New Discoveries
Another major critique of PAPs is that they constrain flexibility to adapt to unanticipated circumstances and limit the scope for new discoveries that come from unrestricted explorations of one’s data.Footnote 26 One researcher in our potential PAP users’ survey faulted PAPs for forcing her/him to “think about the lowest risk research I can run with the least potential for surprising findings.” Another described PAPs as “stifling creativity” and worried that they “are being used as ammunition against careful researchers with integrity who genuinely want to learn from data.” Others worry more generally that a mode of inquiry focusing exclusively on the investigation of a narrow set of pre-specified relationships will remove opportunities for understanding relationships between variables, the sensitivity of different empirical tools to different types of data, and other investigations that provide seasoned researchers with the intuitions that set them apart from novices. These are important critiques, but they were outlier views in our users’ survey. Eleven percent of researchers said they thought that the existence of a PAP restricted their ability to fully explore and analyze their data “quite a bit,” whereas 43% reported feeling not at all constrained and 46% reported feeling somewhat constrained. Similarly, 15% said they thought that having registered a PAP prevented them “quite a bit” from stumbling on unexpected, surprise results, whereas 37% reported that the existence of a PAP had not at all prevented them from generating unanticipated findings and 48% reported being somewhat prevented.
One response to the hand-tying generated by pre-specification is to pre-commit to an iterative approach in which the results from one part of the study inform the analysis of subsequent parts in carefully pre-specified ways.Footnote 27 Such an approach can be particularly attractive in situations where prior information about the subject of study is limited, making it difficult for researchers to be confident that they are pre-specifying the full set of relevant or interesting hypotheses. While theoretically attractive, such iterative PAPs are tricky to implement in practice. For example, without a neutral gatekeeper, it can be challenging for researchers to document that iterations were truly pre-specified (Bidwell, Casey, and Glennerster Reference Bidwell, Casey and Glennerster2020).
The more common approach—and the approach we advocate—is to freely undertake exploratory investigations that go beyond the PAP, clearly labeling the results of such investigations in the paper as coming from analyses that were not pre-specified, with an explanation provided for why they were added. Such an approach allows authors to investigate new hypotheses that occur to them after they have immersed themselves in the data, while offering high transparency about the research process that generated results they report. It also allows researchers to avoid the selective attention trap highlighted by Yanai and Lercher (Reference Yanai and Lercher2020). Pursuing such a strategy faithfully, with findings clearly marked as pre-registered or exploratory and explanations provided for each deviation from the PAP—along with the mandatory reporting, in the body of the paper or the appendices, of every analysis that was pre-specified—might appear to come at the expense of the tight narrative that reviewers and journal editors are thought to favor. However, in an analysis of publication outcomes of experimental NBER working papers that do and do not include PAPs, Ofosu and Posner (Reference Ofosu and Posner2020) find that while papers with PAPs are, in fact, slightly less likely to be published, they are more likely to land in a top-five journal, conditional on being published.
Policing
By providing a record of the hypotheses a researcher intends to investigate and the analyses she commits herself to employ to test them, a PAP makes it possible for deviations from these pre-specified plans to be identified—but only if reviewers, editors, or consumers of the published work invest the considerable time and energy to track down the PAP and compare it (and, sometimes, its several iterations) side-by-side with the working paper or published article.Footnote 28 Laitin (Reference Laitin2013) makes the point strongly: “registration without a community of scholars interested and incentivized to challenge findings is worthless.”
Is there any evidence that such policing actually happens? We asked the researchers in our potential PAP users’ survey whether, when they had submitted a paper with pre-registered analyses for review at a journal, reviewers had ever mentioned their PAP. Thirty-nine percent reported that reviewers had. This relatively low share may reflect the fact that only 28% of PAP users said that they had ever included their PAP when they submitted their paper to a journal (however, another 50% said that this was because the paper mentioned the PAP, and they assumed that reviewers could easily find it).Footnote 29 A similar share said that other researchers had invoked their PAP when discussing their paper outside of the formal review process (35%), or that they themselves had consulted the PAP of a paper they were reviewing (34%). While PAPs may make policing possible, the norms and practices among reviewers, journal editors, and seminar participants seem not to have yet evolved to generate the strong policing equilibrium that would be required for PAPs to play the hand-tying role that is often imagined.Footnote 30
Policing involves not just effort on the part of reviewers, seminar participants, and other consumers of research, but also cooperation from the researcher producers themselves. The willingness of study authors to respond to queries about their work—especially when replication data, survey instruments, or code have not been made publicly available, or when PAPs remain private or gated—are essential companions to pre-registration.Footnote 31 It is therefore noteworthy that only 68% of the authors whose private/gated PAPs were randomly selected into our sample, and who we contacted to request that they share their PAPs with us, even replied to our e-mail, and only 58% were willing to share their PAP.Footnote 32 Given the emerging norms in both economics and political science about the importance of adopting open science practices (Christensen et al. Reference Christensen, Freese and Miguel2019), registering a PAP is taken as a signal of “type.” However, such signals become uninformative if researchers who embrace some open science practices (such as pre-registration) are unwilling to do the (admittedly hard) work of following through when other researchers request additional information.
There is a sentiment in some parts of the PAP users’ community that PAPs offer the worst of both worlds, in the sense that they tie researchers’ hands, preventing them from investigating interesting threads that emerge in their analysis, while still leaving them open to demands from reviewers for endless robustness tests. As one PAP user wrote: “I’ve gotten an absurd number of requests for sensitivity analyses for strictly pre-specified empirical work. The existing norm appears to keep me from looking for unexpected results while providing no protection from readers or reviewers who want to dig through the data trying to kill off empirical results they don’t agree with.” Another expressed frustration with the different expectations of different participants in the review process: “Some reviewers didn’t like when we distinguish between hypotheses that were included in the PAP and those that were not. But other reviewers thought we were trying to hide something when we presented all the results (PAP and non-PAP) together.” Although 46% of PAP users report having invoked their PAP to respond to the suggestions of reviewers or workshop participants regarding additional analyses to run, one lamented that pointing to the PAP does little good, since “referees and editors ignore them/refuse to be bound by them.” Again, the absence of common norms about what PAPs obligate both producers and consumers of research to do leaves pre-registration well short of achieving its goals.
Getting Scooped
We also asked researchers in our potential PAP users’ survey whether, in contemplating registering a PAP, they had any concern that others might scoop their ideas. Forty-six percent reported having no concern whatsoever, with another 39% saying they had slight concern. Eleven percent said that they were unconcerned because the PAP was gated or private. If we assume that preventing others from stealing their ideas was the only reason why these researchers gated their PAPs, then the total share of researchers expressing significant concern about getting scooped is below 15%.
The Balance Sheet
Our stocktaking suggests that PAPs registered during the first six years of the “pre-registration revolution” (Nosek et al. Reference Nosek, Ebersole, DeHaven and Mellor2018) were often not written or used in a way that allowed them to do everything that their proponents had hoped. Many PAP authors were insufficiently clear about the hypotheses they were testing to prevent them from moving the goal posts once they had seen the patterns in their data. The details of the analyses that PAPs pre-specified—how outcome and causal variables were to be operationalized; which controls would be included; what the statistical model would look like; how imbalances, outliers, and attrition would be dealt with—were not always adequate to reduce researcher degrees of freedom in a meaningful way. In addition, papers that resulted from pre-registered analyses did not always follow what was pre-registered. Some papers introduced entirely novel hypotheses; others presented only a subset of the hypotheses that were pre-registered.
But documenting that not all PAPs adequately addressed all of the problems they were designed to solve does not imply that the growing use of PAPs in political science and economics during this period did not generate more credible research. Figure 3 reports the share of PAPs in our sample that meet what we take to be the four key requirements for a complete, well-specified PAP: specifying a clear hypothesis, specifying the primary dependent and independent/treatment variable(s) sufficiently clearly so as to prevent post-hoc adjustments, and spelling out the precise statistical model to be tested. Just over half of the 195 PAPs we analyzed were judged to meet all four of these criteria, and about another third were judged to satisfy three of the four.Footnote 33 Although this is hardly a perfect record, it seems reasonable to view our stocktaking as suggesting that the glass is half full rather than half empty—especially when one recognizes that the counterfactual condition would be a world with no PAPs at all. Even if the scope for fishing and HARKing was not foreclosed by every PAP, such opportunities were limited to at least some degree in most. Even imperfect PAPs increase the credibility of (at least some aspects of) the research studies for which they are written.
As PAP skeptics point out, however, these improvements to research credibility came at a price. Writing a PAP occupies weeks of valuable research time, and adhering faithfully to what was pre-specified may limit flexibility and creativity, reduce the scope for new discoveries, and result in research papers that more closely resemble lab reports than the sorts of exciting write-ups that reviewers and journal editors are thought to favor—or so critics claim. While the time costs of writing a PAP are real, the alleged constraints on flexibility, creativity, and exploration can be loosened by simply labeling one’s investigations as exploratory or confirmatory or by explaining the exigent circumstances that necessitated the departure from what was pre-specified. The concern that adherence to PAPs results in boring, rote papers can be addressed by a combination of better writing and a re-weighting of priorities toward scientific rigor over compelling narrative. Equally important, the data from our potential PAP users’ survey suggest that PAPs do not restrict researchers’ investigations or gum up the research process nearly as much as their detractors claim. On balance, researchers report that the benefits of writing a PAP outweigh the costs. For every researcher who describes PAPs as “an additional hassle” or “toxic to the process of doing research,” there is another who says that writing a PAP “makes me more thoughtful and deliberate” or “causes me to really think through design and analysis decisions that, honestly, were often done on the back end.” The cost of writing and adhering to a complete and comprehensive PAP may simply be the price researchers need to pay for making their research more credible.
The Importance of Complementary Norms and Institutions
Our stocktaking exercise was motivated by a desire to assess the extent to which PAPs, as they are actually written and used, generate meaningful improvements in research credibility. Our strategy for answering this question was to scrutinize whether PAPs were sufficiently clear, precise, and comprehensive to prevent fishing and HARKing. However, as we have hinted at several points in the discussion, the impact of PAPs on research credibility may depend less on the contents of the PAPs themselves than on the presence of a set of complementary norms and institutions that provide guidance on how PAPs should be used in the research process and that create incentives for researchers to invest the time and energy to produce and police them.
A first, crucial set of norms speak to what, exactly, a complete PAP should contain and how PAPs should be adapted for observational studies, which comprise the majority of research projects undertaken in political science and economics (Burlig Reference Burlig2018; Jacobs Reference Jacobs, Elman, Gerring and Mahoney2020). Although several publications provide recommendations for what authors should (and need not) include in their PAPs (McKenzie Reference McKenzie2012; Glennerster and Takavarasha Reference Glennerster and Takavarasha2013; Piñeiro and Rosenblatt Reference Piñeiro and Rosenblatt2016; Kern and Gleditsch Reference Kern and Gleditsch2017; Christensen, Freese, and Miguel Reference Christensen, Wang, Paluck, Swanson, Birke, Miguel and Littman2019; Chen and Grady Reference Casey, Glennerster and Miguel2019; Duflo et al. Reference Duflo, Banerjee, Finkelstein, Katz, Olken and Sautmann2020), there are no universally agreed upon rules in either discipline for what a comprehensive PAP should look like. This lack of common standards may account for some of the deficiencies we identified in our coding exercise. Recent innovations such as DeclareDesign (Blair et al. Reference Blair, Christia, Samii and Weinstein2019), which provides software that allows researchers to formally describe (and troubleshoot) the details of their proposed analyses, provide clear templates that may help remedy this problem. But they are new and have yet to become widely adopted.
Alongside clarifying the standards for what PAPs should include, a major issue is the development of norms about how PAPs should be used by the research community. Laitin articulates the problem well when he writes that “all the pre-analysis plans … we produce do not serve science if no one has a career interest in deciphering them or confirming the results that followed from them. We have increased the supply of transparency but have given insufficient attention to generating a demand for it” (Laitin Reference Laitin2018). Scrutinizing PAPs and comparing their contents to what is reported in the resulting publications and working papers is tedious work, but it is necessary for the credibility-enhancing benefits of PAPs to be fully realized. Creating disciplinary incentives for such policing is a critical challenge.
The most logical venue for such scrutiny is the journal review process.Footnote 34 But here, too, the disciplines lack clear norms. Should researchers be required to submit their PAPs along with their papers? Should reviewers be expected to go through the PAP and certify that the analyses presented in the paper match those that were pre-specified? What should reviewers or editors do if, as we found in many of the PAPs we analyzed, the pre-specification of hypotheses or procedures is too unclear or incomplete to remove the scope for fishing or HARKing? Or what if, as in Bidwell, Casey, and Glennerster (Reference Bidwell, Casey and Glennerster2020), the PAP was periodically updated during the course of the project, making the task of identifying deviations maddeningly complex? Is it fair for reviewers to ask authors of papers with PAPs to present multiple robustness tests as a condition for acceptance? These and other questions will need to be debated and answered in order to better harness the formal review process to more fully leverage the transparency that PAPs offer.
While the enhanced research credibility generated through pre-registration accrues to the pre-registered studies themselves, some of the benefits of pre-registration depend on the adoption of the practice by the discipline as a whole. For example, the role that pre-registration plays in addressing the file-drawer problem depends on researchers becoming habituated to consulting study registries for clues about the true distribution of findings in a given area. But such consultations will only be informative if the registries are complete and comprehensive. Bolstering the usefulness of registries as repositories of what has been done will thus require bolstering norms about the necessity of pre-registration.
Convincing researchers who do not currently pre-register their projects to begin doing so (much less convincing them to begin composing and filing formal PAPs) is no easy task, however—especially if standards for the precision and comprehensiveness of PAPs are tightened in the ways we are suggesting they need to be.Footnote 35 The recently completed State of Social Science Survey (Christensen et al. Reference Christensen, Freese and Miguel2019) finds that while the majority of researchers in political science and economics are aware of and support the norm of pre-registration, behavior in adopting the practice is significantly lagging. One key obstacle, revealed both in our data and in the evidence summarized in Christensen et al. (Reference Christensen, Freese and Miguel2019), is the hesitancy of authors of observational studies to register PAPs. In part, this reluctance stems from the fact that observational data is often available to researchers prior to initiating their projects, which makes it difficult or impossible for them to demonstrate that they composed their PAPs prior to looking at the data. Institutions for embargoing data or involving independent third-party actors, along the lines suggested in Bidwell, Casey, and Glennerster (Reference Bidwell, Casey and Glennerster2020) and Fafchamps and Labonne (Reference Fafchamps and Labonne2017), might increase the perceived value of PAPs among researchers using historical or administrative data and lead to their adoption by a broader set of scholars.Footnote 36
Another strategy for increasing the value of PAPs is to invest in institutions and norms that allow the researchers who write them to receive helpful feedback on their study designs. Groups such as EGAP, the Working Group in African Political Economy, and the Northeast Workshop in Empirical Political Science regularly reserve slots at their meetings for the discussion of PAPs, alongside completed working papers. Such discussions provide opportunities for receiving comments and suggestions at a key early stage in a project’s development. The promotion of norms—including within professional associations like APSA and AEA—that make seminar presentations of PAPs equally acceptable as presentations of finished papers would lead to the proliferation of such opportunities. This, in turn, would provide tangible benefits to PAP authors that help to offset the cost of composing the PAP, and thus increase willingness to make such investments in the first place.
Although their use has risen steeply in recent years, PAPs are still in their relative infancy. Our analysis, which covers PAPs registered between 2011 and 2016, captures the early years of PAP usage. This was a time when many authors were registering their first PAPs, and when norms about both what authors should include in their PAPs and how they should deal with deviations from what they pre-registered were still emerging. Although nearly half of our sample comes from 2016, the final year in this period, we think it is likely that PAPs registered today may be, on average, more precise and complete than those whose contents we analyzed—and that the contribution of PAPs to research credibility today may be even greater than what is suggested by our stocktaking. The further development of norms and complementary institutions that can both augment the power of PAPs to improve research credibility and create incentives for researchers to invest the time and energy to produce and police them will only reinforce these positive trends.
Supplementary Materials
A. E-mail to Authors of Private/Gated PAPs
B. Summary Statistics
C. Coding Rubric for PAPs with Papers
D. Potential Pap Users’ Survey
To view supplementary material for this article, please visit http://doi.org/10.1017/S1537592721000931.
Acknowledgements
The authors thank Maxim Ananyev and Merabi Chkhenkeli for excellent research assistance, and Graeme Blair, Ashley Blum, Alex Coppock, Steven Glazerman, Donald Green, Macartan Humphreys, Matt Lisiecki, and David McKenzie, as well as seminar participants at EGAP; Oxford; and University of California, Berkeley; and four anonymous reviewers for valuable comments. The authors gratefully acknowledge funding from the Social Science Meta-Analysis and Research Transparency (SSMART) program of the Berkeley Initiative for Transparency in the Social Sciences (BITSS). The survey of researchers who register pre-analysis plans, which provides some of the data reported in the paper, was determined to be exempt from IRB review (UCLA IRB # 19-000063). We registered our study at the Open Science Framework (OSF) registry:https://osf.io/xrtqm/.
Appendix
A total of 1,671 studies were registered on the EGAP (436) and AEA (1,235) registries during the period we studied (see figure A1).Footnote 1 Of these, 591 had PAPs (322 of the EGAP-registered studies and 269 of the AEA-registered studies).Footnote 2 We then identified whether each study had resulted in a publicly available paper. To do this, we conducted web searches of each study author’s web page, as well as key-word searches based on the project’s title and abstract. Of the 591 studies with PAPs, we found 235 that had resulted in a publicly available paper by the time of our search.
We then drew a random sample of one hundred of these studies, alongside a random sample of one hundred studies that had not yet resulted in a publicly available paper. In drawing these samples, we stratified by three criteria: the year the study was registered (2011, 2012, 2013, 2014, 2015, 2016), whether the study was registered on the EGAP or AEA registries, and whether the PAP was initially gated/private.
The fact that not all PAPs are made public at the time a study is registered created a challenge for our coding exercise. The AEA registry affords study authors the opportunity to keep their PAPs private and the EGAP registry, while strongly discouraging researchers from doing so, permits study authors to gate their PAP for a period of time.Footnote 3 As shown in figure A2, of the 591 studies with PAPs, 304 were initially private/gated, although 101 of those had become public/ungated by the time we drew our sample in March 2018.
To reach our goal of coding one hundred PAPs from projects that had resulted in publicly available papers and another set of one hundred that had not, and anticipating that some authors of private/gated PAPs might be unresponsive to our request that they share their PAPs with us, we oversampled 30% of private/gated plans in each category. The oversample contained 265 PAPs (132 with papers and 133 without), of which 123 were still private/gated as of March 2018. We contacted the authors of these private/gated PAPs via e-mail to ask them to confidentially share their PAPs with us.Footnote 4 Of the 120 authors who we can confirm received and read our e-mail, we received replies from 75 (68%), of which 64 (58%) were willing to share their PAP.Footnote 5
Our procedures yielded a sample of 204 PAPs, equally distributed between those with and without publicly available papers. In nine instances, working papers that had been found on authors’ websites at the time we drew our sample were no longer publicly available by the time we began our coding. We therefore coded 93 PAPs with papers, bringing our final sample of coded PAPs to 195. Summary statistics are provided in online appendix B.