Hostname: page-component-cd9895bd7-dzt6s Total loading time: 0 Render date: 2024-12-23T08:16:43.329Z Has data issue: false hasContentIssue false

I Saw You in the Crowd: Credibility, Reproducibility, and Meta-Utility

Published online by Cambridge University Press:  07 January 2021

Nate Breznau*
Affiliation:
University of Bremen
Rights & Permissions [Opens in a new window]

Abstract

Type
Opening Political Science
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
© The Author(s), 2020. Published by Cambridge University Press on behalf of the American Political Science Association

Crowdsourcing is a new word. Google Books’ first recorded usage of “crowdsourcing” was in 1999.Footnote 1 It originally referenced Internet users adding content to websites, the largest example being Wikipedia (Zhao and Zhu Reference Zhao and Zhu2014). As a scientific practice, its roots go back to open-engineering competitions in the 1700s, when the collective ideas of many overcame otherwise insurmountable problems of the few. Openly tapping into a large population breaks down barriers of time, money, and data. The solution is simple. A centralized investigator or team poses a question or problem to the crowd. Sometimes tasks are co-creative, such as the programming of packages by statistical software users and the development of Wiki content. Other times, they are discrete, such as asking individuals to donate computing power,Footnote 2 respond to a survey, or perform specific tasks. Using crowdsourced methods, scientists, citizens, entrepreneurs, and even governments more effectively address societies’ most pressing problems (e.g., cancer and global warming) (Howe Reference Howe2008).

Although the public comprises the typical crowd, new landmark studies involved researchers crowdsourcing other researchers. The human genome project, for example, involved major byline contributions from almost 3,000 researchers. In the social sciences, crowdsourcing of researchers is brand new. In a 2013 study, Silberzahn et al. (Reference Silberzahn, Uhlmann, Martin, Anselmi, Aust, Awtrey and Bahkik2018) convened researchers from across disciplines and geographic locations to analyze the same dataset to discover whether football referees issued more red cards to darker-skinned players.Footnote 3 Klein et al. (Reference Klein, Ratliff, Vianello, Adams, Bahník and Bernstein2014) sourced laboratories across the world to try and reproduce several high-profile experimental psychological studies. In the Crowdsourced Replication Initiative, Breznau, Rinke, and Wuttke et al. (Reference Breznau, Rinke, Wuttke, Adem, Adriaans, Amalia-Alvarez-Benjumea and Andersen2019) demonstrated that crowdsourcing also is useful for replication, structured online deliberation, and macro-comparative research. Studies like these are a new paradigm within social, cognitive, and behavioral research.

In the knowledge business, crowdsourcing is highly cost effective. The average researcher lacks the means to survey or conduct experiments with samples outside of universities. Crowdsourcing platforms such as Amazon’s Mechanical Turk (mTurk) changed this by creating access to even-more-convenient convenience samples from all over the world (Palmer and Strickland Reference Palmer and Strickland2016). Others use these platforms to crowdsource globally competitive labor to perform paid tasks of scientific value (Berinsky, Huber, and Lenz Reference Berinsky, Huber and Lenz2012). Blockchain technology uses decentralized crowd computing. Many actions that crowds already take (e.g., “Tweeting” and “geotagging” images) offer free possibilities to test hypotheses of social and political relevance (Nagler and Tucker Reference Nagler and Tucker2015; Salesses, Schechtner, and Hidalgo Reference Salesses, Schechtner and Hidalgo2013). Wikipedia had fantastic success, given the strong willingness of publics around the world to add and monitor content. The Wiki model is the basis for all types of crowd creations.Footnote 4 There are the WikiversityFootnote 5 crowdsourcing-teaching resources and the ReplicationWikiFootnote 6 for listing and identifying replications across the social sciences (Höffler Reference Höffler2017). In a similar vein, there are efforts to crowdsource open peer review and ideas for what researchers should study from potential stakeholders outside of science.

Some scholars might question why many researchers are necessary when we have so many preexisting datasets and a single researcher can run every possible statistical model configuration. Crowdsourcing researchers’ time and effort may seem inefficient when machine learning is so advanced. However, if a single scholar or machine throws all possible variables into their models like a “kitchen sink,” it is only efficient at maximizing prediction. There is a subtle but crucial difference between human research, where if X predicts Y, it might be a meaningful association, versus the machine approach, where if X predicts Y, it is meaningful (Leamer Reference Leamer1978). When humans or machines start running every possible model, which is the typical machine-learning strategy, they implicitly test every possible hypothesis and potentially every possible theory. Among all possible models, there will be some in which the data-generating process is theoretically impossible, such as predicting biological sex as an outcome of party affiliation, or right-party votes in 1970 from GDP in 2010. If we want something more than predictive power, human supervision and causal inference are necessary to rule out impossible realities that a computer cannot identify on its own (Pearl Reference Pearl2018).

Crowdsourcing is epistemologically different than machine learning. It is not a “kitchen-sink” method. The landmark Silberzahn et al. (Reference Silberzahn, Uhlmann, Martin, Anselmi, Aust, Awtrey and Bahkik2018) study demonstrated that seemingly innocuous decisions in the data-analysis phase of research can change results dramatically. The effect of idiosyncratic features of researchers (e.g., prior information and beliefs) are something Bayesian researchers raised warning signals about long ago (Jeffreys Reference Jeffreys1998 [1939]). This is echoed in more recent discussions about the unreliability and irreproducibility of research (Dewald, Thursby, and Anderson Reference Dewald, Thursby and Anderson1986; Gelman and Loken Reference Gelman and Loken2014; Wicherts et al. Reference Wicherts, Veldkamp, Augusteijn, Bakker, van Aert and van Assen2016). However, grappling with this uncertainty is problematic. Identification of informative research priors usually involves consulting an expert, such as the researcher doing the work, or using simulations wherein the user selects parameters that may (or may not) reflect the reality of the researcher’s decisions. The primary objection is that the distribution of prior beliefs across researchers is unknown (Gelman Reference Gelman2008). The Silberzahn et al. (Reference Silberzahn, Uhlmann, Martin, Anselmi, Aust, Awtrey and Bahkik2018) and Breznau, Rinke, and Wuttke (Reference Breznau, Rinke and Wuttke2018; Reference Breznau, Rinke, Wuttke, Adem, Adriaans, Amalia-Alvarez-Benjumea and Andersen2019) studies provide a real observation of these priors because all researchers develop what should be at least a plausible model for testing the hypothesis, given the data. This observation extends beyond prior studies because the crowdsourced researchers scrutinized the other researchers’ models before seeing the results. This practice is extremely useful because it is not one expert guessing about the relative value of a model but rather potentially 100 or more.

Crowdsourcing is epistemologically different than machine learning. It is not a “kitchen-sink” method.

The reduction of model uncertainty is good from a Bayesian perspective but it also demands theory from a causal perspective. Statisticians are well aware that without the correct model specification, estimates of uncertainty are themselves uncertain, if not useless. Thus, rather than having 8.8 million false positives after running 9 billion different regression models (Muñoz and Young Reference Muñoz and Young2018), humans can identify correct—or at least “better”—model specifications using causal theory and logic (Clark and Golder Reference Clark and Golder2015; Pearl Reference Pearl2018). The diversity of results in the Silberzahn et al. (Reference Silberzahn, Uhlmann, Martin, Anselmi, Aust, Awtrey and Bahkik2018) and Breznau, Rinke, and Wuttke et al. (Reference Breznau, Rinke and Wuttke2018; Reference Breznau, Rinke, Wuttke, Adem, Adriaans, Amalia-Alvarez-Benjumea and Andersen2019) studies helped to identify key variables and modeling strategies that had the “power” to change the conclusions of any given research team. These were not simply variables among millions of models but rather variables among carefully constructed plausible models. This shifts the discussion from the results and even the Bayesian priors to data-generating theories.

Machines cannot apply discriminating logic—or can they? The US military’s Defense Advanced Research Projects Agency (DARPA) deems the human-versus-machine question so important that it currently funds the Systematizing Confidence in Open Research and Evidence (SCORE) project pitting machines against research teams in reviewing the credibility of research across disciplines. SCORE architects believe that machines might be capable of determining the credibility of social and behavioral research better, faster, and more cost effectively than humans. In this case, crowdsourcing provides DARPA the exact method it needs to answer this question. This project is the largest crowdsourcing of social researchers to date, with already more than 500 participating.Footnote 7 It is a major demonstration that crowdsourcing can bring together the interests of academics and policy makers. It remains to be seen how valuable the machines are with minimal human interference.

At the heart of the SCORE project are questions of credibility and reliability. These topics also are at the center of scandals and conflicts that are infecting science at the moment and rapidly increasing researchers and funding agencies’ interests in replication (Eubank Reference Eubank2016; Ishiyama Reference Ishiyama2014; Laitin and Reich Reference Laitin and Reich2017; Stockemer, Koehler, and Lentz Reference Stockemer, Koehler and Lentz2018). However, there are limits to what replications can offer, and they take on diverse formats and goals (Clemens Reference Clemens2015; Freese and Peterson Reference Freese and Peterson2017). Currently, the decision of whether, what, and how to replicate is entirely a researcher’s prerogative. Few engage in any replications, meaning that any given original study is fortunate to have even one replication.Footnote 8 Journals thus far have been hesitant to publish replications, especially of their own publications, even if the replications overturn preposterous-sounding claims (e.g., precognition) (Ritchie, Wiseman, and French Reference Ritchie, Wiseman and French2012) or identify major mistakes in the methods (Breznau Reference Breznau2015; Gelman Reference Gelman2013). Crowdsourcing could change this because it provides the power of meta-replication in one study. Moreover, crowdsourcing might appear to journal editors as cutting-edge research rather than “just another” replication (Silberzahn and Uhlmann Reference Silberzahn and Uhlmann2015).

Perhaps more important, crowdsourced replications provide reliability at a meta-level. Replications, experimental reproductions, and original research alike suffer from what Leamer (Reference Leamer1978) referred to as “metastatistics” problems. If researchers exercise their degrees of freedom such that ostensibly identical research projects have different results—for example, more than 5% of the time—then any one study is not reliable by most standards. Breznau, Rinke, and Wuttke (Reference Breznau, Rinke and Wuttke2018) simulated this problem and deduced that four to seven independent replications are necessary to obtain a majority of direct replications—that is, a reproduction of the same results, arriving at similar effect sizes within 0.01 of an original study in a 95% confidence interval. Subsequently, the results from their crowdsourced project showed that even after correcting for major mistakes in code, 11% of the effects were substantively different from the original (i.e., a change in significance or direction) (Breznau, Rinke, and Wuttke Reference Breznau, Rinke, Wuttke, Adem, Adriaans, Amalia-Alvarez-Benjumea and Andersen2019). Søndergaard (Reference Søndergaard1994) reviewed replications of the Values Survey Model employed by Hofstede—one of the most cited and replicated studies in social science—and found that 19 of 28 replications (only 68%) came to approximately the same relative results. Lack of reproducibility is not restricted to quantitative research. For example, Forscher et al. (Reference Forscher, Brauer, Cox and Devine2019) investigated thousands of National Institutes of Health grant reviews and found that using the typical three to five reviewers approach achieved only a 0.2 inter-rater reliability. Increasing this to an extreme of 12 reviewers per application achieved only 0.5 reliability. Both scores are far below any acceptable standard for researchers subjectively evaluating the same text data.

There is much to learn about why researchers arrive at different results. It appears that intentional decisions are only one factor (Yong 2012) and that context, versioning, and institutional constraints also play a role (Breznau Reference Breznau2016; Breznau, Rinke, and Wuttke Reference Breznau, Rinke, Wuttke, Adem, Adriaans, Amalia-Alvarez-Benjumea and Andersen2019). Crowdsourcing addresses this meta-uncertainty because principal investigators (PIs) can hold certain research factors constant—including the method, data, and exact framing of the hypothesis—thereby exponentially increasing the power to both discover why researchers’ results differ and to meta-analyze a selected topic (Uhlmann et al. Reference Uhlmann, Ebersole, Chartier, Errington, Kidwell, Lai and McCarthy2018). Combined with the rapidly expanding area of specification-curve analysis, crowdsourcing provides a new way to increase credibility for political and social research—in both sample populations and among the researchers themselves (Rohrer, Egloff, and Schmukle Reference Rohrer, Egloff and Schmukle2017; Simonsohn, Simmons, and Nelson Reference Simonsohn, Simmons and Nelson2015). It is hoped that these developments are tangible outcomes that increase public, private, and government views of social science.

As social scientists, we naturally should be skeptical of the “DARPA hypothesis” that computers could be more reliable than humans for evidence-based policy making (Grimmer Reference Grimmer2015). In fact, a crowdsourced collaboration of 107 teams recently demonstrated that humans were essentially as good as machines in predicting life outcomes of children (Salganik et al. Reference Salganik, Lundberg, Kindel, Ahearn, Al-Ghoneim, Almaatouq and Altschul2020). In this study, however, both humans and machines were relatively poor at prediction overall. These problems may stem from a lack of theory. For example, in a different study of brain imaging involving 70 crowdsourced teams, not a single pair agreed about the data-generating model (Botvinik-Nezer et al. Reference Botvinik-Nezer and Holzmeister2020). Breznau et al. (Reference Breznau, Rinke, Wuttke, Adem, Adriaans, Amalia-Alvarez-Benjumea and Andersen2019) suggested that a lack of theory to describe the relationship of immigration and social policy preferences means that research in this area also is unreliable and—as these new crowdsourced studies demonstrate—meta-analyzing the results does not solve this dearth of theory. However, crowdsourcing has an untapped resource in addition to meta-analysis of results for resolving these issues: meta-construction of theory (Breznau Reference Breznau2020). In the Breznau et al. (Reference Breznau, Rinke, Wuttke, Adem, Adriaans, Amalia-Alvarez-Benjumea and Andersen2019) study, careful consideration of the data-generating model, online deliberation, and voting on others’ models revealed where theoretical weaknesses and lack of consensus exist. When coupling observed data-generating model variations across teams with their deliberations and disagreements on the one hand, with how these may or may not shape the results on the other, immigration and social policy scholars gain insights to where they should focus their theoretical efforts in the future to obtain the most significant gains in knowledge.

Additionally, as a structured crowdsourced research endeavor using the technology of the online deliberation platform Kialo, the Breznau et al. (Reference Breznau, Rinke, Wuttke, Adem, Adriaans, Amalia-Alvarez-Benjumea and Andersen2019) study undermined the current research system that favors novelty of individual researchers. It instead promoted consensus building and direct responsiveness to theoretical claims. When else do hundreds of political researchers collaborate to focus their theoretical discussions in one area? Crowdsourcing, with the help of technologies such as Kialo, could resolve the perpetual problem of scholars, areas, and disciplines talking “past” one another. It also would provide a leap forward technologically: we currently observe “collective” theory construction in a primitive format among conference panels and journal symposia, limited to a few invited participants in a specific event in space and time. This is a narrow collaborative process in the face of structured crowdsourced deliberations in which potentially thousands of global participants can engage at their convenience over many months. Kialo also provides extensive data for analyzing deliberative crowd processes and outcomes, thereby streamlining the process.Footnote 9

The benefits of crowdsourcing are not only on the scientific output or theoretical side. The Silberzahn et al. (Reference Silberzahn, Uhlmann, Martin, Anselmi, Aust, Awtrey and Bahkik2018) and Breznau et al. (Reference Breznau, Rinke and Wuttke2018; Reference Breznau, Rinke, Wuttke, Adem, Adriaans, Amalia-Alvarez-Benjumea and Andersen2019) studies also included structured interaction among researchers, which fostered community engagement and the exchange of ideas and methodological knowledge. The Breznau et al. (Reference Breznau, Rinke, Wuttke, Adem, Adriaans, Amalia-Alvarez-Benjumea and Andersen2019) study specifically asked participants about their experiences and found that learning and enjoyment were common among them. For example, 69% reported that participation was enjoyable, 22% were neutral, and only 7% found it not enjoyable. Of 188 participants, the retention rate was 87% from start to finish, which demonstrates motivation across the spectrum of PhD students, postdocs, professors, and nonacademic professionals. Crowdsourcing potentially requires a significant time investment on behalf of the participating researchers, but the investment pays off in academic and personal development. Moreover, the practice of crowdsourcing researchers embodies many principles amenable to the Open Science Movement, such as open participation, transparency, reproducibility, and better practices. Crowdsourcing may not be a high-density method but it seems sustainable as an occasional large-scale project and as something good for science.

A key principle of open science and crowdsourcing also is related to incentive structure. The status attainment of scholars is theoretically infinite, limited only by the number of people willing to cite a scholar’s work, which leads to egomaniacal and destructive behaviors (Sørensen Reference Sørensen1996). This is the status quo of academia. In a crowdsourced endeavor, the crowdsourced researcher participants are equal. There is no chance for cartelism, veto, or exclusionary practices as long as the PIs are careful moderators. Thus, in its ideal form, crowdsourced projects should give authorship to all participants who complete the tasks assigned to them. The PIs of a crowdsourced endeavor naturally have more to gain than the participants because the research is their “brainchild” and they will be known as the “PIs.” Nonetheless, they also have more to lose in case such a massive implementation of researcher human capital were to fail; therefore, the participants have good reason to trust the process. The net gains should be positive for everyone involved in a well-executed crowdsourced project. The advantage of this vis-à-vis the Open Science Movement is that the current “publish-or-perish” model becomes something positive, wherein all participants benefit by getting published—and any self-citation thereafter is a community rather than individualistic good.

Crowdsourcing is not without boundaries. There is a limited supply of labor. Crowdsourcing of a replication or original research essentially asks researchers to engage in a project that may require as much empirical work as one of their “own” projects. Moreover, the newness of this method means that researchers may not see the value in investing their time. Before contributing their valuable time, researchers may want to consider (1) the novelty and importance of the topic, and (2) whether the proposed project fits with open and ethical science ideals. In the case of crowdsourced replications, the “topic” is to end the replication crisis. Bates (Reference Bates2016) suggested a formula for selecting studies to replicate. He argued that priority for ending the replication crisis can be measured as influence (i.e., citations) divided by evidence (i.e., number of replications, wherein zero might logically be 0.01). The larger the number, the higher the replication priority. I suggest extending this formula to include sociopolitical relevance of the topic, something that crowdsourcing can be used to measure. Crowdsourced ranking of studies in terms of impact can be a weighting coefficient on the right-hand side of this equation. This would provide the possibility to end the replication crisis and also solve pressing societal problems.

Footnotes

1. As of August 6, 2020.

2. For example, see www.worldcommunitygrid.org or cryptocurrency using blockchain technology.

3. Also known as “soccer” in a minority of countries.

4. Just check the growing list on the Wikipedia crowdsourcing projects page! Available at https://en.wikipedia.org/wiki/List_of_crowdsourcing_projects.

8. Hofstede’s Values Survey, the observer effect, and the backfire effect are notable exceptions.

9. This does not mean that Kialo is the only option; however, it seems to be one of the best suited for crowdsourcing.

References

REFERENCES

Bates, Timothy. 2016. “What Should We Replicate? Things That Will End the Replication Crisis.” Medium op-ed. Available at https://medium.com/@timothycbates/what-should-we-replicate-things-that-will-end-the-replication-crisis-cb09ce24b25f.Google Scholar
Berinsky, Adam J., Huber, Gregory A., and Lenz, Gabriel S.. 2012. “Evaluating Online Labor Markets for Experimental Research: Amazon.Com’s Mechanical Turk.” Political Analysis 20 (3): 351–68.CrossRefGoogle Scholar
Botvinik-Nezer, Rotem, Holzmeister, Felix, Colin F. Camerer, Anna Dreber, Juergen Huberm, Magnus Johannesson, Michael Kircler, et al. 2020. “Variability in the Analysis of a Single Neuroimaging Dataset by Many Teams.” Nature 582 (7810): 8488.CrossRefGoogle ScholarPubMed
Breznau, Nate. 2015. “The Missing Main Effect of Welfare-State Regimes: A Replication of ‘Social Policy Responsiveness in Developed Democracies’ by Brooks and Manza.” Sociological Science 2:420–41.CrossRefGoogle Scholar
Breznau, Nate. 2016. “Secondary Observer Effects: Idiosyncratic Errors in Small-N Secondary Data Analysis.” International Journal of Social Research Methodology 19 (3): 301–18.CrossRefGoogle Scholar
Breznau, Nate. 2020. “Meta-Construction of Social Theory.” Crowdid. Available at https://crowdid.hypotheses.org/372.Google Scholar
Breznau, Nate, Rinke, Eike Mark, and Wuttke, Alexander. 2018. “Pre-Registered Analysis Plan for ‘How Many Replicators’ Experiment.” Mannheim Centre for European Social Research. SocArXiv. Available at https://osf.io/hkpdt.Google Scholar
Breznau, Nate, Rinke, Eike Mark, Wuttke, Alexander, Adem, Muna, Adriaans, Jule, Amalia-Alvarez-Benjumea, , Andersen, Henrik, et al. 2019. Crowdsourced Replication Initiative: Executive Report. Mannheim Centre for European Social Research. SocArXiv. Available at https://osf.io/preprints/socarxiv/6j9qb.Google Scholar
Clark, William Roberts, and Golder, Matt. 2015. “Big Data, Causal Inference, and Formal Theory: Contradictory Trends in Political Science? Introduction.” PS: Political Science & Politics 48 (1): 6570.Google Scholar
Clemens, Michael A. 2015. “The Meaning of Failed Replications: A Review and Proposal.” Journal of Economic Surveys 31 (1): 326–42.CrossRefGoogle Scholar
Dewald, William G., Thursby, Jerry G., and Anderson, Richard G.. 1986. “Replication in Empirical Economics: The Journal of Money, Credit and Banking Project.” American Economic Review 76 (4): 587603.Google Scholar
Eubank, Nicholas. 2016. “Lessons from a Decade of Replications at the Quarterly Journal of Political Science .” PS: Political Science & Politics 49 (2): 273–76.Google Scholar
Forscher, Patrick, Brauer, Markus, Cox, William, and Devine, Patricia. 2019. “How Many Reviewers Are Required to Obtain Reliable Evaluations of NIH R01 Grant Proposals?” PsyArXiv preprint. Available at https://psyarxiv.com/483zj.CrossRefGoogle Scholar
Freese, Jeremy, and Peterson, David. 2017. “Replication in Social Science.” Annual Review of Sociology 43 (1): 147–65.CrossRefGoogle Scholar
Gelman, Andrew. 2008. “Objections to Bayesian Statistics.” Bayesian Analysis 3 (3): 445–49.CrossRefGoogle Scholar
Gelman, Andrew. 2013. “Ethics and Statistics: It’s Too Hard to Publish Criticisms and Obtain Data for Republication.” Chance 26 (3): 4952.CrossRefGoogle Scholar
Gelman, Andrew, and Loken, Eric. 2014. “The Statistical Crisis in Science.” American Scientist 102 (6): 460.CrossRefGoogle Scholar
Grimmer, Justin. 2015. “We Are All Social Scientists Now: How Big Data, Machine Learning, and Causal Inference Work Together.” PS: Political Science & Politics 48 (1): 8083.Google Scholar
Höffler, Jan. 2017. “Replication Wiki: Improving Transparency in the Social Sciences.” D-Lib Magazine 23 (3/4).CrossRefGoogle Scholar
Howe, Jeff. 2008. Crowdsourcing: How the Power of the Crowd Is Driving the Future of Business. New York: Random House Business Books.Google Scholar
Ishiyama, John. 2014. “Replication, Research Transparency, and Journal Publications: Individualism, Community Models, and the Future of Replication Studies.” PS: Political Science & Politics 47 (1): 7883.Google Scholar
Jeffreys, Harold. 1998 [1939]. Theory of Probability. Third edition. New York and Oxford: Clarendon Press and Oxford University Press.Google Scholar
Klein, Richard A., Ratliff, Kate A., Vianello, Michelangelo, Adams, Reginald B. Jr., Bahník, Štěpán, Bernstein, Michael J., Konrad Bocian, et al. 2014. “Investigating Variation in Replicability: A ‘Many Labs’ Replication Project.” Social Psychology 45 (3): 142–52.CrossRefGoogle Scholar
Laitin, David D., and Reich, Rob. 2017. “Trust, Transparency, and Replication in Political Science.” PS: Political Science & Politics 50 (1): 172–75.Google Scholar
Leamer, Edward E. 1978. Specification Searches: Ad Hoc Inference with Nonexperimental Data. New York: John Wiley & Sons, Inc.Google Scholar
Muñoz, John, and Young, Cristobal. 2018. “We Ran 9 Billion Regressions: Eliminating False Positives through Computational Model Robustness.” Sociological Methodology 48 (1): 133.CrossRefGoogle Scholar
Nagler, Jonathan, and Tucker, Joshua A.. 2015. “Drawing Inferences and Testing Theories with Big Data.” PS: Political Science & Politics 48 (1): 8488.Google Scholar
Palmer, Joshua C., and Strickland, Justin. 2016. “A Beginner’s Guide to Crowdsourcing: Strengths, Limitations, and Best Practices.” Psychological Science Agenda. Available at www.apa.org/science/about/psa/2016/06/changing-minds. Google Scholar
Pearl, Judea. 2018. “Theoretical Impediments to Machine Learning with Seven Sparks from the Causal Revolution.” ArXiv preprint. Available at https://arxiv.org/abs/1801.04016.CrossRefGoogle Scholar
Ritchie, Stuart J., Wiseman, Richard, and French, Christopher C.. 2012. “Failing the Future: Three Unsuccessful Attempts to Replicate Bem’s ‘Retroactive Facilitation of Recall’ Effect.” PLOS ONE 7 (3).CrossRefGoogle ScholarPubMed
Rohrer, Julia M., Egloff, Boris, and Schmukle, Stefan C.. 2017. “Probing Birth-Order Effects on Narrow Traits Using Specification-Curve Analysis.” Psychological Science 28 (12): 1821–32.CrossRefGoogle ScholarPubMed
Salesses, Philip, Schechtner, Katja, and Hidalgo, César A.. 2013. “The Collaborative Image of The City: Mapping the Inequality of Urban Perception.” PLOS ONE 8 (7).CrossRefGoogle ScholarPubMed
Salganik, Matthew J., Lundberg, Ian, Kindel, Alexander T., Ahearn, Caitlin E., Al-Ghoneim, Khaled, Almaatouq, Abdullah, Altschul, Drew M., et al. 2020. “Measuring the Predictability of Life Outcomes with a Scientific Mass Collaboration.” Proceedings of the National Academy of Sciences 117 (15): 8398–403.CrossRefGoogle ScholarPubMed
Silberzahn, Raphael, Uhlmann, Eric L., Martin, D. P., Anselmi, P., Aust, F., Awtrey, E., Bahkik, S., et al. 2018. “Many Analysts, One Dataset: Making Transparent How Variations in Analytic Choices Affect Results.” Advances in Methods and Practices in Psychological Science 1 (3): 337–56.CrossRefGoogle Scholar
Silberzahn, Raphael, and Uhlmann, Eric L.. 2015. “Crowdsourced Research: Many Hands Make Light Work.” Nature 526:189–91.CrossRefGoogle Scholar
Simonsohn, Uri, Simmons, Joseph P., and Nelson, Leif D.. 2015. “Specification Curve: Descriptive and Inferential Statistics on All Reasonable Specifications.” Social Science Research Network Working Paper. Available at https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2694998.CrossRefGoogle Scholar
Søndergaard, Mikael. 1994. “Research Note: Hofstede’s Consequences: A Study of Reviews, Citations, and Replications.” Organization Studies 15 (3): 447–56.CrossRefGoogle Scholar
Sørensen, Aage B. 1996. “The Structural Basis of Social Inequality.” American Journal of Sociology 101 (5): 1333–65.CrossRefGoogle Scholar
Stockemer, Daniel, Koehler, Sebastian, and Lentz, Tobias. 2018. “Data Access, Transparency, and Replication: New Insights from the Political Behavior Literature.” PS: Political Science & Politics 51 (4): 799803.Google Scholar
Uhlmann, Eric L., Ebersole, C. R., Chartier, C. R., Errington, T. M., Kidwell, M. C., Lai, C. K., McCarthy, Randy, et al. 2018. “Scientific Utopia: III. Crowdsourcing Science.” PsyArXiv. Available at https://psyarxiv.com/vg649.CrossRefGoogle Scholar
Wicherts, Jelte M., Veldkamp, C. L. S., Augusteijn, H. E. M., Bakker, M., van Aert, R. C. M., and van Assen, M. A. L.. 2016. “Degrees of Freedom in Planning, Running, Analyzing, and Reporting Psychological Studies: A Checklist to Avoid p-Hacking.” Frontiers in Psychology 7:1832.CrossRefGoogle ScholarPubMed
Yong, Ed. 2012. “Replication Studies: Bad Copy.” Nature News 485 (7398): 298.CrossRefGoogle Scholar
Zhao, Yuxiang, and Zhu, Qinghua. 2014. “Evaluation on Crowdsourcing Research: Current Status and Future Direction.” Information Systems Frontiers 16 (3): 417–34.CrossRefGoogle Scholar