Hostname: page-component-cd9895bd7-jn8rn Total loading time: 0 Render date: 2024-12-23T00:33:44.107Z Has data issue: false hasContentIssue false

Fact or fiction: reducing the proportion and impact of false positives

Published online by Cambridge University Press:  27 November 2017

D. Stahl
Affiliation:
Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
A. Pickles*
Affiliation:
Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
*
Author for correspondence: A. Pickles, E-mail: [email protected]

Abstract

False positive findings in science are inevitable, but are they particularly common in psychology and psychiatry? The evidence that we review suggests that while not restricted to our field, the problem is acute. We describe the concept of researcher ‘degrees-of-freedom’ to explain how many false-positive findings arise, and how the various strategies of registration, pre-specification, and reporting standards that are being adopted both reduce and make these visible. We review possible benefits and harms of proposed statistical solutions, from tougher requirements for significance, to Bayesian and machine learning approaches to analysis. Finally we consider the organisation and methods for replication and systematic review in psychology and psychiatry.

Type
Invited Review
Copyright
Copyright © Cambridge University Press 2017 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Aarts, AA, Anderson, JE, Anderson, CJ, Attridge, PR, Attwood, A, Axt, J, Babel, M, Bahnik, S, Baranski, E, Barnett-Cowan, M, Bartmess, E, Beer, J, Bell, R, Bentley, H, Beyan, L, Binion, G, Borsboom, D, Bosch, A, Bosco, FA, Bowman, SD, Brandt, MJ, Braswell, E, Brohmer, H, Brown, BT, Brown, K, Bruning, J, Calhoun-Sauls, A, Callahan, SP, Chagnon, E, Chandler, J, Chartier, CR, Cheung, F, Christopherson, CD, Cillessen, L, Clay, R, Cleary, H, Cloud, MD, Cohn, M, Cohoon, J, Columbus, S, Cordes, A, Costantini, G, Alvarez, LDC, Cremata, E, Crusius, J, DeCoster, J, DeGaetano, MA, Della Penna, N, den Bezemer, B, Deserno, MK, Devitt, O, Dewitte, L, Dobolyi, DG, Dodson, GT, Donnellan, MB, Donohue, R, Dore, RA, Dorrough, A, Dreber, A, Dugas, M, Dunn, EW, Easey, K, Eboigbe, S, Eggleston, C, Embley, J, Epskamp, S, Errington, TM, Estel, V, Farach, FJ, Feather, J, Fedor, A, Fernandez-Castilla, B, Fiedler, S, Field, JG, Fitneva, SA, Flagan, T, Forest, AL, Forsell, E, Foster, JD, Frank, MC, Frazier, RS, Fuchs, H, Gable, P, Galak, J, Galliani, EM, Gampa, A, Garcia, S, Gazarian, D, Gilbert, E, Giner-Sorolla, R, Glockner, A, Goellner, L, Goh, JX, Goldberg, R, Goodbourn, PT, Gordon-McKeon, S, Gorges, B, Gorges, J, Goss, J, Graham, J, Grange, JA, Gray, J, Hartgerink, C, Hartshorne, J, Hasselman, F, Hayes, T, Heikensten, E, Henninger, F, Hodsoll, J, Holubar, T, Hoogendoorn, G, Humphries, DJ, Hung, COY, Immelman, N, Irsik, VC, Jahn, G, Jakel, F, Jekel, M, Johannesson, M, Johnson, LG, Johnson, DJ, Johnson, KM, Johnston, WJ, Jonas, K, Joy-Gaba, JA, Kappes, HB, Kelso, K, Kidwell, MC, Kim, SK, Kirkhart, M, Kleinberg, B, Knezevic, G, Kolorz, FM, Kossakowski, JJ, Krause, RW, Krijnen, J, Kuhlmann, T, Kunkels, YK, Kyc, MM, Lai, CK, Laique, A, Lakens, D, Lane, KA, Lassetter, B, Lazarevic, LB, LeBel, EP, Lee, KJ, Lee, M, Lemm, K, Levitan, CA, Lewis, M, Lin, L, Lin, S, Lippold, M, Loureiro, D, Luteijn, I, Mackinnon, S, Mainard, HN, Marigold, DC, Martin, DP, Martinez, T, Masicampo, EJ, Matacotta, J, Mathur, M, May, M, Mechin, N, Mehta, P, Meixner, J, Melinger, A, Miller, JK, Miller, M, Moore, K, Moschl, M, Motyl, M, Muller, SM, Munafo, M, Neijenhuijs, KI, Nervi, T, Nicolas, G, Nilsonne, G, Nosek, BA, Nuijten, MB, Olsson, C, Osborne, C, Ostkamp, L, Pavel, M, Penton-Voak, IS, Perna, O, Pernet, C, Perugini, M, Pipitone, RN, Pitts, M, Plessow, F, Prenoveau, JM, Rahal, RM, Ratliff, KA, Reinhard, D, Renkewitz, F, Ricker, AA, Rigney, A, Rivers, AM, Roebke, M, Rutchick, AM, Ryan, RS, Sahin, O, Saide, A, Sandstrom, GM, Santos, D, Saxe, R, Schlegelmilch, R, Schmidt, K, Scholz, S, Seibel, L, Selterman, DF, Shaki, S, Simpson, WB, Sinclair, HC, Skorinko, JLM, Slowik, A, Snyder, JS, Soderberg, C, Sonnleitner, C, Spencer, N, Spies, JR, Steegen, S, Stieger, S, Strohminger, N, Sullivan, GB, Talhelm, T, Tapia, M, te Dorsthorst, A, Thomae, M, Thomas, SL, Tio, P, Traets, F, Tsang, S, Tuerlinckx, F, Turchan, P, Valasek, M, van 't Veer, AE, Van Aert, R, van Assen, M, van Bork, R, van de Ven, M, van den Bergh, D, van der Hulst, M, van Dooren, R, van Doorn, J, van Renswoude, DR, van Rijn, H, Vanpaemel, W, Echeverria, AV, Vazquez, M, Velez, N, Vermue, M, Verschoor, M, Vianello, M, Voracek, M, Vuu, G, Wagenmakers, EJ, Weerdmeester, J, Welsh, A, Westgate, EC, Wissink, J, Wood, M, Woods, A, Wright, E, Wu, S, Zeelenberg, M, Zuni, K Open Science Collaboration (2015) Estimating the reproducibility of psychological science. Science 349, 943952.Google Scholar
Allen, DM (1974) Relationship between variable selection and data augmentation and a method for prediction. Technometrics 16, 125127.Google Scholar
Bakker, M, van Dijk, A, Wicherts, JM (2012) The rules of the game called psychological science. Perspectives on Psychological Science 7, 543554.Google Scholar
Bem, DJ (2011) Feeling the future: experimental evidence for anomalous retroactive influences on cognition and affect. Journal of Personality and Social Psychology 100, 407425.Google Scholar
Bender, R, Lange, S (2001) Adjusting for multiple testing – when and how? Journal of Clinical Epidemiology 54, 343349.CrossRefGoogle ScholarPubMed
Benjamin, DJ, Berger, JO, Johannesson, M, Nosek, BA, Wagenmakers, E-J, Berk, R, Bollen, KA, Brembs, B, Brown, L, Camerer, C, Cesarini, D, Chambers, CD, Clyde, M, Cook, TD, De Boeck, P, Dienes, Z, Dreber, A, Easwaran, K, Efferson, C, Fehr, E, Fidler, F, Field, AP, Forster, M, George, EI, Gonzalez, R, Goodman, S, Green, E, Green, DP, Greenwald, AG, Hadfield, JD, Hedges, LV, Held, L, Hua Ho, T, Hoijtink, H, Hruschka, DJ, Imai, K, Imbens, G, Ioannidis, JPA, Jeon, M, Jones, JH, Kirchler, M, Laibson, D, List, J, Little, R, Lupia, A, Machery, E, Maxwell, SE, McCarthy, M, Moore, DA, Morgan, SL, MunafA3 M, Nakagawa, S, Nyhan, B, Parker, TH, Pericchi, L, Perugini, M, Rouder, J, Rousseau, J, Savalei, V, Schӧnbrodt, FD, Sellke, T, Sinclair, B, Tingley, D, Van Zandt, T, Vazire, S, Watts, DJ, Winship, C, Wolpert, RL, Xie, Y, Young, C, Zinman, J & Johnson, VE (2017) Redefine statistical significance. Human Nature Behavior 1, 15.Google Scholar
Browne, MW (1975) Comparison of single sample and cross-validation methods for estimating mean squared error of prediction in multiple linear-regression. British Journal of Mathematical and Statistical Psychology 28, 112120.CrossRefGoogle Scholar
Caldwell, DM (2014) An overview of conducting systematic reviews with network meta-analysis. Systematic Reviews 3, 109109.CrossRefGoogle ScholarPubMed
Caldwell, DM, Ades, AE, Higgins, JPT (2005) Simultaneous comparison of multiple treatments: combining direct and indirect evidence. BMJ 331, 897900.Google Scholar
Cappelleri, JC, Ioannidis, JPA, Schmid, CH, deFerranti, SD, Aubert, M, Chalmers, TC, Lau, J (1996) Large trials vs meta-analysis of smaller trials – how do their results compare? Jama-Journal of the American Medical Association 276, 13321338.Google Scholar
Cawley, GC, Talbot, NLC (2010) On over-fitting in model selection and subsequent selection bias in performance evaluation. Journal of Machine Learning Research 11, 20792107.Google Scholar
Chamber, C, Sumner, P (2012) Replication is the only solution to scientific fraud. The Guardian [online] (https://www.psychologicalscience.org/observer/bayes-for-beginners-probability-and-likelihood). Accessed 8 September 2017.Google Scholar
Contopoulos-Ioannidis, DG, Gilbody, SM, Trikalinos, TA, Churchill, R, Wahlbeck, K, Ioannidis, JPA, Project, E.-P. (2005) Comparison of large versus smaller randomized trials for mental health-related interventions. American Journal of Psychiatry 162, 578584.Google Scholar
Cumming, G (2014) The new statistics. Psychological Science 25, 729.CrossRefGoogle ScholarPubMed
Debray, TPA, Vergouwe, Y, Koffijberg, H, Nieboer, D, Steyerberg, EW, Moons, KGM (2015) A new framework to enhance the interpretation of external validation studies of clinical prediction models. Journal of Clinical Epidemiology 68, 280289.CrossRefGoogle ScholarPubMed
Dienes, Z (2011) Bayesian versus orthodox statistics: which side are you on? Perspectives on Psychological Science 6, 274290.Google Scholar
Drummond, C (2009) Replicability is not reproducibility: nor is it good science. In Proceedings of the Twenty-Sixth International Conference on Machine Learning. ICML: Montreal, Canada, p. 4.Google Scholar
Dumas-Mallet, E, Button, KS, Boraud, T, Gonon, F, Munafò, MR (2017 a). Low statistical power in biomedical science: a review of three human research domains. Royal Society Open Science 4, 160254.Google Scholar
Dumas-Mallet, E, Smith, A, Boraud, T, Gonon, F (2017 b). Poor replication validity of biomedical association studies reported by newspapers. Plos ONE 12, 15.Google Scholar
Edmonds, B, Gilbert, N, Ahrweiler, P, Scharnhorst, A (2011) Simulating the social processes of science. Jasss-the Journal of Artificial Societies and Social Simulation 14, 14.CrossRefGoogle Scholar
Eich, E (2014) Business not as usual. Psychological Science 25, 36.Google Scholar
Fanelli, D (2009) How many scientists fabricate and falsify research? A systematic review and meta-analysis of survey data. Plos ONE 4, 111.CrossRefGoogle ScholarPubMed
Fanelli, D (2010) “Positive” results increase down the hierarchy of the sciences. Plos ONE 5, 110.Google Scholar
Festinger, L, Hutte, HA (1954) An experimental investigation of the effect of unstable interpersonal relations in a group. Journal of Abnormal and Social Psychology 49, 513522.Google Scholar
Flint, J, Cuijpers, P, Horder, J, Koole, SL, Munafo, MR (2015) Is there an excess of significant findings in published studies of psychotherapy for depression? Psychological Medicine 45, 439446.Google Scholar
Freedman, DA (1983) A note on screening regression equations. American Statistician 37, 152155.Google Scholar
Gallistel, R (2015) Bayes for beginners: probability and likelihood. Observer Magazine 28(7) [online] (https://www.psychologicalscience.org/observer/bayes-for-beginners-probability-and-likelihood). Accessed 8 September 2017.Google Scholar
Geisser, S (1975) Predictive sample reuse method with applications. Journal of the American Statistical Association 70, 320328.Google Scholar
Gelman, A, Loken, E (2014) The statistical crisis in science. American Scientist 102, 460465.Google Scholar
Gelman, A, O'Rourke, K (2014) Discussion: difficulties in making inferences about scientific truth from distributions of published p-values. “Biostatistics (Oxford, England)” 15, 1823.Google Scholar
Giofrè, D, Cumming, G, Fresc, L, Boedker, I, Tressoldi, P (2017) The influence of journal submission guidelines on authors’ reporting of statistics and use of open research practices. PLOS ONE 12, e0175583.Google Scholar
Hand, DJ (2006) Classifier technology and the illusion of progress, Statistical Science 21, 114.CrossRefGoogle Scholar
Harrell, F (2015) Regression Modeling Strategies. Springer: New York, USA.Google Scholar
Harrell, FE, Lee, KL, Mark, DB (1996) Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Statistics in Medicine 15, 361387.Google Scholar
Hastie, T, Tibshirani, R, Friedman, J (2009) The Elements of Statistical Learning. Springer-Verlag: New York.CrossRefGoogle Scholar
Hoerl, AE, Kennard, RW (1970) Ridge regression – biased estimation for nonorthogonal problems. Technometrics 12, 5567.Google Scholar
Hunter, L (2017) An introduction to machine learning for statisticians. [online] (http://compbio.ucdenver.edu/Hunter_lab/Hunter/ml-for-stat.txt). Accessed 8 September 2017.Google Scholar
Ioannidis, JPA (2005) Why most published research findings are false. Plos Medicine 2, 696701.Google Scholar
Ioannidis, JPA (2008) Interpretation of tests of heterogeneity and bias in meta-analysis. Journal of Evaluation in Clinical Practice 14, 951957.Google Scholar
Ioannidis, JPA, Karassa, FB (2010) The need to consider the wider agenda in systematic reviews and meta-analyses: breadth, timing, and depth of the evidence. British Medical Journal 341, c4875.Google Scholar
Ioannidis, JPA, Trikalinos, TA (2007) The appropriateness of asymmetry tests for publication bias in meta-analyses: a large survey. CMAJ: Canadian Medical Association Journal 176, 10911096.Google Scholar
Johnson, VE (2013) Uniformly most powerful Bayesian tests. Annals of Statistics 41, 17161741.CrossRefGoogle ScholarPubMed
Kerr, NL, Niedermeier, KE, Kaplan, MF (1999) Bias in jurors vs bias in juries: new evidence from the SDS perspective. Organizational Behavior and Human Decision Processes 80, 7086.Google Scholar
Kuhn, M, Johnson, K (2013) Applied Predcitive Modelling. Springer-Verlag: New York.Google Scholar
Lawlor, DA, Tilling, K, Davey Smith, G (2016) Triangulation in aetiological epidemiology. International Journal of Epidemiology 45, 18661886.Google Scholar
Luengo-Fernandez, R, Leal, J, Gray, A (2015) UK research spend in 2008 and 2012: comparing stroke, cancer, coronary heart disease and dementia. British Medical Journal Open 5, e006648.Google ScholarPubMed
Ly, A, Verhagen, J, Wagenmakers, E-J (2016) Harold Jeffreys's default Bayes factor hypothesis tests: explanation, extension, and application in psychology. Journal of Mathematical Psychology 72, 1932.Google Scholar
Martin, GN, Clarke, RM (2017) Are psychology journals anti-replication? A snapshot of editorial practices. Frontiers in Psychology 8, 16.Google Scholar
Munafo, MR, Black, S (2017) Personality and smoking status: a longitudinal analysis (vol 9, pg 397, 2007). Nicotine and Tobacco Research 19, 129129.Google Scholar
Munafò, MR, Nosek, BA, Bishop, DVM, Button, KS, Chambers, CD, Percie du Sert, N Simonsohn, U, Wagenmakers, E-J, Ware, JJ, Ioannidis, JPA (2017) A manifesto for reproducible science. Nature Human Behaviour 1, 0021.CrossRefGoogle ScholarPubMed
Parmar, MKB, Sydes, MR, Morris, TP (2016) How do you design randomised trials for smaller populations? A framework. BMC Medicine 14, 183.Google Scholar
Pashler, H, Wagenmakers, EJ (2012) Editors’ introduction to the special section on replicability in psychological science: a crisis of confidence? Perspectives on Psychological Science 7, 528530.Google Scholar
Pickles, A (2009) What clinicians need to know about statistical issues and methods. In Rutter's Child and Adolescent Psychiatry (ed. Rutter, M., Bishop, D.V.M., Pine, D. S., Scott, S., Stevenson, S., Taylor, E., Thapar, A.), pp. 111122. Wiley: Oxford.Google Scholar
Rouder, JN, Speckman, PL, Sun, D, Morey, RD, Iverson, G (2009) Bayesian t tests for accepting and rejecting the null hypothesis. Psychonomic Bulletin and Review 16, 225237.Google Scholar
Sagan, C (1995) The Demon-Haunted World: Science as A Candle in the Dark. Random House: New York.Google Scholar
Schaller, M (2016) The empirical benefits of conceptual rigor: systematic articulation of conceptual hypotheses can reduce the risk of non-replicable results (and facilitate novel discoveries too). Journal of Experimental Social Psychology 66, 107115.CrossRefGoogle Scholar
Schuit, E, Roes, KC, Mol, BW, Kwee, A, Moons, KG, Groenwold, RH (2015) Meta-analyses triggered by previous (false-) significant findings: problems and solutions. Systematic Reviews 4, 57.Google Scholar
Shmueli, G (2010) To explain or to predict? Statistical Science 25, 289310.Google Scholar
Shmueli, G, Koppius, OR (2011) Predictive analytics in information systems research. Management Information Systems Quarterly 35, 553572.Google Scholar
Simmons, JP, Nelson, LD, Simonsohn, U (2011) False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science 22, 13591366.Google Scholar
Simonsohn, U, Nelson, LD, Simmons, JP (2014) P-curve: a key to the file-drawer. Journal of Experimental Psychology General 143, 534547.Google Scholar
Sober, E (2006) Parsimony. In The Philosophy of Science: An Encyclopaedia (ed. Sarkar, S., Pfeifer, J.), pp. 531538. Routledge: Oxford.Google Scholar
Sterling, TD (1959) Publication decisions and their possible effects on inferences drawn from tests of significance – or vice versa. Journal of the American Statistical Association 54, 3034.Google Scholar
Sterling, TD, Rosenbaum, WL, Weinkam, JJ (1995) Publication decisions revisited – the effect of the outcome of statistical tests on the decision to publish and vice-versa. American Statistician 49, 108112.Google Scholar
Sterne, JAC, Sutton, AJ, Ioannidis, JPA, Terrin, N, Jones, DR, Lau, J, Carpenter, J, Rücker, G, Harbord, RM, Schmid, CH, Tetzlaff, J, Deeks, JJ, Peters, J, Macaskill, P, Schwarzer, G, Duval, S, Altman, DG, Moher, D, Higgins, JPT (2011) Recommendations for examining and interpreting funnel plot asymmetry in meta-analyses of randomised controlled trials. British Medical Journal 343, D002.Google Scholar
Steyerberg, EW (2009) Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating. Springer: New York.Google Scholar
Stone, M (1974) Cross-Validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society Series B-Statistical Methodology 36, 111147.Google Scholar
Sumner, P, Vivian-Griffiths, S, Boivin, J, Williams, A, Venetis, CA, Davies, A, Ogden, J, Whelan, L, Hughes, B, Dalton, B, Boy, F, Chambers, CD (2014) The association between exaggeration in health related science news and academic press releases: retrospective observational study. British Medical Journal 349, g7015.Google Scholar
Taylor, AE, Munafò, MR (2016) Triangulating meta-analyses: the example of the serotonin transporter gene, stressful life events and major depression. BMC Psychology 4, 23.Google Scholar
Tsilidis, KK, Panagiotou, OA, Sena, ES, Aretouli, E, Evangelou, E, Howells, DW, Salman, RAS, Macleod, MR, Ioannidis, JPA (2013) Evaluation of excess significance bias in animal studies of neurological diseases. Plos Biology 11.Google Scholar
Van Batenburg-Eddes, T, Brion, MJ, Henrichs, J, Jaddoe, VWV, Hofman, A, Verhulst, FC, Lawlor, DA, Smith, GD, Tiemeier, H (2013) Parental depressive and anxiety symptoms during pregnancy and attention problems in children: a cross-cohort consistency study. Journal of Child Psychology and Psychiatry 54, 591600.Google Scholar
Varma, S, Simon, R (2006) Bias in error estimation when using cross-validation for model selection. BMC Bioinformatics 7, 9191.Google Scholar
Wagenmakers, EJ, Wetzels, R, Borsboom, D, van der Maas, HLJ (2011) Why psychologists must change the way they analyze their data: the case of psi: comment on Bem (2011). Journal of Personality and Social Psychology 100, 426432.Google Scholar
Wetzels, R, Matzke, D, Lee, MD, Rouder, JN, Iverson, GJ, Wagenmakers, E-J (2011) Statistical evidence in experimental psychology. Perspectives on Psychological Science 6, 291298.Google Scholar
Wicherts, JM, Veldkamp, CLS, Augusteijn, HEM, Bakker, M, van Aert, RCM, van Assen, MALM (2016) Degrees of freedom in planning, running, analyzing, and reporting psychological studies: a checklist to avoid p-hacking. Frontiers in Psychology 7, 112.Google Scholar
Yong, E (2012) Bad copy. Nature 485, 298300.Google Scholar
Supplementary material: File

Stahl and Pickles supplementary material 1

Appendix

Download Stahl and Pickles supplementary material 1(File)
File 150.9 KB