Fact or fiction: reducing the proportion and impact of false positives

D. Stahl; A. Pickles

doi:10.1017/S003329171700294X

Fact or fiction: reducing the proportion and impact of false positives

Published online by Cambridge University Press: 27 November 2017

D. Stahl and

A. Pickles

Show author details

D. Stahl: Affiliation:
Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
A. Pickles*: Affiliation:
Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
*: Author for correspondence: A. Pickles, E-mail: [email protected]

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

False positive findings in science are inevitable, but are they particularly common in psychology and psychiatry? The evidence that we review suggests that while not restricted to our field, the problem is acute. We describe the concept of researcher ‘degrees-of-freedom’ to explain how many false-positive findings arise, and how the various strategies of registration, pre-specification, and reporting standards that are being adopted both reduce and make these visible. We review possible benefits and harms of proposed statistical solutions, from tougher requirements for significance, to Bayesian and machine learning approaches to analysis. Finally we consider the organisation and methods for replication and systematic review in psychology and psychiatry.

Keywords

Bayes' factor cross-validation external validation false positives meta-analyses replication crisis researcher degrees-of-freedom statistical learning

Type: Invited Review
Information: Psychological Medicine , Volume 48 , Issue 7 , May 2018 , pp. 1084 - 1091

DOI: https://doi.org/10.1017/S003329171700294X [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2017

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Aarts, AA, Anderson, JE, Anderson, CJ, Attridge, PR, Attwood, A, Axt, J, Babel, M, Bahnik, S, Baranski, E, Barnett-Cowan, M, Bartmess, E, Beer, J, Bell, R, Bentley, H, Beyan, L, Binion, G, Borsboom, D, Bosch, A, Bosco, FA, Bowman, SD, Brandt, MJ, Braswell, E, Brohmer, H, Brown, BT, Brown, K, Bruning, J, Calhoun-Sauls, A, Callahan, SP, Chagnon, E, Chandler, J, Chartier, CR, Cheung, F, Christopherson, CD, Cillessen, L, Clay, R, Cleary, H, Cloud, MD, Cohn, M, Cohoon, J, Columbus, S, Cordes, A, Costantini, G, Alvarez, LDC, Cremata, E, Crusius, J, DeCoster, J, DeGaetano, MA, Della Penna, N, den Bezemer, B, Deserno, MK, Devitt, O, Dewitte, L, Dobolyi, DG, Dodson, GT, Donnellan, MB, Donohue, R, Dore, RA, Dorrough, A, Dreber, A, Dugas, M, Dunn, EW, Easey, K, Eboigbe, S, Eggleston, C, Embley, J, Epskamp, S, Errington, TM, Estel, V, Farach, FJ, Feather, J, Fedor, A, Fernandez-Castilla, B, Fiedler, S, Field, JG, Fitneva, SA, Flagan, T, Forest, AL, Forsell, E, Foster, JD, Frank, MC, Frazier, RS, Fuchs, H, Gable, P, Galak, J, Galliani, EM, Gampa, A, Garcia, S, Gazarian, D, Gilbert, E, Giner-Sorolla, R, Glockner, A, Goellner, L, Goh, JX, Goldberg, R, Goodbourn, PT, Gordon-McKeon, S, Gorges, B, Gorges, J, Goss, J, Graham, J, Grange, JA, Gray, J, Hartgerink, C, Hartshorne, J, Hasselman, F, Hayes, T, Heikensten, E, Henninger, F, Hodsoll, J, Holubar, T, Hoogendoorn, G, Humphries, DJ, Hung, COY, Immelman, N, Irsik, VC, Jahn, G, Jakel, F, Jekel, M, Johannesson, M, Johnson, LG, Johnson, DJ, Johnson, KM, Johnston, WJ, Jonas, K, Joy-Gaba, JA, Kappes, HB, Kelso, K, Kidwell, MC, Kim, SK, Kirkhart, M, Kleinberg, B, Knezevic, G, Kolorz, FM, Kossakowski, JJ, Krause, RW, Krijnen, J, Kuhlmann, T, Kunkels, YK, Kyc, MM, Lai, CK, Laique, A, Lakens, D, Lane, KA, Lassetter, B, Lazarevic, LB, LeBel, EP, Lee, KJ, Lee, M, Lemm, K, Levitan, CA, Lewis, M, Lin, L, Lin, S, Lippold, M, Loureiro, D, Luteijn, I, Mackinnon, S, Mainard, HN, Marigold, DC, Martin, DP, Martinez, T, Masicampo, EJ, Matacotta, J, Mathur, M, May, M, Mechin, N, Mehta, P, Meixner, J, Melinger, A, Miller, JK, Miller, M, Moore, K, Moschl, M, Motyl, M, Muller, SM, Munafo, M, Neijenhuijs, KI, Nervi, T, Nicolas, G, Nilsonne, G, Nosek, BA, Nuijten, MB, Olsson, C, Osborne, C, Ostkamp, L, Pavel, M, Penton-Voak, IS, Perna, O, Pernet, C, Perugini, M, Pipitone, RN, Pitts, M, Plessow, F, Prenoveau, JM, Rahal, RM, Ratliff, KA, Reinhard, D, Renkewitz, F, Ricker, AA, Rigney, A, Rivers, AM, Roebke, M, Rutchick, AM, Ryan, RS, Sahin, O, Saide, A, Sandstrom, GM, Santos, D, Saxe, R, Schlegelmilch, R, Schmidt, K, Scholz, S, Seibel, L, Selterman, DF, Shaki, S, Simpson, WB, Sinclair, HC, Skorinko, JLM, Slowik, A, Snyder, JS, Soderberg, C, Sonnleitner, C, Spencer, N, Spies, JR, Steegen, S, Stieger, S, Strohminger, N, Sullivan, GB, Talhelm, T, Tapia, M, te Dorsthorst, A, Thomae, M, Thomas, SL, Tio, P, Traets, F, Tsang, S, Tuerlinckx, F, Turchan, P, Valasek, M, van 't Veer, AE, Van Aert, R, van Assen, M, van Bork, R, van de Ven, M, van den Bergh, D, van der Hulst, M, van Dooren, R, van Doorn, J, van Renswoude, DR, van Rijn, H, Vanpaemel, W, Echeverria, AV, Vazquez, M, Velez, N, Vermue, M, Verschoor, M, Vianello, M, Voracek, M, Vuu, G, Wagenmakers, EJ, Weerdmeester, J, Welsh, A, Westgate, EC, Wissink, J, Wood, M, Woods, A, Wright, E, Wu, S, Zeelenberg, M, Zuni, K Open Science Collaboration (2015) Estimating the reproducibility of psychological science. Science 349, 943–952.Google Scholar

Allen, DM (1974) Relationship between variable selection and data augmentation and a method for prediction. Technometrics 16, 125–127.Google Scholar

Bakker, M, van Dijk, A, Wicherts, JM (2012) The rules of the game called psychological science. Perspectives on Psychological Science 7, 543–554.Google Scholar

Bem, DJ (2011) Feeling the future: experimental evidence for anomalous retroactive influences on cognition and affect. Journal of Personality and Social Psychology 100, 407–425.Google Scholar

Bender, R, Lange, S (2001) Adjusting for multiple testing – when and how? Journal of Clinical Epidemiology 54, 343–349.CrossRef Google Scholar PubMed

Benjamin, DJ, Berger, JO, Johannesson, M, Nosek, BA, Wagenmakers, E-J, Berk, R, Bollen, KA, Brembs, B, Brown, L, Camerer, C, Cesarini, D, Chambers, CD, Clyde, M, Cook, TD, De Boeck, P, Dienes, Z, Dreber, A, Easwaran, K, Efferson, C, Fehr, E, Fidler, F, Field, AP, Forster, M, George, EI, Gonzalez, R, Goodman, S, Green, E, Green, DP, Greenwald, AG, Hadfield, JD, Hedges, LV, Held, L, Hua Ho, T, Hoijtink, H, Hruschka, DJ, Imai, K, Imbens, G, Ioannidis, JPA, Jeon, M, Jones, JH, Kirchler, M, Laibson, D, List, J, Little, R, Lupia, A, Machery, E, Maxwell, SE, McCarthy, M, Moore, DA, Morgan, SL, MunafA3 M, Nakagawa, S, Nyhan, B, Parker, TH, Pericchi, L, Perugini, M, Rouder, J, Rousseau, J, Savalei, V, Schӧnbrodt, FD, Sellke, T, Sinclair, B, Tingley, D, Van Zandt, T, Vazire, S, Watts, DJ, Winship, C, Wolpert, RL, Xie, Y, Young, C, Zinman, J & Johnson, VE (2017) Redefine statistical significance. Human Nature Behavior 1, 1–5.Google Scholar

Browne, MW (1975) Comparison of single sample and cross-validation methods for estimating mean squared error of prediction in multiple linear-regression. British Journal of Mathematical and Statistical Psychology 28, 112–120.CrossRef Google Scholar

Caldwell, DM (2014) An overview of conducting systematic reviews with network meta-analysis. Systematic Reviews 3, 109–109.CrossRef Google Scholar PubMed

Caldwell, DM, Ades, AE, Higgins, JPT (2005) Simultaneous comparison of multiple treatments: combining direct and indirect evidence. BMJ 331, 897–900.Google Scholar

Cappelleri, JC, Ioannidis, JPA, Schmid, CH, deFerranti, SD, Aubert, M, Chalmers, TC, Lau, J (1996) Large trials vs meta-analysis of smaller trials – how do their results compare? Jama-Journal of the American Medical Association 276, 1332–1338.Google Scholar

Cawley, GC, Talbot, NLC (2010) On over-fitting in model selection and subsequent selection bias in performance evaluation. Journal of Machine Learning Research 11, 2079–2107.Google Scholar

Chamber, C, Sumner, P (2012) Replication is the only solution to scientific fraud. The Guardian [online] (https://www.psychologicalscience.org/observer/bayes-for-beginners-probability-and-likelihood). Accessed 8 September 2017.Google Scholar

Contopoulos-Ioannidis, DG, Gilbody, SM, Trikalinos, TA, Churchill, R, Wahlbeck, K, Ioannidis, JPA, Project, E.-P. (2005) Comparison of large versus smaller randomized trials for mental health-related interventions. American Journal of Psychiatry 162, 578–584.Google Scholar

Cumming, G (2014) The new statistics. Psychological Science 25, 7–29.CrossRef Google Scholar PubMed

Debray, TPA, Vergouwe, Y, Koffijberg, H, Nieboer, D, Steyerberg, EW, Moons, KGM (2015) A new framework to enhance the interpretation of external validation studies of clinical prediction models. Journal of Clinical Epidemiology 68, 280–289.CrossRef Google Scholar PubMed

Dienes, Z (2011) Bayesian versus orthodox statistics: which side are you on? Perspectives on Psychological Science 6, 274–290.Google Scholar

Drummond, C (2009) Replicability is not reproducibility: nor is it good science. In Proceedings of the Twenty-Sixth International Conference on Machine Learning. ICML: Montreal, Canada, p. 4.Google Scholar

Dumas-Mallet, E, Button, KS, Boraud, T, Gonon, F, Munafò, MR (2017 a). Low statistical power in biomedical science: a review of three human research domains. Royal Society Open Science 4, 160254.Google Scholar

Dumas-Mallet, E, Smith, A, Boraud, T, Gonon, F (2017 b). Poor replication validity of biomedical association studies reported by newspapers. Plos ONE 12, 1–5.Google Scholar

Edmonds, B, Gilbert, N, Ahrweiler, P, Scharnhorst, A (2011) Simulating the social processes of science. Jasss-the Journal of Artificial Societies and Social Simulation 14, 14.CrossRef Google Scholar

Eich, E (2014) Business not as usual. Psychological Science 25, 3–6.Google Scholar

Fanelli, D (2009) How many scientists fabricate and falsify research? A systematic review and meta-analysis of survey data. Plos ONE 4, 1–11.CrossRef Google Scholar PubMed

Fanelli, D (2010) “Positive” results increase down the hierarchy of the sciences. Plos ONE 5, 1–10.Google Scholar

Festinger, L, Hutte, HA (1954) An experimental investigation of the effect of unstable interpersonal relations in a group. Journal of Abnormal and Social Psychology 49, 513–522.Google Scholar

Flint, J, Cuijpers, P, Horder, J, Koole, SL, Munafo, MR (2015) Is there an excess of significant findings in published studies of psychotherapy for depression? Psychological Medicine 45, 439–446.Google Scholar

Freedman, DA (1983) A note on screening regression equations. American Statistician 37, 152–155.Google Scholar

Gallistel, R (2015) Bayes for beginners: probability and likelihood. Observer Magazine 28(7) [online] (https://www.psychologicalscience.org/observer/bayes-for-beginners-probability-and-likelihood). Accessed 8 September 2017.Google Scholar

Geisser, S (1975) Predictive sample reuse method with applications. Journal of the American Statistical Association 70, 320–328.Google Scholar

Gelman, A, Loken, E (2014) The statistical crisis in science. American Scientist 102, 460–465.Google Scholar

Gelman, A, O'Rourke, K (2014) Discussion: difficulties in making inferences about scientific truth from distributions of published p-values. “Biostatistics (Oxford, England)” 15, 18–23.Google Scholar

Giofrè, D, Cumming, G, Fresc, L, Boedker, I, Tressoldi, P (2017) The influence of journal submission guidelines on authors’ reporting of statistics and use of open research practices. PLOS ONE 12, e0175583.Google Scholar

Hand, DJ (2006) Classifier technology and the illusion of progress, Statistical Science 21, 1–14.CrossRef Google Scholar

Harrell, F (2015) Regression Modeling Strategies. Springer: New York, USA.Google Scholar

Harrell, FE, Lee, KL, Mark, DB (1996) Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Statistics in Medicine 15, 361–387.Google Scholar

Hastie, T, Tibshirani, R, Friedman, J (2009) The Elements of Statistical Learning. Springer-Verlag: New York.CrossRef Google Scholar

Hoerl, AE, Kennard, RW (1970) Ridge regression – biased estimation for nonorthogonal problems. Technometrics 12, 55–67.Google Scholar

Hunter, L (2017) An introduction to machine learning for statisticians. [online] (http://compbio.ucdenver.edu/Hunter_lab/Hunter/ml-for-stat.txt). Accessed 8 September 2017.Google Scholar

Ioannidis, JPA (2005) Why most published research findings are false. Plos Medicine 2, 696–701.Google Scholar

Ioannidis, JPA (2008) Interpretation of tests of heterogeneity and bias in meta-analysis. Journal of Evaluation in Clinical Practice 14, 951–957.Google Scholar

Ioannidis, JPA, Karassa, FB (2010) The need to consider the wider agenda in systematic reviews and meta-analyses: breadth, timing, and depth of the evidence. British Medical Journal 341, c4875.Google Scholar

Ioannidis, JPA, Trikalinos, TA (2007) The appropriateness of asymmetry tests for publication bias in meta-analyses: a large survey. CMAJ: Canadian Medical Association Journal 176, 1091–1096.Google Scholar

Johnson, VE (2013) Uniformly most powerful Bayesian tests. Annals of Statistics 41, 1716–1741.CrossRef Google Scholar PubMed

Kerr, NL, Niedermeier, KE, Kaplan, MF (1999) Bias in jurors vs bias in juries: new evidence from the SDS perspective. Organizational Behavior and Human Decision Processes 80, 70–86.Google Scholar

Kuhn, M, Johnson, K (2013) Applied Predcitive Modelling. Springer-Verlag: New York.Google Scholar

Lawlor, DA, Tilling, K, Davey Smith, G (2016) Triangulation in aetiological epidemiology. International Journal of Epidemiology 45, 1866–1886.Google Scholar

Luengo-Fernandez, R, Leal, J, Gray, A (2015) UK research spend in 2008 and 2012: comparing stroke, cancer, coronary heart disease and dementia. British Medical Journal Open 5, e006648.Google Scholar PubMed

Ly, A, Verhagen, J, Wagenmakers, E-J (2016) Harold Jeffreys's default Bayes factor hypothesis tests: explanation, extension, and application in psychology. Journal of Mathematical Psychology 72, 19–32.Google Scholar

Martin, GN, Clarke, RM (2017) Are psychology journals anti-replication? A snapshot of editorial practices. Frontiers in Psychology 8, 1–6.Google Scholar

Munafo, MR, Black, S (2017) Personality and smoking status: a longitudinal analysis (vol 9, pg 397, 2007). Nicotine and Tobacco Research 19, 129–129.Google Scholar

Munafò, MR, Nosek, BA, Bishop, DVM, Button, KS, Chambers, CD, Percie du Sert, N Simonsohn, U, Wagenmakers, E-J, Ware, JJ, Ioannidis, JPA (2017) A manifesto for reproducible science. Nature Human Behaviour 1, 0021.CrossRef Google Scholar PubMed

Parmar, MKB, Sydes, MR, Morris, TP (2016) How do you design randomised trials for smaller populations? A framework. BMC Medicine 14, 183.Google Scholar

Pashler, H, Wagenmakers, EJ (2012) Editors’ introduction to the special section on replicability in psychological science: a crisis of confidence? Perspectives on Psychological Science 7, 528–530.Google Scholar

Pickles, A (2009) What clinicians need to know about statistical issues and methods. In Rutter's Child and Adolescent Psychiatry (ed. Rutter, M., Bishop, D.V.M., Pine, D. S., Scott, S., Stevenson, S., Taylor, E., Thapar, A.), pp. 111–122. Wiley: Oxford.Google Scholar

Rouder, JN, Speckman, PL, Sun, D, Morey, RD, Iverson, G (2009) Bayesian t tests for accepting and rejecting the null hypothesis. Psychonomic Bulletin and Review 16, 225–237.Google Scholar

Sagan, C (1995) The Demon-Haunted World: Science as A Candle in the Dark. Random House: New York.Google Scholar

Schaller, M (2016) The empirical benefits of conceptual rigor: systematic articulation of conceptual hypotheses can reduce the risk of non-replicable results (and facilitate novel discoveries too). Journal of Experimental Social Psychology 66, 107–115.CrossRef Google Scholar

Schuit, E, Roes, KC, Mol, BW, Kwee, A, Moons, KG, Groenwold, RH (2015) Meta-analyses triggered by previous (false-) significant findings: problems and solutions. Systematic Reviews 4, 57.Google Scholar

Shmueli, G (2010) To explain or to predict? Statistical Science 25, 289–310.Google Scholar

Shmueli, G, Koppius, OR (2011) Predictive analytics in information systems research. Management Information Systems Quarterly 35, 553–572.Google Scholar

Simmons, JP, Nelson, LD, Simonsohn, U (2011) False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science 22, 1359–1366.Google Scholar

Simonsohn, U, Nelson, LD, Simmons, JP (2014) P-curve: a key to the file-drawer. Journal of Experimental Psychology General 143, 534–547.Google Scholar

Sober, E (2006) Parsimony. In The Philosophy of Science: An Encyclopaedia (ed. Sarkar, S., Pfeifer, J.), pp. 531–538. Routledge: Oxford.Google Scholar

Sterling, TD (1959) Publication decisions and their possible effects on inferences drawn from tests of significance – or vice versa. Journal of the American Statistical Association 54, 30–34.Google Scholar

Sterling, TD, Rosenbaum, WL, Weinkam, JJ (1995) Publication decisions revisited – the effect of the outcome of statistical tests on the decision to publish and vice-versa. American Statistician 49, 108–112.Google Scholar

Sterne, JAC, Sutton, AJ, Ioannidis, JPA, Terrin, N, Jones, DR, Lau, J, Carpenter, J, Rücker, G, Harbord, RM, Schmid, CH, Tetzlaff, J, Deeks, JJ, Peters, J, Macaskill, P, Schwarzer, G, Duval, S, Altman, DG, Moher, D, Higgins, JPT (2011) Recommendations for examining and interpreting funnel plot asymmetry in meta-analyses of randomised controlled trials. British Medical Journal 343, D002.Google Scholar

Steyerberg, EW (2009) Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating. Springer: New York.Google Scholar

Stone, M (1974) Cross-Validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society Series B-Statistical Methodology 36, 111–147.Google Scholar

Sumner, P, Vivian-Griffiths, S, Boivin, J, Williams, A, Venetis, CA, Davies, A, Ogden, J, Whelan, L, Hughes, B, Dalton, B, Boy, F, Chambers, CD (2014) The association between exaggeration in health related science news and academic press releases: retrospective observational study. British Medical Journal 349, g7015.Google Scholar

Taylor, AE, Munafò, MR (2016) Triangulating meta-analyses: the example of the serotonin transporter gene, stressful life events and major depression. BMC Psychology 4, 23.Google Scholar

Tsilidis, KK, Panagiotou, OA, Sena, ES, Aretouli, E, Evangelou, E, Howells, DW, Salman, RAS, Macleod, MR, Ioannidis, JPA (2013) Evaluation of excess significance bias in animal studies of neurological diseases. Plos Biology 11.Google Scholar

Van Batenburg-Eddes, T, Brion, MJ, Henrichs, J, Jaddoe, VWV, Hofman, A, Verhulst, FC, Lawlor, DA, Smith, GD, Tiemeier, H (2013) Parental depressive and anxiety symptoms during pregnancy and attention problems in children: a cross-cohort consistency study. Journal of Child Psychology and Psychiatry 54, 591–600.Google Scholar

Varma, S, Simon, R (2006) Bias in error estimation when using cross-validation for model selection. BMC Bioinformatics 7, 91–91.Google Scholar

Wagenmakers, EJ, Wetzels, R, Borsboom, D, van der Maas, HLJ (2011) Why psychologists must change the way they analyze their data: the case of psi: comment on Bem (2011). Journal of Personality and Social Psychology 100, 426–432.Google Scholar

Wetzels, R, Matzke, D, Lee, MD, Rouder, JN, Iverson, GJ, Wagenmakers, E-J (2011) Statistical evidence in experimental psychology. Perspectives on Psychological Science 6, 291–298.Google Scholar

Wicherts, JM, Veldkamp, CLS, Augusteijn, HEM, Bakker, M, van Aert, RCM, van Assen, MALM (2016) Degrees of freedom in planning, running, analyzing, and reporting psychological studies: a checklist to avoid p-hacking. Frontiers in Psychology 7, 1–12.Google Scholar

Yong, E (2012) Bad copy. Nature 485, 298–300.Google Scholar

Stahl and Pickles supplementary material 1

Appendix

File 150.9 KB

Article contents

Fact or fiction: reducing the proportion and impact of false positives

Abstract

Keywords

Access options

References

Stahl and Pickles supplementary material 1

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests