The generalizability crisis

Tal Yarkoni

doi:10.1017/S0140525X20001685

The generalizability crisis

Published online by Cambridge University Press: 21 December 2020

Tal Yarkoni

Show author details

Tal Yarkoni*: Affiliation:
Department of Psychology, The University of Texas at Austin, Austin, TX78712-1043, [email protected]

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Most theories and hypotheses in psychology are verbal in nature, yet their evaluation overwhelmingly relies on inferential statistical procedures. The validity of the move from qualitative to quantitative analysis depends on the verbal and statistical expressions of a hypothesis being closely aligned – that is, that the two must refer to roughly the same set of hypothetical observations. Here, I argue that many applications of statistical inference in psychology fail to meet this basic condition. Focusing on the most widely used class of model in psychology – the linear mixed model – I explore the consequences of failing to statistically operationalize verbal hypotheses in a way that respects researchers' actual generalization intentions. I demonstrate that although the “random effect” formalism is used pervasively in psychology to model intersubject variability, few researchers accord the same treatment to other variables they clearly intend to generalize over (e.g., stimuli, tasks, or research sites). The under-specification of random effects imposes far stronger constraints on the generalizability of results than most researchers appreciate. Ignoring these constraints can dramatically inflate false-positive rates, and often leads researchers to draw sweeping verbal generalizations that lack a meaningful connection to the statistical quantities they are putatively based on. I argue that failure to take the alignment between verbal and statistical expressions seriously lies at the heart of many of psychology's ongoing problems (e.g., the replication crisis), and conclude with a discussion of several potential avenues for improvement.

Keywords

Generalization inference philosophy of science psychology random effects statistics

Type: Target Article
Information: Behavioral and Brain Sciences , Volume 45 , 2022 , e1

DOI: https://doi.org/10.1017/S0140525X20001685 [Opens in a new window]
Copyright: Copyright © The Author(s), 2020. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Acosta, A., Adams, R. B. Jr., Albohn, D. N., Allard, E. S., Beek, T., Benning, S. D., … Zwaan, R. A. (2016). Registered replication report: Strack, Martin, & Stepper (1988). Perspectives on Psychological Science, 11(6), 917–928.Google Scholar

Alogna, V. K., Attaya, M. K., Aucoin, P., Bahník, Š, Birch, S., Birt, A. R., … Zwaan, R. A. (2014). Registered replication report: Schooler and Engstler-Schooler (1990). Perspectives on Psychological Science, 9(5), 556–578.CrossRef Google Scholar

Baayen, R. H., Davidson, D. J., & Bates, D. M. (2008). Mixed-effects modeling with crossed random effects for subjects and items. Journal of Memory and Language, 59(4), 390–412.CrossRef Google Scholar

Balota, D. A., Yap, M. J., Hutchison, K. A., & Cortese, M. J. (2012). Megastudies: What do millions (or so) of trials tell us about lexical processing? In Adelman, J. S. (Ed.), Visual word recognition volume 1: Models and methods, orthography and phonology (pp. 90–115). Psychology Press.Google Scholar

Baribault, B., Donkin, C., Little, D. R., Trueblood, J. S., Oravecz, Z., van Ravenzwaaij, D., … Vandekerckhove, J. (2018). Metastudies for robust tests of theory. Proceedings of the National Academy of Sciences of the United States of America, 115(11), 2607–2612.CrossRef Google Scholar PubMed

Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68(3), 255–278.CrossRef Google Scholar PubMed

Bates, D., Maechler, M., Bolker, B., Walker, S., Christensen, R. H. B., Singmann, H., … Krivitsky, P. N. (2014). Lme4: Linear mixed-effects models using eigen and S4. R Package Version, 1(7), 1–23.Google Scholar

Benjamin, D. J., Berger, J. O., Johannesson, M., Nosek, B. A., Wagenmakers, E.-J., & Berk, R., … Johnson, V. E. (2018). Redefine statistical significance. Nature Human Behaviour, 2(1), 6.CrossRef Google Scholar PubMed

Bergelson, E., Bergmann, C., Byers-Heinlein, K., Cristia, A., Cusack, R., & Dyck, K., … (2017). Quantifying sources of variability in infancy research using the infant-directed speech preference.Google Scholar

Borsboom, D., Mellenbergh, G. J., & van Heerden, J. (2003). The theoretical status of latent variables. Psychological Review, 110(2), 203–219.CrossRef Google Scholar PubMed

Breiman, L. (2001). Statistical modeling: The two cultures. Statistical Science, 16(3), 199–215.CrossRef Google Scholar

Brennan, R. L. (1992). Generalizability theory. Educational Measurement: Issues and Practice, 11(4), 27–34.CrossRef Google Scholar

Brunswik, E. (1947). Systematic and representative design of psychological experiments. In Proceedings of the Berkeley symposium on mathematical statistics and probability (pp. 143–202).Google Scholar

Carpenter, B., Gelman, A., Hoffman, M. D., Lee, D., Goodrich, B., Betancourt, M., … Riddell, A. (2017). Stan: A probabilistic programming language. Journal of Statistical Software, 76(1), 1–32.CrossRef Google Scholar

Chabris, C. F., Hebert, B. M., Benjamin, D. J., Beauchamp, J., Cesarini, D., van der Loos, M., … Laibson, D. (2012). Most reported genetic associations with general intelligence are probably false positives. Psychological Science, 23(11), 1314–1323.CrossRef Google Scholar PubMed

Cheung, I., Campbell, L., LeBel, E. P., Ackerman, R. A., Aykutoğlu, B., Bahník, Š, … Yong, J. C. (2016). Registered replication report: Study 1 from Finkel, Rusbult, Kumashiro, & Hannon (2002). Perspectives on Psychological Science, 11(5), 750–764.CrossRef Google Scholar

Clark, H. H. (1973). The language-as-fixed-effect fallacy: A critique of language statistics in psychological research. Journal of Verbal Learning and Verbal Behavior, 12(4), 335–359.CrossRef Google Scholar

Cohen, J. (2016). The earth is round (p < 0.05). In Harlow, L. L., Mulaik, S. A., & Steiger, J. H. (Eds.), What if there were no significance tests? (pp. 69–82). Routledge.Google Scholar

Coleman, E. B. (1964). Generalizing to a language population. Psychological Reports, 14(1), 219–226.CrossRef Google Scholar

Colhoun, H. M., McKeigue, P. M., & Davey Smith, G. (2003). Problems of reporting genetic associations with complex outcomes. Lancet (London, England), 361(9360), 865–872.CrossRef Google Scholar PubMed

Cornfield, J., & Tukey, J. W. (1956). Average values of mean squares in factorials. The Annals of Mathematical Statistics, 27(4), 907–949.CrossRef Google Scholar

Crabbe, J. C., Wahlsten, D., & Dudek, B. C. (1999). Genetics of mouse behavior: Interactions with laboratory environment. Science, 284(5420), 1670–1672.CrossRef Google Scholar PubMed

Crits-Christoph, P., & Mintz, J. (1991). Implications of therapist effects for the design and analysis of comparative studies of psychotherapies. Journal of Consulting and Clinical Psychology, 59(1), 20–26.CrossRef Google Scholar PubMed

Cronbach, L. J. (1975). Beyond the two disciplines of scientific psychology. American Psychologist, 30(2), 116.CrossRef Google Scholar

Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52(4), 281–302.CrossRef Google Scholar PubMed

Cronbach, L. J., Rajaratnam, N., & Gleser, G. C. (1963). Theory of generalizability: A liberalization of reliability theory. The British Journal of Mathematical and Statistical Psychology, 16(2), 137–163.CrossRef Google Scholar

Draper, D. (1995). Assessment and propagation of model uncertainty. Journal of the Royal Statistical Society: Series B (Methodological), 57(1), 45–70.Google Scholar

Ebstein, R. P., Novick, O., Umansky, R., Priel, B., Osher, Y., Blaine, D., … Belmaker, R. H. (1996). Dopamine D4 receptor (D4DR) exon III polymorphism associated with the human personality trait of novelty seeking. Nature Genetics, 12(1), 78–80.CrossRef Google Scholar PubMed

Eerland, A. S., Magliano, A. M., Zwaan, J. P., Arnal, R. A., Aucoin, J. D., & Crocker, P. (2016). Registered replication report: Hart & Albarracín (2011). Perspectives on Psychological Science, 11(1), 158–171.CrossRef Google Scholar

Feynman, R. P. (1974). Cargo cult science. Engineering Sciences, 37(7), 10–13.Google Scholar

Francis, G. (2012). Publication bias and the failure of replication in experimental psychology. Psychonomic Bulletin & Review, 19(6), 975–991.CrossRef Google Scholar PubMed

Gelman, A. (2015). The connection between varying treatment effects and the crisis of unreplicable research: A Bayesian perspective. Journal of Management, 41(2), 632–643.CrossRef Google Scholar

Gelman, A. (2016). The problems with p-values are not just with p-values. The American Statistician, 70(supplemental material to the ASA statement on p-values and statistical significance), 10.Google Scholar

Gelman, A. (2018). The failure of null hypothesis significance testing when studying incremental changes, and what to do about it. Personality and Social Psychology Bulletin, 44(1), 16–23.CrossRef Google Scholar

Gelman, A., & Hill, J. (2006). Data analysis using regression and multilevel/hierarchical models. Cambridge University Press.CrossRef Google Scholar

Gelman, A., & Loken, E. (2013). The garden of forking paths: Why multiple comparisons can be a problem, even when there is no “fishing expedition” or “p-hacking” and the research hypothesis. Downloaded January, 1–17.Google Scholar

Gelman, A., & Shalizi, C. R. (2013). Philosophy and the practice of Bayesian statistics. British Journal of Mathematical and Statistical Psychology, 66(1), 8–38.CrossRef Google Scholar PubMed

Gigerenzer, G. (2004). Mindless statistics. The Journal of Socio-Economics, 33(5), 587–606.CrossRef Google Scholar

Gigerenzer, G. (2017). A theory integration program. Decision, 4(3), 133.CrossRef Google Scholar

Gigerenzer, G., & Marewski, J. N. (2015). Surrogate science: The idol of a universal method for scientific inference. Journal of Management, 41(2), 421–440.CrossRef Google Scholar

Guion, R. M. (1980). On Trinitarian doctrines of validity. Professional Psychology, 11(3), 385–398.CrossRef Google Scholar

Hamilton, L. S., & Huth, A. G. (2018). The revolution will not be controlled: Natural stimuli in speech neuroscience. Language, Cognition and Neuroscience, 35(5), 573–582.CrossRef Google Scholar

Hofman, J. M., Sharma, A., & Watts, D. J. (2017). Prediction and explanation in social systems. Science, 355(6324), 486–488.CrossRef Google Scholar PubMed

Huth, A. G., de Heer, W. A., Griffiths, T. L., Theunissen, F. E., & Gallant, J. L. (2016). Natural speech reveals the semantic maps that tile human cerebral cortex. Nature, 532(7600), 453–458.CrossRef Google Scholar PubMed

Huth, A. G., Nishimoto, S., Vu, A. T., & Gallant, J. L. (2012). A continuous semantic space describes the representation of thousands of object and action categories across the human brain. Neuron, 76(6), 1210–1224.CrossRef Google Scholar PubMed

Ioannidis, J. (2008). Why most discovered true associations are inflated. Epidemiology (Cambridge, Mass.), 19(5), 640–648.CrossRef Google Scholar PubMed

Ioannidis, J. P. A. (2005). Why most published research findings are false. PLoS Medicine, 2(8), e124.CrossRef Google Scholar PubMed

John, L. K., Loewenstein, G., & Prelec, D. (2012). Measuring the prevalence of questionable research practices with incentives for truth telling. Psychological Science, 23(5), 524–532.CrossRef Google Scholar PubMed

Jordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends, perspectives, and prospects. Science, 349(6245), 255–260.CrossRef Google Scholar PubMed

Judd, C. M., Westfall, J., & Kenny, D. A. (2012). Treating stimuli as a random factor in social psychology: A new and comprehensive solution to a pervasive but largely ignored problem. Journal of Personality and Social Psychology, 103(1), 54–69.CrossRef Google Scholar PubMed

Keuleers, E., & Balota, D. A. (2015). Megastudies, crowdsourcing, and large datasets in psycholinguistics: An overview of recent developments. The Quarterly Journal of Experimental Psychology, 68(8), 1457–1468.CrossRef Google Scholar PubMed

Kruschke, J. K., & Liddell, T. M. (2017). The Bayesian new statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective. Psychonomic Bulletin & Review, 25, 178–206.CrossRef Google Scholar

Kühberger, A., Fritz, A., & Scherndl, T. (2014). Publication bias in psychology: A diagnosis based on the correlation between effect size and sample size. PLoS ONE, 9(9), e105825.CrossRef Google Scholar PubMed

Lakens, D. (2017). Equivalence tests: A practical primer for t tests, correlations, and meta-analyses. Social Psychological and Personality Science, 8(4), 355–362.CrossRef Google Scholar

Lakens, D., Adolfi, F. G., Albers, C. J., Anvari, F., Apps, M. A. J., Argamon, S. E., … Zwaan, R. A. (2018). Justify your alpha. Nature Human Behaviour, 2(3), 168–171.CrossRef Google Scholar

LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444.CrossRef Google Scholar PubMed

Lesch, K. P., Bengel, D., Heils, A., Sabol, S. Z., Greenberg, B. D., Petri, S., … Murphy, D. L. (1996). Association of anxiety-related traits with a polymorphism in the serotonin transporter gene regulatory region. Science, 274(5292), 1527–1531.CrossRef Google Scholar PubMed

Lilienfeld, S. O. (2004). Taking theoretical risks in a world of directional predictions. Applied and Preventive Psychology, 11(1), 47–51.CrossRef Google Scholar

Lilienfeld, S. O. (2017). Psychology's replication crisis and the grant culture: Righting the ship. Perspectives on Psychological Science, 12(4), 660–664.CrossRef Google Scholar

Lykken, D. T. (1968). Statistical significance in psychological research. Psychological Bulletin, 70(3), 151–159.CrossRef Google Scholar PubMed

MacLeod, C. M. (1991). Half a century of research on the Stroop effect: An integrative review. Psychological Bulletin, 109(2), 163–203.CrossRef Google Scholar

Marewski, J. N., & Olsson, H. (2009). Beyond the null ritual: Formal modeling of psychological processes. Zeitschrift für Psychologie/Journal of Psychology, 217(1), 49–60.CrossRef Google Scholar

Matuschek, H., Kliegl, R., Vasishth, S., Baayen, H., & Bates, D. (2017). Balancing type I error and power in linear mixed models. Journal of Memory and Language, 94, 305–315.CrossRef Google Scholar

Mayo, D. G. (1991). Novel evidence and severe tests. Philosophy of Science, 58(4), 523–552.CrossRef Google Scholar

Mayo, D. G. (2018). Statistical inference as severe testing. Cambridge University Press.CrossRef Google Scholar

McShane, B. B., Gal, D., Gelman, A., Robert, C., & Tackett, J. L. (2019). Abandon statistical significance. The American Statistician, 73(Suppl. 1), 235–245.CrossRef Google Scholar

Meehl, P. (1997). The problem is epistemology, not statistics: Replace significance tests by confidence intervals and quantify accuracy of risky numerical predictions. In Harlow, L. L., Mulaik, S. A. & Steiger, J. H. (Eds.), What if there were no significance tests? (pp. 393–425). Erlbaum.Google Scholar

Meehl, P. E. (1967). Theory-testing in psychology and physics: A methodological paradox. Philosophy of Science, 34(2), 103–115.CrossRef Google Scholar

Meehl, P. E. (1978). Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress of soft psychology. Journal of Consulting and Clinical Psychology, 46(4), 806.CrossRef Google Scholar

Meehl, P. E. (1986). What social scientists don't understand. In Fiske, D. W. & Shweder, R. A. (Eds.), Metatheory in social science: Pluralisms and subjectivities (pp. 315–338). University of Chicago Press.Google Scholar

Meehl, P. E. (1990a). Appraising and amending theories: The strategy of Lakatosian defense and two principles that warrant it. Psychological Inquiry, 1(2), 108–141.CrossRef Google Scholar

Meehl, P. E. (1990b). Why summaries of research on psychological theories are often uninterpretable. Psychological Reports, 66(1), 195–244.CrossRef Google Scholar

Meissner, C. A., & Brigham, J. C. (2001). A meta-analysis of the verbal overshadowing effect in face identification. Applied Cognitive Psychology, 15(6), 603–616.CrossRef Google Scholar

Meissner, C. A., & Memon, A. (2002). Verbal overshadowing: A special issue exploring theoretical and applied issues. Applied Cognitive Psychology, 16(8), 869–872.CrossRef Google Scholar

Moshontz, H., Campbell, L., Ebersole, C. R., IJzerman, H., Urry, H. L., Forscher, P. S., … Chartier, C. R. (2018). Psychological science accelerator: Advancing psychology through a distributed collaborative network. Advances in Methods and Practices in Psychological Science, 1(4), 501–515.CrossRef Google Scholar PubMed

Nagel, M., Jansen, P. R., Stringer, S., Watanabe, K., de Leeuw, C. A., Bryois, J., … Posthuma, D. (2018). Meta-analysis of genome-wide association studies for neuroticism in 449,484 individuals identifies novel genetic loci and pathways. Nature Genetics, 50(7), 920–927.CrossRef Google Scholar PubMed

O'Leary-Kelly, S. W., & Vokurka, R. J. (1998). The empirical assessment of construct validity. Journal of Operations Management, 16(4), 387–405.CrossRef Google Scholar

Pashler, H., & Wagenmakers, E.-J. (2012). Editors’ introduction to the special section on replicability in psychological science: A crisis of confidence? Perspectives on Psychological Science, 7(6), 528–530.CrossRef Google Scholar

Popper, K. (2014). Conjectures and refutations: The growth of scientific knowledge. Routledge.CrossRef Google Scholar

Reuss, H., Kiesel, A., & Kunde, W. (2015). Adjustments of response speed and accuracy to unconscious cues. Cognition, 134, 57–62.CrossRef Google Scholar PubMed

Roberts, S., & Pashler, H. (2000). How persuasive is a good fit? A comment on theory testing. Psychological Review, 107(2), 358.CrossRef Google Scholar

Rouder, J. N., Speckman, P. L., Sun, D., Morey, R. D., & Iverson, G. (2009). Bayesian T tests for accepting and rejecting the null hypothesis. Psychonomic Bulletin & Review, 16(2), 225–237.CrossRef Google Scholar

Rozin, P. (2001). Social psychology and science: Some lessons from Solomon Asch. Personality and Social Psychology Review, 5(1), 2–14.CrossRef Google Scholar

Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., … Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252.CrossRef Google Scholar

Salvatier, J., Wiecki, T. V., & Fonnesbeck, C. (2016). Probabilistic programming in python using PyMC3. PeerJ Computer Science, 2, e55.CrossRef Google Scholar

Savage, J. E., Jansen, P. R., Stringer, S., Watanabe, K., Bryois, J., de Leeuw, C. A., … Posthuma, D. (2018). Genome-wide association metaanalysis in 269,867 individuals identifies new genetic and functional links to intelligence. Nature Genetics, 50(7), 912–919.CrossRef Google Scholar

Schooler, J. W., & Engstler-Schooler, T. Y. (1990). Verbal overshadowing of visual memories: Some things are better left unsaid. Cognitive Psychology, 22(1), 36–71.CrossRef Google Scholar PubMed

Shavelson, R. J., & Webb, N. M. (1991). Generalizability theory: A primer. SAGE.Google Scholar

Shmueli, G. (2010). To explain or to predict? Statistical Science, 25(3), 289–310.CrossRef Google Scholar

Shrout, P. E., & Rodgers, J. L. (2018). Psychology, science, and knowledge construction: Broadening perspectives from the replication crisis. Annual Review of Psychology, 69, 487–510.CrossRef Google Scholar PubMed

Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-Positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22(11), 1359–1366.CrossRef Google Scholar PubMed

Simons, D. J., Holcombe, A. O., & Spellman, B. A. (2014). An introduction to registered replication reports at perspectives on psychological science. Perspectives on Psychological Science, 9(5), 552–555.CrossRef Google Scholar PubMed

Simons, D. J., Shoda, Y., & Lindsay, D. S. (2017). Constraints on generality (COG): A proposed addition to all empirical papers. Perspectives on Psychological Science, 12(6), 1123–1128.CrossRef Google Scholar PubMed

Smaldino, P. E. (2017). Models are stupid, and we need more of them. In Vallacher, R. R., Read, S. J., & Nowak, A. (Eds.), Computational social psychology (pp. 311–331). Routledge.CrossRef Google Scholar

Smaldino, P. E., & McElreath, R. (2016). The natural selection of bad science. Royal Society Open Science, 3(9), 160384.CrossRef Google Scholar PubMed

Smedslund, J. (1991). The pseudoempirical in psychology and the case for psychologic. Psychological Inquiry, 2(4), 325–338.CrossRef Google Scholar

Spiers, H. J., & Maguire, E. A. (2007). Decoding human brain activity during real-world experiences. Trends in Cognitive Sciences, 11(8), 356–365.CrossRef Google Scholar PubMed

Steckler, A., McLeroy, K. R., Goodman, R. M., Bird, S. T., & McCormick, L. (1992). Toward integrating qualitative and quantitative methods: An introduction. Health Education Quarterly, 19(1), 1–8.CrossRef Google Scholar PubMed

Stroop, J. R. (1935). Studies of interference in serial verbal reactions. Journal of Experimental Psychology, 18(6), 643.CrossRef Google Scholar

Sullivan, P. F. (2007). Spurious genetic associations. Biological Psychiatry, 61(10), 1121–1126.CrossRef Google Scholar PubMed

Tong, C. (2019). Statistical inference enables bad science; statistical thinking enables good science. The American Statistician, 73(Suppl. 1), 246–261.CrossRef Google Scholar

Trafimow, D. (2014). Editorial. Basic and Applied Social Psychology, 36(1), 1–2.CrossRef Google Scholar

Trafimow, D., & Marks, M. (2015). Editorial. Basic and Applied Social Psychology, 37(1), 1–2.CrossRef Google Scholar

Van Bavel, J. J., Mende-Siedlecki, P., Brady, W. J., & Reinero, D. A. (2016). Contextual sensitivity in scientific reproducibility. Proceedings of the National Academy of Sciences of the United States of America, 113(23), 6454–6459.CrossRef Google Scholar PubMed

Wagenmakers, E.-J. (2007). A practical solution to the pervasive problems of p values. Psychonomic Bulletin & Review, 14(5), 779–804.CrossRef Google Scholar

Wahlsten, D., Metten, P., Phillips, T. J., Boehm, S. L., Burkhart-Kasch, S., & Dorow, J., … (2003). Different data from different labs: Lessons from studies of gene–environment interaction. Journal of Neurobiology, 54(1), 283–311.CrossRef Google Scholar PubMed

Walker, H. A., & Cohen, B. P. (1985). Scope statements: Imperatives for evaluating theory. American Sociological Review, 50, 288–301.CrossRef Google Scholar

Westfall, J., Nichols, T. E., & Yarkoni, T. (2016). Fixing the stimulus-as-fixed-effect fallacy in task fMRI. Wellcome Open Research, 1, 23.CrossRef Google Scholar PubMed

Wolsiefer, K., Westfall, J., & Judd, C. M. (2017). Modeling stimulus variation in three common implicit attitude tasks. Behavior Research Methods, 49(4), 1193–1209.CrossRef Google Scholar PubMed

Woolston, C. (2015). Psychology journal bans P values. Nature News, 519(7541), 9.CrossRef Google Scholar

Wray, N. R., Ripke, S., Mattheisen, M., Trzaskowski, M., Byrne, E. M., & Abdellaoui, A., … Major Depressive Disorder Working Group of the Psychiatric Genomics Consortium. (2018). Genome-wide association analyses identify 44 risk variants and refine the genetic architecture of major depression. Nature Genetics, 50(5), 668–681.CrossRef Google Scholar PubMed

Yarkoni, T. (2009). Big correlations in little studies: Inflated fMRI correlations reflect low statistical power-commentary on Vul et al. (2009). Perspectives on Psychological Science, 4(3), 294–298.CrossRef Google Scholar

Yarkoni, T., & Westfall, J. (2017). Choosing prediction over explanation in psychology: Lessons from machine learning. Perspectives on Psychological Science, 12(6), 1100–1122.CrossRef Google Scholar PubMed