Hostname: page-component-cd9895bd7-p9bg8 Total loading time: 0 Render date: 2024-12-23T08:22:50.642Z Has data issue: false hasContentIssue false

Null Hypothesis Significance Testing, p-values, Effects Sizes and Confidence Intervals

Published online by Cambridge University Press:  07 December 2017

Michael Perdices*
Affiliation:
Department Of Neurology, Royal North Shore Hospital, New South Wales, Australia
*
Address for correspondence: Department Of Neurology, Royal North Shore Hospital, The University of Sydney Medical School, Northern Clinical School, Discipline of Psychiatry, New South Wales, Australia. E-mail: [email protected]
Get access

Abstract

There has been controversy over Null Hypothesis Significance Testing (NHST) since the first quarter of the 20th century and misconceptions about it still abound. The first section of this paper briefly discusses some of the problems and limitations of NHST. Overwhelmingly, the ‘holy grail’ of researchers has been to obtain significant p-values. In 1999 the American Psychological Association (APA) recommended that if NHST was used in data analysis, then researchers should report effect sizes (ESs) and their confident intervals (CIs) as well as p-values. The APA recommendations are summarised in the next section of the paper. But as neuropsychological rehabilitation clinicians, the primary interest is (or should be) to determine whether or not the effect of an intervention is clinically important, not just statistically significant. In this context, ESs and their CIs provide information relevant to clinicians. The next section of the paper reviews common ESs and worked out examples are provided for the calculation of three commonly used ES (Cohen's d, Hedge's g and Glass’ delta). Web-based resources for calculating other ESs and their CIs are also reviewed.

Type
Articles
Copyright
Copyright © Australasian Society for the Study of Brain Impairment 2017 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

APA Publications and Communications Board Working Group on Journal Article Reporting Standards (2008). Reporting standards for research in psychology: Why do we need them? What might they be?. American Psychologist, 63 (9), 839851.Google Scholar
Bakeman, R. (2005). Recommended effect size statistics for repeated measures designs. Behavior Research Methods, 37 (3), 379384.CrossRefGoogle ScholarPubMed
Berben, L., Sereika, S.M., & Engberg, S. (2012). Effect size estimation: Methods and examples. International Journal of Nursing Studies, 49, 10391047.CrossRefGoogle ScholarPubMed
Berkson, J. (1938). Some difficulties of interpretation encountered in application of Chi squared. Journal of the American Statistical Association, 33 (203), 526536.Google Scholar
Carver, R.P. (1978). The case against statistical significance. Harvard Educational Review, 48 (3), 378399.Google Scholar
Castro Sotos, A.E., Vanhoof, S., Van den Noortgate, W., & Onghena, P. (2007). Students' misconceptions of statistical inference: A review of the empirical evidence from research on statistics education. Educational Research Review, 2 (2), 98113.Google Scholar
Clark, C.A. (1963). Hypothesis testing in relation to statistical methodology. Review of Educational Research, 33, 455473.Google Scholar
Cohen, J. (1962). The statistical power of abnormal–social psychological research. Journal of Abnormal and Social Psychology, 65, 145153.Google Scholar
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Erlbaum.Google Scholar
Cohen, J. (1990). Things I have learned (So far). American Psychologist, 45 (12), 13041312.Google Scholar
Cohen, J. (1994). The Earth is round (p < .5). American Psychologist, 49 (12), 9971003.Google Scholar
Cooper, H., Hedges, L.V., & Valentine, J.C. (2009). The handbook of research and synthesis and meta-analysis. New York: Russell Sage Foundation.Google Scholar
Cumming, G. & Finch, S. (2001). A primer on the understanding, use, and calculation of confidence intervals that are based on central and noncentral distributions. Educational and Psychological Measurement, 61 (4), 532574.Google Scholar
Draper, S.W. (2016). Effect Size. Retrieved from http://www.psy.gla.ac.uk/~steve/best/effect.html Google Scholar
Ellis, P.D. (2010). The essential guide to effect sizes: Statistical power, meta-analysis, and the interpretation of research results. New York: Cambridge University Press.CrossRefGoogle Scholar
Falk, R. & Greenbaum, C.W. (1995). Significance tests die hard. The amazing persistence of a probabilistic misconception. Theory and Psychology, 5 (1), 7698.Google Scholar
Ferguson, C.J. (2009). An effect size primer: A guide for clinicians and researchers. Professional Psychology: Research and Practice, 40 (5), 532538.Google Scholar
Fethney, J. (2010). Statistical and clinical significance, and how to use confidence intervals to help interpret both. Australian Critical Care, 23, 9397.Google Scholar
Fisher, R.A. (1925). Statistical methods for research workers. Edinburgh: Oliver and Boyd.Google Scholar
Gigerenzer, G. (1993). The superego, the ego, and the id in statistical reasoning. In Keren, G. & Lewis, C. (Eds.), A handbook for data analysis in the behavioral sciences. Methodological issues (pp. 311339). Hillsdale, NJ: Lawrence Erlbaum Associates.Google Scholar
Glaser, D.N. (1999). The controversy of significance testing: Misconceptions and alternatives. American Journal of Critical care, 8 (5), 291296.Google Scholar
Glass, G.V., McGaw, B., & Smith, M.L. (1981). Meta-analysis in social research. Beverly Hills, CA: Sage Google Scholar
Gliner, J.A., Leech, N.L., & Morgan, G.A. (2002). Problems with null hypothesis significance testing (NHST): What do the textbooks say?. The Journal of Experimental Education, 71 (1), 8392.Google Scholar
Halsey, L.G., Curran-Everett, D., Vowler, S.L., & Drummond, G.B. (2015). The fickle P value generates irreproducible results. Nature Methods, 12 (3), 179185.Google Scholar
Hedges, L.V. (1981). Distribution theory for Glass's estimator of effect size and related estimators. Journal of Educational Statistics, 6 (2), 106128.Google Scholar
Howell, D.C. (2010). Confidence intervals on effect size. Retrieved from: https://www.uvm.edu/~dhowell/methods7/Supplements/Confidence%20Intervals%20on%20Effect%20Size.pdf Google Scholar
Huberty, C.J. (2002). A history of effect size indices. Educational and Psychological Measurement, 62 (2), 227240.CrossRefGoogle Scholar
Huberty, C.J., & Pike, C.J. (1999). On some history regarding statistical testing. Advances in Social Science Methodology, 5, 122.Google Scholar
Keselman, H.J., Huberty, C.J., Lix, L.M., Olejnik, S., Cribbie, R., Donahue, B., . . . Levin, J.R. (1998). Statistical practices of educational researchers: An analysis of their ANOVA, MANOVA, and ANCOVA analyses. Review of Educational Research, 68, 350386.Google Scholar
Kirk, R.E. (1996). Practical significance: A concept whose time has come. Educational and Psychological Measurement, 56 (5), 746759.Google Scholar
Kraemer, H.C., Morgan, G.A., Leech, N.L., Gliner, J.A., Vaske, J.J., & Harmon, R.J. (2003). Measures of clinical significance. Journal of the American Academy of Child and Adolescent Psychiatry, 42 (12), 15241529.Google Scholar
Krishnan, S. & Idris, N. (2014). Students’ misconceptions about hypothesis test. REDIMAT: Journal of Research in Mathematics Education, 3 (3), 276293.Google Scholar
Lambdin, C. (2012). Significance tests as sorcery: Science is empirical-significance tests are not. Theory and Psychology, 22 (1), 6790.Google Scholar
Li-Ting, C., & Chao-Ying, J.P. (2013). Constructing confidence intervals for effect sizes in ANOVA designs. Journal of Modern Applied Statistical Methods, 12 (2), 82104.Google Scholar
Meehl, P.E. (1967). Theory-testing in psychology and physics: A methodological paradox. Philosophy of Science, 34 (2), 103115.Google Scholar
Meyer, G.J., McGrath, R.E., & Rosenthal, R. (2003 ). Basic effect size guide with SPSS® and SAS® syntax. Retrieved from www.tandf.co.uk/journals/authors/hjpa/resources/basiceffectsizeguide.rtf.Google Scholar
Neyman, J., & Pearson, E. (1928a). On the use and interpretation of certain test criteria for purposes of statistical inference: Part I. Biometrika, 20A, 175240.Google Scholar
Neyman, J., & Pearson, E. (1928b). On the use and interpretation of certain test criteria for purposes of statistical inference: Part II. Biometrika, 20A, 263294.Google Scholar
Nickerson, R.S. (2000). Null hypothesis significance testing: A review of an old and continuing controversy. Psychological Methods, 5 (2), 241301.Google Scholar
Olejnik, S., & Algina, J. (2000). Measures of effect size for comparative studies: Applications, interpretations, and limitations. Contemporary Educational Psychology, 25, 241286.Google Scholar
Peng, C.-Y. J., Chen, L.-T., Chiang, H.-M., & Chiang, Y.-C. (2013). The impact of APA and AERA guidelines on effect size reporting. Educational Psychology Review, 25, 157209.CrossRefGoogle Scholar
Prentice, D.A., & Miller, D.T. (1992). When small effects are impressive. Psychological Bulletin, 112 (1), 160164.Google Scholar
Rea, L.M., & Parker, R.A. (1992). Designing and conducting survey research. San Francisco: Jossey-Boss.Google Scholar
Richardson, J.T.E. (2011). Eta squared and partial eta squared as measures of effect size in educational research. Educational Research Review, 6 (12), 135147.Google Scholar
Rozeboom, W.W. (1960). The fallacy of the null-hypothesis significance test. Psychological Bulletin, 57 (5), 416428.Google Scholar
Sainani, K.L. (2012). Clinical versus statistical significance. American Academy of Physical Medicine and Rehabilitation, 4 (6), 442445.Google ScholarPubMed
Schatz, P., Jay, K.A., McComb, J., & McLaughlin, J.R. (2005). Misuse of statistical tests in archives of clinical neuropsychology publications. Archives of Clinical Neuropsychology, 20, 10531059.CrossRefGoogle ScholarPubMed
Smithson, M. (2001). Correct confidence intervals for various regression effect sizes and parameters: The importance of noncentral distributions in computing intervals. Educational and Psychological Measurement, 61 (4), 605632.Google Scholar
Thompson, B. (2002). “Statistical,” “Practical,” and “Clinical”: How many kinds of significance do counselors need to consider?. Journal of Counselling and Development, 80, 6471.Google Scholar
Torciano, M. (2017 ) Efficient effect size computation. Retrieved from https://cran.r-project.org/web/packages/effsize/effsize.pdf Google Scholar
Turner, H.M., & Bernard, R.M. (2006). Calculating and synthesizing effect sizes. Contemporary Issues in Communication Science and Disorders, 33, 4255.Google Scholar
Vallecillos, A. (2001). Cuestiones metodológicas en la investigación educativa. Quinto Simposio de la Sociedad Española de Investigación en Educación Matemática, Almería, Spain.Google Scholar
Vallecillos, A., & Batanero, C. (1997b). Conceptos activados en el contraste de hipótesis estadísticas y su comprensión por estudiantes universitarios. Recherches en Didactique des Mathématiques, 17 (1), 2948.Google Scholar
Vallecillos, A., & Batanero, M.C. (1997a). Aprendizaje y enseñanza del contraste de hipotesis: Concepciones y errores. Enseñanza de las Ciencias, 15 (2), 189197.Google Scholar
Wilkinson, L. and the Task Force on Statistical Inference APA Board of Scientific Affairs. (1999). Statistical methods in psychology journals: Guidelines and explanations. American Psychologist, 54 (8), 594604.Google Scholar
Wilson, D.B. (2011). Interpretation.ppt. Retrieved from http://mason.gmu.edu/~dwilsonb/ma.html.Google Scholar