Hostname: page-component-745bb68f8f-s22k5 Total loading time: 0 Render date: 2025-01-12T05:09:31.136Z Has data issue: false hasContentIssue false

P-CURVING AS A SAFEGUARD AGAINST P-HACKING IN SLA RESEARCH

A CASE STUDY

Published online by Cambridge University Press:  06 September 2021

Seth Lindstromberg*
Affiliation:
Hilderstone College
*
*Correspondence concerning this article should be addressed to Seth Lindstromberg, Hilderstone College, St Peters Road, Broadstairs, Kent, CT10 2JW, United Kingdom. Email: [email protected]; [email protected]

Abstract

It is important to be able to identify research results likely to have been arrived at by means of “p-hacking,” a common term for research and reporting practices (such as the selective reporting of results) that are biased toward finding p < α. This paper discusses and demonstrates “p-curving,” a means of checking a set of primary studies within a specific research stream for signs of p-hacking. A salient feature of p-curving is that it is based entirely on significant p-values. Because of the potential usefulness of p-curving and because it has been little used by SLA researchers, a case study illustrates the construction and analysis of a p-curve as a complement to meta-analysis. The focal p-curve in this study relates to published (quasi)experimental studies that addressed the research hypothesis that for low and middle proficiency learners L1 glosses facilitate vocabulary learning during reading better than L2 glosses do.

Type
Methods Forum
Copyright
© The Author(s), 2021. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

I am grateful to the lead authors of Kim et al. (2020) and Yanagisawa et al. (2020) for answering questions and to the lead author of Kim et al. for supplying two papers that I could find no other way of obtaining. At various stages of its development this paper benefited immensely from comments, suggestions, and corrections from Frank Boers, Tessa Woodward, three anonymous reviewers, and an editor.

References

Primary Studies Included in Case Study

Arpaci, D. (2016). The effects of accessing L1 versus L2 definitional glosses on L2 learners’ reading comprehension and vocabulary learning. Eurasian Journal of Applied Linguistics, 2, 1529. https://www.ejal.info/index.php/ejal/issue/view/10 CrossRefGoogle Scholar
Ertürk, Z. (2016). The effect of glossing on EFL learners’ incidental vocabulary learning in reading. Procedia: Social and Behavioral Sciences, 232, 373381. https://doi.org/10.1016/j.sbspro.2016.10.052 Google Scholar
Farvardin, M., & Biria, R. (2012). The impact of gloss types on Iranian EFL students’ reading comprehension and lexical retention. International Journal of Instruction, 5, 99114. http://www.e-iji.net/volumes/317-january-2012-volume-5-number-1 Google Scholar
Ko, M.-H. (1995). Glossing in incidental and intentional learning of foreign language vocabulary and reading. University of Hawai’i Working Papers in ESL, 13, 4994. https://scholarspace.manoa.hawaii.edu/handle/10125/40761 Google Scholar
Ko, M.-H. (2017). The relationship between gloss type and L2 proficiency in incidental vocabulary learning. Modern English Education, 18, 4769. http://www.dbpia.co.kr/Article/NODE07255877 CrossRefGoogle Scholar
Mitarai, Y., & Aizawa, K. (1999). The effects of different types of glosses in vocabulary learning and reading comprehension. ARELE: Annual Review of English Language Education in Japan, 10, 7382.Google Scholar
Öztürk, M., & Yorgancı, M. (2017). Effects of L1 and L2 glosses on incidental vocabulary learning of EFL prep students. Turkish Studies: International Periodical for the Languages , Literature and History of Turkish or Turkic, 12, 635656. http://doi.org/10.7827/TurkishStudies.11432 Google Scholar
Pishghadam, R., & Ghahari, S. (2011). The impact of glossing on incidental vocabulary learning: A comparative study. Iranian EFL Journal, 7, 829. https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.965.6528&rep=rep1&type=pdf Google Scholar
Shiki, O. (2008). Effects of glosses on incidental vocabulary learning: Which gloss-type works better, L1, L2, single choice, or multiple choices for Japanese university students? Journal of Inquiry and Research, 87, 3956. http://doi.org/10.18956/00006209 Google Scholar
Yoshii, M. (2006). L1 and L2 glosses: Their effects of incidental vocabulary learning. Language Learning & Technology, 10, 85101. https://www.lltjournal.org/item/2563 Google Scholar

References

Anscombe, F. (1973). Graphs in statistical analysis. The American Statistician, 27, 1721. https://doi.org/10.2307/2682899 Google Scholar
Arnholt, A. (2017). R package BSDA (Basic statistics and data analysis), Version 1.20. (Computer freeware). https://www.rdocumentation.org/packages/BSDA/versions/1.2.0 Google Scholar
Bakker, M., & Wicherts, J. (2011). The (mis) reporting of statistical results in psychology journals. Behavior Research Methods, 43, 666678. https://doi.org/10.3758/s13428-011-0089-5 CrossRefGoogle ScholarPubMed
Bakker, M., & Wicherts, J. (2014). Outlier removal, sum scores, and the inflation of the type I error rate in independent samples t tests: The power of alternatives and recommendations. Psychological Methods, 19, 409427. https://doi.org/10.1037/met0000014 CrossRefGoogle Scholar
Barcroft, J. (2015). Lexical input processing and vocabulary learning. John Benjamins.CrossRefGoogle Scholar
Bishop, D., & Thompson, P. (2016). Problems in using p-curve analysis and text-mining to detect rate of p-hacking and evidential value. PeerJ, 4:e1715. https://doi.org/10.7717/peerj.1715 CrossRefGoogle ScholarPubMed
BITSS (Berkeley Initiative for Transparency in the Social Sciences). (2017). P-curve: A tool for detecting publication bias. https://www.bitss.org/p-curve-a-tool-for-detecting-publication-bias/ Google Scholar
Boers, F. (in press). Glossing and vocabulary learning. Language Teaching. https://doi.org/10.1017/S0261444821000252 CrossRefGoogle Scholar
Borenstein, M., Hedges, L., Higgins, J., & Rothstein, H. (2009). Introduction to meta-analysis. Wiley.CrossRefGoogle Scholar
Carter, E., Schönbrodt, F., Gervais, W., & Hilgard, J. (2019). Correcting for bias in psychology: A comparison of meta-analytic methods. Advances in Methods and Practices in Psychological Science, 2, 115144. https://doi.org/10.1177/2515245919847196 CrossRefGoogle Scholar
Chambers, C. (2017). The seven deadly sins of psychology: A manifesto for reforming the culture of scientific practice. Princeton University Press.Google Scholar
Coburn, K., & Vevea, J. (2016). weightr: Estimating weight-function models for publication, Version 2.0.2. (Computer freeware). https://CRAN.R-project.org/package=weightr Google Scholar
Dick, A., Garcia, N., Pruden, S., Thompson, W., Hawes, S., Sutherland, M., Riedel, M., Laird, A., & Gonzalez, R. (2019). No evidence for a bilingual executive function advantage in the ABCD study. Nature Human Behavior, 3, 692701. https://doi.org/10.1038/s41562-019-0609-3 CrossRefGoogle ScholarPubMed
Duval, S., & Tweedie, R. (2000). Trim and fill: A simple funnel-plot–based method of testing and adjusting for publication bias in metaanalysis. Biometrics, 56, 455463. https://doi.org/10.1111/j.0006-341X.2000.00455.x CrossRefGoogle Scholar
Egger, M., Smith, G., Schneider, M., & Minder, C. (1997). Bias in meta-analysis detected by a simple, graphical test. British Medical Journal, 315, 629634. https://doi.org/10.1136/bmj.315.7109.629 CrossRefGoogle ScholarPubMed
Erdfelder, E., & Heck, D. (2019). P-curve: A word of caution. Zeitschrift für Psychologie, 227, 249260. https://doi.org/10.1027/a000001 CrossRefGoogle Scholar
Fidler, F., & Wilcox, J. (2018). Reproducibility of scientific results. Stanford encyclopedia of philosophy . Metaphysics Research Lab, Stanford University. https://plato.stanford.edu/entries/scientific-reproducibility/ Google Scholar
Gelman, A. (2018). The p-curve, p-uniform, and Hedges (1984). Methods for meta-analysis under p-hacking: An exchange with Blake McShane, Uri Simosohn, and Marcel van Assen. Stat modeling, causal inference, and social science, 26 February. https://statmodeling.stat.columbia.edu/2018/02/26/p-curve-p-uniform-hedges-1984-methods-meta-analysis-selection-bias-exchange-blake-mcshane-uri-simosohn/ Google Scholar
Gelman, A., & Loken, E. (2013). The garden of forking paths: Why multiple comparisons can be a problem, even when there is no “fishing expedition” or “p-hacking” and the research hypothesis was posited ahead of time. Department of Statistics, Columbia University. http://www.stat.columbia.edu/~gelman/research/unpublished/p_hacking.pdf Google Scholar
Hartgerink, C. (2017). Reanalyzing Head et al. (2015): Investigating the robustness of widespread p-hacking. PeerJ Preprints, 5, e3068. https://doi.org/10.7717/peerj.3068 CrossRefGoogle ScholarPubMed
Head, M., Holman, L., Lanfear, R., Kahn, A., & Jennions, M. (2015). The extent and consequences of p-hacking in science. PLOS Biology, 13, e1002106. https://doi.org/10.1371/journal.pbio.1002106 CrossRefGoogle Scholar
Hedges, L. (1992). Modeling publication selection effects in meta-analysis. Statistical Science, 7, 246255. https://projecteuclid.org/euclid.ss/1177011364 CrossRefGoogle Scholar
John, L., Loewenstein, G., & Prelec, D. (2012). Measuring the prevalence of questionable research practices with incentives for truth telling. Psychological Science, 23, 524532. https://doi.org/10.1177/0956797611430953 CrossRefGoogle ScholarPubMed
Kazerouni, Z., & Rassaei, E. (2016). The effects of L1 and L2 glossing on the retention of L2 vocabulary in intentional and incidental settings. Journal of Studies in Learning and Teaching English, 5, 119150. http://jslte.iaushiraz.ac.ir/issue_112611_112618.html Google Scholar
Kim, H., Lee, J., & Lee, H. (2020). The relative effects of L1 and L2 glosses on L2 learning: A meta-analysis. Language Teaching Research. Advance view. https://doi.org/10.1177/1362168820981394 CrossRefGoogle Scholar
Lakens, D. (2014). What p-hacking really looks like: A comment on Masicampo & Lalande. (2012). Quarterly Journal of Experimental Psychology A, 68, 829832. https://doi.org/10.1080/17470218.2014.982664 CrossRefGoogle ScholarPubMed
Lakens, D. (2018). Professors are not elderly: Evaluating the evidential value of two social priming effects through p-curve analyses. Eindhoven University of Technology. https://psyarxiv.com/3m5y9/ Google Scholar
Lakens, D. (2021). Sample size justification. PsyArXiv https://psyarxiv.com/9d3yf/ Google Scholar
Lakens, D., Scheel, A., & Isager, P. (2018). Equivalence testing for psychological research. Advances in Methods and Practices in Psychological Science, 1, 1259–69. https://doi.org/10.1177/2515245918770963 CrossRefGoogle Scholar
Light, R., & Pillemer, D. (1984). Summing up: The science of reviewing research. Harvard University Press.CrossRefGoogle Scholar
Linck, J., & Cunnings, J. (2015). The utility and application of mixed-effects models in second language research. Language Learning, 65, 185207. https://doi.org/10.1111/lang.12117 CrossRefGoogle Scholar
Lindstromberg, S. (2016). Inferential statistics in Language Teaching Research: A review and ways forward. Language Teaching Research, 20, 741768. https://doi.org/10.1177/1362168816649979 CrossRefGoogle Scholar
McShane, B., Böckenholt, U., & Hansen, K. (2016). Adjusting for publication bias in meta-analysis: An evaluation of selection methods and some cautionary notes. Perspectives on Psychological Science, 11, 730749. https://doi.org/10.1177/1745691616662243 CrossRefGoogle ScholarPubMed
Norris, J. (2015). Statistical significance testing in second language research: Basic problems and suggestions for reform. Language Learning, 65, 97126. https://doi.org/10.1111/lang.12114 CrossRefGoogle Scholar
Plonsky, L. (2013). Study quality in SLA: An assessment of designs, analyses, and reporting practices in quantitative L2 research. Studies in Second Language Acquisition, 35, 655687. https://doi.org/10.1017/S0272263113000399 CrossRefGoogle Scholar
Plonsky, L., & Gass, S. (2011). Quantitative research methods, study quality, and outcomes: The case of interaction research. Language Learning, 61, 325366. https://doi.org/10.1111/j.1467-9922.2011.00640.x CrossRefGoogle Scholar
Plonsky, L. & Oswald, F. (2014). How big is “big”? Interpreting effect sizes in L2 research. Language Learning, 64, 878912. https://doi.org/10.1111/lang.12079 CrossRefGoogle Scholar
Plonsky, L., Sudina, E., & Hu, Y. (2021). Applying meta-analysis to research on bilingualism: An introduction. Bilingualism: Language and Cognition. Advance online publication. https://doi.org/10.1017/S1366728920000760 CrossRefGoogle Scholar
Pollet, T., & van der Meij, L. (2017). To remove or not to remove: The impact of outlier handling on significance testing in testosterone data. Adaptive Human Behavior and Physiology, 3, 4360. https://doi.org/10.1007/s40750-016-0050-z CrossRefGoogle Scholar
Roettger, T. (2019). Researcher degrees of freedom in phonetic research. Laboratory Phonology: Journal of the Association for Laboratory Phonology, 10, 127. https://doi.org/10.5334/labphon.147 CrossRefGoogle Scholar
Rosenthal, R. (1979). The file drawer problem and tolerance for null results. Psychological Bulletin, 86, 638641. https://doi.org/10.1037/0033-2909.86.3.638 CrossRefGoogle Scholar
Rothstein, H., Sutton, A., & Borenstein, M., Eds. (2005). Publication bias in meta‐analysis: Prevention, assessment and adjustments. Wiley.CrossRefGoogle Scholar
Simmons, J., Nelson, L., & Simonsohn, U. (2011). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22, 13591366. http://doi.org/10.1177/0956797611417632 CrossRefGoogle ScholarPubMed
Simonsohn, U., Nelson, L., & Simmons, J. (2014a). P-curve: A key to the file drawer. Journal of Experimental Psychology: General, 143, 534547. http://doi.org/10.1037/a0033242 CrossRefGoogle Scholar
Simonsohn, U., & Nelson, L., & Simmons, J. (2014b). P-curve and effect size: Correcting for publication bias using only significant results. Perspectives on Psychological Science, 9, 666681. https://doi.org/10.1177/1745691614553988 CrossRefGoogle Scholar
Simonsohn, U., Simmons, J., & Nelson, L. (2015). Better p-curves: Making p-curve analysis more robust to errors, fraud, and ambitious p-hacking, A reply to Ulrich and Miller (2015). Journal of Experimental Psychology: General, 144, 11461152. http://doi.org/10.1037/xge0000104 CrossRefGoogle Scholar
Simonsohn, U., Nelson, L., & Simmons, J. (2017). P-curve app 4.06. (Computer freeware). http://www.p-curve.com/app4/ Google Scholar
Simonsohn, U., Nelson, L., & Simmons, J. (2019). P-curve won’t do your laundry, but it will distinguish replicable from non-replicable findings in observational research: Comment on Bruns & Ioannidis (2016). PLoS ONE 14, e0213454. https://doi.org/10.1371/journal.pone.0213454 CrossRefGoogle Scholar
van Aert, R. (2021). puniform: Meta-analysis methods correcting for publication bias, Version 0.2.4. (Computer freeware). https://github.com/RobbievanAert/puniform Google Scholar
van Aert, R., & van Assen, M. (2021). Correcting for publication bias in a meta-analysis with the p-uniform* method. Open Science Framework. https://doi.org/10.31222/osf.io/zqjr9 CrossRefGoogle Scholar
van Assen, M., van Aert, R., & Wicherts, J. (2015). Meta-analysis using effect size distributions of only statistically significant studies. Psychological Methods, 20, 293309. http://doi.org/10.1037/met0000025 CrossRefGoogle ScholarPubMed
van Aert, R., Wicherts, J., & van Assen, M. (2019). Publication bias examined in meta-analyses from psychology and medicine: A meta-meta-analysis. PLoS ONE, 14, e0215052. https://doi.org/10.1371/journal.pone.0215052 CrossRefGoogle ScholarPubMed
Vitta, J., & Al-Hoorie, A. (2020). The flipped classroom in second language learning: A meta-analysis. Language Teaching Research. Advance view. https://doi.org/10.1177/1362168820981403 Google Scholar
Vogel, D., & Homberg, F. (2020). P‐hacking, p‐curves, and the PSM–performance relationship: Is there evidential value? Public Administration Review, 81, 191204. http://doi.org/10.1111/puar.13273 CrossRefGoogle Scholar
Westfall, J. (2016). Five different “Cohen’s d” statistics for within-subject designs. Cookie Scientist: Designing experiments and analyzing data. 25 March. http://jakewestfall.org/blog/index.php/2016/03/25/five-different-cohens-d-statistics-for-within-subject-designs/ Google Scholar
Wicherts, J., Veldkamp, C., Augusteijn, H., Bakker, M., van Aert, , Robbie, M., & van Assen, M. (2016). Degrees of freedom in planning, running, analyzing, and reporting psychological studies: A checklist to avoid p-hacking, Frontiers in Psychology, 7, 1832. https://www.frontiersin.org/article/10.3389/fpsyg.2016.01832 CrossRefGoogle ScholarPubMed
Yanagisawa, A., Webb, S., & Uchihara, T. (2020). How do different forms of glossing contribute to L2 vocabulary learning from reading? A meta-regression analysis. Studies in Second Language Acquisition, 42, 411438. https://doi.org/10.1017/S0272263119000688 CrossRefGoogle Scholar
Supplementary material: File

Lindstromberg supplementary material

Lindstromberg supplementary material

Download Lindstromberg supplementary material(File)
File 47.3 KB