Hostname: page-component-cd9895bd7-7cvxr Total loading time: 0 Render date: 2024-12-23T03:15:43.719Z Has data issue: false hasContentIssue false

The benefits of preregistration for hypothesis-driven bilingualism research

Published online by Cambridge University Press:  29 March 2021

Daniela Mertzen*
Affiliation:
Department of Linguistics, University of Potsdam, Potsdam
Sol Lago
Affiliation:
Institute for Romance Languages and Literatures, Goethe University Frankfurt, Frankfurt
Shravan Vasishth
Affiliation:
Department of Linguistics, University of Potsdam, Potsdam
*
Address for correspondence: Daniela Mertzen Department of Linguistics University of Potsdam Campus Golm, Haus 14 Karl-Liebknecht-Straße 24, 14476 Potsdam Germany Email: [email protected]
Rights & Permissions [Opens in a new window]

Abstract

Preregistration is an open science practice that requires the specification of research hypotheses and analysis plans before the data are inspected. Here, we discuss the benefits of preregistration for hypothesis-driven, confirmatory bilingualism research. Using examples from psycholinguistics and bilingualism, we illustrate how non-peer reviewed preregistrations can serve to implement a clean distinction between hypothesis testing and data exploration. This distinction helps researchers avoid casting post-hoc hypotheses and analyses as confirmatory ones. We argue that, in keeping with current best practices in the experimental sciences, preregistration, along with sharing data and code, should be an integral part of hypothesis-driven bilingualism research.

Type
Review Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
Copyright © The Author(s), 2021. Published by Cambridge University Press

1. Introduction

An important aspect of hypothesis-driven research is preregistration, an open science practice that consists of the specification of research question(s), method(s) and analysis plan(s) before data collection. Preregistration is a relatively simple yet powerful tool for improving transparency in bilingualism research, and we suggest that, in keeping with current best practices in the experimental sciences, bilingualism researchers include preregistration as an essential component of hypothesis-driven research, along with other open science practices such as releasing materials, data and code alongside publications (Chambers, Feredoes, Muthukumaraswamy & Etchells, Reference Chambers, Feredoes, Muthukumaraswamy and Etchells2014; Nosek, Ebersole, DeHaven & Mellor, Reference Nosek, Ebersole, DeHaven and Mellor2018b; Nosek & Lakens, Reference Nosek and Lakens2014; Nosek, Ebersole, DeHaven & Mellor, Reference Nosek, Ebersole, DeHaven and Mellor2018a; Open Science Collaboration, 2015).

There are several positions regarding the goals of preregistration. Many researchers view it as a tool specific to confirmatory research because it can help assess the falsifiability of an experimental study's predictions, control for false positive error probability in null hypothesis significance testing (NHST), and mitigate researcher biases (e.g., Lakens, Reference Lakens2019; Chambers, Reference Chambers2019; Nosek, Beck, Campbell, Flake, Hardwicke, Mellor, van 't Veer & Vazire, Reference Nosek, Beck, Campbell, Flake, Hardwicke, Mellor, van 't Veer and Vazire2019). Under this view, preregistration helps implement the distinction between confirmatory analyses (used for hypothesis testing) and exploratory analyses (used for hypothesis generation) (e.g., de Groot, Reference de Groot1956/2014; Chambers, Reference Chambers2019; Nosek et al., Reference Nosek, Ebersole, DeHaven and Mellor2018b; Nosek et al., Reference Nosek, Beck, Campbell, Flake, Hardwicke, Mellor, van 't Veer and Vazire2019; Wagenmakers, Wetzels, Borsboom, van der Maas & Kievit, Reference Wagenmakers, Wetzels, Borsboom, van der Maas and Kievit2012). More recently, preregistration has also been considered for qualitative research with the aim to make documentation of research plans more transparent (Haven & Grootel, Reference Haven and Grootel2019). Other research groups acknowledge the contribution of preregistration to scientific transparency, but call into question the validity of the distinction between confirmatory and exploratory research, and the usefulness of preregistration to help implement this distinction (e.g., Devezer, Navarro, Vandekerckhove & Buzbas, Reference Devezer, Navarro, Vandekerckhove and Buzbas2020; Szollosi et al., Reference Szollosi, Kellen, Navarro, Shiffrin, van Rooij, Van Zandt and Donkin2020; Szollosi & Donkin, Reference Szollosi and Donkin2019, cf. Wagenmakers, Reference Wagenmakers2019). From this point of view, a shift to the development of more explicit theories would make preregistration unnecessary.

In this paper, we take the position that preregistration is crucial to separate confirmatory from exploratory analyses. In our view, the preregistration of confirmatory hypotheses can counter questionable research practices and unconscious biases (Box 1). Consequently, it can enhance research transparency in confirmatory bilingualism (L2) research. Concerns about (non-)transparency and researcher biases are well-known in psychological science (Wicherts, Borsboom, Kats & Molenaar, Reference Wicherts, Borsboom, Kats and Molenaar2006; Simmons, Nelson & Simonsohn, Reference Simmons, Nelson and Simonsohn2011). L2 research is similarly affected by a lack of clarity about pre-data collection hypotheses and analysis plan choices. This problem is compounded by the fact that L2 studies rarely release their research materials (Derrick, Reference Derrick2016; Marsden, Thompson & Plonsky, Reference Marsden, Thompson and Plonsky2018c) or their data (Larson-Hall & Plonsky, Reference Larson-Hall and Plonsky2015; Bolibaugh, Vanek & Marsden, Reference Bolibaugh, Vanek and Marsden2020).

To address these issues, two journals in the field of bilingualism, Language Learning and Bilingualism: Language and Cognition, have introduced a new type of article, Registered Reports, which allows researchers to submit their hypotheses, methods, and analysis protocols for peer review prior to data collection (Marsden, Morgan-Short, Trofimovich & Ellis, Reference Marsden, Morgan-Short, Trofimovich and Ellis2018b).

Box 1. Three questionable research practices and biases

• The garden of forking paths

In hypothesis-driven research, there are many possible data analysis paths, and one of several potential paths can be selectively chosen and reported (Gelman & Loken, Reference Gelman and Loken2013, Reference Dillon, Mishler, Sloggett and Phillips2014). For example, one could choose a particular measure, region of interest or time-window that was not originally selected for analysis, or delete outliers based on an arbitrary criterion. Such multiple analysis paths cumulatively create so many researcher degrees of freedom that one can describe them using a decision tree. This bias is often an unconscious one (Gelman & Loken, Reference Gelman and Loken2013, pp. 9-10):

It's not that the researchers performed hundreds of different comparisons and picked ones that were statistically significant. Rather, they start with a somewhat-formed idea in their mind of what comparison to perform, and they refine that idea in light of the data. (...) they are using their scientific common sense to formulate their hypotheses in a reasonable way, given the data they have. The mistake is in thinking that, if the particular path that was chosen yields statistical significance, that this is strong evidence in favor of the hypothesis.

• Multiple testing

For purely statistical reasons, if one conducts enough statistical tests, some test will eventually come out significant. For example, in psycholinguistic eye-tracking reading research, one can easily end up conducting dozens of statistical tests to evaluate a single hypothesis. Simulations in von der Malsburg and Angele (Reference von der Malsburg and Angele2017) demonstrate that multiple analyses in eye-tracking dramatically inflate Type I error, leading to a large proportion of false positive rejections of the null hypothesis.

• Post-hoc hypothesizing

When data is analyzed without having explicitly stated the predictions, one may easily convince oneself that an unforeseen result was expected all along, and subsequently report this unexpected finding as a confirmatory one. This bias is commonly referred to as ‘hypothesizing after the results are known’ (HARKing) (Simmons et al., Reference Simmons, Nelson and Simonsohn2011; Kerr, Reference Kerr1998). This can skew the scientific record with less well-grounded theories, cherry-picked after the fact (Chambers, Reference Chambers2019).

Here, we discuss a different approach: non-peer reviewed preregistration using open science platforms such as the Open Science Framework (OSF, https://osf.io/ ) or AsPredicted (https://aspredicted.org/ ). On these platforms, researchers have the opportunity to create a public or private, time-stamped, non-modifiable record of a planned study prior to data inspection, either before or during data collection. Here, we argue that non-peer reviewed preregistration can counteract the questionable research practices presented below. We first illustrate them with an example from our own work on native (L1) sentence processing. Then, we discuss correlates in the L2 literature and explain how non-peer reviewed preregistrations can improve L2 research.

2. Possible pitfalls of hypothesis-driven research: An example from L1 sentence processing

We briefly introduce our study, which attempted to replicate the findings of an eye-tracking reading study that compared the processing of two different syntactic dependencies (Dillon, Mishler, Sloggett & Phillips, Reference Dillon, Mishler, Sloggett and Phillips2013; Jäger, Mertzen, Van Dyke & Vasishth, Reference Jäger, Mertzen, Van Dyke and Vasishth2020). This example can be easily translated to bilingualism settings where, similar to our example, processing patterns are investigated for different syntactic constructions, but also for different speaker groups, such as native vs. non-native speakers (Felser & Cunnings, Reference Felser and Cunnings2012; Grüter, Lew-Williams & Fernald, Reference Grüter, Lew-Williams and Fernald2012), or successive vs. simultaneous learners (Lemmerth & Hopp, Reference Lemmerth and Hopp2019; Sabourin & Vīnerte, Reference Sabourin and Vīnerte2015).

Our example concerns a phenomenon called agreement attraction. For subject-verb agreement dependencies, previous work has shown that a processing disruption elicited by an ungrammatical plural verb can be weakened if a plural noun (an “attractor”) intervenes between the subject and the verb (as in 1a vs. 1b; Wagers, Lau & Phillips, Reference Wagers, Lau and Phillips2009; Pearlmutter, Garnsey & Bock, Reference Pearlmutter, Garnsey and Bock1999; Dillon et al., Reference Dillon, Mishler, Sloggett and Phillips2013). Dillon and colleagues used a within-subjects design to examine whether the attraction effect extended to ungrammatical antecedent-reflexive dependencies, where an attractor matched the reflexive in number (1c vs. 1d).

  1. (1)

    1. a. Subject-verb agreement; attraction

      *The amateur bodybuilder who worked with the personal trainers amazingly were competitive for the gold medal.

    2. b. Subject-verb agreement; no attraction

      *The amateur bodybuilder who worked with the personal trainer amazingly were competitive for the gold medal.

    3. c. Reflexive; attraction

      *The amateur bodybuilder who worked with the personal trainers amazingly injured themselves on the lightest weights.

    4. d. Reflexive; no attraction

      *The amateur bodybuilder who worked with the personal trainer amazingly injured themselves on the lightest weights.

Building on work by Sturt (Reference Sturt2003), they argued that, unlike subject-verb agreement configurations, the processing of antecedent-reflexive dependencies should be syntactically constrained (Chomsky, Reference Chomsky1981). If so, attraction effects were expected in subject-verb dependencies but not in antecedent-reflexive dependencies, yielding an interaction between dependency type and attraction.

Dillon et al. (Reference Dillon, Mishler, Sloggett and Phillips2013) analyzed multiple reading measures and observed the predicted interaction only in total reading time. This result was taken as support for the hypothesis that subject-verb agreement and reflexives show different susceptibility to agreement attraction, and thus are differentially constrained by syntactic principles. In our large-sample replication study (Jäger et al., Reference Jäger, Mertzen, Van Dyke and Vasishth2020), the goal was to replicate the statistically significant interaction in total reading time from the original study. Our confirmatory analysis of total reading time showed no effect, while the exploratory analyses of first-pass regressions and regression-path durations did (Table 1).

Table 1. Comparison of the findings by Dillon et al. (Reference Dillon, Mishler, Sloggett and Phillips2013) and Jäger et al. (Reference Jäger, Mertzen, Van Dyke and Vasishth2020).

The table shows the interaction effect of Dependency type × Attraction, computed using generalized linear mixed models (effects on first-pass regressions were estimated using a logit link function). The interaction effect was expected to have a negative sign. Significant effects at a 0.05 α-level are shown in bold. Note that the published analyses in Jäger et al. (Reference Jäger, Mertzen, Van Dyke and Vasishth2020) differ from the ones we present here due to different model assumptions made in the present paper for expository purposes.

The study by Dillon and colleagues and our attempted replication serve to illustrate the potential issues of the garden of forking paths, multiple testing and posthoc theorizing. First, even for a confirmatory replication study, where one analyzes the same region and reading measure that showed the interaction in the original study, garden of forking paths scenarios arise if an analysis path is not defined prior to data inspection. For example, different decisions regarding statistical tests and outlier treatment could still be made after data inspection.

Second, for the analyses of the Dillon et al. study and our replication study, six statistical tests were conducted. Testing six eye-tracking measures increases the Type I error probability from 5% to 26.5% (i.e., 1 − 0.956 = 0.265) (Bonferroni, Reference Bonferroni1936). It is possible to correct for multiple testing. For example, a Bonferroni correction would require an adjusted Type I error of 0.05/6 for the six statistical tests we conducted, which implies that the absolute critical t-/z-value would be 2.64. If this criterion were used, there would be no significant effects in either the original study or the replication attempt (see observed z/t-values in Table 1). A better solution to the multiple testing problem may be to avoid it altogether by having precise predictions about the dependent measure(s), and focus on (Bayesian) estimation of effects rather than NHST (e.g., Norouzian, Reference Norouzian2020; Gelman & Carlin, Reference Gelman and Carlin2014; Gelman et al., Reference Gelman, Carlin, Stern, Dunson, Vehtari and Rubin2014; Kruschke, Reference Kruschke2014).

Third, suppose that the effect that was expected a priori at the critical auxiliary verb or the reflexive had been found further downstream in the sentence or even before the critical region. Without specifying the critical region in advance, one could easily have found a post-hoc theory for the effect showing up in another region and reported this as if it had been predicted all along.

Finally, both the original and the replication study show some evidence of the effect of interest. However, the effect occurs in different measures across the two studies. Because of the exploratory nature of the first-pass regression and regression-path duration results in the replication attempt, we cannot treat these hypothesis tests as confirmatory ones. Exploratory analyses per se are an important part of doing science, but they should be presented as such (e.g., Bishop, Reference Bishop2020; de Groot, Reference de Groot1956/2014; Nosek et al., Reference Nosek, Ebersole, DeHaven and Mellor2018b).

3. Problematic research practices in L2 research

The issues above can also arise in L2 research. Two common examples of forks in the analysis path are outlier treatment and the selection of interest regions in reading studies. For example, a synthesis of methodological decisions in L2 self-paced reading (SPR) research showed a variety of outlier removal criteria across 64 studies, such as standard deviations around the mean, reading time cutoffs, or both (Marsden et al., Reference Marsden, Thompson and Plonsky2018c; see Nicklin & Plonsky, Reference Nicklin and Plonsky2020, for discussion of outlier treatment). Moreover, L2 reading studies on the same grammatical phenomena can vary substantially in their selection of interest regions. For a subset of the L2 SPR studies on local ambiguity processing synthesized in Marsden et al. (Reference Marsden, Thompson and Plonsky2018c), some studies reported statistical analyses for the ambiguous sentence region, and other studies for some, or all, of the subsequent regions. In addition, the critical regions varied between studies, consisting of a single word or several words combined.

A closely related problem to the selective reporting of interest regions is conducting statistical tests for many different regions, and/or eye-tracking measures. Godfroid (Reference Godfroid2020) reported that an average of 3.4 eye-tracking measures per study are analyzed in the L2 eye-tracking literature, further inflating Type I error probability. The Type I error issue might be particularly prevalent in L2 studies because many of them use frequentist NHST and only report binary decisions about the presence or absence of an effect without also reporting effect estimates (Marsden et al., Reference Marsden, Thompson and Plonsky2018c). One unfortunate consequence is that other researchers cannot gain knowledge about the magnitude of an effect across studies, or conduct meta-analyses due to the lack of information from previous studies (Plonsky, Reference Plonsky2013; Larson-Hall & Plonsky, Reference Larson-Hall and Plonsky2015; Plonsky & Oswald, Reference Plonsky and Oswald2014; Al-Hoorie & Vitta, Reference Al-Hoorie and Vitta2019; for an introduction to meta-analyses in bilingualism research, see Plonsky & Oswald, Reference Plonsky, Oswald and Plonsky2015; Plonsky, Sudina & Hu, Reference Plonsky, Sudina and Hu2020).

Finally, as in our example on L1 processing, post-hoc hypothesizing, i.e., changing a hypothesis to match the findings, may reduce the reproducibility of L2 research (Marsden et al., Reference Marsden, Morgan-Short, Thompson and Abugaber2018a; Marsden, Morgan-Short, Thompson & Abugaber, Reference Marsden, Morgan-Short, Trofimovich and Ellis2018b; Chambers, Reference Chambers2019). Possibly partly due to the issues raised above, and low statistical power (Cohen, Reference Cohen1962, Reference Cohen1988; Brysbaert, Reference Brysbaert2020), inconsistent findings also occur in L2 research. Some examples include the role of crosslinguistic influence in syntactic processing (Dussias, Dietrich & Villegas, Reference Dussias, Dietrich, Villegas and Schwieter2015; Lago, Mosca & Stutter Garcia, Reference Lago, Mosca and Stutter Garcia2020), the existence of a bilingual advantage in attentional systems (Bialystok, Reference Bialystok2017; Paap, Anders-Jefferson, Mason, Alvarado & Zimiga, Reference Paap, Anders-Jefferson, Mason, Alvarado and Zimiga2018), and the role of morphological decomposition in inflected vs. derived forms during word recognition in native vs. non-native speakers (Clahsen & Veríssimo, Reference Clahsen and Veríssimo2016; Feldman & Kroll, Reference Feldman and Kroll2019). Next, we discuss how a non-peer reviewed preregistration can be implemented to improve L2 research.

4. Non-peer reviewed preregistration in psycholinguistic research

For preregistration to counter questionable research practices and biases, it is not sufficient to a priori specify the dependent measure(s), because many researcher degrees of freedom remain. A complete preregistration requires a full description of the research questions and hypotheses, study design, methods, speaker group selection criteria, data collection procedure, participant sample size or stopping rule, outcome variable(s), as well as an analysis plan including statistical models, information on data exclusion and statistical inference criteria. This does not only ensure greater transparency, but it can also keep in check one's biases because analysis decisions are made public prior to data analysis, preventing selective reporting of effects. For example, assume that for a planned study we preregister no outlier exclusions, but later find an effect only when removing certain data points. This could be reported as an exploratory finding. Without preregistration, it may be tempting to report the most ‘interesting’ result as confirmatory, preventing other researchers from evaluating the findings in light of the analysis choices. In addition, if our published preregistration committed to a predicted effect for a particular region and measure, based on theory or previous findings, we can no longer convince ourselves that a surprising result was originally predicted and restate the hypotheses post-hoc.

One may argue that if one has strong theoretical predictions, preregistration is redundant because the analysis choices are predetermined by the theory. However, Silberzahn et al. (Reference Silberzahn, Uhlmann, Martin, Anselmi, Aust, Awtrey and Nosek2018) convincingly illustrated that different analysis choices can be made even under highly constraining conditions. Their study recruited 29 research groups in the psychological sciences to answer the same research question for one particular dataset. Of the 29 groups, 20 observed a significant and nine a non-significant result. Strikingly, the range of effect estimates reported by the different research groups allowed for different conclusions.

Although we take the view that preregistration without peer review can be an effective way to reduce unconscious biases in one's work, the lack of peer review means that the preregistration of a study can be as thorough or as vague as the researcher deems appropriate. Vaguely specified research plans still allow for many possible analysis paths, and selective reporting of effects. Consequently, it is up to the scientific community to make non-peer reviewed preregistration a success or a failure: only a thoroughly implemented preregistration and a precisely followed research plan can reduce unconscious biases and help to separate confirmatory hypothesis tests from exploratory ones.

4.1 Selecting dependent measures for a preregistration

If one wants to preregister a study, but lacks prior knowledge of a particular phenomenon, an experiment could be piloted and exploratory analyses conducted to identify which measure(s) show the predicted effect. One could then generate hypotheses from this and test them in a confirmatory study (e.g., Nicenboim, Vasishth, Engelmann & Suckow, Reference Nicenboim, Vasishth, Engelmann and Suckow2018; Nicenboim, Vasishth & Rösler, Reference Nicenboim, Vasishth and Rösler2020). If, on the other hand, there are previous findings on a phenomenon, these could serve as the basis for a preregistration. However, when the literature shows equivocal results as discussed above, what steps could be taken to consolidate the support in favor of or against a theory? This is not straightforward. For example, in the Dillon et al. (Reference Dillon, Mishler, Sloggett and Phillips2013) study and our replication study, the effect of interest was observed in different reading measures. If, based on linguistic theory, we believe that the effect of interest should be found in earlier reading measures (first-pass regression and regression-path duration as in our replication study), the only way to test this is by conducting a replication study. This replication should aim for a sufficiently large participant sample and a sufficiently precise effect estimate, and specify the dependent measure(s) and critical region(s) in advance. Otherwise, in a future study we may find some other dependent measure showing the effect, which may again tempt us to draw a bullseye around the arrow that happened to land where it did.

4.2 How to get started with a non-peer reviewed preregistration

Preregistration templates are available on OSF and AsPredicted for novel studies as well as for replication studies (e.g., https://bit.ly/OSFtemplates ; https://bit.ly/ AsPredtemplate ). If one prefers to create a Registered Report-type preregistration (i.e., in manuscript format), it is possible to upload a preregistration manuscript on OSF. It is not enough to upload this document to the project's public repository, because the preregistration could be removed or replaced at any point. Rather, one needs to create a time-stamped, non-editable version which can be made public either immediately or it can be embargoed until, for example, the associated paper is submitted or published. If the preregistration is withdrawn at any stage after creating a “frozen” version of it, some meta data (title, authors, description, reason for withdrawing preregistration) will remain publicly available. A new version of the preregistration can be made available before the data are inspected. We have previously made attempts at such manuscript-style preregistrations, e.g., for Vasishth, Mertzen, Jäger and Gelman (Reference Vasishth, Mertzen, Jäger and Gelman2018) (see https://osf.io/dgewb for the non-editable preregistration).

5. Conclusion

We have used examples from L1 sentence processing and the L2 literature to illustrate some of the problems that can arise during the research process. We then discussed how preregistration allows researchers to better separate confirmatory and exploratory analyses, which can help them counter questionable research practices and unconscious biases. Our view is that, if done thoroughly, non-peer reviewed preregistration would greatly benefit the bilingualism community. We suggest that the hypothesis-driven L2 research process should standardly include preregistration, in addition to the release of materials, data and code upon publication to increase research transparency and reproducibility.

Acknowledgements

We thank João Veríssimo, Laura de Ruiter, Cylcia Bolibaugh, and Luke Plonsky for their valuable feedback on the earlier version of this paper. This research was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – project number 317633480 – SFB 1287, Projects B03 and Q (PIs: Shravan Vasishth and Ralf Engbert).

Competing interests

The authors declare none.

Supplementary materials

For data and code accompanying this paper, visit https://osf.io/5ab7d/.

References

Bialystok, E (2017) The bilingual adaptation: How minds accommodate experience. Psychological Bulletin 143(3), 233262. doi: 10.1037/bul0000099CrossRefGoogle ScholarPubMed
Bishop, DV (2020) The psychology of experimental psychologists: Overcoming cognitive constraints to improve research: The 47th Sir Frederic Bartlett Lecture. Quarterly Journal of Experimental Psychology 73(1), 119. doi: 10.1177/1747021819886519CrossRefGoogle ScholarPubMed
Bolibaugh, C, Vanek, N and Marsden, E (2020) Towards a credibility revolution in bilingualism research: Open data and materials as stepping stones. Submitted for publication.CrossRefGoogle Scholar
Bonferroni, CE (1936) Teoria statistica delle classi e calcolo delle probabilità. Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commerciali di Firenze, 8, 362.Google Scholar
Brysbaert, M (2020) Power considerations in bilingualism research: Time to step up our game. Bilingualism: Language and Cognition 1–6. doi: 10.1017/S1366728920000437Google Scholar
Chambers, CD (2019) The seven deadly sins of psychology: A manifesto for reforming the culture of scientific practice. Princeton, NJ: Princeton University Press.Google Scholar
Chambers, CD, Feredoes, E, Muthukumaraswamy, SD and Etchells, PJ (2014) Instead of “playing the game” it is time to change the rules: Registered reports at AIMS Neuroscience and beyond. AIMS Neuroscience 1, 417. doi: 10.3934/Neuroscience2014.1.4CrossRefGoogle Scholar
Chomsky, N (1981) Lectures on government and binding. Dordrecht: Foris.Google Scholar
Clahsen, H and Veríssimo, J (2016) Investigating grammatical processing in bilinguals: The case of morphological priming. Linguistic Approaches to Bilingualism 6(5), 685698. doi:https://doi.org/10.1075/lab.15039.claCrossRefGoogle Scholar
Cohen, J (1962) The statistical power of abnormal-social psychological research: A review. The Journal of Abnormal and Social Psychology 65(3), 145.CrossRefGoogle ScholarPubMed
Cohen, J (1988) Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum.Google Scholar
de Groot, A (1956/2014) The meaning of “significance” for different types of research [translated and annotated by Eric-Jan Wagenmakers, Denny Borsboom, Josine Verhagen, Rogier Kievit, Marjan Bakker, Angelique Cramer, Dora Matzke, Don Mellenbergh, & Han L. J. van der Maas]. Acta Psychologica 148, 188194. doi: https://doi.org/10.1016/j.actpsy.2014.02.001CrossRefGoogle Scholar
Derrick, DJ (2016) Instrument reporting practices in second language research. TESOL Quarterly 50(1), 132153. doi: 10.1002/tesq.217CrossRefGoogle Scholar
Devezer, B, Navarro, DJ, Vandekerckhove, J and Buzbas, EO (2020) The case for formal methodology in scientific reform. bioRxiv. doi: 10.1101/2020.04.26.048306Google Scholar
Dillon, B, Mishler, A, Sloggett, S and Phillips, C (2013) Contrasting intrusion profiles for agreement and anaphora: Experimental and modeling evidence. Journal of Memory and Language 69(2), 85103. doi: https://doi.org/10.1016/j.jml.2013.04.003CrossRefGoogle Scholar
Dussias, P, Dietrich, AJ and Villegas, Á (2015) Cross-language interactions during bilingual sentence processing. In Schwieter, JW (Ed.), The Cambridge Handbook of Bilingual Processing (pp. 349366). Cambridge Handbooks in Language and Linguistics. Cambridge University Press.CrossRefGoogle Scholar
Feldman, L and Kroll, J (2019) Learning and Using Morphology and Morphosyntax in a Second Language. Oxford Research Encyclopedia of Linguistics. Retrieved from https://oxfordre.com/linguistics/view/10.1093/acrefore/9780199384655.001.0001/acrefore-9780199384655-e-604. doi: https://doi.org/10.1093/acrefore/9780199384655.013.604CrossRefGoogle Scholar
Felser, C and Cunnings, I (2012) Processing reflexives in a second language: The timing of structural and discourse-level constraints. Applied Psycholinguistics 33(3), 571603. doi: 10.1017/S0142716411000488CrossRefGoogle Scholar
Gelman, A and Carlin, JB (2014) Beyond power calculations: Assessing Type S (sign) and Type M (magnitude) errors. Perspectives on Psychological Science 9(6), 641651.CrossRefGoogle ScholarPubMed
Gelman, A, Carlin, JB, Stern, HS, Dunson, DB, Vehtari, A and Rubin, DB (2014) Bayesian data analysis (Third). Boca Raton, FL: Chapman and Hall/CRC.Google Scholar
Gelman, A and Loken, E (2013) The garden of forking paths: Why multiple comparisons can be a problem, even when there is no ‘fishing expedition’ or ‘p-hacking’ and the research hypothesis was posited ahead of time. Retrieved from http://www.stat.columbia.edu/~gelman/research/unpublished/p_hacking.pdfGoogle Scholar
Gelman, A and Loken, E (2014) The statistical crisis in science. American Scientist 102. doi: 460.10.1511/2014.111.460CrossRefGoogle Scholar
Godfroid, A (2020) Eye Tracking in Second Language Acquisition and Bilingualism: A Research Synthesis and Methodological Guide. New York, NY: Routledge. doi: 10.4324/9781315775616Google Scholar
Grüter, T, Lew-Williams, C and Fernald, A (2012) Grammatical gender in L2: A production or a real-time processing problem? Second Language Research 28(2), 191215. doi: 10.1177/0267658312437990CrossRefGoogle ScholarPubMed
Haven, TL and Grootel, DLV (2019) Preregistering qualitative research. Accountability in Research 26(3), 229244. doi: 10.1080/08989621.2019.1580147CrossRefGoogle Scholar
Al-Hoorie, AH and Vitta, JP (2019) The seven sins of L2 research: A review of 30 journals’ statistical quality and their CiteScore, SJR, SNIP, JCR Impact Factors. Language Teaching Research 23(6), 727744. doi: 10.1177/1362168818767191CrossRefGoogle Scholar
Jäger, LA, Mertzen, D, Van Dyke, JA and Vasishth, S (2020) Interference patterns in subject-verb agreement and reflexives revisited: A large-sample study. Journal of Memory and Language 111. doi: https://doi.org/10.1016/j.jml.2019.104063CrossRefGoogle ScholarPubMed
Kerr, NL (1998) Harking: Hypothesizing After the Results are Known. Personality and Social Psychology Review 2(3), 196217. doi:10.1207/s15327957pspr0203\_4CrossRefGoogle ScholarPubMed
Kruschke, J (2014) Doing Bayesian data analysis: A tutorial with R, JAGS, and Stan. London: Academic Press.Google Scholar
Lago, S, Mosca, M and Stutter Garcia, A (2020) The role of crosslinguistic influence in multilingual processing: Lexicon versus syntax. Language Learning. doi: 10.1111/lang.12412Google Scholar
Lakens, D (2019) The value of preregistration for psychological science: A conceptual analysis. PsyArXiv. doi: 10.31234/osf.io/jbh4wGoogle Scholar
Larson-Hall, J and Plonsky, L (2015) Reporting and interpreting quantitative research findings: What gets reported and recommendations for the field. Language Learning 65(S1), 127159. doi: 10.1111/lang.12115CrossRefGoogle Scholar
Lemmerth, N and Hopp, H (2019) Gender processing in simultaneous and successive bilingual children: Cross-linguistic lexical and syntactic influences. Language Acquisition 26(1), 2145. doi: 10.1080/10489223.2017.1391815CrossRefGoogle Scholar
Marsden, E, Morgan-Short, K, Thompson, S and Abugaber, D (2018a) Replication in Second Language Research: Narrative and systematic reviews and recommendations for the field. Language Learning 68(2), 321391. doi: 10.1111/lang.12286CrossRefGoogle Scholar
Marsden, E, Morgan-Short, K, Trofimovich, P and Ellis, NC (2018b) Introducing Registered Reports at Language Learning: Promoting transparency, replication, and a synthetic ethic in the language sciences. Language Learning 68(2), 309320. doi: 10.1111/lang.12284CrossRefGoogle Scholar
Marsden, E, Thompson, S and Plonsky, L (2018c) A methodological synthesis of self-paced reading in second language research. Applied Psycholinguistics 39(5), 861904. doi: 10.1017/S0142716418000036CrossRefGoogle Scholar
Nicenboim, B, Vasishth, S, Engelmann, F and Suckow, K (2018) Exploratory and confirmatory analyses in sentence processing: A case study of number interference in German. Cognitive Science 42. doi: 10.1111/cogs.12589CrossRefGoogle ScholarPubMed
Nicenboim, B, Vasishth, S and Rösler, F (2020) Are words pre-activated probabilistically during sentence comprehension? Evidence from new data and a Bayesian random-effects meta-analysis using publicly available data. Neuropsychologia 142, 107427. doi: https://doi.org/10.1016/j.neuropsychologia.2020.107427CrossRefGoogle Scholar
Nicklin, C and Plonsky, L (2020) Outliers in L2 Research in Applied Linguistics: A Synthesis and Data Re-Analysis. Annual Review of Applied Linguistics 40, 2655. doi: 10.1017/S0267190520000057CrossRefGoogle Scholar
Norouzian, R (2020) Sample size planning in quantitative L2 research: A pragmatic approach. Studies in Second Language Acquisition 1–22. doi: 10.1017/S0272263120000017Google Scholar
Nosek, BA, Beck, ED, Campbell, L, Flake, JK, Hardwicke, TE, Mellor, DT, van 't Veer, AE and Vazire, S (2019) Preregistration is hard, and worthwhile. Trends in Cognitive Sciences 23(10), 815818. doi: https://doi.org/10.1016/j.tics.2019.07.009CrossRefGoogle ScholarPubMed
Nosek, BA, Ebersole, CR, DeHaven, AC and Mellor, DT (2018a) Reply to Ledgerwood: Predictions without analysis plans are inert. Proceedings of the National Academy of Sciences 115(45), E10518E10518. doi: 10.1073/pnas.1816418115CrossRefGoogle Scholar
Nosek, BA, Ebersole, CR, DeHaven, AC and Mellor, DT (2018b) The preregistration revolution. Proceedings of the National Academy of Sciences 115(11), 26002606. doi: 10.1073/pnas.1708274114CrossRefGoogle Scholar
Nosek, BA and Lakens, D (2014) Registered reports: A method to increase the credibility of published results. Social Psychology 45, 137141. doi: http://dx.doi.org/10.1027/1864-9335/a000192CrossRefGoogle Scholar
Open Science Collaboration. (2015) Estimating the reproducibility of psychological science. Science 349(6251), aac4716.Google Scholar
Paap, KR, Anders-Jefferson, R, Mason, L, Alvarado, K and Zimiga, B (2018) Bilingual advantages in inhibition or selective attention: More challenges. Frontiers in Psychology 9, 1409. doi: 10.3389/fpsyg.2018.01409CrossRefGoogle ScholarPubMed
Pearlmutter, NJ, Garnsey, SM and Bock, K (1999) Agreement processes in sentence comprehension. Journal of Memory and Language 41(3), 427456. doi: https://doi.org/10.1006/jmla.1999.2653CrossRefGoogle Scholar
Plonsky, L (2013) Study quality in SLA: An Assessment of Designs, Analyses, and Reporting Practices in Quantitative L2 Research. Studies in Second Language Acquisition 35(4), 655687. doi:10.1017/S0272263113000399CrossRefGoogle Scholar
Plonsky, L and Oswald, FL (2014) How Big Is ‘Big’? Interpreting Effect Sizes in L2 Research. Language Learning 64(4), 878912. doi: 10.1111/lang.12079CrossRefGoogle Scholar
Plonsky, L and Oswald, F (2015) Meta-analyzing second language research. In Plonsky, L (Ed.), Advancing quantitative methods in second language research (pp. 106128). Routledge: New York, NY, USA.CrossRefGoogle Scholar
Plonsky, L, Sudina, E and Hu, Y (2020) Applying meta-analysis to research on bilingualism: An introduction. In press.CrossRefGoogle Scholar
Sabourin, L and Vīnerte, S (2015) The bilingual advantage in the stroop task: Simultaneous vs. early bilinguals. Bilingualism: Language and Cognition 18(2), 350355. doi: 10.1017/S1366728914000704CrossRefGoogle Scholar
Silberzahn, R, Uhlmann, EL, Martin, DP, Anselmi, P, Aust, F, Awtrey, E, … Nosek, BA (2018) Many analysts, one data set: Making transparent how variations in analytic choices affect results. Advances in Methods and Practices in Psychological Science 1(3), 337356. doi:10.1177/2515245917747646CrossRefGoogle Scholar
Simmons, J, Nelson, L and Simonsohn, U (2011) False-positive psychology. Psychological Science 22(11), 13591366. doi: https://doi.org/10.1177/0956797611417632CrossRefGoogle ScholarPubMed
Sturt, P (2003) The time-course of the application of binding constraints in reference resolution. Journal of Memory and Language 48, 542562.CrossRefGoogle Scholar
Szollosi, A and Donkin, C (2019) Arrested theory development: The misguided distinction between exploratory and confirmatory research. doi:10.31234/osf.io/suzejGoogle Scholar
Szollosi, A, Kellen, D, Navarro, DJ, Shiffrin, R, van Rooij, I, Van Zandt, T and Donkin, C (2020) Is preregistration worthwhile? Trends in Cognitive Sciences 24(2), 9495. doi:https://doi.org/10.1016/j.tics.2019.11.009CrossRefGoogle ScholarPubMed
Vasishth, S, Mertzen, D, Jäger, LA and Gelman, A (2018) The statistical significance filter leads to overoptimistic expectations of replicability. Journal of Memory and Language 103, 151175. doi:https://doi.org/10.1016/j.jml.2018.07.004CrossRefGoogle Scholar
von der Malsburg, T and Angele, B (2017) False positives and other statistical errors in standard analyses of eye movements in reading. Journal of Memory and Language 94, 119133. doi:https://doi.org/10.1016/j.jml.2016.10.003CrossRefGoogle ScholarPubMed
Wagenmakers, E-J (2019) A breakdown of “preregistration is redundant, at best”. https://www.bayesianspectacles.org/a-breakdown-of-preregistration-is-redundant-at-best/. Accessed: 2020-08-15.Google Scholar
Wagenmakers, E-J, Wetzels, R, Borsboom, D, van der Maas, HLJ and Kievit, RA (2012) An agenda for purely confirmatory research. Perspectives on Psychological Science 7(6), 632638. doi:10.1177/1745691612463078CrossRefGoogle ScholarPubMed
Wagers, MW, Lau, EF and Phillips, C (2009) Agreement attraction in comprehension: Representations and processes. Journal of Memory and Language 61(2), 206237. doi:https://doi.org/10.1016/j.jml.2009.04.002CrossRefGoogle Scholar
Wicherts, J, Borsboom, D, Kats, J and Molenaar, D (2006) The poor availability of psychological research data for reanalysis. The American Psychologist 61, 726728. doi: 10.1037/0003-066X.61.7.726CrossRefGoogle ScholarPubMed
Figure 0

Table 1. Comparison of the findings by Dillon et al. (2013) and Jäger et al. (2020).The table shows the interaction effect of Dependency type × Attraction, computed using generalized linear mixed models (effects on first-pass regressions were estimated using a logit link function). The interaction effect was expected to have a negative sign. Significant effects at a 0.05 α-level are shown in bold. Note that the published analyses in Jäger et al. (2020) differ from the ones we present here due to different model assumptions made in the present paper for expository purposes.