Sampling Bias and the Problem of Generalizability in Applied Linguistics

Sible Andringa; Aline Godfroid

doi:10.1017/S0267190520000033

Sampling Bias and the Problem of Generalizability in Applied Linguistics

Published online by Cambridge University Press: 30 June 2020

Sible Andringa and

Aline Godfroid

Show author details

Sible Andringa: Affiliation:
Universiteit van Amsterdam, the Netherlands
Aline Godfroid*: Affiliation:
Michigan State University, USA
*: *Corresponding author. E-mail: [email protected]

Article contents

Abstract
Introduction
Sampling Practices in Applied Linguistics
What are the consequences for generalizability?
Making Applied Linguistics More Open and Less WEIRD: Some Suggestions
References

Rights & Permissions

Abstract

In this final contribution to the issue, we discuss the important concept of generalizability and how it relates to applied linguists’ ability to serve language learners of all shades and grades. We provide insight into how biased sampling in Applied Linguistics currently is and how such bias may skew the knowledge that we, applied linguists, are building about second language learning and instruction. For example, our conclusions are often framed as universally-applying, even though the samples that have given rise to them are highly specific and Western, Educated, Industrialized, Rich, and Democratic (WEIRD; Henrich, Heine, & Norenzayan, 2010). We end with a call for research and replication in more diverse contexts and with more diverse samples to promote progress in the field of Applied Linguistics as ARAL celebrates its 40th anniversary.

Type: Short Paper
Information: Annual Review of Applied Linguistics , Volume 40 , March 2020 , pp. 134 - 142

DOI: https://doi.org/10.1017/S0267190520000033 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike licence (http://creativecommons.org/licenses/by-nc-sa/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the same Creative Commons licence is included and the original work is properly cited. The written permission of Cambridge University Press must be obtained for commercial re-use.
Copyright: Copyright © The Author(s), 2020

Introduction

Research in the language sciences is generally motivated by a desire to understand the architecture of the language system, language learning, and language use. As language scientists, we tend to strive for knowledge that has universal value, knowledge that is true beyond the particular group of language users we observe in our studies. An important lesson in behavioral research methodology is that such generalization rests on the premise that samples are randomly drawn from the population under study. If the selection of participants is somehow biased, then the reliability of researchers’ statements about the behavior under investigation is compromised (Henrich, Heine, & Norenzayan, Reference Henrich, Heine and Norenzayan2010). In practice, however, participant selection is almost always biased. For the field of psychology, Arnett (Reference Arnett2008) has shown that samples are drawn primarily from WEIRD societies—Western, Educated, Industrialized, Rich, and Democratic groups (Henrich et al., Reference Henrich, Heine and Norenzayan2010). Arnett (Reference Arnett2008) estimated that 80% of the samples presented in six major psychology journals consisted of university students drawn from academic psychology programs. Arnett (Reference Arnett2008) also clearly demonstrated that sampling is not only biased in terms of educational background: 90% of the participants represented in the journals surveyed came from North American or European populations. Current psychological insight, then, is based on a very specific, highly educated group drawn from WEIRD societies.

In this position paper, we explore to what extent our own field of Applied Linguistics suffers from biased sampling and what the consequences of such skewed sampling may be. We do this in full recognition that we are not alone in raising concerns about the demographics of the samples used in our field (e.g., Bigelow & Tarone, Reference Bigelow and Tarone2004; Cox, Reference Cox2019; Kormos & Sáfár, Reference Kormos and Sáfár2008; Mackey & Sachs, Reference Mackey and Sachs2012; Ortega, Reference Ortega2005, Reference Ortega2019; Pot, Keijzer, & de Bot, Reference Pot, Keijzer and de Bot2018; Tarone & Bigelow, Reference Tarone and Bigelow2005), that we cannot do justice to all these voices within the scope of this paper, and in full recognition of the fact that we ourselves are part of the problem in that we rely on academic samples as much as anyone else. We do think it is important to keep this conversation on the front burner in Applied Linguistics, and we intend to do so in this paper by sketching just how skewed sampling currently is and how problematic this can be.

Sampling Practices in Applied Linguistics

There is no reason to assume that the field of Applied Linguistics is doing any better in terms of population representation than its neighboring disciplines. There are many indications (but little hard evidence so far) that certain groups are underrepresented. For example, in a plea for attention to illiteracy in second-language (L2) learning published in TESOL Quarterly, Bigelow and Tarone (Reference Bigelow and Tarone2004) pointed out that there had been no publications on illiterate L2 learners in TESOL Quarterly in the previous ten years, even though millions of people in the US and hundreds of millions of people worldwide are illiterate (currently estimated at 750 million by the UNESCO Institute for Statistics [2019]). Sixteen years later, this situation remains largely unchanged (but see Bigelow, Delmas, Hansen, & Tarone, Reference Bigelow, Delmas, Hansen and Tarone2006). Conversely, there is also evidence that some groups are overrepresented: Plonsky (Reference Plonsky2016) estimated that 67% of research samples in L2 research are university student samples. To date, however, no published research has systematically investigated sampling biases in Applied Linguistics research.

One way to gauge the sampling practices in the field is to examine published meta-analyses. The field has witnessed a plethora of meta-analyses, which collectively cover a large number of published studies on a wide array of topics. In some ways, meta-analyses are the hallmark of researchers’ wish to generalize. Meta-analysts typically synthesize a collection of studies that, by the nature of their designs, all aim for generalization, and in doing so, meta-analysts aim to present the state of the art in a particular area of inquiry. Meta-analysis also affords a possibility to evaluate the sampling practices in a given domain, given that many meta-analysts will summarize and present the demographic characteristics of the participant samples included in the primary studies. Table 1 lists sample information provided by meta-analyses published in the last five years that coded participants’ background in sufficient detail to allow insight into the field's sampling practices. As can be seen, this yielded a selection of 17 meta-analyses collectively covering 863 studies and a fair (but obviously far from exhaustive) range of topics that applied linguists are concerned with. The table shows that 65% of all samples were university student samples. Samples other than university students tended to consist of elementary or high school pupils. These samples of young learners accounted for 25% of the research database. There is thus a sizeable amount of research with young learners, but it still constitutes a heavy underrepresentation in light of the fact that L2 learning takes place in schools much more frequently than at universities (Kormos & Sáfár, Reference Kormos and Sáfár2008).

Table 1. Characteristics of participant samples (k) in recent meta-analyses (2014-2019)¹

¹ Notes: Elem. school = elementary school; Lang. ins. = language institute; Miss. = missing

² To be included in the table, the meta-analyses needed to be published between 2014 and 2019, were required to code for educational level, and were not allowed to exclude studies on the basis of educational level. To avoid overlap, the most recent meta-analysis was included in case studies covered the same topic. Journal selection was based on Al-Hoorie & Vitta (2018); journals were SSCI indexed and international in scope. They were: Applied Linguistics, Applied Psycholinguistics, Bilingualism: Language & Cognition, Computer Assisted Language Learning, ELT Journal, English for Specific Purposes, Foreign Language Annals, International Review of Applied Linguistics in Language Teaching, Language Assessment Quarterly, Language Learning, Language Learning & Technology, Language Teaching Research, Language Testing, The Modern Language Journal, ReCALL, Second Language Research, Studies in Second Language Acquisition, System, and TESOL Quarterly.

³ Number of articles and the total number of samples don't correspond exactly. Articles included in a meta-analysis often provide more than one sample, but some articles in the meta-analysis don't provide information about educational level.

⁴ This number represents elementary and high school samples together.

If we zoom in on the adult groups, the picture is particularly bleak. A staggering 88% of all adult samples are university student samples. The remaining 12% were samples coded as “language institute” or “other,” and, while it is not exactly clear what kind of learners fall in these categories, we might conclude that we know comparatively little about adult language learning that takes place outside the walls of academia.

It is important to acknowledge that an investigation of participant samples on the basis of meta-analyses has limitations. For one thing, descriptive and qualitative work remains under the radar, but such work often foregrounds the contextual and situated nature of language learning and use over its generalizability. Also, particular topics foreground particular populations: reading research is often conducted with younger learners and study abroad with university students. Furthermore, in meta-analyses, a large number of studies are typically dropped because of unfit study designs and insufficient detail in reporting. We cannot rule out (but do not assume either) that studies with nonuniversity learners present greater challenges in study design and are, therefore, excluded more often from meta-analysis. If that were the case, research with nonuniversity learners would be underrepresented in the meta-analytic literature and the proportion of research samples with non-university learners might actually be higher.

Zhang's (Reference Zhang2019) recently published bibliometric analysis of second language acquisition (SLA) research provides yet another view of the field's sampling practices. This analysis shows clearly that the field is dominated heavily by Western scholarship. Using the 16 most influential Applied Linguistics journals as determined by Al-Hoorie and Vitta (Reference Al-Hoorie and Vitta2018), Zhang (Reference Zhang2019) analyzed the origins of the publications in these journals. The analysis showed that the field has been heavily dominated by journals published in the U.S., publications produced at prestigious North American research institutes, and scholars working in Anglo-Saxon environments (although Japanese scholarship has been strong, and Chinese scholarship is on the rise). It is possible that the research produced at Anglo-Saxon universities was conducted elsewhere, such that there may not always be a one-to-one relationship between the university location and the study location, but overall, the picture emerging is that insight from Applied Linguistics research is strongly shaped by information gathered from university students, and these tend to be enrolled at prestigious research institutes in North America. The inescapable conclusion is that our participants are truly WEIRD.

It is interesting to consider briefly why sampling biases exist. Notions of convenience and accessibility are easily invoked as an explanation for sampling biases, but probably too simplistic (Ortega, Reference Ortega2005). Of course, many researchers are likely to resort to student samples, because they are easily available. Many programs, for example, maintain a database of potential research participants and sometimes have a credit system in place for student participation in research. But the reasons underlying sampling bias probably run deeper. One important reason could be a lack of engagement with language-learning contexts outside the walls of the university (Ortega, Reference Ortega2005). Many applied linguists teach languages at their own research institutes, and it is natural that their classroom observations generate hypotheses that inspire their research. Moreover, the tendency to look for participants within the walls of academia may be exacerbated by the “publish or perish” culture that governs much of the production of knowledge in academia (Chambers, Reference Chambers2019). Making it in academia tends to depend on the number of publications one produces and the perceived quality of the journals in which they appear, and this makes it attractive to go for the more easily accessible student populations.

What are the consequences for generalizability?

The ways in which existing sampling biases may have affected the state of our knowledge must remain a matter of speculation until we systematically gather knowledge from a wider array of language learning contexts. As Henrich et al. (Reference Henrich, Heine and Norenzayan2010) have demonstrated for several areas within psychology, there is every reason to be concerned about this. One consequence of sampling bias could be that behavior attested in particular groups is simply absent in other groups. This is illustrated by Tarone and Bigelow (Reference Tarone and Bigelow2005), who argued that the influential noticing hypothesis (Schmidt, Reference Schmidt1990, Reference Schmidt1994) may need reconsideration in light of evidence showing that illiterate learners typically do no exhibit noticing where literate learners do. Another consequence is that we may underestimate the range of variation in language learners and language learning outcomes. The Applied Linguistics field is replete with studies that try to establish links between language learning or use and particular individual characteristics. Such endeavors are likely to be compromised due to range restriction if observations are sampled from just a narrow band of the population distribution. How conclusive are findings on the role of working memory or aptitude in foreign language learning, if they are based on university students exclusively? Correlations are hard to detect in the absence of variation and may well be deflated or gone if one zooms in on a restricted range of variation. For this reason, at least one psychology lab at a northeastern U.S. elite institution no longer recruits study participants among its university students, although such a measure tends to be the exception rather than the rule.

Very few studies have directly tested how the nature of the sample affects the behavior under scrutiny. One exception is Andringa (Reference Andringa2014), who investigated how the characteristics of native-speaker comparison groups affect decisions about the incidence of native-like performance by second language learners. Second language learners were compared against a nonrepresentative (academic-only) native speaker norm group and a representative norm group, which approximately followed the education demographics of the society from which the native speakers were drawn, including participants with an academic background. Andringa showed that L1 proficiency ranges were consistently larger in the representative norm group, suggesting that it is both unrealistic and unrepresentative to hold L2 speakers’ language abilities up against academically trained L1 speakers only.

In reading research, the heavy reliance on participant samples learning to read in English (with the English language being an extreme outlier in terms of the inconsistency of the sound-spelling link) illustrates how sampling biases can negatively affect the state of our knowledge. Share (Reference Share2008) offered a fascinating and somewhat disturbing discussion of the consequences, which include, among others, an undue focus on accuracy in learning to read and in defining and diagnosing dyslexia, and the adoption of pedagogies that are misplaced when learning to read in languages with more regular orthographies. In Share's words, the overwhelming presence of English in reading research has led to “distorted theorizing with regard to many issues—including phonological awareness, early reading instruction, the architecture of stage models of reading development, the definition and remediation of reading disability, and the role of lexical-semantic and supralexical information in word recognition” (p. 584).

As Ortega (Reference Ortega2005) has poignantly argued, our sampling choices not only create problems for generalizability, but also pose ethical dilemmas. Our science may not provide answers to questions for a vast majority of language learners, even within WEIRD societies. Ultimately, the challenges we face as a discipline may lie elsewhere. They may lie in how to sustain linguistic diversity, in how to sustain multilingual development, in dealing with the language-learning needs of refugees and migrants, who may have had limited or interrupted formal education, in the relationship between (second) language skills and healthy aging, and in the dropping numbers of students learning languages other than English (e.g., Cox, Reference Cox2019; King, Schilling, Fogle, Lou, & Soukup, Reference King, Schilling, Fogle, Lou and Soukup2008; Ortega, Reference Ortega2019; Pot et al., Reference Pot, Keijzer and de Bot2018; Reynolds, Reference Reynolds2019). Does our research and the models that we generate represent the groups that face these challenges, and does it speak to their experiences? Based on the evidence presented above, we cannot be confident they do. Ultimately, for Applied Linguistics research to be considered relevant by stakeholders, researchers need to demonstrate and make a case for the impact of their work beyond the walls of the academy, in a society that faces many real linguistic needs and questions.

Making Applied Linguistics More Open and Less WEIRD: Some Suggestions

Changing current sampling practices will not be easy. It will require applied linguists to broaden the scope of their research. For this to happen, they may need to step outside their comfort zone, to take note of language learning contexts outside the university, to collaborate with scholars working in different settings, and to actively engage in more dialogue with stakeholders, such as teachers, policymakers, and funding agencies. Current publication practices and the emphasis on publication metrics in researcher quality evaluation may also need to change.

Some of these changes are already underway. The OASIS project, which stands for Open Accessible Summaries In Language Studies, is one cross-journal initiative that may help foster interaction between researchers and stakeholders (“OASIS,” n.d.), and has been embraced by a number of journals in our field already (e.g., Marsden, Trofimovich, & Ellis, Reference Marsden, Trofimovich and Ellis2019). Changes are also visible in the fact that many funding agencies now consider the broader impacts or societal impacts of research grant applications as a required and integral part of the grant evaluation process (e.g., NSF, n.d.; NWO, n.d.). And in Europe, the Plan S initiative, which mandates that publicly funded research from 12 European countries be published in open access journals by 2021, aims not only to make science freely available to everyone, but also tries to change the way scholarly achievement is evaluated by putting in place a system that acknowledges that an academic career is also shaped by a scholar's ability to engage with society and to teach (Science Europe, Reference Europe2018). All these initiatives are meant to break down walls between the academy and the wider society, so researchers may expand their research, relevance, and reach.

And of course, we should not take the current state of our knowledge for granted, but work directly to diversify participant samples. One powerful way to do this is to replicate existing research with different samples and evaluate to what extent the same conclusions still hold. This is the essence of the SLA for All? project (Andringa & Godfroid, Reference Andringa and Godfroid2018), in which we invited scholars to replicate influential studies in different, nonacademic samples. The goals of the project are to (1) raise awareness of the existing sampling biases in Applied Linguistics and the problems associated with them; (2) to get a sense of the extent to which the current state of knowledge in Applied Linguistics is indeed distorted by these sampling practices; (3) to gain experience doing research in different contexts and with learners who currently often remain invisible in research; and (4) to learn to what extent the tools applied linguists have developed are adequate for doing so, as they may well not be (Ortega, Reference Ortega2019). We hope that such an initiative will promote further efforts to make Applied Linguistics less WEIRD and, with that, a bit more fair and inclusive.

References

Al-Hoorie, A., & Vitta, J. P. (2018). The seven sins of L2 research: A review of 30 journals’ statistical quality and their CiteScore, SJR, SNIP, JCR impact factors. Language Teaching Research, 23(6), 727–744. doi:10.1177/1362168818767191CrossRef Google Scholar

Andringa, S. J. (2014). The use of native speaker norms in critical period hypothesis research. Studies in Second Language Acquisition, 36, 565–596. doi:10.1017/S0272263113000600CrossRef Google Scholar

Andringa, S. J., & Godfroid, A. (2018). SLA for all? Reproducing SLA research in non-academic samples. OSF. Retrieved from https://doi.org/10.17605/OSF.IO/MP47B CrossRef Google Scholar

Arnett, J. J. (2008). The neglected 95%: Why American psychology needs to become less American. American Psychologist, 63(7), 602–614.CrossRef Google Scholar PubMed

Bigelow, M., Delmas, R., Hansen, K., & Tarone, E. (2006). Literacy and the processing of oral recasts in SLA. TESOL Quarterly, 40(4), 665–689.CrossRef Google Scholar

Bigelow, M., & Tarone, E. (2004). The role of literacy level in second language acquisition: Doesn't who we study determine what we know? TESOL Quarterly, 38(4), 689–700.CrossRef Google Scholar

Chambers, C. (2019). The seven deadly sins of psychology: A manifesto for reforming the culture of scientific practice. Princeton University Press.Google Scholar

Cox, J. G. (2019). Multilingualism in older age: A research agenda from the cognitive perspective. Language Teaching, 52(3), 360–373.CrossRef Google Scholar

Henrich, J., Heine, S. J., & Norenzayan, A. (2010). The weirdest people in the world? Behavioral and Brain Sciences, 33(2-3), 61–83.CrossRef Google Scholar

King, K. A., Schilling, N., Fogle, L. W., Lou, J. J., & Soukup, B. (2008). Sustaining linguistic diversity: Endangered and minority languages and language varieties. Georgetown University Press.Google Scholar

Kormos, J., & Sáfár, A. (2008). Phonological short-term memory, working memory and foreign language performance in intensive language learning. Bilingualism: Language and Cognition, 11(2), 261–271.CrossRef Google Scholar

Mackey, A., & Sachs, R. (2012). Older learners in SLA research: A first look at working memory, feedback, and L2 development. Language Learning, 62(3), 704–740.CrossRef Google Scholar

Marsden, E., Trofimovich, P., & Ellis, N. (2019). Extending the reach of research: Introducing open accessible summaries at Language Learning. Language Learning, 69(1), 11–17.CrossRef Google Scholar

NSF (n.d.). Perspectives on broader impacts. Retrieved January 2020 from https://www.nsf.gov/od/oia/publications/Broader_Impacts.pdf Google Scholar

NWO (n.d.). Knowledge utilisation to create impact. Retrieved January 2020 from https://www.nwo.nl/en/about-nwo/organisation/nwo-domains/sgw/knowledge+utilisation Google Scholar

OASIS: Open accessible summaries in language studies (n.d.). Retrieved January 2020 from https://oasis-database.org/Google Scholar

Ortega, L. (2005). For what and for whom is our research? The ethical as transformative lens in instructed SLA. The Modern Language Journal, 89(3), 427–443. doi:10.1111/j.1540-4781.2005.00315.xCrossRef Google Scholar

Ortega, L. (2019). SLA and the study of equitable multilingualism. The Modern Language Journal, 103, 23–38.CrossRef Google Scholar

Plonsky, L. (2016). The N crowd: Sampling practices, internal validity, and generalizability in L2 research. Presentation at University College London.Google Scholar

Pot, A., Keijzer, M., & de Bot, K. (2018). Do low L2 abilities impede healthy aging for migrant older adults in the Netherlands? Dutch Journal of Applied Linguistics, 7(1), 109–120.CrossRef Google Scholar

Reynolds, D. (2019). Language policy in globalized contexts. Report RR.3.2019. Report for the World Innovation Summit for Education, Qatar Foundation. Retrieved from: https://www.wise-qatar.org/app/uploads/2019/08/language-policy-in-globalized-contexts.pdf Google Scholar

Schmidt, R. (1990). The role of consciousness in second language learning. Applied Linguistics, 11(2), 129–158.CrossRef Google Scholar

Schmidt, R. (1994). Deconstructing consciousness in search of useful definitions for applied linguistics. AILA Review, 11, 11–26.Google Scholar

Europe, Science. (2018). Plan S: Making full and immediate open access a reality. Retrieved from https://www.coalition-s.org/Google Scholar

Share, D. L. (2008). On the Anglocentricities of current reading research and practice: The perils of overreliance on an “outlier”orthography. Psychological Bulletin, 134(4), 584.CrossRef Google Scholar

Tarone, E., & Bigelow, M. (2005). Impact of literacy on oral language processing: Implications for second language acquisition research. Annual Review of Applied Linguistics, 25, 77–97.CrossRef Google Scholar

UNESCO Institute for Statistics. (2019). Literacy. Retrieved from http://uis.unesco.org/en/topic/literacy Google Scholar

Zhang, X. (2019). A bibliometric analysis of second language acquisition between 1997 and 2018. Studies in Second Language Acquisition, 1–24. doi:10.1017/S0272263119000573Google Scholar

Table 1. Characteristics of participant samples (k) in recent meta-analyses (2014-2019)1

Article contents

Sampling Bias and the Problem of Generalizability in Applied Linguistics

Abstract

Introduction

Sampling Practices in Applied Linguistics

What are the consequences for generalizability?

Making Applied Linguistics More Open and Less WEIRD: Some Suggestions

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests