Introduction
Most neurocognitive tests have been developed in North America and normed on native English speakers (NSEs). Normative systems typically focus on age, education, gender, or race (Abeare et al., Reference Abeare, Sabelli, Taylor, Holcomb, Dumitrescu, Kirsch and Erdodi2019; Heaton et al., Reference Heaton, Miller, Taylor and Grant2004, Reference Heaton, Ryan and Grant2009) and tend to ignore variability in language proficiency (Gasquoine et al., Reference Gasquoine, Croyle, Cavazos-Gonzalez and Sandoval2007; Gasquoine, Reference Gasquoine1999). Limited English proficiency (LEP) refers to a continuum of deficits in phonology (systematic phoneme substitutions characteristic of foreign accents), lexicon (limited vocabulary and speed of retrieval), and syntax (deviation from grammatical rules) attributable to late-life language acquisition (i.e., outside the sensitive period) in the context of normal verbal skills in the individual’s mother tongue. In other words, LEP is a learned deficit reflecting a delay in exposure to English.
Recent research demonstrated that LEP can be a significant confound in test result interpretation even in cognitively high-functioning examinees (Ali, Brantuo, et al., Reference Ali, Brantuo, Cutler, Kennedy and Erdodi2022; Erdodi et al., Reference Erdodi, Nussbaum, Sagar, Abeare and Schwartz2017a). Consequently, existing norms may not apply to individuals with LEP (Celik et al., Reference Celik, Kokje, Meyer, Frolich and Teichmann2020; Funes et al., Reference Funes, Hernandez Rodriguez and Lopez2016; Gasquoine & Gonzales, Reference Gasquoine and Gonzales2012), as they systematically underestimate verbal skills in general – while perhaps providing an accurate measure of English proficiency. As the world grows diverse due to migration and the percentage of bilinguals increases both in Europe and the USA (Eurostat, 2018; Ryan, Reference Ryan2013), so do the chances of encountering patients with LEP in clinical settings. Therefore, understanding the impact of LEP on cognitive testing is of immediate practical interest.
Recent reviews (Antoniou, Reference Antoniou2019; Celik et al., Reference Celik, Kokje, Meyer, Frolich and Teichmann2020) have outlined bilinguals’ advantages and disadvantages in different cognitive tasks. The tasks’ level of verbal mediation further complicates the interpretation of cognitive profiles associated with LEP. Verbal mediation has recently been referenced in LEP research to classify neuropsychological instruments based on the extent to which intact language skills and/or native-level proficiency in the language of administration is required for the test to provide a valid measure of its target construct (Brantuo et al., Reference Brantuo, An, Biss, Ali and Erdodi2022). Throughout this paper, we refer to “verbal” tests as having high verbal mediation, indicating that verbal skills are central to optimal performance. In contrast, we refer to “non-verbal” tests (i.e., tasks designed to measure visual–perceptual skills; Gasquoine et al., Reference Gasquoine, Croyle, Cavazos-Gonzalez and Sandoval2007) as having low verbal mediation.
Studies comparing NSE and LEP groups on tests administered in English have yielded contradictory results (Boone et al., Reference Boone, Victor, Wen, Razani and Ponton2007; Gasquoine et al., Reference Gasquoine, Croyle, Cavazos-Gonzalez and Sandoval2007; Kisser et al., Reference Kisser, Wendell, Spencer and Waldstein2012). On the one hand, there are reports of NSE performing better on verbal but not nonverbal tasks, such as tests of visuospatial abilities (Boone et al., Reference Boone, Victor, Wen, Razani and Ponton2007; Kisser et al., Reference Kisser, Wendell, Spencer and Waldstein2012). On the other hand, significant language administration effects in Spanish-English bilingual groups were documented for some (e.g., letter fluency, Stroop Color and Word trials) but not other verbal tasks (i.e., verbal learning, Digit Span; Gasquoine et al., Reference Gasquoine, Croyle, Cavazos-Gonzalez and Sandoval2007).
Theoretically, nonverbal tests should be immune to LEP. Indeed, NSE norms for certain visuospatial measures can be applied to Spanish-speaking LEP samples without increasing false-positive rates (Gasquoine & Gonzales, Reference Gasquoine and Gonzales2012; Gasquoine et al., Reference Gasquoine, Croyle, Cavazos-Gonzalez and Sandoval2007). Similarly, Walker et al. (Reference Walker, Batchelor, Shores and Jones2010) found no differences between NSE and LEP participants with different English proficiency levels on several tests (e.g., Digit Symbol, Matrix Reasoning). However, Funes et al. (Reference Funes, Hernandez Rodriguez and Lopez2016) demonstrated that administering tests in English to Spanish-speaking participants may overestimate deficits even on non-verbal tasks (e.g., Digit-Symbol Coding, Block Design).
Predictably, NSE outperform LEP on multiple verbal tasks, including auditory attention (Digit Span/Word Span; Durand-Lopez, Reference Durand-Lopez2020; Mattys et al., Reference Mattys, Baddeley and Trenkle2017; Walker et al., Reference Walker, Batchelor, Shores and Jones2010; Yoo & Kaushanskaya, Reference Yoo and Kaushanskaya2012), executive functions (Stroop; Coderre et al., Reference Coderre, Van Heuven and Conklin2013; Singh & Mishra, Reference Singh and Mishra2013; Tse & Altarriba, Reference Tse and Altarriba2012), object naming or verbal fluency (BNT-15; Ali, Elliott, et al., Reference Ali, Elliott, Biss, Abumeeiz, Brantuo, Kuzmenka, Odenigbo and Erdodi2022; Brantuo et al., Reference Brantuo, An, Biss, Ali and Erdodi2022; Erdodi et al., Reference Erdodi, Jongsma and Issa2016). However, not all tests that, at face value, appear to have high verbal mediation are equally affected by LEP. Some (animal fluency, BNT, Complex Ideational Material, single word reading) are particularly sensitive to it, whereas others (speeded reading, Digit Span) seem surprisingly robust to LEP (Ali, Brantuo, et al., Reference Ali, Brantuo, Cutler, Kennedy and Erdodi2022; Ali, Elliott, et al., Reference Ali, Elliott, Biss, Abumeeiz, Brantuo, Kuzmenka, Odenigbo and Erdodi2022; Brantuo et al., Reference Brantuo, An, Biss, Ali and Erdodi2022; Kousaie et al., Reference Kousaie, Sheppard, Lemieux, Monetta and Taler2014; Papageorgiou et al., Reference Papageorgiou, Bright, Periche Tomas and Filippi2019). Additionally, the degree of LEP (Coderre et al., Reference Coderre, Van Heuven and Conklin2013; Marian et al., Reference Marian, Blumenfeld, Mizrahi, Kania and Cordes2013; Roselli et al., Reference Roselli, Ardila, Santisi, Del Rosario Arecco, Salvatierra, Conde and Lenis2002; Tse & Altarriba, Reference Tse and Altarriba2012; Walker et al., Reference Walker, Batchelor, Shores and Jones2010), task difficulty (Durand-Lopez, Reference Durand-Lopez2020), and even the examinees’ mother tongues (Ardila, Reference Ardila2020; Mattys et al., Reference Mattys, Baddeley and Trenkle2017) can also mediate test performance. Such results re-iterate the fact that LEP is a heterogeneous category – treating it simply as the opposite of NSE may overlook important within-group trends that could further inform research on cross-cultural neuropsychology.
Such divergent findings raise several questions about the effect of English proficiency on neuropsychological testing: which cognitive tasks are most affected by LEP? Is the neurocognitive profile associated with LEP more complex than a predictable pattern of deficits based on the level of verbal mediation? Are there meaningful subtypes within LEP? This study was designed to provide tentative answers to these questions. Since most prior research on LEP has been based on US Spanish-English bilinguals, we recruited two geographically and linguistically diverse LEP samples to test the limits of generalizability.
These two bilingual samples (Arabic-dominant students from Canada and Romanian-dominant students from Romania) were recruited to examine the geographic, cultural, and linguistic variability in cognitive profiles associated with LEP. Their main shared commonality is their non-NSE status. In contrast, the differences between them are significant and multifactorial: different native languages (Romanian versus Arabic), writing systems (26-letter Latin alphabet versus the abjad), directions of writing/reading (left-to-right versus right-to-left), educational systems, the broader cultural context (Central Europe versus the Middle East) and cultural identity (Romanian versus Arabic Canadians), the relative homogeneity within the groups and immigration status (all Romanian participants were born and raised in Romania and recruited from a single university whereas the Arabic participants immigrated from various countries) could potentially influence performance on neuropsychological testing. Therefore, comparing the Romanian and Arabic samples provided a robust method for examining whether LEP should be considered a unitary or a heterogeneous construct.
We made the following predictions: (1) All participants with LEP would perform worse than NSEs and below the US normative mean on verbal tests; (2) There would be no difference between NSE and LEP on nonverbal tests; (3) Within participants with LEP, performance on verbal tests would differ as a function of the relative level of English proficiency.
Method
Participants
Data were collected from 113 cognitively healthy university students (98 women; M Age = 22.7; SD = 5.6; M Education = 14.2; SD = 2.0). Participants were recruited from two countries (the Western region of Romania and South-Central Canada) and divided into three samples: Romanian-English bilinguals with LEP (n = 59; LEP-RO), Arabic-English bilinguals with LEP (n = 30; LEP-AR) from Canada, and Canadian NSEs (n = 24). The LEP-RO group was established by default: all participants grew up in a non-English-speaking country and learned English later in life. LEP-AR was psychometrically operationalized: a BNT-15 score of ≤11 was required – a level of performance highly specific to LEP status (Ali, Elliott, et al., Reference Ali, Elliott, Biss, Abumeeiz, Brantuo, Kuzmenka, Odenigbo and Erdodi2022; Brantuo et al., Reference Brantuo, An, Biss, Ali and Erdodi2022). The NSE sample included participants born and raised in an English-speaking part of Canada.
To control for noncredible responding as a confound (Abeare et al., Reference Abeare, Romero, Cutler, Sirianni and Erdodi2021), only participants who passed the first trial of the Test of Memory Malingering (i.e., scored >43 on the TOMM-1; Crișan & Erdodi, Reference Crișan and Erdodi2022; Erdodi, Reference Erdodi2022; Jones, Reference Jones2013; Kulas et al., Reference Kulas, Axelrod and Rinaldi2014; Rai & Erdodi, Reference Rai and Erdodi2021) were included in the study. Six participants from LEP-RO and four from LEP-AR were excluded based on their TOMM-1 scores. All NSEs scored above the cutoff and were retained in the study. No participant reported any neurological or neuropsychological condition associated with cognitive impairment. The three samples were similar in age and gender. LEP-RO participants had higher levels of education than NSEs (Table 1).
Note. BNT-15: Boston Naming Test – Short Form (administered in English); LEP-RO: Romanian Limited English Proficiency sample; LEP-AR: Canadian Arabic LEP sample; NSE: Canadian native speakers of English; η p 2 : Partial Eta-Squared (effect size for ANOVAs); Sig. post hocs: Significant post hoc contrasts (Games-Howell tests, p < .05); g: Effect size (Hedge’s g); 95% CI: 95% Confidence interval.
Materials
All participants were administered a battery of neuropsychological tests in English, including the first three trials of the Stroop test of the Delis–Kaplan Executive Function System (D-KEFS; Delis et al., Reference Delis, Kaplan and Kramer2001), the HVLT-R (Benedict et al., Reference Benedict, Schretlen, Groninger and Brandt1998) with the newly developed Forced Choice Recognition (FCR; Abeare et al., Reference Abeare, Hurtubise, Cutler, Sirianni, Brantuo, Makhzoum and Erdodi2020; Cutler et al., Reference Cutler, Abeare, Messa, Holcomb and Erdodi2021), the Digit Span and Digit-Symbol Coding (CD) subtests of the Wechsler Adult Intelligence Scale – Third Edition (WAIS-III; Wechsler, Reference Wechsler1997), the TMT (Reitan, Reference Reitan1955), animal (Gladsjo et al., Reference Gladsjo, Schuman, Evans, Peavy, Miller and Heaton1999) and Emotion Word Fluency test (EWFT; Abeare et al., Reference Abeare, Freund, Kaploun, McAuley and Dumitrescu2017).
The EWFT instructs examinees to generate as many emotion words as possible within 1 minute. The initial validation study placed the normative output (raw score) in Canadian university students between 10.6 (SD = 3.3) and 11.4 words (SD = 3.3; Abeare et al., Reference Abeare, Freund, Kaploun, McAuley and Dumitrescu2017). Subsequent research reported slightly higher but more variable performance in cognitively healthy students (M = 13.3, SD = 3.3) and slightly lower scores in clinical patients (M = 9.9, SD = 4.4; Abeare An et al., Reference Abeare, An, Tyson, Holcomb, Cutler, May and Erdodi2022).
Age-corrected scaled scores (ACSSs) for the D-KEFS, HVLT-R, Digit Span, and CD were derived from norms published in the Technical Manuals. Demographically adjusted T-scores for TMT and animal fluency were determined using norms published by Heaton et al. (Reference Heaton, Miller, Taylor and Grant2004). Although norms developed on and for NSEs in the USA cannot be assumed to be the appropriate reference group for examinees with LEP in the USA, Canada, or other countries, these are the normative data most likely available to clinicians when assessing LEP examinees. Therefore, an empirical evaluation of the extent to which widely used norms may or may not be appropriate for such individuals is directly relevant to North American neuropsychologists.
Procedure
Participants were recruited as volunteers in a study on cognitive performance and received extra credit for their time. Tests were administered face-to-face individually in quiet rooms by bilingual research assistants with a Bachelor’s degree in psychology, relevant coursework in psychometrics, and specialized training and ongoing supervision received by the first and last authors in administering and scoring the employed battery. Research assistants in Romania and Canada followed the same standardized procedure developed by test publishers during administration and scoring. All tests were administered in English, following standard protocols. In addition, animal fluency and EWFT were administered in both languages only in the LEP samples to directly evaluate the effect of language of administration (native versus English). All data collection, storage, and processing were done with the approval of relevant institutional authorities regulating research involving human participants, in compliance with the 1964 Helsinki Declaration and its subsequent amendments or comparable ethical standards.
Data analysis
Descriptive statistics (percentage, M, SD) for each group were reported as relevant. The main inferential statistics evaluating the significance of between-group differences were one-way ANOVAs, chi-square, and independent (Welch’s) and within-sample t-tests (all contrasts were two-tailed). Post hoc contrasts were performed using the Games–Howell test to control the familywise error rate and protect against alpha inflation. Effect size estimates were expressed in Hedge’s g (with corresponding 95% CIs) and partial eta squared (η p 2).
Results
A large main effect on Digit Span ACSS and a medium effect on longest Digit Span backward were driven by the below-average score of LEP-RO. No difference was noted on longest Digit Span forward (Table 2). There was a small-medium main effect on CD caused by the above-average performance of NSE participants. An extremely large effect emerged on TMT-A, driven by the unusually low score of the LEP-RO sample. The performance gap between groups narrowed on the TMT-B but remained significant. A large effect emerged on the TMT B/A raw score ratio, driven by low scores of the LEP-RO sample (indicating better cognitive flexibility relative to visuomotor sequencing speed). A very large main effect was observed on the Color Naming subtest of the D-KEFS, reflecting a linear increase in performance from LEP-RO through LEP-AR to NSE. The contrasts on the Word Reading and Stroop subtests of the D-KEFS were not significant. Figure 1 displays the between-group trends on the three trials of the D-KEFS.
Note. All tests were administered in English unless marked with * (those tests were administered in the native language of the LEP sample); TMT: Trail Making Test; D-KEFS: Delis–Kaplan Executive Systems; EWFT: Emotion Word Fluency Test; Animals: Category fluency; EWFT: Emotion Word Fluency Test; LDF: Longest digit span forward; LDB: Longest digit span backward; COL: Color Naming; WOR: Word Reading; STR: Stroop; ACSS: Age-corrected scaled score (M = 10, SD = 3); T: T-score (M = 50, SD = 10); LEP-RO: Romanian limited English proficiency sample; LEP-AR: Canadian Arabic LEP sample; NSE: Canadian native speakers of English; η p 2: Partial Eta-Squared (effect size for ANOVAs); Sig. post hocs: Significant post hoc contrasts (Games-Howell tests, p < .05); t = Welch’s t test; g = Effect size for significant post hoc constrasts (Hedge’s g); 95%CI = 95% Confidence interval.
A very large main effect re-emerged on animal fluency in English, driven by the normative performance of the NSE sample relative to the mean score in the impaired and borderline range, respectively, of the LEP samples. The contrast between the two LEP samples on EWFT approached significance (medium effect). When we compared performances of the two LEP samples on animal fluency and EWFT administered in their native language (Romanian and Arabic), extremely large effects emerged for both measures.
Finally, within-sample t-tests revealed a significantly higher performance in raw scores on animal fluency [t(58) = 12.9, p < .001, d = 1.68, extremely large effect] and EWFT [t(58) = 3.09, p < .01; d = 0.40, medium effect] in Romanian within the LEP-RO sample. At T-score levels, mean performance on animal fluency shifted from the impaired (English) to the low average (Romanian) range [t(58) = 10.9, p < .001, d = 1.41, extremely large effect]. Within the LEP-AR sample, all three contrasts were significant but in the opposite direction: participants performed better when animal fluency [t(29) = −4.97, p < .001] and EWFT [t(29) = −3.75, p < .01] were administered in English (d = 0.68–0.91, large effects). At T-score levels, mean performance on animal fluency shifted from the borderline (English) to the impaired (Arabic) range [t(29) = −4.90, p < .001; d = 0.89, large effect].
Significant main effects emerged on all three individual acquisition trials of the HVLT-R, although the magnitude of the difference declined gradually with each subsequent trial (from large to medium effects). However, a large effect re-emerged on the sum of Trials 1–3 (Table 3). There was a very large effect on delayed free recall. Although the ANOVA remained significant on recognition performance, the effect size was notably smaller (medium) on raw scores. Once age correction was applied (T-scores), between-group differences disappeared. All contrasts above were driven by the notably lower performance of the LEP-AR sample. Although the main effect on the FCR trial was significant, this likely reflects the mathematical artifact of very low SDs, as all three samples performed near the ceiling (i.e., a score of 12.0). Figure 2 provides a visual summary of the between-group patterns of auditory verbal learning performance.
Note. All tests were administered in English, following standard instructions; HVLT-R: Hopkins Verbal Learning Test – Revised; 1–3: Acquisition trials (sum of scores across trials 1 through 3); DR: Delayed free recall; RH: Yes/No recognition hits (true positives); RD: Recognition discrimination (true positives minus false positives); FCR: Forced Choice Recognition; LEP-RO: Romanian Limited English Proficiency sample; LEP-AR: Canadian Arabic LEP sample; NSE: Canadian native speakers of English; η p 2 : Partial Eta-Squared (effect size for ANOVAs); Sig. post hocs: Significant post hoc contrasts (Games-Howell tests, p < .05); g = Effect size for significant post hoc constrasts (Hedge’s g); 95%CI = 95% Confidence interval.
Given the prominence of North American normative systems, one-sample t-tests were computed for each of the samples against US norms (Table 4). The LEP-RO performed significantly below the normative mean on Digit Span (large effect), TMT A & B (very large and large effects), animals (very large effect), HVLT-R (medium effects), D-KEFS Color Naming (large effect) and Word Reading (small-medium effect), showing no difference on CD and Stroop. The LEP-AR performed significantly below the normative mean on Digit Span (large effect), TMT A & B (large effects), animals (very large effect), HVLT-R (small to very large effects), and the Color Naming (medium effect) subtest, with no difference on Digit Span, CD, and Word Reading or Stroop subtests. The NSE sample performed above the normative mean on CD and Stroop (medium effects) and below the normative mean on the acquisition trials of the HVLT-R (medium effect).
Note. All tests were administered in English, following standard instructions; TMT: Trail Making Test; Animals: Category fluency; HVLT-R: Hopkins Verbal Learning Test – Revised; D-KEFS: Delis–Kaplan Executive Systems; Animals: Category fluency; 1–3: Acquisition trials (sum of scores across trials 1 through 3); DR: Delayed free recall; RD: Recognition discrimination (true positives minus false positives); COL: Color Naming; WOR: Word Reading; STR: Stroop; ACSS: Age-corrected scaled score (M = 10, SD = 3); T: T-scores (M = 50; SD = 10); LEP-RO: Romanian Limited English Proficiency sample; LEP-AR: Canadian Arabic LEP sample; NSE: Canadian native speakers of English; g = Effect size (Hedge’s g); 95%CI = 95% Confidence interval.
Since a BNT-15 score ≤ 11 has been proposed as a psychometric marker of LEP (Ali, Elliott, et al., Reference Ali, Elliott, Biss, Abumeeiz, Brantuo, Kuzmenka, Odenigbo and Erdodi2022; Brantuo et al., Reference Brantuo, An, Biss, Ali and Erdodi2022; Erdodi et al., Reference Erdodi, Nussbaum, Sagar, Abeare and Schwartz2017a), whereas a score of 12 has been identified as the low end of intact performance among NSEs (Abeare et al., 2022), two subgroups were created first within the LEP-RO sample along this cutoff. Participants with BNT-15 ≤ 11 scored significantly lower than those with BNT-15 ≥ 12 on animal fluency in both languages (despite smaller effects during the Romanian administration) and the English administration of the EWFT (large effect). Similarly, large effects emerged on the time-to-completion of both the Yes/No and the FCR recognition trials of the HVLT-R (Table 5).
Note. TMT: Trail Making Test; EWFT: Emotion Word Fluency Test; D-KEFS: Delis–Kaplan Executive Function System; HVLT-R: Hopkins Verbal Learning Test – Revised; LDF: Longest digits forward; LDB: Longest digits backward; COL: Color Naming; WOR: Word Reading; STR: Stroop; 1–3: Acquisition trials (sum of scores across trials 1 through 3); DR: Delayed-free recall; RD: Recognition discrimination (true positives minus false positives); FCR: Forced Choice Recognition; LA: Language of administration; EN: English; RO: Romanian; ACSS: Age-corrected scaled score; T: Demographically adjusted T-score based on norms by Heaton et al. (Reference Heaton, Miller, Taylor and Grant2004); T2C: Time to completion (seconds); t = Welch’s t test; g = Effect size (Hedge’s g); 95% CI = 95% Confidence interval.
Within-sample t-tests revealed that LEP-RO participants with BNT-15 ≤ 11 performed better at raw score level on both animal fluency [t(35) = 9.89, p < .001; d = 1.65, very large effect] and EWFT [t(35) = 3.75, p < .01; d = 0.63, medium effect] administered in Romanian. Their mean animal fluency T-score shifted from the impaired (English) to the low average (Romanian) range [t(35) = 8.38, p < .001; d = 1.40, extremely large effect]. Similar results were observed in participants with BNT-15 ≥ 12 on animal fluency raw [t(22) = 8.51, p < .001; d = 1.77] and T-scores [t(22) = 7.17, p < .001; d = 1.50], with extremely large effects. The mean animal fluency T-score shifted from the borderline (English) to the average (Romanian) range. However, there was no difference in EWFT performance within this subset of the LEP-RO sample as a function of the administration language [t(22) = .08, p = .945].
To control for the method variance in selecting participants for the LEP-RO (by default) and the LEP-AR (BNT-15 ≤ 11) samples, the main contrasts were re-computed after Romanian participants with BNT-15 scores >11 were excluded. This change in the composition of the LEP-RO sample ensured that the two groups had comparable levels of English proficiency. The overall pattern of positive and negative findings captured in Tables 2 and 3 was preserved after equalizing the groups (Table 6).
Note. TMT: Trail Making Test; EWFT: Emotion Word Fluency Test; D-KEFS: Delis–Kaplan Executive Function System; HVLT-R: Hopkins Verbal Learning Test – Revised; LDF: Longest digits forward; LDB: Longest digits backward; COL: Color Naming; WOR: Word Reading; STR: Stroop; 1–3: Acquisition trials (sum of scores across trials 1 through 3); DR: Delayed free recall; RD: Recognition discrimination (true positives minus false positives); FCR: Forced Choice Recognition; LA: Language of administration; EN: English; NA: Native language (Romanian for the RO and Arabic for the AR participants); ACSS: Age-corrected scaled score; T: Demographically adjusted T-score based on norms by Heaton et al. (Reference Heaton, Miller, Taylor and Grant2004); T2C: Time to completion (seconds); t = Welch’s t test; g = Effect size (Hedge’s g); 95% CI = 95% Confidence interval.
Finally, to investigate whether there is an incremental loss in performance on cognitive tests as a function of decreasing English proficiency, test scores were compared across five BNT-15 scores: 11, 10, 9, 8, and ≤7 using a series of one-way ANOVAs (Table 7). Only two significant main effects emerged: on CD (η p 2 = .205, large) and animal fluency T-scores (η p 2 = .149, large). Examining the pattern of CD scores revealed that the finding was driven by the combination of an isolated high average range mean associated with BNT-15 = 11 (12.1) compared to a narrow (average) range performance (M = 9.5–9.8) at the other four levels of BNT-15 and low variability (SD = 1.9–2.5). However, a linear decline in animal fluency T-scores was observed, from M = 35.6 at BNT-15 = 11 to M = 24.8 at BNT-15 ≤ 7.
Note. All tests were administered in English; BNT-15: Boston Naming Test – Short Form; TMT: Trail Making Test; HVLT: Hopkins Verbal Learning Test – Revised; 1–3: Acquisition trials (sum of performance across Trials 1 through 3); DR: Delayed Recall; COL: Color Naming; WOR: Word Reading; STR: Stroop task; D-KEFS: Delis–Kaplan Executive Function Systems; ACSS: Age-corrected scaled score.
When developmental history and cognitive profile collide: a case study
Although learning a language outside the sensitive period (age > 15) is commonly considered a developmental marker of LEP (Johnson & Newport, Reference Johnson and Newport1989; Lenneberg, Reference Lenneberg1967; Sakai, Reference Sakai2005), individual variability in language acquisition results in notable exceptions from this principle. To illustrate this, we present psychometric data from a a 47-year-old right-handed female patient with 16 years of education referred to the senior author’s private practice for assessment following an uncomplicated mild traumatic brain injury. She grew up speaking Russian, immigrated to Canada at age 18, and obtained a bachelor’s degree. By history, she would be classified as LEP. However, she had no obvious accent when speaking English and obtained the following scores on verbal neuropsychological tests: BNT-15 = 14 (the mean of the NSE sample in the present study was 14.1 and 13.5 in the most recently published norms; Abeare et al., 2022); Complex Ideational Material = 12 (perfect score); letter and animal fluency T = 61; California Verbal Learning Test acquisition trials raw score = 66/80 (T = 69), long-delay free recall raw score = 4/16 (z-score = 1.0); Similarities ACSS = 16, Vocabulary ACSS = 19 (Verbal Comprehension Index = 150). Based on her cognitive profile, her neuropsychological functioning better matches an NSE’s.
Discussion
This study was designed to investigate geographic differences in cognitive profiles associated with LEP and compare them to norms developed on and for NSEs. To this end, two different LEP samples were recruited (Romanian and Arabic Canadian students), and their cognitive profiles were compared to NSE norms and a student sample of Canadian NSEs. We predicted that LEP participants would perform worse than NSEs and below the normative mean on verbal tests; no difference between NSE and LEP on nonverbal tests; and that performance on verbal tests would differ based on English proficiency levels within the LEP sample. Results generally supported the first hypothesis, with several notable exceptions: the LEP-RO and LEP-AR samples demonstrated a unique pattern of strengths and weaknesses that defies a unifying interpretation. The support for the second hypothesis was mixed due to the divergent performance between the two LEP samples. The third hypothesis was only supported in the verbal fluency tests and the HVLT-R time-to-completion metrics.
Results are broadly consistent with previous research on the deleterious effect of LEP on performance during verbal tasks (Bialystok et al., Reference Bialystok, Craik and Luk2008, Reference Bialystok, Craik, Green and Gollan2009; Boone et al., Reference Boone, Victor, Wen, Razani and Ponton2007; Coderre et al., Reference Coderre, Van Heuven and Conklin2013; Kisser et al., Reference Kisser, Wendell, Spencer and Waldstein2012; Mattys et al., Reference Mattys, Baddeley and Trenkle2017; Rivera Mindt et al., Reference Rivera Mindt, Arentoft, Kubo Germano., D’Aquila, Scheiner, Pizzirusso, Sandoval. and Gollan2008; Walker et al., Reference Walker, Batchelor, Shores and Jones2010). Previous reports of the heightened sensitivity of the D-KEFS Color Naming to LEP relative to Word Reading were replicated (Brantuo et al., Reference Brantuo, An, Biss, Ali and Erdodi2022), with one caveat: LEP-RO continued to improve on the Stroop task, whereas performances of LEP-AR declined. Consistent with existing research (Brantuo et al., Reference Brantuo, An, Biss, Ali and Erdodi2022; Erdodi et al., Reference Erdodi, Nussbaum, Sagar, Abeare and Schwartz2017a), animal fluency was very sensitive to LEP, as evidenced by a mean performance of 1.5–2 SDs below the normative mean. Consistent with previous reports (Wauters & Marquardt, Reference Wauters and Marquardt2017), the EWFT was less susceptible to the administration language than animal fluency, although both the magnitude and the direction of the effect of native versus English administration were different in LEP-RO from LEP-AR.
The performance of the LEP-RO sample improved during the native language compared to the English administration of the animal fluency test. Applying the demographically adjusted norms by Heaton et al. (Reference Heaton, Miller, Taylor and Grant2004) to raw scores increased their average scores by almost 1.5 SDs. However, the LEP-AR sample demonstrated the opposite pattern: participants performed better during the English administration, resulting in a 1 SD difference. This pattern complicates the interpretation of the results and precludes clear recommendations to assessors in clinical settings. Findings from the LEP-RO sample indicate that scores during the task’s standard English administration underestimate semantic fluency skills that could be obtained in their native language by 1–1.5 SDs. Therefore, adjusting the T-score obtained in English by 10–15 T-score points may provide a more accurate estimate of the true cognitive ability of LEP examinees who could not be tested in their native language.
However, findings in the LEP-AR sample suggest that such an adjustment is far from universally applicable. Whether the Heaton norms provide a valid normative comparison for individuals with LEP has yet to be established. Known variability in verbal fluency scores as a function of broader cultural and linguistic variables (Ardila, Reference Ardila2020) suggests that the accurate clinical interpretation of test scores may require a deeper understanding of the complex interactions among the various factors influencing performance on cognitive testing.
Similar to the clinical case study, the LEP-RO sample produced an auditory verbal memory profile that was indistinguishable from that of NSEs, whereas the LEP-AR consistently underperformed the NSE sample. The fact that LEP-AR participants were immersed in an English-speaking language environment, whereas LEP-RO participants lived in a non-English-speaking country, makes this pattern even more difficult to interpret. The most parsimonious explanation seems to be the inclusion criterion of BNT-15 ≤ 11: although needed to ensure that the English-Arabic bilinguals had LEP, it may have inadvertently resulted in oversampling participants from the lower end of the English proficiency continuum.
However, ANOVAs using five levels of the BNT-15 (11, 10, 9, 8, and ≤7) as the independent variable only found two significant contrasts, indicating that below the LEP cutoff (≤11) BNT-15 scores no longer predict performance on most cognitive tests. Therefore, the unexpectedly high performance of the LEP-RO sample cannot be attributed to 23 of the Romanian participants having scored above this cutoff and, hence, proved superior English proficiency than LEP-AR.
Findings on non-verbal tests are less conclusive: although both LEP samples performed close to the normative mean on CD, consistent with previous research (Walker et al., Reference Walker, Batchelor, Shores and Jones2010), NSEs scored above it, suggesting that a mild LEP disadvantage persists even in the absence of frank deficits. The outcome on the TMT is puzzling and contradicts previous reports (Boone et al., Reference Boone, Victor, Wen, Razani and Ponton2007; Kisser et al., Reference Kisser, Wendell, Spencer and Waldstein2012). The LEP-RO sample performed 2 SDs below the normative mean on TMT-A and one SD below on TMT-B. In the context of intact performance on CD and D-KEFS Stroop, these findings are difficult to interpret and serve as an important reminder of the relevance of population-specific norms (Bezdicek et al., Reference Bezdicek, Motak, Axelrod, Preiss, Nikolai, Vyhnalek, Poreh and Ruzika2012, Reference Bezdicek, Motak, Schretlen, Preiss, Axelrod, Nikolai, Peña, Ojeda and Ruzika2016).
Assuming normative performance in examinees with LEP on nonverbal tests on rational grounds alone increases the risk of significant errors in the clinical interpretation of scores (Celik et al., Reference Celik, Kokje, Meyer, Frolich and Teichmann2020; Funes et al., Reference Funes, Hernandez Rodriguez and Lopez2016; Gasquoine & Gonzales, Reference Gasquoine and Gonzales2012). In fact, our results challenge the notion of “LEP profile” as a unitary construct. They suggest that other parameters (geographic location, level of English proficiency, native language, cultural differences in the significance of response speed, etc.) may be equally important factors in understanding the clinical implications of test scores by LEP examinees (Ardila, Reference Ardila2020; Coderre et al., Reference Coderre, Van Heuven and Conklin2013; Durand-Lopez, Reference Durand-Lopez2020; Marian et al., Reference Marian, Blumenfeld, Mizrahi, Kania and Cordes2013; Roselli et al., Reference Roselli, Ardila, Santisi, Del Rosario Arecco, Salvatierra, Conde and Lenis2002; Singh & Mishra, Reference Singh and Mishra2013; Tse & Altarriba, Reference Tse and Altarriba2012; Walker et al., Reference Walker, Batchelor, Shores and Jones2010).
Separating the LEP-RO sample into high and low English proficiency levels operationalized using BNT-15 scores (Ali, Elliott, et al., Reference Ali, Elliott, Biss, Abumeeiz, Brantuo, Kuzmenka, Odenigbo and Erdodi2022; Brantuo et al., Reference Brantuo, An, Biss, Ali and Erdodi2022) revealed a performance pattern with potential clinical relevance. Although both groups obtained significantly lower scores during the English relative to Romanian administration of animal fluency, participants with BNT-15 ≥ 12 performed consistently better on both administrations. These findings support the use of the BNT-15 as an objective index of English proficiency (Erdodi et al., Reference Erdodi, Nussbaum, Sagar, Abeare and Schwartz2017a) and reveal that BNT-15 scores may tap the broader construct of general verbal skills independent of any specific language, which includes fund of word knowledge and the speed of lexical retrieval. In other words, BNT-15 preserves its original function of measuring cognitive functioning in addition to LEP status.
Finally, a BNT-15 ≤ 11 score was associated with higher time-to-completion on the HVLT-R recognition trials, indicating increased processing demands in participants with lower levels of English proficiency. This finding has implications for both performance validity assessment and academic accommodations for LEP students at English-speaking institutions. Since time-to-completion often serves as an index of response credibility on word recognition tests generally (Cutler et al., Reference Cutler, Greenacre, Abeare, Sirianni, Roth and Erdodi2022; Erdodi & Lichtenstein, Reference Erdodi, Lichtenstein and Boone2021; Erdodi et al., Reference Erdodi, Tyson, Shahein, Lichtenstein, Abeare, Pelletier, Zuccato, Kucharski and Roth2017b; Kim et al., Reference Kim, Boone, Victor, Marion, Amano, Cottingham, Ziegler and Zeller2010; Lupu et al., Reference Lupu, Elbaum, Wagner and Braw2018) and the HVLT-R specifically (Cutler et al., Reference Cutler, Abeare, Messa, Holcomb and Erdodi2021), assessors should exercise caution before interpreting slow responding on the HVLT-R as evidence of invalid performance in LEP examinees to protect them against increased false positives. In an academic context, extending the time limit on exams may be construed as a reasonable and necessary accommodation for LEP students (Ali, Brantuo, et al., Reference Ali, Brantuo, Cutler, Kennedy and Erdodi2022).
It is widely accepted that translating and norming commonly used neuropsychological tests to all languages is not feasible (Franzen et al., Reference Franzen2021). Administering tests in the examinee’s native language is often considered the next best solution for neutralizing the effects of LEP (Franzen et al., Reference Franzen2021; Fujii, Reference Fujii2018). However, our results indicate that such an accommodation can have the opposite (i.e., suppressing rather than enhancing) effect. Indeed, while the Romanian administration significantly improved verbal fluency performance in LEP-RO compared to the English administration, the Arabic administration of these tests produced lower scores in LEP-AR compared to the English administration. This finding suggests that administering psychometric tests in the examinee’s native language fails to neutralize LEP as a confound and may even inadvertently magnify distortions within the neurocognitive profile, especially in the absence of appropriate norms for many LEP populations.
Results point towards identifying a list of tests that are robust to the variability in the level of English proficiency as the best pragmatic safeguard to LEP status. Within the present study, three such tests emerged as possible “LEP-resistant” candidates: CD, the Word Reading subtest of the D-KEFS, and the EWFT. Age-corrected T-scores for the Yes/No Recognition Discrimination trial of the HVLT-R were also immune to LEP. However, their utility as an overall measure of auditory verbal learning and memory might be limited, considering that the test’s key trials remain vulnerable to LEP.
Results should be interpreted in the context of the study’s limitations. The most obvious one is the relatively small samples of convenience. In addition, all participants were recruited from two universities, raising questions about the representativeness of the samples. On the one hand, university students may be cognitively higher functioning than the general population. As such, results may not generalize to clinical populations (Braw, Reference Braw, Horton and Reynolds2021). On the other hand, the significant variability in English proficiency within LEP-RO may have masked general trends relevant to cross-cultural neuropsychology. Additionally, several poorly understood cultural and educational differences between samples might have confounded results, especially on verbal fluency tests (Ardila, Reference Ardila2020). In the absence of appropriate norms for individuals with LEP in general (let alone specific cultural/linguistic communities), the clinical interpretation of cognitive profiles in such populations remains uncertain.
The study also has several strengths. It recruited two LEP samples from different countries (indeed, continents) with linguistically and orthographically dissimilar native languages to empirically investigate the variability in cognitive profiles across different LEP subtypes. Such a design enabled several population- and instrument-specific discoveries with potential clinical and cross-cultural relevance. Participants were screened for noncredible responding, a significant source of error variance in academic research on university students (An et al., Reference An, Kaploun, Erdodi and Abeare2017; Hurtubise et al., Reference Hurtubise, Baher, Messa, Cutler, Shahein, Hastings, Carignan-Querqui and Erdodi2020; Roye et al., Reference Roye, Calamia, Bernstein, De Vito and Hill2019) and even in normative samples (Erdodi & Lichtenstein, Reference Erdodi and Lichtenstein2017). The battery was selected to include a strategic combination of tests with low and high verbal mediation informed by previous research to further flesh out LEP-specific performance patterns.
Conclusions
Results are broadly consistent with previous research on the deleterious effects of LEP on cognitive profiles – especially on verbal tests. At the same time, findings revealed clinically significant heterogeneity among individuals with LEP, both within and across samples. Therefore, results challenge the notion that LEP status is a unitary construct and emphasize the importance of population-specific research, as findings may not generalize to different groups with LEP (Braw, Reference Braw, Horton and Reynolds2021). Although the BNT-15 proved a valid overall psychometric marker of English proficiency, some of the evidence suggests that it may also capture general verbal/cognitive skills that are not English-specific. Even in the context of high accuracy scores, LEP is associated with slowed processing speed with clear implications for performance validity assessment and eligibility for academic accommodations. Finally, there may be no straightforward definition of LEP status, as individual history of language acquisition and performance-based markers of English proficiency can produce contradictory conclusions (as illustrated by the case study). More research is needed to better understand cognitive profiles associated with LEP and the optimal method for operationalizing the construct itself.
Funding statement
This study received no external funding.
Conflicts of interest
The authors have no conflicts of interest to declare.