Hostname: page-component-669899f699-qzcqf Total loading time: 0 Render date: 2025-04-24T19:49:51.509Z Has data issue: false hasContentIssue false

Feature typology and L2 proficiency matter: L3 acquisition of Portuguese

Published online by Cambridge University Press:  15 April 2025

Mila Tasseva-Kurktchieva*
Affiliation:
Linguistics Program, University of South Carolina, Columbia, SC, USA
Danielle Fahey
Affiliation:
School of Speech, Language, Hearing & Occupational Sciences, University of Montana, Missoula, MT, USA
*
Corresponding author: Mila Tasseva-Kurktchieva; Email: [email protected]
Rights & Permissions [Opens in a new window]

Abstract

To determine the source of transfer in third language acquisition (L3A), we tested the effects of grammatical feature typology and L2 Spanish proficiency on the comprehension and production of grammatical [gender] and [number] in the early stages of L3A of Portuguese. We distinguish between the two features based on their participation in the lexical-conceptual structure of the lemma and its interaction with the morpho-syntactic derivation. L1 English speakers were tested on their knowledge of the features in both their L2 and their L3 through a grammaticality judgment task and an elicited production task. Results show that L3 learners transfer only some features, specifically [gender] rather than [number], suggesting a fine-grained divide in feature compositionality between the structural ([gender]) and semantic ([number]) features. We also found facilitative transfer only after a threshold acquisition in L2, in support of the Threshold Hypothesis. For beneficial transfer of a feature, mere knowledge of the L2 structure was found to be sufficient. However, a higher generalized L2 proficiency threshold was found to predict high L3 accuracy.

Type
Original Article
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - ND
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NoDerivatives licence (https://creativecommons.org/licenses/by-nd/4.0/), which permits re-use, distribution, and reproduction in any medium, provided that no alterations are made and the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press

Introduction

Worldwide, it is more common than not for individuals to speak multiple languages, but the acquisition of a language beyond the second is far from understood. With acquisition of a third language or beyond becoming more common, it is important to explore the influences of previously acquired languages on the newly acquired one in order to better appreciate the cognitive potential of adult language learners. Increasingly, evidence suggests that third language acquisition (L3A) may be qualitatively different from second language acquisition (L2A) (Amaro et al., Reference Amaro, Flynn, Rothman, Cabrelli Amaro, Flynn and Rothman2012). Current L2A theories do not, and in fact cannot, predict L3A patterns clearly, requiring that researchers conduct independent studies of L3A.

An overarching point of inquiry is whether transfer will proceed in a wholesale manner by which either the first or the second language, as a whole, will serve as the baseline of L3 acquisition, or if learners will transfer particular properties from either previously acquired language, also referred to as piecemeal transfer (c.f. the Typological Proximity Model: Rothman, Reference Rothman2010, Reference Rothman2015; Rothman & Amaro, Reference Rothman and Cabrelli Amaro2010; the Scalpel Model: Slabakova, Reference Slabakova, Cabrelli, Flynn and Rothman2012, Reference Slabakova2016, Reference Slabakova2021; the Linguistic Proximity Model: Westergaard, Reference Westergaard2021; Westergaard et al., Reference Westergaard, Mitrofanova, Mykhaylyk and Rodina2017; the Cumulative Enhancement Model: Flynn et al., Reference Flynn, Foley and Vinnitskaya2004). Stepping on the empirically well-supported claim that there is transfer from at least one, and possibly more languages, a further question is what exactly transfers: the properties of the lexical entry (Parasitic Model: Ecke and Hall, Reference Ecke and Hall2021), the syntactic parameters (well described and argued about in the L2A theories), or something else.

In this study, we explore a possible merger in transfer between the lexical and syntactic level as well as the effects of proficiency in both comprehension and production. We test the predictions of transfer and proficiency in a Grammaticality Judgment Task (GJT) and an Elicited Production Task of the grammatical features [gender] and [number]Footnote 1 in the early stages of acquisition of L3 Portuguese by native speakers of English with L2 Spanish.

Transfer in L3A

Sources of transfer L3A

The dust has settled over the major issue in generative L2A of whether adult learners of a new language have access to the guiding principles and parameters of Universal Grammar (UG). However, the manner in which UG interacts with sources of transfer is still an open question for L3A models drawing on the generative framework. The underlying prediction is that facilitative transfer will be observed where parameters are shared between previously learned and new languages while inhibition will happen where the parameter settings are different. An ongoing debate focuses on which of the previously acquired languages serves as the source of transfer in third and subsequent language acquisition. Related to this dispute is the discussion about whether the full grammatical structure of either the L1 or the L2, or only relevant grammatical elements are transferred in the beginning stages of language acquisition (for recent reviews of the theories see Alonso et al., Reference Alonso, Rothman, Berndt, Castro and Westergaard2017; Rothman et al., Reference Rothman, González Alonso and Puig-Mayenco2019). Prevailing models build on the idea that the third language does not start in a vacuum but rather is built on the basis of the previously acquired languages, yet they differ widely in their stance on how much and which previously acquired language supports the acquisition of the third language. Some models look at the order of acquisition of the languages to argue for the primacy of L1 or L2. For example, Leung (Reference Leung2005, Reference Leung, Slabakova, Montrul and Prévost2006), Jin (Reference Jin and Leung2009), and Hermas (Reference Hermas2015) propose that the source of transfer is the grammar of L1 as this is the fully developed system at the time of the typical onset of L3A. Others contend that only the most recently acquired grammar, i.e., that of the L2, will serve as the basis of the initial-state L3 grammar (Bardel & Falk, Reference Bardel and Falk2007, Reference Bardel and Falk2021; Falk & Bardel, Reference Falk and Bardel2011; Sánchez & Bardel, Reference Sánchez, Bardel, Angelovska and Hahn2017). However, the potentially privileged role of one or the other of the previously developed languages in L3A is difficult to pinpoint. Many learners have acquired more than one additional language prior to the onset of third language acquisition. Yet in many cases, only one of those languages is sufficiently developed and maintained to be considered an L2.

A number of other models are rooted in the empirically supported premise that transfer of linguistic features is neither random nor order-of-acquisition based. Instead, these models draw upon the language which is most typologically similar. A further divide here has to do with the manner in which typology influences acquisition: (i) does transfer result in an initial-state grammar based on the typologically-closer L1 or L2 (c.f. the Typological Primacy Model: Rothman, Reference Rothman2010, Reference Rothman2011, Reference Rothman2015; Cabrelli et al., Reference Cabrelli, Pichan, Ward, Rothman and Serratrice2023), or (ii) do individual, typologically similar features and parameters transfer from any previously acquired language (c.f. the Scalpel Model: Slabakova, Reference Slabakova2016, Reference Slabakova2021; the Linguistic Proximity Model: Westergaard, Reference Westergaard2021; Westergaard et al., Reference Westergaard, Mitrofanova, Mykhaylyk and Rodina2017; or the Cumulative Enhancement Model: Flynn et al., Reference Flynn, Foley and Vinnitskaya2004).

It is also not clear yet whether the quantity of possible transfer sources influences facilitative transfer. For example, would it be easier to acquire a setting that is evident in two or more of the previously acquired languages than a setting that is evident in only one? Related to that, we also do not know whether facilitative transfer of parameters is affected in any way by the internal structure of the features to be transferred. This last question is the focus of our study. Here we assume that transfer of linguistics properties can happen from either L1 or L2, as a wholesale, or as individual grammatical. We start with the initial assumptions that adult learners of a new language (i) will have at least partial access to UG and (ii) may transfer at least part of their already acquired grammars to their L3 features. The study is focused on the type of feature rather than the type of transfer. We operationalize transfer of grammatical features as a predictive relationship between acquisition of features in L2 and their acquisition in L3. A discussion of our account of the relevant features follows below.

Grammatical features and their treatment

An additional aspect that plays a role in our study is the understanding of the compositionality of the features of interest to this study: [gender] and [number]. The minimalist framework treats the two features differently based on their (un)interpretability, a distinction between grammatical properties that may or may not be assigned Identification at the conceptual-intentional interface (Chomsky, Reference Chomsky1995, Reference Chomsky and Kenstowicz2001) and thus may or may not need to be valued during interpretation. For instance, nominal φ features such as [gender] and [number] will receive Identification if they are properties of a noun, as they are meaningful at the conceptual-intentional interface (White et al., Reference White, Valenzuela, Kozlowska-MacGregor and Leung2004). Those same features realized on the adjective will remain uninterpretable since within this lexical category they are purely relational grammatical features and will thus need to be checked and erased. This view is somewhat in line with Carstens (Reference Carstens2000, Reference Carstens2001) who assumes that [gender] and [number] do not project independently of each other and are interpretable on the noun but uninterpretable on its modifiers. One easy prediction for any subsequent language acquisition would be that the interpretable features, i.e., those on the noun, will be acquired before the uninterpretable, i.e., those on the agreeing modifiers such as adjectives and determiners.Footnote 2

A more refined feature distinction from van de Craats et al. (Reference Van de Craats, Corver and Van Hout2000) suggests that the internal, or structural, features are an inseparable part of the lexical-conceptual structure. Of the features of interest to our study, [gender] on the noun is arbitrary, unpredictable, and idiosyncratic in the lexical-conceptual structure. This structural feature does not contribute new information to the concept denoted by the noun but is rather an agreement feature that is overtly marked on both the noun and its modifiers. By contrast, the external (semantic) features, such as [number], are those that can be predicted from other properties of the lexical entry. They (i) bring at least some semantic content to their carrier, and (ii) they are ultimately morphologically more uniform and typically present in much more regularized paradigms, when realized overtly at Spell-Out. Following the structural-semantic feature distinction, differential acquisition between [gender] and [number] would be expected both on the noun and on the agreeing modifiers. Such a view is compatible with Bernstein (Reference Bernstein1997) and Carminati (Reference Carminati2005) who propose that [gender] and [number] have their own projections, as well as Van de Craats et al. (Reference Van de Craats, Corver and Van Hout2000) in their prediction that the two features might be acquired sequentially rather than simultaneously. If we take this approach, [gender], being a structural feature built in the lexical entry and proposed to be lower in the syntactic structure, should be acquired prior to [number], which is a semantic feature positioned higher in the syntactic tree.

Individual realizations of the [gender] feature are also perceived differently in the research community (Fuchs et al., Reference Fuchs, Polinsky and Scontras2015). For example, Alarcón (Reference Alarcón2011) takes the stance that both feminine and masculine are specified in the lexicon while Fuchs et al. (Reference Fuchs, Polinsky and Scontras2015) suggest that masculine is default and unspecified but feminine needs to be specified. The latter view is compatible with proposals explaining the lack of agreement in production as a surface form deficit rather than an underlying lack of representation of the grammatical structure (McCarthy, Reference McCarthy2008; Prévost & White, Reference Prévost and White2000).

Our study assumes two points related to these features: (i) [gender] and [number] have different structural versus semantic compositionality in the lexicon, and (ii) [gender] and [number] agreement on the nominal modifiers is a result of morpho-syntactic derivation.

The effect of proficiency on transfer in L3A

Proficiency may play a role in L3A irrespective of transfer source as it may interact with order of acquisition or typology in determining the initial-state L3 grammar. Some scholars suggest that learners must surpass a minimum proficiency, or threshold, in a previously learned additional language in order for beneficial transfer to occur (e.g., Jaensch, Reference Jaensch and Leung2009). Below such a proficiency threshold, learners apply the rules they presumably have from the L1. Following the Threshold Hypothesis (Cummins, Reference Cummins1976), Jaensch (Reference Jaensch and Leung2009, p.116) suggested that L3 learners who are weak in the L2 will not exhibit the benefits of bilingualism in L3A until they have advanced to the “uppermost” proficiency level in the L2. Rothman (Reference Rothman2015) found a similar effect of L2 proficiency on transfer. From a different perspective, Cenoz and Valencia (Reference Cenoz and Valencia1994), Lasagabaster (Reference Lasagabaster2000), and Sanz (Reference Sanz2000) have also suggested that, in cases where neither L1 nor L2 share relevant features with L3, proficiency in the previously acquired languages is the definitive factor in L3A. In our study, we evaluate proficiency as both realization of the specific features and as generalized proficiency in the L2.

L1, L2, and L3 of this study

To achieve this study’s stated aim, we tested native speakers of L1 English with L2 Spanish learning L3 Portuguese. Unlike English (L1), both Spanish (L2) and Portuguese (L3) have overt realizations of both [gender] and [number] on the head N, as well as via concord on its modifiers (adjectives, determiners, quantifiers). Both languages have two [genders] (masculine and feminine), and two [numbers] (singular and plural). In both languages, in the canonical case, masculine (Masc), is marked by –o and feminine (Fem) by –a. Footnote 3 The default [number] singular (Sg) is unmarked and plural (Pl) is marked by the canonical suffix -s attached after the [gender] marker in both languages. The combination of both features in each language, or the 4 conditions, is presented in Table 1 below. Contrary to that, English utilizes the grammatical semantic feature [number] but not the structural feature [gender] on the noun.Footnote 4 In addition, [number] only has an interpretable realization on the noun but is not present as an uninterpretable feature on its modifiers.

Table 1. [Gender] and [Number] features in English, Spanish, and Portuguese

Research questions and predictions

Transfer, either wholesale or of only appropriate features, and proficiency have both been claimed to influence L3A individually, as well as through interaction with each other. The goal of the present study is to investigate the effect of L2 proficiency on transfer of individual grammatical and semantic features into L3.

Transfer of structural vs. semantic features

All three languages in our study have the semantically interpretable feature [number], but only the L2 and L3 have the structural feature [gender]. In terms of positive cross-language influence, [number] can be sourced from both L1 and L2, while gender can only be sourced from L2, we can tease apart the influence of feature type (structural [gender] vs. semantic [number]) in comprehension and production tasks. The structural feature [gender] is part of the lexical-conceptual structure of the noun, and an integral and constant part of the lexical item, while the semantic feature [number] first needs to be mapped on the noun during the computation and then needs to be transferred to the agreeing elements in the noun phrase as a second step. If transfer is reliant on quantity and availability of features, we predict we will see more accurate performance on the semantic feature [number]. However, if transfer is based on feature typology, we predict that grammatical concord of the structural feature [gender] between the noun and its agreeing modifiers will be easier to acquire than grammatical concord of the semantic feature [number].Footnote 5

The effects of proficiency on L3A

One of the main factors affecting subsequent language acquisition is proficiency in previously acquired languages. Thus, in our study we also aim to measure the effect of varying L2 proficiency on the acquisition of the selected features. If the Threshold Hypothesis and its implications for L3A are on the right track, we expect, in general, that learners with greater proficiency in а typologically similar L2 should outperform L3 learners with lower L2 proficiency. We note that proficiency can be operationalized as general exposure to or as achievement in L2 (Jaensch, Reference Jaensch and Leung2009). In our study, general exposure is measured by the length of Spanish study and by a Spanish cloze test. We expect that longer length of Spanish study and/or higher Spanish cloze test scores will correlate with higher overall performance on Portuguese tasks. We also operationalized achievement as performance in the Spanish tasks on equivalent grammatical features. We expected to observe a correlation between accuracy on specific features in Spanish and accuracy on corresponding features in Portuguese.

Methods

Participants

Thirty-five university students enrolled in a Portuguese course completed the study. Participants self-reported data about their mother tongue, all previously learned languages and their experience acquiring Portuguese in a background questionnaire. Additional questions related to previously acquired languages included self-reported L2 proficiency level(s), number of semesters they have spent learning those languages (e.g., Spanish study length), and general habits in using those languages in their daily life. A final set of questions asked which Portuguese course they were currently taking, as well as their gender identification, age, and experience living in another country (i.e., extended stays over 3 months, such as study abroad or family relocation to another country).

Of these 35 participants, 2 were excluded because they had a first language other than English, 4 were excluded because they did not complete tasks in Spanish, and 4 were excluded because they did not complete the Spanish cloze test. The data from twenty-five participants (16 female- and 9 male-identifying; 19–22 y.o., M = 21) were included in statistical analyses. Table 2 below provides a summary of the background information used in analyses.

Table 2. Participants’ backgrounds, Spanish cloze test, and Portuguese vocabulary test results

Five participants reported studying another language with the [gender] feature.Footnote 6 Four reported having studied French and one reported Latin; the rest of the participants have not studied a language with this feature. Eleven of the final twenty-five participants were enrolled in first semester Accelerated Portuguese for speakers of Spanish course and were tested towards the end of the semester; another eleven were enrolled in second semester Portuguese and were tested at the beginning of the semester; the other three were enrolled in intermediate-level Portuguese courses. One participant reported living abroad in a country where Spanish was spoken. Three of the 25 participants reported living abroad in a location where either Spanish or Portuguese is spoken.

Cloze tests

We used a cloze test (modeled after Slabakova (Reference Slabakova2000) and Tremblay and Garrison (Reference Tremblay, Garrison, Prior, Watanabe and Lee2010)) to obtain an objective measure of participants’ overall L2 proficiency in Spanish. This type of proficiency measure has been used since the 1950s as both a placement test in foreign language classes and a proficiency test in second language research due to its availability and ease of use (Tremblay, Reference Tremblay2011). More recently, a version of the test has also been used as a more reliable measure of the efficacy of machine translations (Fedus et al., Reference Fedus, Goodfellow and Dai2018; Forcada et al., Reference Forcada, Scarton, Specia, Haddow and Birch2018). As pointed out in both Slabakova (Reference Slabakova2000) and Tremblay (Reference Tremblay2011), multiple articles have shown a correlation between cloze test results and standardized ESL tests, thus validating the use of the tool as a proficiency measure.

In our study, we used an original article published in a popular Spanish-language magazine with a total length of 387 words. The first 50 words were left intact; after that every 9th word was omitted and represented by a blank, creating a total of 35 blanks, 20 of which fell on content words and 15 on function words.Footnote 7 The cloze test was presented online through Qualtrics survey software. The deleted words were presented in their correct space as a fillable blank. Participants were instructed to type in the words that they felt were appropriate. Following the procedures in Tremblay and Garrison (Reference Tremblay, Garrison, Prior, Watanabe and Lee2010), responses were considered correct and received 1 point if they matched the omitted word or used an appropriate synonym, and incorrect if they provided a different, inappropriate for the context word. To reduce possible correlation between cloze test results and experimental results neither spelling nor [gender]/[number] errors were counted as inaccuracies. Participants scored between 0 and 68% on the cloze test (M = 21%; see Table 2), consistent with prior literature. For reference, two native speakers of Spanish completed the cloze test with accuracy rate of 87%.Footnote 8

Vocabulary test

To ensure their familiarity with the target vocabulary in L3 Portuguese, participants were presented with a multiple-choice vocabulary test that included a total of 86 lexical items, 58 of which were the target nouns used in the GJT and EPT tasks. For each word, participants saw 3 possible translations, only one of which was a correct match. If they chose the correct translation they received 1 point, otherwise they received 0 points. The mean correct response rate was 93%, (range 78–100%).Footnote 9

Materials and procedures

For this study, participants completed a comprehension and a production task. All participants completed first the tasks in L3 Portuguese. Once they completed the main tasks in the L3, they were presented with the Portuguese vocabulary task which gauged their knowledge of the target nouns and adjectives. They then switched to the main tasks in L2 Spanish, followed by a cloze test in Spanish to gauge their overall Spanish proficiency, and finally the background questionnaire which was always presented at the end of the study. Participants were instructed to answer all questions in the tasks as quickly as possible; we did not restrict the time spent on trial as they were at the very early stages of L3 acquisition, and this would have added unnecessary stress. For this reason, although we collected reaction times data, they were not included in the analyses.Footnote 10 By order of appearance, participants were pseudo-randomly divided into one of two groups based on whether they received comprehension tasks or the production tasks first in each language.

A GJT aimed to gauge participants’ comprehension of the two features at hand, and an Elicited Production Task was used to determine their ability to produce those features. Both tasks utilized the same language-specific pools of concrete inanimate nouns and descriptive adjectives with canonical [gender] (-a/-o) and [number] (Ø-s) agreement. Animate nouns were used in some of the filler trials. All nouns, adjectives, and verbs in both tasks were taken from the first-year curriculum specific to each language to ensure that participants were familiar with the lexical items. No nouns were repeated within tasks, although nouns were repeated between tasks.Footnote 11 Of the 44 nouns used in both languages, 40 had a matching [gender], and 4 (chalkboard, desert, pen, and dance) had opposite [genders] between the two languages. Adjectives were repeated both within and between tasks due to the insufficient number of adjectives introduced in the first-year curriculum and the higher variability in agreement patterns.

The experimental tasks were presented using E-Prime 2.10. Participants’ production was digitally recorded using Audacity software.Footnote 12 Responses on all tasks were recorded for both accuracy and reaction times from the beginning of the trial presentation to the moment of keystroke response. Trials within each task were randomized for each participant. Participants were given 4 practice items at the start of every task to ensure that they understood the procedure and were comprehending and producing the intended language. The instructions for all tasks were in English to avoid misunderstanding due to low proficiency level as well as to prevent participants from staying within a particular language or language mode.

The Grammaticality Judgement Task (GJT)

Participants’ comprehension of the features under investigation was gauged in each language using a GJT with 24 target (grammatical and ungrammatical) and 24 filler sentences. The 24 target trials were constructed in a 2 (Masc, Fem) × 2 (Sg, Pl) × 2 (grammatical, ungrammatical) design. Target sentences always contained canonical adjectives and inanimate nouns. Ungrammatical trials contained a non-agreeing adjective. Agreement violations were counterbalanced for mismatching [gender] and [number. In the Portuguese example (1) below the MascPl noun, ovos (“eggs”), is modified by the MascSg adjective, mexido (“scrambled”). (See Appendix A for further examples.)

The 24 filler trials were also counterbalanced for grammaticality. In Portuguese, 12 filler sentences resembled the target sentences, counterbalanced for [gender], [number], and grammaticality, but contained animate nouns. The other 12 fillers contained (un)grammatical subject-verb agreement. In Spanish, grammatical filler sentences included noncanonical adjectives, while the ungrammatical sentences manipulated grammaticality of the noun (singular nouns had ungrammatical [gender], while plural nouns had ungrammatical [number]).Footnote 13

Participants were required to indicate their response by pressing the J button on the keyboard for a grammatical sentence and the F button for an ungrammatical sentence. Results from participants’ accuracy on the target questions were analyzed; filler questions were not considered in the analyses.

The elicited production task (EPT)

In the general case, the ability to produce a non-native language follows the ability to understand it. However, both abilities give us a measure of the overall learner development and as such both should be studied. To gauge participants’ production of the features under investigation, we used an Elicited Production Task (EPT) that presented a sentence in the target language which contained the target noun, alongside two images. Images depicted the target noun as an object with opposing modifications, e.g., a black calculator and a blue calculator. The questions asked them to choose between the two options, e.g., Que calculadora está em cima do caderno? (“Which calculator is on a notebook?”), as in Figure 1 below. Thus, we presented the features of interest on the head noun and expected participants to name the modification and by doing so transfer the same features to the adjective. Participants were instructed that they can answer with as little as a NP (e.g., la calculadora preta “the black calculator” or simply la preta “the black (one)”) or as much as a sentence. Participants were instructed to produce a response for each question. They could say “I don’t know” or “pass” (in any language) to skip items for which they could not produce a response. Such responses were coded as incorrect (5.14% in the Portuguese and 6.98% in the Spanish EPT).

Figure 1. Example of EPT target and filler trials. English translations were not presented on the screen.

Each version of the EPT contained 48 questions. The target trials in the EPT were constructed in a 2 (masculine, feminine) × 2 (singular, plural) design. In both languages, there were 24 filler trials which asked participants to state a number, direction (i.e., left or right), or action, e.g., Quantas calculadoras estão sobre a mesa? (“How many calculators are on the table?”). In Portuguese, participants were presented with 12 additional target trials containing animate nouns (mirroring the 12 inanimate target trials). However, animacy in a gendered language like Spanish and Portuguese can introduce an inadvertent bias. Animate nouns in these languages typically display a grammatical gender that matches the biological gender. In turn, this brings a semantic component to the otherwise structural feature. To avoid such overlap in the features under study, we included in the analyses for the present study only the 12 inanimate target trials in Portuguese and the 24 trials in Spanish.Footnote 14

Trained graduate students, fluent in the target language (Spanish or Portuguese), transcribed and coded the responses on the target NP(s). Each element produced in the target NP (determiner, adjective, noun) was coded for the [gender] and [number] features uttered separately. Their scores were then compared and where there were disagreements the researchers worked with the coders to resolve them. Overall agreement scores were then computed based on match/mismatch between the produced features on these elements (see (2) below). A full report of participants’ coded results is available in the Supplementary Materials. For the purposes of this report, we computed a score showing an “attempt” at correct production.Footnote 15 A number of theoretical approaches are based on the premise that learners’ production is imperfect and, in most cases, is a poor representation of learners’ grammatical knowledge. For example, the Missing Surface Inflection Hypothesis, Prévost and White (Reference Prévost and White2000), and the Bottleneck Hypothesis, Slabakova (Reference Slabakova2009) both lead to the conclusion that although the interlanguage grammar can handle the syntactic structure of the target language, it fails short in mapping the correct morphological form on the intended syntactic structure in production. Thus, we also computed scores which accounted for attempts at producing the target agreement. We took into account the fact that both L2 and L3 allow for grammatical responses which lack either the head noun (2b) or the modifying adjective (2c), though neither L2 nor L3 allow for grammatical noun phrases without a determiner (2d–e). However, we also note that examples such as (2d–e) are syntactically ungrammatical but show an attempt at producing the correct affixes for morphological agreement. To ensure that we account for all forms of correct grammatical processing in production, we also considered the non-target-like forms which were still marked for the intended gender and number of the NP (see (2d–e)).

Analyses of task accuracy

Our goal was to understand the relationship between L2 proficiency and acquisition of shared features in the L3. We conducted two steps of analyses to determine which factors to interpret: first, analyses of multicollinearity and second, stepwise binary logistic mixed effects regressions. To interpret effects of individual factors that survived these analyses, we used Odds Ratios (OR).

The factors that we considered for analyses included the following within-subject effects: (i) [gender] (Masc & Fem), (ii) [number] (Sg & Pl), and (iii) grammaticality (grammatical & ungrammatical, for GJT items only); along with the following between-subject effects: (iv) whether they have lived in a Spanish- or Portuguese-speaking country (i.e., “lived abroad”), (v) whether the participant had acquired another gendered L2 (i.e., “other L2”), (vi) their self-reported length of Spanish study (i.e., “SP study length”), (vii) their Spanish cloze test scores, (viii) their Portuguese course, (ix) their Portuguese vocabulary test scores, and (x) participants’ performance on equivalent items in the Spanish task (i.e., “equivalent SP performance”). Further, the following fixed and random effects were noted: (a) participant, (b) which task was presented 1st, (c) age, (d) participant gender, and (e) item.

Participants’ performance on equivalent items in the Spanish task, or equivalent Spanish performance, was operationalized as a proportion of items correct on the same category of question between the Spanish and Portuguese tasks. Item categories were by task, grammatical gender, grammatical number, and grammaticality (for GJT items only). Equivalent performance was operationalized as a proportion because there was no direct equivalence in materials item-by-item and because there was variation in total number of items by category between languages. The proportion correct of the Spanish items was repeated for each Portuguese item as a within-subject factor, ranging from 0 to 1.

Since our factors of interest were not assumed to be orthogonal, we conducted a multicollinearity analysis in MATLAB using variance inflation factors (VIF) (Tomaschek et al., Reference Tomaschek, Hendrix and Baayen2018). Though there is no standard VIF value to indicate multicollinearity, values above 5 have been cited as too high (Sheather, Reference Sheather2009). Spanish cloze test scores and Portuguese course were both above 5, while Spanish study length was the only other factor with a VIF above 2. The simplest solution to resolve collinearity is to remove factors with strong theoretical relationships. Both Spanish cloze test scores and Portuguese course reflect information portrayed through other factors. Spanish cloze test scores, as a proxy of Spanish proficiency, theoretically relate strongly both to length of Spanish study and to equivalent Spanish performance. Portuguese course has strong theoretical relation to Portuguese vocabulary test scores, as well as to all factors representing proxies of Spanish proficiency, since some participants were enrolled in an accelerated introductory Portuguese course for speakers of Spanish. Therefore, VIF was recalculated with Spanish cloze test scores and Portuguese course removed; the resulting VIF scores were all below 2, indicating that this fix resolved potential multicollinearity confounds.

To select the best model (i.e., determine which factors should be interpreted), stepwise binary logistic mixed effects regressions were run in Matlab using the fitglme function (The Mathworks Inc., 2024a, 2024b). AIC and BIC were used for model comparison; smaller AIC/BIC indicate a better fitting model (Vrieze, Reference Vrieze2012).Footnote 16 Each task was modeled separately. Each model analysis was conducted first with the enter method to identify the effect of all factors in combination and then with the backward method to identify the best stepwise combination of factors. Factors were removed one at a time, until only significant ones remained. After removal of multicollinear factors, we examined the following factors: [gender], [number], [grammaticality], lived abroad, other L2, length of Spanish study, Portuguese course, and equivalent Spanish performance.Footnote 17 Odds ratios were then computed for the remaining factors. OR are relative measures of how likely a factor is to affect the outcome in a particular direction (i.e., correct or incorrect responses) (Cummings, Reference Cummings2009). A higher OR is associated with a higher likelihood of more accurate performance for a given factor. OR can be interpreted similar to effect, with OR = 1.68, 3.47, and 6.71 approximately equivalent to Cohen’s d = 0.2 (small), 0.5 (medium), and 0.8 (large), respectively (Chen et al., Reference Chen, Cohen and Chen2010).

Results

Overall accuracy

Table 3 below presents the descriptive statistics of learners’ accuracy in Portuguese and Spanish in both tasks. Overall accuracy in Spanish was higher than in Portuguese in the comprehension task but similar in the production task (M = 68% in Portuguese, M = 62% in Spanish). In the comprehension task, we found stronger performance on the feminine over masculine trials (M = 81.5% and M = 65.5% respectively) and on plural over singular items (M = 77% and M = 70% respectively) in Portuguese. No difference was observed between the two genders and the two numbers in Spanish.

Table 3. Descriptive statistics for both tasks

In the production task, we found above-chance performance (Vainikka & Young-Scholten, Reference Vainikka and Young-Scholten1996). In Portuguese, participants performed better on feminine than on masculine items (respectively, M = 34% vs. 25.5%) and roughly the same on singular and plural items (respectively, M = 29.5% vs. 30%).

Finally, we look at the elements of the noun phrase that have been dropped and levels of agreement between determiners, nouns, and adjectives (Table 4). Several observations can be made here. First, while learners do not use the noun drop feature of L2 and L3 (see (2b) above), they are using this grammatical option more in L3 Portuguese than in L2 Spanish. Second, dropping of the determiner (as in (2d) above) and the adjective (as in (2c) above) shows a reverse pattern between L2 and L3 with more adjectives dropped in Portuguese but more determiners dropped in Spanish. Finally, when producing agreement, learners are correct mostly when matching the features between D and N in L3 but between N and A in L2.

Table 4. Dropped elements and surface agreement in the EPT task

We use the following abbreviations: Det = determiner, N = noun, Adj = adjective.

Indicators of accuracy in comprehension model

The GJT model considered when participants produced correct responses in the GJT task. Table 5 summarizes the results of the GJT model regression analyses. Based upon model selection, we determined that [gender], grammaticality, and Portuguese vocabulary test scores effected accuracy (see Figure 2a–c for a visualization of OR), while [number], SP study length, and equivalent SP performance were not associated with accuracy in the GJT. Feminine [gender] was associated with a higher accuracy (i.e., likelihood of agreement across NPs in the GJT), whereas Masculine [gender] was neither associated with higher nor lower accuracy (OR = 1.7, 95% CI [1.2, 2.4]). Grammatical questions were associated with higher accuracy, whereas ungrammatical questions were associated with lower accuracy (i.e., a higher likelihood of incorrect responses) (OR = 0.30, 95% CI [0.2, 0.4]). Unsurprisingly, increasing Portuguese vocabulary test scores were also associated with increased accuracy, with a score of ∼84% associated with primarily correct responses (OR = 130.1, 95% CI [6.0, 2820.3]).

Table 5. GJT model results. Results of binary logistic regression analyses with correct response as dependent variable

Figure 2. (a) GJT Results: [Gender]. Circle size is proportional to the responses it represents. (b) GJT Results: Grammaticality. Circle size is proportional to the responses it represents. (c) GJT Results: Portuguese vocabulary test scores. Circle size is proportional to the responses it represents.

Accuracy in production model

For the production task, we considered the production agreement as discussed and operationalized in (2) above. Table 6 summarizes the results of the EPT model regression analyses. Based upon model selection, we determined that [gender], higher scores on equivalent SP performance, and Portuguese vocabulary test scores effected accuracy (see Figure 3a–c for a visualization of OR), while [number] and SP study length were not associated with accuracy in the EPT. Similar to the results in GJT, Feminine [gender] was associated with higher accuracy, whereas Masculine [gender] was associated with neither higher nor lower accuracy (OR = 1.54, 95% CI [1.1, 2.1]). Higher scores on equivalent SP performance were also associated with higher scores in the EPT, with a score of ∼25% associated with primarily correct responses (OR = 1.55, 95% CI [1.4, 1.7]). Finally, increasing Portuguese vocabulary test scores were again associated with an increasing accuracy, with a score of ∼87.5% associated with primarily correct responses (OR = 92.3, 95% CI [5.9, 1446.5]). We also considered a model of full agreement reflected by a target-like agreement across the response NP (see Supplementary Materials for the summary of these results, including Table 7 the EPT model regression analyses and Figure 4a–c for visualization of OR).

Table 6. EPT model results. Results of backward stepwise binary logistic regression analyses with correct response as dependent variable

Figure 3. (a) ANY EPT results: [Gender]. Circle size is proportional to the responses it represents. Dashed lines represent 95% confidence intervals. (b) ANY EPT results: Equivalent Spanish performance. Circle size is proportional to the responses it represents. Dashed lines represent 95% confidence intervals. (c) ANY EPT results: Portuguese vocabulary test scores. Circle size is proportional to the responses it represents. Dashed lines represent 95% confidence intervals.

Discussion and implications from the study

Our primary goal was to test whether L3 learners of Portuguese would transfer featural knowledge of the structural [gender] and/or semantic [number] features from their L1 and/or L2, English and Spanish respectively. We operationalized transfer from L2 as equivalent performance between the two languages on the grammatical features of interest. We explicitly tested performance on these features in both their L2 Spanish and L3 Portuguese through a comprehension task (the GJT) and a production task (the EPT) and inferred complete acquisition of L1 language features. We claimed that the structural feature [gender] requires a one-step computation to spread from the noun to its modifiers while the semantic feature [number] requires a two-step computation, first as a mapping on the noun and then as a spread to the nominal modifiers. We predicted that because of the ease of computation the structural feature will be acquired first. A review of the results shows limited support for this prediction at the initial stages of L3A. The Odds Ratios we obtained for the predictive factors were in the range between 1.4 and 2 which is roughly equivalent to Cohen’s d of 0.2 indicating small effect size. We note that with our study design to achieve even moderate effects, we would need a much larger participant pool or a much larger number of trials. Our design was based on the premise expressed in many subsequent language acquisition research models that the acquisition of grammatical property in one language will be correlated with its acquisition in another. Thus, to assert acquisition of the features in L2 as well as in L3, we needed to test them in both languages using comparable tests. To avoid testing fatigue we made the decision to limit our trials which limited the power of our results. This calls for a cautious interpretation of the results, mostly as tendencies, until more studies are conducted collecting data from more languages as well as additional research with less ambitious design that can detect smaller effects and with a wider variety of L1-L2-L3 combinations.

Results showed that learners performed above chance overall and on grammatical items in the GJT with comparable results between L2 and L3 in terms of tendencies and percent correct answers (Table 3). In the EPT, participants performed above chance when considering any form of feature marking on the agreeing elements (Det-N-Adj, Det-N, D-Adj, N-Adj, or simply Adj). We interpret this as partial acquisition of the features at hand. These results suggest that participants comprehend and produce to some extent the grammatical features [gender] and [number] in both L2 and L3, but they also point to a discrepancy between the acquisition of the two features. Recall that the regression models for both the GJT and the EPT showed Portuguese vocabulary test scores and [gender], but not [number], as predictive factors of L3 acquisition (i.e., they had an interpretable effect on accuracy). While increasing lexical proficiency is expected to reflect in better performance in grammatical acquisition, the question remains why its co-factor for correct performance is only one of the grammatical features but not both.

We take our results to suggest that there is partial support for our prediction that the structural features will be acquired before the semantic features although the latter are evident in both L1 and L2. Overall results (Table 3) demonstrated that while participants’ comprehension of the Masc and Fem and Sg and Pl was different in their L3, it was indistinguishable in their L2. This may be due to the novelty of the third language (most of our participants were tested at the end of first semester/beginning of second semester of Portuguese), but we believe it is more likely to be due to unequal transfer from the second language, suggesting that not only we can have differential transfer of features but also of feature realizations (e.g., Masc vs Fem or Sg vs Pl). This interpretation of the results must be further investigated.

One issue necessary to address in this study was the difference we found between participants’ accuracy results in comprehension versus production. One possible theoretical assumption is that comprehension and production of the same grammatical elements are two sides of the same coin and that the two modes are intrinsically related to each other (Smolensky, Reference Smolensky1996). It is logical to assume then that once learners can comprehend a grammatical feature, they would be ready to start producing it. Thus, we should see a similarity in patterns between the two modes. Furthermore, with the features being introduced to learners in the L2, we should expect that they would show a progression along the lines of accurate Spanish comprehension, followed by either accurate Spanish production or Portuguese comprehension before we observe accurate Portuguese production. However, we did not observe such a pattern. While our results confirmed the expectation that in general comprehension accuracy will precede production accuracy, we did not observe uniform behavior when comparing the two languages. In comprehension, the accuracy in Spanish L2 was higher than in the Portuguese L3 while in production the features were delivered more accurately in L3 than in L2 (Figure 2a–b). Furthermore, participants’ accuracy on the individual features was not consistent across tasks with comprehension of Fem and Pl being higher than those of Masc and Sg but production of the latter being higher than production of the former.

One possibility for the observed results may lay in the interaction between the differential demands of the active mode of communication (comprehension versus production) and the relative saliency of the two features in each mode. Beyond the initial observation that production is generally more demanding than comprehension, there’s also the observation that between the four realizations of the features at hand (Masc, Fem, Sg, and Pl), the Fem and Pl are the marked, and thus more salient, ones for the language learner from English-speaking background. [Gender] is a novel feature for such a learner but within that feature, Masc is arguably the default and more frequently used [gender] in both L2 and L3, leaving Fem in a prominent position. Parallel to that, Pl is morphologically marked in all three languages and arguably more prominent than the unmarked Sg. When the demands of the comprehension task interact with the specific realizations of the two features, it is the more salient ones that trigger the most attention and thus yield the most correct responses (Table 3). On the other hand, the demands of the production task push the learners to take the shortest route in an attempt to produce the required structure. This shortest route also entails the most familiar of the features, the unmarked Sg and the default Masc in both L2 and L3 when we consider the EPT agreement results.

Another possibility is that asymmetric performance between comprehension and production should be expected. Some computational models of syntactic processing posit that brain regions associated with syntax support comprehension versus production differentially (Matchin & Hickok, Reference Matchin and Hickok2020). While data from healthy, young adults is often at ceiling in task accuracy (in native languages), data from aphasia patients paints a different picture. Production requires a far greater processing demand, even when studies control for working memory and/or phonological processing demands (Matchin et al., Reference Matchin, Basilakos, Stark, den Ouden, Fridriksson and Hickok2020).

We also need to note the differential performance between languages. Participants perform as expected in their L2, showing better performance in comprehension than production, while in L3 Portuguese they perform slightly better on production than comprehension, contra our predictions above, but retaining the feature saliency as predicted above. The retention of the predicted markedness of the features speaks to the idea that it is not the features but rather the modes that trigger the discrepancy. One possible explanation for our findings is the emphasis on production practice in instructed language acquisition. Knowledge gained from instructed/formal acquisition environments, at least at the early stages of acquisition, is explicit and declarative. Students are typically more concentrated on producing the correct language than on comprehending beyond picking up the gist of what they hear. Some models of second language acquisition suggest that at the initial stages of acquisition comprehension is purely semantic (e.g., VanPatten (Reference VanPatten1993)). Others go as far as suggesting that learners’ processing at any stage of development remains shallow and does not involve native-like syntactic derivations. Clearly, more work is needed to explore the interaction of acquisition environment and the mode in which learners demonstrate the most rapid acquisition.

Finally, we want to address the different theoretical views on transfer. For some researchers cross-language influence can only occur if a feature is fully acquired in the “carrier” language (Hawkins, Reference Hawkins2001; Hawkins & Chan, Reference Hawkins and Chan1997), while for others partial acquisition of a feature may be enough for it to transfer to the next language (Cenoz & Valencia, Reference Cenoz and Valencia1994; Lasagabaster, Reference Lasagabaster2000; Sanz, Reference Sanz2000; Jin, Reference Jin and Leung2009; Jaensch, Reference Jaensch and Leung2009). In addition, even subscribing to the latter view, we need to address the question of what is transfer versus what is conscious learning of grammatical features. While our experiment does not let us distinguish between the two, we envision a scenario in which a learner consciously matches what they learn about a feature in L3 to what they already know/have learned about the same feature in a previously acquired language. This will be a form of (conscious) transfer.

Of course, there could be other explanations of the obtained results.Footnote 18 The results could be due to ease of acquisition of the target features which arguably are simpler and more transparent to the learner from the earliest stages of acquisition compared to complex syntactic features such as long-distance extraction or semantic features such as definiteness/specificity. Nevertheless, a number of studies document difficulties with the acquisition of the [gender] and/or [number] features as either assignment on the noun or agreement within or beyond the noun phrase (e.g., Bruhn de Garavito, Reference Bruhn de Garavito, Liceras, Zobl and Goodluck2002; Dewaele & Veronique, Reference Dewaele and Veronique2001; Franceschina, Reference Franceschina2002; Hopp, Reference Hopp2013; Keating, Reference Keating2006; McCarthy, Reference McCarthy2008; Montrul et al., Reference Montrul, Fuente, Davidson and Foote2013; White et al., Reference White, Valenzuela, Kozlowska-MacGregor and Leung2004). The problems with these features dissipate as the learners’ proficiency increases but there are lingering issues with subcomponents of the features. In our study we asked participants to parse the agreement features on the adjectives and from there discern the features on the noun. This makes the task even more difficult for beginner learners. Although we do not exclude the possibility that the feature [gender] is so prominent that learners are paying special attention to it thus making it “easy” to acquire, this is probably not the case for the [number] feature in our study.

Another possible explanation of the results is that agreement violation is more easily detected in the paradigm. We agree that GJTs tap into the metalinguistic knowledge of the learners which is a more prominent source of interpretation for beginner learners like our participants. A different type of task might tap into learners’ subconscious/semi-automated knowledge better. We will leave this methodological exploration for future research. Another possible methodological issue that makes our results more difficult to interpret in relation to the question of transfer in L3 acquisition is that we are unable to detect small effects. To improve the power of its conclusions a future study will benefit from the use of more trials and/or more participants to identify the effects. Considering the complexity of L3 acquisition, detectable effects may be exclusively small.

A secondary research question considered whether proficiency in the L2 would affect performance in the L3. Jaensch (Reference Jaensch and Leung2009), along with (Cenoz & Valencia, Reference Cenoz and Valencia1994; Lasagabaster, Reference Lasagabaster2000; Sanz, Reference Sanz2000; Jin, Reference Jin and Leung2009) suggests that beneficial transfer from L2 to L3 can only be observed after a sufficient exposure to and achievement in L2. We suggested two possible interpretations of the hypothesis: as achievement, and as general exposure. In our study general L2 proficiency was operationalized through the length of time studying the L2, while knowledge of the features in the L2 was examined through the equivalent tasks in Spanish. Our findings are in line with those from several previous studies. The overall indicators for the features point to above-chance performance in both languages although we observed some discrepancies when tasks are examined across the languages (Table 3). In support of the “achievement” interpretation of the Threshold Hypothesis, participants’ equivalent Spanish performance predicted their L3 production in the EPT task. Specifically, participants who were more than 25% accurate on the Spanish EPT had a greater likelihood of partial agreement (i.e., higher accuracy) in the Portuguese EPT task. Overall, these results suggest that L2 proficiency mediated performance in the L3. In initial acquisition, achievement matters most for beneficial transfer, possibly in line with Rast (Reference Rast2010).

How the Threshold Hypothesis interacts with the models of L3 acquisition is a subject for further investigation. Our study did not aim to test the specific interactions between Threshold Hypothesis and wholesale versus pick-and-choose models. The “exposure” interpretation of the Threshold Hypothesis that requires general L2 proficiency or cumulative L2 exposure may be more coordinated with a wholesale transfer model. In other words, if the overall proficiency threshold is met, then the typologically similar language (the L2 in our study) will be the source of all transfer. If the threshold is not met, then presumably the L1 will be the source of all transfer. The competing “achievement” interpretation of the Threshold Hypothesis that requires minimal L2 acquisition of the target structures may be more coordinated with the à-la-carte transfer model. In other words, if the threshold is met, the relevant features will be transferred from the language(s) that can provide them.

Additionally, the timing of transfer (only at the initial stage vs. throughout the acquisition of L3) (Rothman, Reference Rothman2015), the effects of early vs. late bilingualism on transfer at the onset of L3A (Cenoz, Reference Cenoz2003), the quality and quantity of metalinguistic knowledge and their effect on transfer (Puig-Mayenco et al., Reference Puig-Mayenco, Rothman, González Alonso, Rothamn, Alonso and Puig-Mayenco2019), the influence of age of acquisition on the outcome of L3A (Flynn et al., Reference Flynn, Foley and Vinnitskaya2004), and the effect of L1 and L2 proficiency (taken as either overall proficiency or target structure proficiency) (Cenoz, Reference Cenoz2003; Williams & Hammarberg, Reference Williams and Hammarberg1998; Rothman, Reference Rothman2015; Puig-Mayenco & Rothman, Reference Puig-Mayenco and Rothman2020) are open questions. The current study does not address any of the aforementioned debates but rather assumes the unifying point that transfer of linguistic properties exists in any subsequent language acquisition.

Finally, in the present study we tested learners’ sensitivity to the features presented on the adjective and required them to use their knowledge of agreement to retrieve the correct noun. This presumed that they have not only learned and can use the relevant features on the adjective to recover the same on the correct noun but also that they have already acquired the features on the noun. Alarcón (Reference Alarcón2011) has shown that at least for L2 learners gender assignment on the noun leads to correct gender agreement throughout a noun phrase headed by an animate noun more so than one headed by an inanimate noun. Thus, future research should also test whether subsequent language learners have acquired the interpretable features on the noun, as well as whether this acquisition parallels the acquisition of the agreement features on the modifiers.

Conclusion

We set out to test two of the effects of feature type on the acquisition of gender and number agreement in L3. Overall, results weakly support an interpretation of transfer between languages, particularly for the structural [gender] feature which requires one-step computation. We also set out to test two interpretations of the Threshold Hypothesis: general exposure to the L2 and achievement in the L2. Overall results also weakly support an interpretation of threshold effects of L2 proficiency on L3 performance. A lower L2 achievement threshold was needed for transfer of individual features while a higher L2 exposure threshold was needed for generalized L3 performance. It appears that transfer may be mediated by a minimum proficiency threshold, showing support for the Threshold Hypothesis. Further investigation of the questions we raised should examine the differentiation of procedural knowledge and lexical representation of the features in L2 and L3.

Replication package

Anonymized data, coding scripts, model output, appendixes, and data coding and results for the strict production model can be found at https://osf.io/jthp5/files/osfstorage.

Acknowledgements

We are thankful to Jefferson De Carvalho Maya, Carlos Sanchez, and Travis Sago for their assistance in creating materials, running participants, and transcribing results, and Douglas Cahl for help with the statistical analyses. We are thankful to the audiences of the annual meeting of the Linguistic Society of America (2017) and Conference on Executive Function in Mind and Brain (2019) for useful comments. The authors received no financial support for the research, authorship, and/or publication of this article.

Competing interests

We have no conflict of interest to disclose.

Appendix A

Table A1. Grammaticality judgment task—Portuguese

Appendix B

Table B1. Elicited production task—Portuguese

Appendix C

Table C1. Grammaticality judgment task—Spanish

Appendix D

Table D1. Elicited production task—Spanish

Footnotes

1 Throughout this study we use the [gender] and [number] notations to signify the grammatical features. For the identity variable of our participants, we use ‘gender’ without the square brackets.

2 Although this issue is relevant to the present discussion, the current experimental setup cannot fully address it. We directly address the question of acquisition of interpretable vs. uninterpretable features in another project.

3 We are only investigating canonical markers on both the N and the A so we will not discuss the other options here.

4 While English utilizes [gender] in pronouns and also has cases where nouns are interpreted as gendered based on the discourse (see Slabakova, Reference Slabakova2009), this feature is not utilized on inanimate nouns in the language.

5 Note that this proposal aligns better with an a-la-carte type of L3 transfer as in Slabakova (Reference Slabakova2016) and Westergaard et al., (Reference Westergaard, Mitrofanova, Mykhaylyk and Rodina2017).

6 We later included the reported other foreign language experiences as a factor in our analyses. It showed no effect on the acquisition of the features at hand.

7 Deleting every 7th word, as in Slabakova (Reference Slabakova2000) and Tremblay & Garrison (Reference Tremblay, Garrison, Prior, Watanabe and Lee2010) produced a text with gaps falling overwhelmingly on function words (determiners, prepositions, pronouns, etc.).

8 A reviewer mentions the low accuracy of the native speaker controls. This is not uncommon as the cloze tests are demanding even for native speakers of the language. Slabakova (Reference Slabakova2000) reports NS accuracy with the strict coding of 70% on average (65% for the North American NS group and 75% for the British English group). Tremblay & Garrison (Reference Tremblay, Garrison, Prior, Watanabe and Lee2010), with the looser coding, report 89.3% accuracy for their most proficient group (graduate/faculty).

9 Only one person had results below 80% accuracy on the vocabulary test. We kept their data in the analyses given our assumption that the morphosyntactic agreement of [gender] and [number] are not acquired from the noun.

10 As a reviewer points out, it would be interesting to see whether there is a trade-off between accuracy and reaction times. We examined our reaction time data for both tasks: M = 7826 ms. and 14872 ms, respectively. In our opinion, this is an indication that reaction times are not a reliable measure of processing with learners at the beginning stages of language acquisition. We felt that to achieve more reliable results this relationship needs to be tested with more advanced learners of both the L2 and L3 and would leave this for future investigations.

11 The guiding principle in selection of items was familiarity by participants (from course content). We controlled for canonicity of words selected in both Spanish and Portuguese, but did not control for matching gender. Of the 74 nouns used, 57 had a matching [gender].

12 Audacity(R) software is copyright (c) 1999–2014 Audacity Team. [Web site: http://audacity.sourceforge.net/. It is free software distributed under the terms of the GNU General Public License.] The name Audacity(R) is a registered trademark of Dominic Mazzoni.

13 The different set-up for filler sentences is due to further research questions which are not related to the issue of transfer in L3A but rather investigate the processing of animate versus inanimate nouns and [gender] in canonical versus non-canonical nouns.

14 As with the GJT an additional goal of the experiment, not related to the present study, was to gauge L3 learners’ sensitivity to the [gender] feature in animate versus inanimate nouns. We will leave this investigation aside for now.

15 Given the requirements for agreement across the entire NP with marked [gender] and [number] on the determiner, another analysis of overall scores was possible—an absolute correct score. For clarity of results, we present this coding schema as well as corresponding analyses in Supplementary Materials. We note that these results, while distinct, are quite similar to the ‘attempt at correct production’ described herein.

16 In binary logistic mixed effects regression models, AIC, BIC, log-likelihood (LL), and pseudo-R2 values are reported, but only AIC or BIC should be interpreted for these models (Veall & Zimmermann, Reference Veall and Zimmermann1996; The Mathworks Inc, 2024b). In particular, R2 values are almost always low, since observed values are 0 or 1, and the resultant values would be in between these extremes (Hosmer et al., Reference Hosmer, Lemeshow and Sturdivant2013).

17 We examined whether the ‘item’ covariate improved the model in the same manner as factors of interest, using smaller AIC/BIC scores. It increased the variance in the models (i.e., decreased the model fit), so were not included in the stepwise binary logistic regressions.

18 We thank an anonymous reviewer for suggesting some of the improvements needed in future research.

References

Alarcón, I. V. (2011). Spanish gender agreement under complete and incomplete acquisition: early and late bilinguals’ linguistic behavior within the noun phrase. Bilingualism: Language and Cognition, 14(03), 332350.CrossRefGoogle Scholar
Alonso, J. G., Rothman, J., Berndt, D., Castro, T., & Westergaard, M. (2017). Broad scope and narrow focus: on the contemporary linguistic and psycholinguistic study of third language acquisition. International Journal of Bilingualism, 21(6), 639650.CrossRefGoogle Scholar
Amaro, J, Flynn, S., & Rothman, J. (2012). Third language (L3) acquisition in adulthood. In Cabrelli Amaro, J., Flynn, S., & Rothman, J. (Eds.), Third Language Acquisition in Adulthood (pp. 16). Amsterdam: John Benjamins.CrossRefGoogle Scholar
Bardel, C., & Falk, Y. (2007). The role of the second language in third language acquisition: the case of Germanic syntax. Second Language Research, 23(4), 459484.CrossRefGoogle Scholar
Bardel, C., & Falk, Y. (2021). L1, L2 and L3: same or different? Second Language Research, 37(3), 459464.CrossRefGoogle Scholar
Bernstein, J. B. (1997). Demonstratives and reinforcers in Romance and Germanic languages. Lingua, 102(2–3), 87113.CrossRefGoogle Scholar
Bruhn de Garavito, J. (2002). Learners’ competence may be more accurate than we think: Spanish L2 and agreement morphology. In Liceras, J., Zobl, H., & Goodluck, H. (Eds.), Proceedings of the 6th Generative Approaches to Second Language Acquisition Conference (GASLA 2002) (pp. 1723). Somerville, MA: Cascadilla Proceedings Project.Google Scholar
Cabrelli, J., Pichan, C., Ward, J., Rothman, J., & Serratrice, L. (2023). Factors that moderate global similarity in initial L3 transfer: intervocalic voiced stops in heritage Spanish/English bilinguals’ L3 Italian. Linguistic Approaches to Bilingualism, 13(5), 638662.CrossRefGoogle Scholar
Carminati, M. N. (2005). Processing reflexes of the feature hierarchy (person > number > gender) and implications for linguistic theory. Lingua, 115(3), 259285.CrossRefGoogle Scholar
Carstens, V. (2000). Concord in minimalist theory. Linguistic Inquiry, 31(2), 319355.CrossRefGoogle Scholar
Carstens, V. (2001). Multiple agreement and case deletion: against φ-incompleteness. Syntax, 4(3), 147163.CrossRefGoogle Scholar
Cenoz, J. (2003). The additive effect of bilingualism on third language acquisition: a review. International Journal of Bilingualism, 7(1), 7187.CrossRefGoogle Scholar
Cenoz, J., & Valencia, J. F. (1994). Additive trilingualism: evidence from the Basque country. Applied Psycholinguistics, 15(2), 195207.CrossRefGoogle Scholar
Chen, H., Cohen, P., & Chen, S. (2010). How big is a big odds ratio? Interpreting the magnitudes of odds ratios in epidemiological studies. Communications in Statistics—Simulation and Computation®, 39(4), 860864.CrossRefGoogle Scholar
Chomsky, N. (1995). THE MINIMALIST PROGRAM. Massachusetts Institute of Technology.Google Scholar
Chomsky, N. (2001). Derivation by phase. In Kenstowicz, M. (Ed.), Ken Hale: A Life in Language (Vol. 36, pp. 152). MIT Press.Google Scholar
Cummings, P. (2009). The relative merits of risk ratios and odds ratios. Archives of Pediatrics & Adolescent Medicine, 163(5), 438445.CrossRefGoogle ScholarPubMed
Cummins, J. (1976). The influence of bilingualism on cognitive growth: a synthesis of research findings and explanatory hypotheses. Working Papers on Bilingualism, 9, 143.Google Scholar
Dewaele, J.-M., & Veronique, D. (2001). Gender assignment and gender agreement in advanced French interlanguage: a cross-sectional study. Bilingualism: Language and Cognition, 4(3), 275297.CrossRefGoogle Scholar
Ecke, P., & Hall, C. J. (2021). The parasitic Model: lexical acquisition and its impact on morphosyntactic transfer. Linguistic Approaches to Bilingualism, 11(1), 4549.CrossRefGoogle Scholar
Falk, Y., & Bardel, C. (2011). Object pronouns in German L3 syntax: evidence for the L2 status factor. Second Language Research, 27(1), 5982.CrossRefGoogle Scholar
Fedus, W., Goodfellow, I., & Dai, A. M. (2018). MaskGAN: Better Text Generation via Filling in the______ (No. arXiv:1801.07736). arXiv. https://doi.org/10.48550/arXiv.1801.07736 CrossRefGoogle Scholar
Flynn, S., Foley, C., & Vinnitskaya, I. (2004). The cumulative-enhancement model for language acquisition: comparing adults’ and children’s patterns of development in first, second and third language acquisition of relative clauses. International Journal of Multilingualism, 1(1), 316.CrossRefGoogle Scholar
Forcada, M. L., Scarton, C., Specia, L., Haddow, B., & Birch, A. (2018). Exploring Gap Filling as a Cheaper Alternative to Reading Comprehension Questionnaires when Evaluating Machine Translation for Gisting (No. arXiv:1809.00315). arXiv. https://doi.org/10.48550/arXiv.1809.00315 CrossRefGoogle Scholar
Franceschina, F. (2002). Case and (phi)-feature agreement in advanced L2 Spanish grammars. EUROSLA Yearbook, 2, 7186.CrossRefGoogle Scholar
Fuchs, Z., Polinsky, M., & Scontras, G. (2015). The differential representation of number and gender in Spanish. The Linguistic Review, 32(4), 703737.CrossRefGoogle Scholar
Hawkins, R. (2001). The theoretical significance of Universal Grammar in second language acquisition. Second Language Research, 17(4), 345367.CrossRefGoogle Scholar
Hawkins, R., & Chan, C. Y.-H. (1997). The partial availability of universal grammar in second language acquisition: the failed functional features hypothesis. Second Language Research, 13(3), 187226.CrossRefGoogle Scholar
Hermas, A. (2015). The categorization of the relative complementizer phrase in third-language English: a feature re-assembly account. International Journal of Bilingualism, 19(5), 587607.CrossRefGoogle Scholar
Hopp, H. (2013). The development of L2 morphology. Second Language Research, 29(1), 36.CrossRefGoogle Scholar
Hosmer, D. W. Jr, Lemeshow, S., & Sturdivant, R. X. (2013). Applied Logistic Regression, 3 rd Ed. Hoboken, NJ: John Wiley & Sons.CrossRefGoogle Scholar
Jaensch, C. (2009). L3 enhanced feature sensitivity as a result of higher proficiency in the L2. In Leung, Y.-K. I. (Ed.), Third Language Acquisition and Universal Grammar (pp. 115143). Bristol, UK: Multilingual Matters.Google Scholar
Jin, F. (2009). Third language acquisition of Norwegian objects: interlanguage transfer or L1 influence?. In Leung, Y.-K. I. (Ed.), Third Language Acquisition and Universal Grammar (pp. 144161). Bristol, UK: Multilingual Matters.Google Scholar
Keating, G. D. (2006). Processing Gender Agreement across Phrases in Spanish: Eye Movements during Sentence Comprehension. Illinois, Chicago: University of Illinois.Google Scholar
Lasagabaster, D. (2000). The effects of three bilingual education models on linguistic creativity. International Review of Applied Linguistics in Language Teaching, 38, 213238.CrossRefGoogle Scholar
Leung, Y.-K. I. (2005). L2 vs. L3 initial state: a comparative study of the acquisition of French DPs by Vietnamese monolinguals and Cantonese–English bilinguals. Bilingualism: Language and Cognition, 8(1), 3961.CrossRefGoogle Scholar
Leung, Y.-K. I. (2006). Full transfer vs. partial transfer in L2 and L3 acquisition. In Slabakova, R., Montrul, S., & Prévost, P. (Eds.), Inquiries in Linguistics Development in Honor of Lydia White (pp. 157188). Amsterdam: John Benjamins.CrossRefGoogle Scholar
Matchin, W., Basilakos, A., Stark, B. C., den Ouden, D. B., Fridriksson, J., & Hickok, G. (2020). Agrammatism and paragrammatism: a cortical double dissociation revealed by lesion-symptom mapping. Neurobiology of Language, 1(2), 208225.CrossRefGoogle ScholarPubMed
Matchin, W., & Hickok, G. (2020). The cortical organization of syntax. Cerebral Cortex, 30(3), 14811498.CrossRefGoogle ScholarPubMed
McCarthy, C. (2008). Morphological variability in the comprehension of agreement: an argument for representation over computation. Second Language Research, 24(4), 459486.CrossRefGoogle Scholar
Montrul, S., Fuente, I. de la, Davidson, J., & Foote, R. (2013). The role of experience in the acquisition and production of diminutives and gender in Spanish: Evidence from L2 learners and heritage speakers. Second Language Research, 29(1), 87118.CrossRefGoogle Scholar
Prévost, P., & White, L. (2000). Missing surface inflection or impairment in second language acquisition? Evidence from tense and agreement. Second Language Research, 16(2), 103133.CrossRefGoogle Scholar
Puig-Mayenco, E., & Rothman, J. (2020). Low proficiency does not mean ab initio: a methodological footnote for linguistic transfer studies. Language Acquisition, 27(2), 217226.CrossRefGoogle Scholar
Puig-Mayenco, E., Rothman, J., & González Alonso, J. (2019). Setting the context. In Rothamn, J., Alonso, G., & Puig-Mayenco, E. (Eds.), Third Language Acquisition and Linguistic Transfer (pp. 144). Cambridge: Cambridge University Press.Google Scholar
Rast, R. (2010). The use of prior linguistic knowledge in the early stages of L3 acquisition. International Review of Applied Linguistics in Language Teaching, 48(2–3), 159183.CrossRefGoogle Scholar
Rothman, J. (2010). On the typological economy of syntactic transfer: word order and relative clause attachment preference in L3 Brazilian Portuguese. International Review of Applied Linguistics in Language Teaching, 48(2–3), 245273.CrossRefGoogle Scholar
Rothman, J. (2011). L3 syntactic transfer selectivity and typological determinacy: The typological primacy model. Second Language Research, 27(1), 107127.CrossRefGoogle Scholar
Rothman, J. (2015). Linguistic and cognitive motivations for the typological primacy model (TPM) of third language (L3) transfer: timing of acquisition and proficiency considered. Bilingualism: Language and Cognition, 18(2), 179190.CrossRefGoogle Scholar
Rothman, J., & Cabrelli Amaro, J. (2010). What variables condition syntactic transfer? A look at the L3 initial state. Second Language Research, 26(2), 189218.CrossRefGoogle Scholar
Rothman, J., González Alonso, J., & Puig-Mayenco, E. (2019). Third Language Acquisition and Linguistic Transfer. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Sánchez, L., & Bardel, C. (2017). Transfer from L2 in third language learning: a study on L2 proficiency. In Angelovska, T., & Hahn, A. (Eds.), L3 Syntactic Transfer: Models, New Developments and Implications (pp. 223250). Amsterdam: John Benjamins Publishing Company.CrossRefGoogle Scholar
Sanz, C. (2000). Bilingual education enhances third language acquisition: evidence from Catalonia. Applied Psycholinguistics, 21(1), 2344.CrossRefGoogle Scholar
Sheather, S. (2009). A Modern Approach to Regression with R. Berlin: Springer Science & Business Media.CrossRefGoogle Scholar
Slabakova, R. (2000). L1 transfer revisited: the L2 acquisition of telicity marking in English by Spanish and Bulgarian native speakers. Linguistics, 38(4), 739770.CrossRefGoogle Scholar
Slabakova, R. (2009). L2 fundamentals. Studies in Second Language Acquisition, 31, 155173.CrossRefGoogle Scholar
Slabakova, R. (2012). L3/Ln acquisition. In Cabrelli, J., Flynn, S., & Rothman, J. (Eds.), Third Language Acquisition in Adulthood (pp. 115139). Amsterdam: John Benjamins Publishing Co.CrossRefGoogle Scholar
Slabakova, R. (2016). The scalpel model of third language acquisition. International Journal of Bilingualism, 21(6), 651665.CrossRefGoogle Scholar
Slabakova, R. (2021). Does full transfer endure in L3A? Linguistic Approaches to Bilingualism, 11(1), 96102.CrossRefGoogle Scholar
Smolensky, P. (1996). On the comprehension/production dilemma in child language. Linguistic Inquiry, 27(4), 720731.Google Scholar
The MathWorks Inc. (2024a). MATLAB Version: 9.5 (R2018b). Natick, Massachusetts: The MathWorks Inc.Google Scholar
The MathWorks Inc. (2024b). Fit generalized Linear Mixed-Effects Model Documentation. Natick, Massachusetts: The MathWorks Inc.Google Scholar
Tomaschek, F., Hendrix, P., & Baayen, R. H. (2018). Strategies for addressing collinearity in multivariate linguistic data. Journal of Phonetics, 71, 249267.CrossRefGoogle Scholar
Tremblay, A. (2011). Proficiency assessment standards in second language acquisition research: “clozing” the gap. Studies in Second Language Acquisition, 33(3), 339372.CrossRefGoogle Scholar
Tremblay, A., & Garrison, M. (2010). Cloze tests: a tool for proficiency assessment in research on L2 French. In Prior, M., Watanabe, Y., & Lee, S.-K. (Eds.), Selected Proceedings of the 2008 Second Language Research Forum (pp. 7388). Somerville, MA: Cascadilla Proceedings Project.Google Scholar
Vainikka, A., & Young-Scholten, M. (1996). Gradual development of L2 phrase structure. Second Language Research, 12(7), 739.CrossRefGoogle Scholar
Van de Craats, I., Corver, N., & Van Hout, R. (2000). Conservation of grammatical knowledge: on the acquisition of possessive noun phrases by Turkish and moroccan learners of Dutch. Linguistics, 38(2), 221314.CrossRefGoogle Scholar
VanPatten, B. (1993). Grammar teaching for the acquisition-rich classroom. Foreign Language Annals, 26(4), 435450.CrossRefGoogle Scholar
Veall, M. R., & Zimmermann, K. F. (1996). Pseudo-R2 measures for some common limited dependent variable models. Journal of Economic Surveys, 10(3), 241259.CrossRefGoogle Scholar
Vrieze, S. I. (2012). Model selection and psychological theory: a discussion of the differences between the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC). Psychological Methods, 17(2), 228.CrossRefGoogle ScholarPubMed
Westergaard, M. (2021). The plausibility of wholesale vs. property-by-property transfer in L3 acquisition. Linguistic Approaches to Bilingualism, 11(1), 103108.CrossRefGoogle Scholar
Westergaard, M., Mitrofanova, N., Mykhaylyk, R., & Rodina, Y. (2017). Crosslinguistic influence in the acquisition of a third language: the linguistic proximity model. International Journal of Bilingualism, 21(6), 666682.CrossRefGoogle Scholar
White, L., Valenzuela, E., Kozlowska-MacGregor, M., & Leung, Y. I. (2004). Gender and number agreement in Nonnative Spanish. Applied Psycholinguistics, 25(1), 105133.CrossRefGoogle Scholar
Williams, S., & Hammarberg, B. (1998). Language switches in L3 production: implications for a polyglot speaking model. Applied Linguistics, 19(3), 295333.CrossRefGoogle Scholar
Figure 0

Table 1. [Gender] and [Number] features in English, Spanish, and Portuguese

Figure 1

Table 2. Participants’ backgrounds, Spanish cloze test, and Portuguese vocabulary test results

Figure 2

Figure 1. Example of EPT target and filler trials. English translations were not presented on the screen.

Figure 3

Table 3. Descriptive statistics for both tasks

Figure 4

Table 4. Dropped elements and surface agreement in the EPT task

Figure 5

Table 5. GJT model results. Results of binary logistic regression analyses with correct response as dependent variable

Figure 6

Figure 2. (a) GJT Results: [Gender]. Circle size is proportional to the responses it represents. (b) GJT Results: Grammaticality. Circle size is proportional to the responses it represents. (c) GJT Results: Portuguese vocabulary test scores. Circle size is proportional to the responses it represents.

Figure 7

Table 6. EPT model results. Results of backward stepwise binary logistic regression analyses with correct response as dependent variable

Figure 8

Figure 3. (a) ANY EPT results: [Gender]. Circle size is proportional to the responses it represents. Dashed lines represent 95% confidence intervals. (b) ANY EPT results: Equivalent Spanish performance. Circle size is proportional to the responses it represents. Dashed lines represent 95% confidence intervals. (c) ANY EPT results: Portuguese vocabulary test scores. Circle size is proportional to the responses it represents. Dashed lines represent 95% confidence intervals.

Figure 9

Table A1. Grammaticality judgment task—Portuguese

Figure 10

Table B1. Elicited production task—Portuguese

Figure 11

Table C1. Grammaticality judgment task—Spanish

Figure 12

Table D1. Elicited production task—Spanish