Individual differences in L1 and L2 anaphora resolution: effects of implicit prosodic cues and working memory

Andromachi Tsoukala; Margreet Vogelzang; Ianthi Maria Tsimpli

doi:10.1017/S0142716424000316

Individual differences in L1 and L2 anaphora resolution: effects of implicit prosodic cues and working memory

Published online by Cambridge University Press: 08 October 2024

and

Andromachi Tsoukala*: Affiliation:
Department of Theoretical and Applied Linguistics, University of Cambridge, Cambridge, UK
Margreet Vogelzang: Affiliation:
Department of Theoretical and Applied Linguistics, University of Cambridge, Cambridge, UK
Ianthi Maria Tsimpli: Affiliation:
Department of Theoretical and Applied Linguistics, University of Cambridge, Cambridge, UK
*: Corresponding author: Andromachi Tsoukala; Email: [email protected]

Article contents

Abstract
Individual differences in L1 speakers
L2 anaphora resolution
Individual differences in L2 speakers
Interim summary
The current studies
Study 1: L1 speakers of English
Results
Results
Study 1 & 2: Comparison between L1 and L2 group
General discussion
Limitations
Conclusion
Replication package
Funding statement
Competing interests
Footnotes
References

Rights & Permissions

Abstract

The present experimental studies shed light on effects of implicit prosodic cues on anaphora resolution as well as on how these differ both within and between L1 and L2 speaker groups. In two self-paced reading studies, L1 and L2 participants read poem-like texts that contained anaphoric ambiguity. These stimuli were designed to include a rhyming scheme and meter that were either regular or disrupted. We expected a rhyme cue on a nonsubject pronoun antecedent (in the regularly metered and rhyming version of the texts) to induce competition effects in L1 speakers and cause them to adapt their interpretative preferences and processing strategies; yet, for L2 speakers we hypothesized that effects would either not be observed or that they would be attenuated. Additionally, we examined whether comprehender-dependent factors would modulate effects in each group. We tested both L1 and L2 participants on memory-related tasks. We also measured L1 speakers’ print exposure and L2ers’ proficiency in English. Results revealed L1–L2 dissimilarities in interpretative preferences and reading behavior, as L2 speakers were not equally sensitive to the prosodic cues introduced. The examination of memory-related measures provided evidence of within-group differences and between-group parallels: higher working memory in both groups modulated anaphora resolution, although for L2 speakers there was no additional influence of context.

Keywords

anaphora implicit prosody individual differences L1 and L2 processing working memory

Type: Original Article
Information: Applied Psycholinguistics , Volume 45 , Issue 5 , September 2024 , pp. 834 - 872

DOI: https://doi.org/10.1017/S0142716424000316 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright: © The Author(s), 2024. Published by Cambridge University Press

Previous research has demonstrated that there are certain syntactic, lexical, and information structure constraints that determine reference interpretation in a systematic way across different languages (de la Fuente et al., Reference de la Fuente, Hemforth, Colonna, Schimke, Holler and Suckow2016). By focusing on the domain of anaphora resolution in a nonnull subject language such as English, we can observe some of these constraints in examples (1), (2), and (3).

In each one of these examples, the pronominalized form is globally ambiguous; that is, coindexation of the pronoun “he” or “she” to either one of the two preceding referents is licensed by the grammar. Yet, native speakers rely on “soft” constraints and heuristics (de la Fuente et al., Reference de la Fuente, Hemforth, Colonna, Schimke, Holler and Suckow2016) to arrive at an optimal interpretation (de Hoop, Reference de Hoop2013) which is indicated by the subscript notation in the examples above. In (1), it is syntactic role prominence (subjecthood) and/or structural parallelism that determines coreference (Arnold, Reference Arnold1998, Reference Arnold2010; Arnold et al., Reference Arnold, Strangmann, Hwang, Zerkle and Nappa2018; Chambers & Smyth, Reference Chambers and Smyth1998; Smyth, Reference Smyth1994); in (2), the implicit causality of interpersonal verbs regulates mappings between imputed cause-referents and pronouns such that in (2a) the verb “impress” is subject-biasing but in (2b) the verb “hit” is object-biasingFootnote ¹ (Au, Reference Au1986; Garnham et al., Reference Garnham, Traxler, Oakhill and Gernsbacher1996; Hartshorne, Reference Hartshorne2014; Hartshorne & Snedeker, Reference Hartshorne and Snedeker2013; Koornneef & Van Berkum, Reference Koornneef and Van Berkum2006; Stewart et al., Reference Stewart, Pickering and Sanford2000); in (3) it is contrastive prosody that affects reference interpretation such that the accentuated antecedent (when presented audially; capitalized here for illustrative purposes) becomes more likely for remention (Schafer et al., Reference Schafer, Takeda, Rohde and Grüter2015). As can be seen in the examples above, what makes an antecedent salient for pronoun coreference is context-dependent (Kaiser & Trueswell, Reference Kaiser and Trueswell2008).

Importantly, some types of information are key determinants of anaphora resolution, whereas other cues may be less influential. Studies that have examined how prosodic prominence influences coreference preferences have often reported weak effects of prosodic manipulations (see Schafer et al., Reference Schafer, Camp, Rohde, Grüter, Carlson, Clifton and Fodor2019 for discussion). The results reported in Schafer et al. (Reference Schafer, Camp, Rohde, Grüter, Carlson, Clifton and Fodor2019) further highlight that prosodic cues cannot override default lexical biases. In their completion study, it was observed that the use of transfer-of-possession verbs in sentence fragments such as “Sue_Source threw Laura_Goal” led participants to produce more Goal continuations overall, regardless of whether the Source or the Goal carried a prominent pitch accent. Similarly, syntactic function biases may also determine coreference decisions over other cues. In Kaiser (Reference Kaiser2011), the default subject preference (a “general tendency” as noted in Arnold et al., Reference Arnold, Strangmann, Hwang, Zerkle and Nappa2018) was equally strong regardless of whether the subject or object was contrastively focused. Hence, in contexts where competing cues are introduced, default structural or lexical biases seem to determine pronoun interpretation more so than prosodic manipulations.

Beyond these comprehender-independent factors that guide and constrain pronoun interpretation to a large extent, there is also evidence of individual variability. In the discussion that follows, the role of comprehender-dependent variables in L1 anaphora resolution is addressed.

Individual differences in L1 speakers

In native speakers, a key variable that has been shown to modulate the strength of interpretative preferences is linguistic experience; in Arnold et al. (Reference Arnold, Strangmann, Hwang, Zerkle and Nappa2018) as well as in Langlois and Arnold (Reference Langlois and Arnold2020), it was suggested that print exposure, as a measurable proxy for linguistic experience, provides a rich pool of data that comprehenders can draw on to form predictions about referent remention. One of the hypotheses entertained by the authors was that greater reading experience leads to stronger representations of established interpretative patterns (e.g., subject bias). The findings of these studies provided support for this hypothesis. For instance, in the second experiment reported in Langlois and Arnold (Reference Langlois and Arnold2020), a subject bias was observed in anaphora resolution, as indexed though responses to offline comprehension questions, but the extent of this subject preference was greater in participants who scored higher on a measure of print exposure, namely the Author Recognition Task (ART). As such, there is evidence that native speakers vary in how systematically they follow the subject bias depending on their exposure to print.

Moreover, working memory (WM) is a correlate of print exposure (e.g., Farmer et al., Reference Farmer, Fine, Misyak and Christiansen2017) which also contributes to individual differences in the resolution of various types of referential ambiguities (Arslan et al., Reference Arslan, Palasis and Meunier2020; Cunnings & Felser, Reference Cunnings and Felser2013; Nieuwland & Van Berkum, Reference Nieuwland and Van Berkum2006; Payne et al., Reference Payne, Grison, Gao, Christianson, Morrow and Stine-Morrow2014; Swets et al., Reference Swets, Desmet, Hambrick and Ferreira2007; Van Rij et al., Reference Van Rij, Van Rijn and Hendriks2013; Vogelzang et al., Reference Vogelzang, Guasti, van Rijn and Hendriks2021). Yet, unlike print exposure, WM does not modulate the strength of default interpretative biases (Arnold et al., Reference Arnold, Strangmann, Hwang, Zerkle and Nappa2018). Instead, better WM has been shown to allow for greater sensitivity to formally ambiguous pronouns (for an examination of the role of WM assessed through Reading Span tasks see, e.g., Nieuwland & Van Berkum, Reference Nieuwland and Van Berkum2006 and Swets et al., Reference Swets, Desmet, Hambrick and Ferreira2007; for assessment through Backward Digit Span tasks see Fotiadou et al., Reference Fotiadou, Muñoz and Tsimpli2020).

Of particular interest for present purposes are the results reported in Nieuwland and Van Berkum (Reference Nieuwland and Van Berkum2006). They conducted an event-related potential (ERP) study in which they examined brain responses to referentially ambiguous pronouns. They found that especially when an anaphoric pronoun could refer to either one of two antecedents (no strong coreference bias), a sustained negative-going ERP component was observed, which was taken to indicate the processing consequences of momentarily considering two potential referents for a single pronoun. Importantly, after splitting participants into “high span” and “low span” groups based on their performance on a Reading Span task, they found that the aforementioned ERP effect was only present in high span readers and absent in the low span group. These results were argued to indicate that higher WM leads to heightened sensitivity to alternative interpretations conveyed by referentially ambiguous pronouns. In parallel, low span readers may be less sensitive to ambiguity and more likely to take on the first referential commitment that comes to mind without activating alternative referential interpretations. These findings are consistent with the more general idea that greater WM allows comprehenders to construct rich representations of the preceding context and keep track of multiple discourse entities when resolving various types of ambiguities (e.g., Just & Carpenter, Reference Just and Carpenter1992).

L2 anaphora resolution

A common approach to studying L2 anaphora resolution has been to make comparisons between L1 and L2 speakers in order to test whether the latter group behaves in a “native-like” way. Following this approach, several studies have tested learners of null subject languages and have often reported increased optionality and unstable biases that diverge from L1 norms, even at advanced levels of L2 proficiency (e.g., for L2 Italian, see Sorace & Filiaci, Reference Sorace and Filiaci2006; for L2 Spanish, see Keating et al., Reference Keating, VanPatten and Jegerski2011). As for research examining how speakers of a null subject L1 (e.g., Spanish, Greek, and Turkish) resolve pronoun ambiguities in a nonnull subject L2 (e.g., English, French, and Dutch), the evidence is mixed. Below, we review empirical studies from this second strand of research, and then we relate their findings to broader theoretical accounts of L2 processing.

In two previous studies, Contemori and colleagues used either offline comprehension measures (Contemori et al., Reference Contemori, Asiri and Irigoyen2019) or online eye fixation data in a visual world paradigm task (Contemori & Dussias, Reference Contemori and Dussias2020) to examine how L1 Spanish speakers resolved anaphoric ambiguities in L2 English. In the former case, L2 comprehenders interpreted pronominal reference in a native-like way overall; however, they exhibited less robust antecedent preferences compared to L1 speakers when two referents with similar salience were presented in the preceding context (see experiment 4 in Contemori et al., Reference Contemori, Asiri and Irigoyen2019). In the latter case, Contemori and Dussias (Reference Contemori and Dussias2020) found no significant delays for L2ers compared to L1 speakers in terms of the time course of pronoun resolution. Adding to the above, Cunnings et al. (Reference Cunnings, Fotiadou and Tsimpli2017) combined both online and offline measures in a visual world eye-tracking study that examined how L1 Greek speakers processed and interpreted anaphoric ambiguities in L2 English. The results revealed that the L2 speakers followed the processing and interpretative patterns observed in L1 comprehenders, although L2ers exhibited a more pronounced subject preference and delays in pronoun resolution. The authors argued that because these differences concerned only the magnitude or timing of effects, the findings do not point toward fundamental L1–L2 dissimilarities; instead, they indicate that anaphora resolution can be qualitatively similar overall in native and non-native speakers.

However, other studies have reported contradictory results which suggest L1–L2 divergence. Using offline measures, Schimke and Colonna (Reference Schimke and Colonna2016) found that when L1 Turkish learners of French interpreted overt pronouns in the L2, they overrelied on semantic and discourse-level cues, unlike French native speakers (see their fifth experiment). Moreover, Roberts et al. (Reference Roberts, Gullberg and Indefrey2008) tested anaphora resolution in L2 Dutch using both offline tasks and online data from an eye-tracking during reading experiment. With regard to the results of the eye-tracking experiment, it was observed that the two groups of L2 participants in their study (one of which consisted of L1 speakers of Turkish) exhibited processing costs during pronoun resolution in cases where Dutch native speakers experienced facilitation. More specifically, L2 speakers exhibited effortful processing (longer reading times) when syntactic information needed to be integrated with the appropriate discourse representation for pronoun resolution.

On the one hand, the results reported by Contemori and colleagues (Reference Contemori, Asiri and Irigoyen2019, Reference Contemori and Dussias2020) and Cunnings et al. (Reference Cunnings, Fotiadou and Tsimpli2017) provide support to the claim that L1 and L2 processing are qualitatively similar, as is posited by resource-deficit accounts (Hopp, Reference Hopp2006; McDonald, Reference McDonald2006) and cue-based models (Cunnings, Reference Cunnings2017). The former attribute any L1/L2 disparities observed during parsing to cognitive capacity-based limitations: since engaging the non-dominant language is assumed to be more cognitively demanding, this limits the resources available when parsing in the L2. Similarly, the latter posit that any differences are best explained in terms of a difficulty, namely susceptibility to interference, during memory retrieval operations in L2 processing.

On the other hand, the results reported in Roberts et al. (Reference Roberts, Gullberg and Indefrey2008) and Schimke and Colonna (Reference Schimke and Colonna2016) are more in line with accounts that argue for L1–L2 qualitative dissimilarities; such accounts focus on the types of information that may or may not be accessible in the L2. The Interface Hypothesis (Sorace & Filiaci, Reference Sorace and Filiaci2006) argues that L2 formal features are acquirable, but that L2 learners have trouble integrating information across domains. A particular difficulty has been identified with syntax–discourse interface phenomena, such as anaphora, which can lead to non-native-like interpretation of pronominal reference. Moreover, according to the Shallow Structure Hypothesis (Clahsen & Felser, Reference Clahsen and Felser2006), processing in the L2 compared to the L1 is characterized by representational rather than processing dissimilarities. This is because L2 speakers do not build native-like structural representations; instead, they strategically focus on lexical, pragmatic, and other surface-level cues to guide comprehension.

However, it should be stressed that more recent instantiations of the Interface Hypothesis and the Shallow Structure Hypothesis have looked beyond a purely representational account of L1–L2 dissimilarities. For instance, Sorace (Reference Sorace2011) has considered that L2ers may use less efficient processing strategies in real time compared to L1 speakers, leading to the observed differences. Additionally, Clahsen and Felser (Reference Clahsen and Felser2018) have clarified that processing reflexes may simply be delayed in L2 compared to L1 processing.

Individual differences in L2 speakers

In the wider L2 literature, it has commonly been suggested that individual differences in factors such as L2 proficiency and WM can affect the extent to which L2 processing systems behave in a native-like way (e.g., see Hopp, Reference Hopp2022 for a recent review). Similarly, in the domain of anaphora resolution, Bel et al. (Reference Bel, Sagarra, Comínguez and García-Alcaraz2016) reported that irrespective of L1 background, once participants reach an advanced level of L2 proficiency they are more likely to converge with native speakers in pronoun interpretation. As for WM, its role during the processing of anaphoric ambiguities in the L2 has been implicated (e.g., Bel et al., Reference Bel, Sagarra, Comínguez and García-Alcaraz2016; Cunnings, Reference Cunnings2017) but not formally tested. Some relevant evidence was reported in Nowbakht (Reference Nowbakht2019) who found that higher WM in L2ers was associated with faster processing and better comprehension of sentences containing unambiguous anaphoric pronouns. Beyond this, not much more can be said regarding the contribution that WM can make in the resolution of anaphoric ambiguities in the L2 (see also Juffs & Harrington, Reference Juffs and Harrington2011 for discussion of mixed evidence regarding WM in L2 ambiguity research). Furthermore, it currently remains unknown whether the WM effects on anaphora resolution that have been reported in the L1 literature (e.g., higher sensitivity to pronoun ambiguity as in Nieuwland and Van Berkum, Reference Nieuwland and Van Berkum2006) are replicated in L2 groups.

Before ending this section, we briefly note that although higher L2 proficiency and WM can lead to more native-like behavior, this is not a deterministic outcome. For instance, in a study by Perdomo and Kaan (Reference Perdomo and Kaan2021) which examined effects of prosodic manipulations on referential processing, neither WM nor proficiency measures modulated L2ers’ performance (see also Kaan & Grüter, Reference Kaan, Grüter, Kaan and Grüter2021 for discussion of further related evidence).

Interim summary

All the evidence discussed so far can be summarized in the following key points. First, when pitted against each other, default structural biases (e.g., subjecthood) determine anaphora resolution more so than other contextualized information, such as prosodic cues. Second, beyond these fixed biases, responses to pronoun ambiguity and interpretative preferences may vary within native speaker groups due to individual differences in print exposure and WM. Third, contradictory results have been reported with regard to whether L2ers converge with native speakers in anaphora resolution. In particular, the evidence from L2 English speakers with a null subject L1 would lead us to expect L1–L2 convergence, although potential differences may be observed in contexts where multiple antecedents are salient and/or in terms of the magnitude and timing of online resolution effects. Lastly, previous research suggests that L2 proficiency and WM can contribute to L2 processing efficiency more generally, although the exact role they play in anaphora resolution is less clear.

Given the above, it is suggested that in order to examine anaphora resolution in L1 and L2 comprehenders in a more holistic way, both between- and within-group differences need to be addressed. With this as our research motivation, in the present studies we focus on effects of implicit prosodic cues on anaphora resolution. Specifically, we address the following overarching issues that have received little attention in previous research: (i) whether L1 and L2 speakers consider prosodic cues in silent reading when evaluating the salience of pronoun antecedents, and whether they do so to similar extents and (ii) whether effects of prosodic manipulations on anaphora resolution are modulated by comprehender-dependent variables in L1 and L2 groups.

The current studies

Exploring the effects of rhyme and meter

One aim of the present studies was to explore how L1 and L2 English comprehenders would process and interpret anaphoric ambiguities in the presence of competing cues. To that end, we considered contexts where the subject would normally be the preferred referent; yet, we manipulated the salience of an alternative candidate (object) by having it be a rhymed antecedent in rhyming and metered poem-like texts.

Previous studies have considered other prominence-enhancing devices to manipulate the salience of antecedents such as clefted focus (Cowles et al., Reference Cowles, Walenski and Kluender2007; Kaiser, Reference Kaiser2011) and pitch accents (Schafer et al., Reference Schafer, Camp, Rohde, Grüter, Carlson, Clifton and Fodor2019, Reference Schafer, Takeda, Rohde and Grüter2015). Unlike local cues, little attention has been paid to contextualized, global prosodic features that are activated during silent reading (see Clifton, Reference Clifton, Frazier and Gibson2015 and Breen, Reference Breen2014 for reviews).

End rhyme in verse constitutes a key element of stanza structure; it involves regulated patterning based on the repetition of sound segments found at the end portion of verse-final words; words that rhyme share the same nucleus in the final stressed syllable, optionally codas too including any further unstressed syllable that follows (Fabb, Reference Fabb2015). This repetition of accented syllabic nuclei contributes to the phonological salience of rhymed words (e.g., Fabb, Reference Fabb1999; Obermeier et al., Reference Obermeier, Menninghaus, von Koppenfels, Raettig, Schmidt-Kassow, Otterbein and Kotz2013). Additionally, according to Rickert (Reference Rickert1984, p.1), this sound-based repetition “impinges on the reader’s awareness,” in the sense that “as the rhyme becomes increasingly conspicuous, it commands added attention.” Thus, given this attention-attracting property, rhyme may even induce a “noticing” effect. Related to the above, rhymed words have been found to be predictable (Read et al., Reference Read, Macauley and Furay2014; Read & Regan, Reference Read and Regan2018) and easy to remember (Goldman et al., Reference Goldman, Meyerson and Coté2006; Johnson & Hayes, Reference Johnson and Hayes1987; Király et al., Reference Király, Takacs, Kaldy and Blaser2016).

All these effects are maximized when rhyme is paired with meter in verse as attention is drawn to lyrical properties of language (Goldman et al., Reference Goldman, Meyerson and Coté2006). Whereas rhyme regulates form patterning at a larger timescale across verses, meter structures perceptual input at a smaller timescale within verses (Fabb, Reference Fabb2009; Obermeier et al., Reference Obermeier, Menninghaus, von Koppenfels, Raettig, Schmidt-Kassow, Otterbein and Kotz2013). In metered poetry, repetitive sequences of stressed and unstressed syllables are found in metrical feet that regulate the meter of the verse-line (Fabb & Halle, Reference Fabb and Halle2008). This systematized alternation of syllabic stress promotes recall of linguistic material (Boucher, Reference Boucher2006; Brower, Reference Brower1993) and facilitates language processing by making loci of stress predictable and salient (Kotz & Schmidt-Kassow, Reference Kotz and Schmidt-Kassow2015; Pitt & Samuel, Reference Pitt and Samuel1990; Roncaglia-Denissen et al., Reference Roncaglia-Denissen, Schmidt-Kassow and Kotz2013; Rothermich & Kotz, Reference Rothermich and Kotz2013; Rothermich et al., Reference Rothermich, Schmidt-Kassow and Kotz2012).

Taken together, the phonological and rhythmic regularities found in rhyming and metered poetry promote perceptual (prosodic) fluency (Menninghaus et al., Reference Menninghaus, Bohrn, Knoop, Kotz, Schlotz and Jacobs2015). Once such recurring patterns have been identified, any disruptions to the established, and thereafter, expected structure lead to processing costs (for costs caused by metrical violations see, e.g., Breen & Clifton, Reference Breen and Clifton2011; Breen & Clifton, Reference Breen and Clifton2013; for costs induced by rhyme violations see Hoorn, Reference Hoorn, Kruez and MacNealey1996; Liu et al., Reference Liu, Jin, Wang and Xin2011). However, perceptual fluency does not imply conceptual fluency; rather, a trade-off between the two has been suggested (Menninghaus et al., Reference Menninghaus, Bohrn, Knoop, Kotz, Schlotz and Jacobs2015). This is because overattendance to formal surface elements can result in shallow processing and impact semantic comprehension (Blohm et al., Reference Blohm, Wagner, Schlesewsky and Menninghaus2018; Goldman et al., Reference Goldman, Meyerson and Coté2006; Menninghaus et al., Reference Menninghaus, Bohrn, Knoop, Kotz, Schlotz and Jacobs2015; Wallot & Menninghaus, Reference Wallot and Menninghaus2018).

Finally, it is important to note that the effects of rhyme and meter discussed above have mostly been examined with native, rather than non-native speakers. To the extent that rhyme awareness has been tested in the L2, it has mostly been studied at the word level (e.g., rhyme judgment tasks as in Bassetti et al., Reference Bassetti, Mairano, Masterson and Cerni2020), while its role during sentence processing, let alone disambiguation in poetic contexts, has remained unexplored. Regarding meter, some studies have made comparisons between L1 and L2 speakers’ brain responses to metrical structure (Roncaglia-Denissen et al., Reference Roncaglia-Denissen, Schmidt-Kassow, Heine and Kotz2015; Schmidt-Kassow et al., Reference Schmidt-Kassow, Rothermich, Schwartze and Kotz2011); the findings of Roncaglia-Denissen et al. (Reference Roncaglia-Denissen, Schmidt-Kassow, Heine and Kotz2015) revealed that L1 Turkish L2 German speakers did not take advantage of rhythmic regularity as monolingual German speakers did when processing syntactically ambiguous structures. These results are not surprising; the wider literature on prosodic cue sensitivity in the L2 so far suggests that non-native comprehenders may not take into account prosodic cues in the same way as L1 speakers; rather, effects are typically absent, weaker, or delayed in L2 processing (Foltz, Reference Foltz2021; Nakamura et al., Reference Nakamura, Arai, Hirose and Flynn2020; Perdomo & Kaan, Reference Perdomo and Kaan2021; Schafer et al., Reference Schafer, Takeda, Rohde and Grüter2015; Schmidt et al., Reference Schmidt, Pérez, Cilibrasi and Tsimpli2020).

The cause behind the lack or reduction of L2 prosody effects is difficult to pinpoint, and alternative explanations have been entertained in previous literature; these include L2 perceptual limitations, an issue with incorporating prosodic cues into the representations constructed online, or a difficulty in using prosody in conjunction with other types of information (for reviews, see Mennen & De Leeuw, Reference Mennen and De Leeuw2014; Pratt, Reference Pratt, Fernández and Smith Cairns2017). Complicating matters further, it has been observed that L2 participants do not make use of prosodic cues when engaged in certain types of processing, such as memory retrieval (Schmidt et al., Reference Schmidt, Pérez, Cilibrasi and Tsimpli2020) and reference prediction in the L2 (Perdomo & Kaan, Reference Perdomo and Kaan2021), even though they rely on the same or equivalent cues in their L1 to perform these operations. Similar results have also been reported in L2 ambiguity research, which is most relevant for present purposes. Nakamura et al. (Reference Nakamura, Arai, Hirose and Flynn2020) found that in contexts where two referents were available to resolve a type of global ambiguity, namely prepositional phrase attachment, prosodic prominence cues (contrastive pitch accents) did not have an effect on L2ers, unlike in native speakers.

Before proceeding to our hypotheses, it is worth noting that most of the research described above has manipulated overt prosodic information. All this evidence was taken into account and informed our hypotheses regarding effects of implicit prosodic cues during silent reading, which is the focus of the present studies. Henceforth, when discussing prosodic cues, we specifically refer to silent prosody.

Hypotheses

Pronoun interpretative preferences

The first set of hypotheses concerned the influence that rhyme in particular could have on pronoun interpretative preferences. In forming our hypotheses, we considered that: (i) whether an antecedent is accessible for pronoun coreference depends on the extent to which it is salient; (ii) salience is influenced by multiple constraints that may be at play in a particular context (Kaiser & Trueswell, Reference Kaiser and Trueswell2008). In the poem-like contexts considered in the present studies, a notable salience-influencing factor was the presence or absence of rhyme. Specifically, we hypothesized that if an object antecedent is a line-final rhyming element in a regularly metered and rhyming context, it could attract pronoun coreference to a greater extent than when it is not a rhyming element (in a rhyme- as well as meter-disrupted context); this could be attributed to a rhyme-induced salience and/or memorability effect, as discussed in the previous section. On the other hand, we also considered that in the latter case, a non-rhyming object antecedent may instead stand out as the non-conforming line-final word in an otherwise regularly metered and rhyming context, thus attracting coreference to a greater extent than when it is rhyming. Importantly, for L1 speakers, we predicted small effects to be observed in line with previous literature on prosodic prominence manipulations. For L2 speakers, we expected that any such effects would be attenuated or absent. To test these hypotheses, we analyzed L1 and L2 participants’ responses to the interpretation questions that followed each poem-like item. These analyses are the first ones that are reported under the Results section of the main studies.

The time course of anaphora resolution

The second set of hypotheses concerned the influence that the prosodic manipulations could have on the time course of anaphora resolution. In forming our hypotheses, we considered the potentiality of a trade-off between prosodic and conceptual fluency caused by the presence of a regular rhyme and meter (Menninghaus et al., Reference Menninghaus, Bohrn, Knoop, Kotz, Schlotz and Jacobs2015). On the one hand, since these form-based regularities can boost the degree to which linguistic material is predictable and attended to, both of which are important factors for reference interpretation (Arnold, Reference Arnold2010), we hypothesized that a regular meter and rhyming scheme will prove influential for the anaphora resolution process, potentially expediting it. Thus, one hypothesis was that comprehenders could speed-read regularly metered and rhyming ambiguous versions of poems due to perceptual fluency more so than rhyme- and meter-disrupted ones; this would be reflected in online reading rates. On the other hand, when resolving pronominal reference, comprehenders take into account discourse-level and lexical constraints (e.g., structural biases, verb semantics), and as such, any form-based regularities that divert attention away from these elements could impede the resolution of anaphoric ambiguities. Thus, we hypothesized that a semantic disfluency cost could manifest in the form of a delay at interpretative stages (e.g., when comprehenders would be prompted to arrive at a pronoun interpretation by answering a question that followed each text); this would be reflected in reaction times to the interpretation question.

In fact, such a pattern emerged in an earlier pilot study we conducted; therein, it was observed that regularly metered and rhyming multiline texts involving global ambiguity were read faster overall compared to rhyme- and meter-disrupted counter-part versions of the stimuli. Yet, comprehenders were somewhat slower at providing a response to an interpretation question (a marginally significant effect), especially when rhyme and meter were both regular as opposed to when only one of these features was disrupted.

Nevertheless, in that study, we did not compute reading times for different regions of text, but instead only a global measure of whole-text reading duration. Thus, it is possible that if critical regions were to be examined separately, disruptions to anaphora resolution would be observed early on during online processing in regularly metered and rhyming contexts. Any potential delays could be attributed to heightened competition between cues (subjecthood vs rhyme cue) or a rhyme noticing effect. Finally, any such effects were expected to be attenuated or absent in L2 speakers. To test these hypotheses, we analyzed L1 and L2 participants’ reading times for each line of text as well as their reaction times to the interpretation question. These analyses are the second ones that are reported under the Results section of the main studies.

Individual differences

In the present studies, we were interested in examining whether WM and print exposure would modulate L1 participants’ performance, as well as whether WM and L2 proficiency would prove to be influential in the L2 group. Based on the existing L1 literature, we expected to observe that greater print exposure would be associated with a more pronounced subject bias in pronoun interpretative preferences (e.g., Langlois & Arnold, Reference Langlois and Arnold2020). Additionally, we expected to observe that greater WM (reading span in particular) would be associated with increased sensitivity to pronoun ambiguity, as was reported in Nieuwland and Van Berkum (Reference Nieuwland and Van Berkum2006). However, because different methods were used in that study (ERPs), we tested this association in a more indirect way through a novel measure we devised, which we explain under the Data Analyses section of Study 1. As for L2 speakers, we considered that WM and L2 proficiency might prove influential, yet due to the mixed evidence reported in previous literature or lack of research, we did not form strong predictions regarding the effects of these comprehender-dependent measures on L2 anaphora resolution; thus, our analyses are exploratory. The measures computed for the aforementioned variables, alongside certain additional ones which we detail below, were included as covariates in the analyses of offline and online results and are reported at the end of the Results section of the main studies.

Design

Two web-based self-paced reading experiments were designed to test which of the hypotheses formulated above would enjoy empirical support. In Study 1, native English speakers were recruited; in Study 2, we tested L2 English speakers with a null subject L1, namely Greek, following Cunnings et al. (Reference Cunnings, Fotiadou and Tsimpli2017). To account for individual variability, we selected different tasks to compute comprehender-dependent measures for the two groups.

L1 speakers were assessed on memory-related tasks, a print exposure task and a rhyme judgment task. Because both verbal and non-verbal WM contribute to individual variability in ambiguity resolution (Swets et al., Reference Swets, Desmet, Hambrick and Ferreira2007), we chose to include the Reading Span and the Spatial Span task used in that study. Additionally, we assessed WM in the auditory modality using the Backward Digit Span, formatted and administered in a similar fashion as the Digit Span tasks in the Cambridge Neuropsychological Test Automated Battery (CANTAB)Footnote ² . The Forward Digit Span task was also administered as it taps into phonological short-term memory, and it is thought to be associated with rhyme cue sensitivity (Classon et al., Reference Classon, Rudner and Rönnberg2013). Following Arnold et al. (Reference Arnold, Strangmann, Hwang, Zerkle and Nappa2018) and Langlois and Arnold (Reference Langlois and Arnold2020), we used the ART (Acheson et al., Reference Acheson, Wells and MacDonald2008; for adaptations see Martin-Chang & Gould, Reference Martin-Chang and Gould2008 and Moore & Gordon, Reference Moore and Gordon2015) as a proxy for print exposure. A self-report measure, namely the three-part questionnaire (Reading, Writing, and Comparative Reading Habits) developed by Acheson et al. (Reference Acheson, Wells and MacDonald2008) was also used to derive a composite measure of print exposure which was used in analyses (henceforth Print Exposure composite). Finally, in order to assess whether rhyme awareness influences pronoun interpretative preferences (given the rhyme cue on the nonsubject antecendent in our stimuli), we used a Rhyme Judgement task (Johnston & McDermott, Reference Johnston and McDermott1986) which was modeled on materials used in Frisson et al. (Reference Frisson, Koole, Hughes, Olson and Wheeldon2014) and Zecker et al. (Reference Zecker, Tanenhaus, Alderman and Siqueland1986), while the administration procedure was informed by findings of Classon et al. (Reference Classon, Rudner and Rönnberg2013).

In Study 2, L2 speakers completed the same tasks as specified above; however, for reasons explained under Study 2, L2 speakers did not complete the ART and Spatial Span task. Finally, we assessed proficiency levels of L2 speakers using the English Level Placement Test by the British CouncilFootnote ³ .

The results of Study 1 and Study 2 are first discussed separately and then compared in a subsequent section. We decided to analyze the data in this way because we first wanted to examine how individual differences affect performance within each group. Therefore, we decided to first analyze L1 and L2 data separately while taking into account the unique set of comprehender-dependent measures that were computed for each group and then compare between L1 and L2 results in the main self-paced reading experiment that both groups completed.

Study 1: L1 speakers of English

Participants

Thirty-nine L1 English speakers (21 female, M_AGE = 21.3; SD_AGE = 2.08) participated in the study. Twenty-one of them were UK-born university students recruited through the Prolific platformFootnote ⁴ . The remaining 18 were students at the University of Cambridge, who were born in the UK (N = 17) or Ireland (N = 1). All provided their informed consent to participate in the study and received £15 as payment. The study has received ethical approval by the ethics committee of the Modern and Medieval Languages and Linguistics faculty at the University of Cambridge.

Materials

The critical items were 16 poem-texts which contained anaphoric ambiguity. These items were created in two different conditions (R&M-consistent and R&M-disrupted), and participants were presented with 8 items for each one. An example of the two conditions can be found in Table 1; a list of all items can be found in Appendix A.

Table 1. Example of critical item

Out of the five verses of each critical poem-item, the first two established an expectation for a consistent meter and rhyming scheme. These first two lines remained unchanged across conditions, as did the last two. Importantly, the third line contained two proper names, one being the subject (SUB) and one being the object (OBJ: line-final) of a main clause; either one could be the referent of the ambiguous pronoun found within a dependent clause on the 4^th line. Crucially, in half of the items, the metrical structure and rhyming scheme were invariable across all 5 verses (OBJ: rhyming; R&M-consistent), whereas, in the other half, they were disrupted on the 3^rd line (OBJ: non-rhyming; Condition: R&M-disrupted).

In terms of internal structure, each constituent verse of the poem-items consisted of seven syllables. In the R&M-consistent condition, the 3^rd, 5^th, and 7^th syllable were stressed on all lines, whereas in the R&M-disrupted condition the 3^rd, 5^th, and 7^th syllable were stressed on all but the 3^rd line (disrupted). To make this disruption possible, the two monosyllabic proper names in the R&M-consistent condition were replaced by disyllabic ones in the R&M-disrupted version of the stimuli, while the line-initial adverb of frequency in the former condition was eliminated in the latter.

It should be noted that metrical consistency on the 3^rd line could have been disrupted by changing just one of the candidates between conditions (e.g., by making the non-rhyming OBJ candidate in the R&M-disrupted condition be disyllabic while keeping the same SUB candidate as it was in the R&M-consistent condition). This option was dispreferred as it would have meant that in just one of the conditions the SUB and OBJ would necessarily differ in terms of syllable count, and probably character length too. This could have affected interpretative preferences, as well as the reading time results between conditions.

Furthermore, the rhyming scheme remained invariable across all verses in the R&M-consistent condition (AAAAA), whereas in the R&M-disrupted condition the line-final word of the 3^rd verse did not rhyme with the rest (AABAA). The rhyming words in each poem-text exhibited substantial phonological similarity in the rime portion (near-rhymes were avoided where possible), yet word endings were not always orthographically identical. Importantly, the line-final words on the 3^rd and 4^th verse, where the OBJ disambiguation candidate and the dependent clause verb could be found respectively, exhibited adequate orthographic overlap (e.g., Rose-pose, or minimal difference as in Brooke-cook)Footnote ⁵ . This phonemic and graphemic correspondence between the two words was expected to increase the prominence of the rhyming candidate.

To ensure that the items would be comprehensible and plausible enough given their poetic particularities, the materials were normed by an independent group of native English speakers (N = 38; 33 females) prior to the launch of the main study. These participants were asked to rate 32 items which involved anaphoric ambiguity in terms of plausibility and comprehensibility on a scale of 1 (very implausible/incomprehensible) to 7 (very plausible/comprehensible). Only 16 items that received ratings above 4 for both plausibility and comprehensibility were selected. These items were then further amended and checked again for plausibility and comprehensibility by a native speaker of English with a postgraduate linguistics degree. The rater assigned the maximum value (7) to all items in terms of plausibility; none of the items were rated below 4 for comprehensibility (M = 5.3, SD = 0.6, range = [4, 6]).

Alongside the 16 critical anaphoric ambiguity poem-items, participants in the main reading experiment also read: a) 16 relative clause ambiguity items, b) 32 items which involved local ambiguity and were not metered or rhyming, c) 32 items that contained alliteration (N = 16) and vowel assonance (N = 16), none of which were ambiguous, and d) 16 unambiguous filler items containing iconic and non-iconic “after” and “before” clauses, respectively. Thus, participants read a total of 112 five-line poem-like texts.

After each text, participants were asked to specify which character performed the action described by the dependent clause verb on the 4^th line (see Table 1). Since the verb had the ambiguous pronoun as a subject, they essentially had to indicate which one of the candidates found on the 3^rd line was its referent. Alternatively, they could select the fallback response option “Other(s).” Care was taken to avoid repeating the dependent clause verb in the question as it rhymed with the OBJ candidate in the R&M-consistent condition, and this could have amounted to an imbalanced extra cue; instead, alternative formulations of the questions were opted for, where feasible, to allow comparability between the two conditions.

Procedure

A web-based reading experiment was designed which employed the self-paced (line by line) moving-window paradigm (Just et al., Reference Just, Carpenter and Woolley1982). The reading experiment as well as the additional tasks were programmed using the JavaScript library JsPsych (de Leeuw, Reference de Leeuw2015) and then launched online, hosted on a university server. A remote testing method was used (Leong et al., Reference Leong, Raheel, Sim, Kacker, Karlaftis, Vassiliu and Kourtzi2022) which involved participants completing the two-hour study while being on live (audio-only or audio and video) call with the first author and main experimenter.

Once a pre-study form and the three-part questionnaire for print exposure had been completed, participants would start the main reading experiment, after having gone through three practice items. The main reading experiment was split in five blocks. The first four blocks contained 22 texts each, while the last one consisted of 24 texts. After getting through each block, participants had the opportunity to take a short break. Additionally, after the first block, participants would do the Reading Span task; after the second block, they were tested on the Spatial Span task; following the third block, they did the Forward and Backward Digit Span tasks; after the fourth block, they did the ART; finally, upon completion of the fifth block, they did a novel Text Completion task (to be discussed in a future report) and the Rhyme Judgement task. We structured the web-based experimental sessions in this way to prevent participant fatigue and loss of interest caused by the long and repetitive nature of the self-paced reading experiment. In between breaks, participants would rest and also interact with the experimenter to complete the additional tasks, thus ensuring continued engagement throughout the session. Additionally, we decided to keep the order of the tasks fixed so as to streamline the administration and manual scoring of oral responses that were provided to certain tasks; had the order of tasks been randomized, there would not have been sufficient preparedness to complete the above steps efficiently. Importantly, the Rhyme Judgement task came after the final block of the main reading experiment in order to avoid drawing attention to rhyme. Details on what these tasks involved, how they were adapted, administered, and scored can be found in the supplementary material (see Replication Package).

Within the blocks of the main reading experiment, participants read the texts one line at a time; at the beginning of each trial, only dashes would be visible to mask all the lines. Participants had to press a key to reveal only a single line of text each time, making their way from the first to the fifth one with each key press. Subsequently, readers would be directed to the interpretation question which they could answer by pressing a key corresponding to one of the two disambiguation candidates or the fallback option “Other(s).”

The stimuli were counterbalanced and equally distributed across two lists. List assignment and the order in which stimuli appeared was fully randomized. The order of appearance of the response options SUB and OBJ was pseudo-randomized and fixed per item; as such, in each list, for half the stimuli the OBJ would appear in the left-most position of the screen, whereas for the rest the OBJ would appear in the middle position of the screen. The fallback option “Other(s)” would always appear in the right-most position.

Data analyses

Prior to analyses, individual datasets were first checked for accuracy on the unambiguous filler items. Following Fernández’s (Reference Fernández, Heredia and Altarriba2002) methodology, native English speakers who provided an incorrect response to the question that followed the unambiguous filler items in more than 20% of cases were excluded from analyses. As a result, data from three individuals were discarded, yielding a sample size of 39.

Second, trials in which participants had selected the fallback option “Other(s)” as a response to the question for the critical items were excluded (2.2% data loss).

Third, reading time data and reaction times to the interpretation question were checked for normality (the latter were recorded as the time elapsed in milliseconds between the onset of the question trial until a response was provided via a key press and will be referred to as RTs hereafter). As a first step, a combination of winsorizingFootnote ⁶ along with log transformations was used (Nicklin & Plonsky, Reference Nicklin and Plonsky2020). If after transformations had been applied, outliers were identified using Cook’s distance, such cases were discarded (5.7% data loss).

Moreover, as is reported in the Results section, statistical models revealed great participant-level variability in pronoun interpretative preferences and processing rates. It was hypothesized that differences in reading behavior could be associated with individuals’ interpretative preferences. For instance, participants who exhibited a switch in their interpretative preferences from one antecedent to the other depending on the condition they were viewing may have been affected by heightened competition between cues, thus needing more time to process critical information. On the other hand, participants who had fixed preferences may have not considered alternative interpretations, thus needing less time. Additionally, it was deemed important to examine whether WM capacity is predictive of switching tendencies. For instance, switching between referential interpretations may be a sign of higher reading spans, while rigidity may be indicative of reduced sensitivity to formally ambiguous pronouns, a characteristic of individuals with lower spans (Nieuwland & Van Berkum, Reference Nieuwland and Van Berkum2006).

To test these hypothesized associations, a new measure called Switch was created.

This measure involves a dichotomous division (50% threshold) resulting in participants being categorized into distinct groups, namely Biased and Switchers. To explicate, participants who were recorded as Biased exhibited an overall preference (> 50%) for either the SUB or the OBJ candidate in both conditionsFootnote ⁷ . Participants who were recorded as Switchers had an overall preference (> 50%) for either the SUB or the OBJ candidate in only one condition, while in the other condition the overall preference switched to the alternative candidate. Accordingly, 24 of the participants in the L1 group were classified as Biased and 14 as Switchers. One participant who was purely balanced in their preferences (50%–50% distribution between SUB and OBJ in both conditions) was excluded from the analyses that included the Switch measure.

Analyses were performed in R version 4.2.2 using the lme4 package (Bates et al., Reference Bates, Mächler, Bolker and Walker2015). Reading times and RTs were entered as dependent variables into linear mixed effects models. The responses to the interpretation questions were entered into binomial generalized linear mixed effects models. For all dependent variables, analyses were conducted in two steps. First, models with only Condition as a predictor were run to assess effects of the R&M manipulation irrespective of covariates. Then, new analyses were performed which included covariates. For interpretative preferences, the new models included scores on the additional tasks. For reading times and RTs, the new models included scores on the additional tasks and the Switch variable, as well as line character count or word count as predictors if they improved model fit. The results reported are based on the output of the model that best fitted the data (AIC criteria were used for model comparison). In terms of contrasts for factors, deviation coding was used. In cases where significant interactions were detected, post hoc tests were run with Bonferroni p-value corrections using the R lsmeans package. In all of the above models, participants and items were included as random effects (either intercepts only or intercepts and slopes depending on whether they improved the model fit).Footnote ⁸ For model summaries, the R report package was used. Odds ratio and Cohen’s d are reported as effect size indices. The code used and the model output can be found in the supplementary material (see Replication Package).

Results

Effects of the R&M manipulation on pronoun interpretative preferences

The distribution of SUB–OBJ responses in each condition can be seen in Table 2. When the OBJ pronoun antecedent was a rhyming element in the R&M-consistent condition, it was chosen less often overall (42%) compared to when it was not rhyming in the R&M-disrupted condition (49%). Statistical analyses revealed that the R&M-consistent condition led to a decrease in the odds of selecting the OBJ (β = –0.36, 95% CI [–0.70, –0.01], p = 0.044); yet, the effect size was small (OR = 0.70, 95% CI [0.50, 0.99]).

Table 2. Whole-group mean percentage of SUB and OBJ responses by condition (L1 group)

Note: SD and CIs derived from the pooled mean (i.e., grand mean of individuals’ mean preferences).

Effects of the R&M manipulation on reading times and RTs

Reading times and RT data are presented in Table 3. Critical line 3 and 4 reading times as well as the RTs to the question were analyzed in linear mixed models. Results revealed that the R&M-consistent condition caused participants to slow down their reading rate on line 3 (β = 0.10, 95% CI [0.04, 0.17], t(598) = 3.12, p = 0.002; d = 0.291, 95% CI [0.09, 0.48]). There was no significant effect of Condition for line 4 (p > 0.05). In terms of the RT data, the R&M-consistent condition led to somewhat speedier responses to the question, although this result did not reach significance (β = –0.06, p = 0.11). Therefore, these results suggest that R&M consistency led to a significant slow-down in reading rate (for critical line 3 only), and there was no significant difference between the two conditions in terms of RTs.

Table 3. Mean reading times and RTs to the interpretation question by condition in ms (L1 group)

Note: The geometric mean is reported (i.e., antilog of the average of the logarithms of the data)Footnote ⁹ .

Moreover, further inspection of the models’ random effects structure revealed that between-participant variability accounted for greater proportion of the variance than items; this was the case in all models reported above (30–50% more variance explained by participants).

Finally, although it was not part of our original analysis plan, we still examined line 5 reading times to rule out the possibility of differential wrap-up effects between the two conditions. Condition was not significant (p > 0.05).

Individual differences

We first examined whether the scores of L1 participants on the additional tasks (shown in Table 4) could explain by-participant variability in terms of pronoun interpretative preferences. None of the additional task scores were significant predictors (p > 0.05).

Table 4. Mean scores on additional tasks (L1 group)

We also examined whether reading behavior was modulated by comprehender-dependent measures; to that end, we included in models the scores of participants on additional tasks as well as the Switch variable. In the case of line 3 reading times, the model included Condition and the Switch variable, as well as line 3 character count and the ART scores. Results revealed again an effect of Condition (β = 0.11, 95% CI [0.05, 0.18], t(578) = 3.29, p = 0.001; d = 0.32, 95% CI [0.11, 0.52]). Additionally, an increase in ART scores was associated with faster line 3 reading rate (β = –0.02, 95% CI [–0.03, –0.003], t(578) = –2.46, p = 0.014). Importantly, the interaction between Switch (Switchers) and Condition (R&M-consistent) was significant (β = 0.15, 95% CI [0.01, 0.28], t(578) = 2.14, p = 0.033). To follow up on the interaction, post hoc comparisons were performed. Results revealed that the interaction effect was driven by Switchers’ significant slow-down in the R&M-consistent condition compared to the R&M-disrupted one (β = 0.18, p = 0.0019; d = 0.526, 95% CI [0.2, 0.85]). No other significant results were obtained through post hoc tests; this suggests that Condition had differential effects only for Switchers and not for Biased participants.

Regarding line 4 reading times, the new model included Condition and the Switch variable, as well as word count for line 4 and the ART scores. The effect of Condition remained nonsignificant (p > 0.05). Higher ART scores were estimated to lead to somewhat faster line 4 reading times, although this result did not reach significance (β = –0.02, p = 0.056). A significant interaction was detected between Switch and Condition (β = 0.18, 95% CI [0.04, 0.31], t(580) = 2.55, p = 0.011). Post hoc tests revealed that the interaction was driven by Switchers taking longer to read line 4 in the R&M-consistent condition compared to the R&M-disrupted one (β = 0.10, p = 0.049; d = 0.276, 95% CI [–0.0001, 0.55]); there were no differential effects of Condition for Biased participants (p > 0.05).

In terms of the RT data, the new model included Condition and Backward Digit Span task scores as predictors, none of which proved significant (p’s > 0.05).

Importantly, the Switch variable allowed us to test whether switching tendencies were related to WM capacity. A logistic model indicated that a one unit increase in Reading Span scores was associated with greater odds of being a Switcher rather than a Biased participant (β = 0.24, 95% CI [0.08, 0.46], p = 0.012; OR = 1.27, 95% CI [1.08, 1.58]); see Figure 1. Hence, reading span can explain why some participants considered alternative referential interpretations depending on context (Nieuwland & Van Berkum, Reference Nieuwland and Van Berkum2006), and it can also indirectly account for the distinct reading behavior observed between Switchers and Biased participants.

Figure 1. Difference in reading span task scores between L1 switchers and biased participants.

Study 2: L2 English speakers

In study 2, L2 English L1 Greek speakers were recruited to explore how L2ers process and interpret anaphoric ambiguities in the presence of competing cues. The L2 group results are first reported in this section. Following this, they are compared to the L1 group results in a subsequent section in order to test for L1–L2 differences.

Participants

Forty-six L2 English L1 Greek speakers (37 female, M_AGE = 22.4; SD_AGE = 2.68)Footnote ¹⁰ participated in the study. In order to maximize the chances that the L2 speakers would be proficient enough in English to participate in the study and also that their exposure would be frequent and mostly text-based (as opposed to through immersion), participant calls were posted on social media platforms specifically targeted to university students of English or Translation studies in Greece. As a result, 92% (N = 43) of participants were students or recent graduates of these degree programs at a Greek university at the time of testing. All provided their informed consent to participate in the study and received €7.5 per hour as payment. The study has received ethical approval by the ethics committee of the Modern and Medieval Languages and Linguistics faculty at the University of Cambridge.

In terms of the L2 acquisition context, all participants indicated that they (i) were born in Greece; (ii) had spent the majority of their childhood in Greece (two participants had temporarily relocated to an English-speaking country between the ages of 3 and 6 due to parents’ profession); and (iii) had learnt English in school and through tutoring once they were of school age (6 years +). Adding to this last point, no participant reported that English was spoken at home by parents/caregivers while they were growing up. However, three participants indicated that, apart from Greek, they were exposed to another language (2L1) in their home environment from birth, namely either French, Romanian, or Russian.

At the time of testing, all participants had spent the majority of their lives in Greece; only five participants had spent a period of 5 to 24 months in an English-speaking country due to university studies. When asked to rank the languages they speak in terms of current overall self-perceived proficiency, all indicated that they were most proficient in Greek (L1); in all cases, English was ranked second (L2). Finally, the majority of participants (80%) indicated that they had learnt other foreign languages at different points in their life, most commonly in adolescence or adulthood. Self-reported data revealed that 91% of participants were less fluent in these languages compared to English (9% were equally fluent), and 89% of participants used these languages less frequently compared to English (11% used them equally frequently). Table 5 contains further information regarding participants’ language ability and daily use of L2 English at the time of testing. For the former, reference is made to the levels of language ability, as specified within the Common European Framework of Reference for Languages (CEFR).

Table 5. Level and use of English in L2 participants (Study 2)

Overall, the majority of the participants: (i) had grown up in a monolingual context without sustained exposure to English before school age, (ii) had learnt English through text-based exposure in instructed settings rather than through immersion in an L2 environment, and (iii) by the time of testing, they had achieved a high level of proficiency in the L2 and kept using the language on a regular basis.

Materials

The same materials as in Study 1 were used.

Procedure

The same procedure as in Study 1 was followed, albeit with slight modifications. In terms of additional tasks, instead of doing the Spatial Span task after the second block of the reading experiment, the L2 participants completed the Digit Span tasks and had no other task after the third block. The choice to eliminate the Spatial Span task was made since it proved to be very challenging for L1 speakers and was not expected to add value to Study 2 with L2 speakers. Moreover, instead of the ART, L2 participants did the English Level Placement Test after the fourth block. The exclusion of the ART was based on McCarron and Kuperman (Reference McCarron and Kuperman2021) who have reported that the task is not effective to use with L2 speakers. All tasks were completed in English including the memory-related ones; participants’ performance in their L1 (Greek) was not assessed.

Data analyses

Regarding analysis steps, the only difference between Study 1 and Study 2 concerned accuracy-related exclusion criteria. Following Fernández (Reference Fernández, Heredia and Altarriba2002), we considered a different error rate cutoff for L2 speakers, namely more than 30% error on fillers. In all cases, error rate did not exceed 25%. As in Study 1, all trials in which the response “Other(s)” was provided to the interpretation question for the critical items were excluded (3.5% data loss). In terms of trial exclusion due to outliers in reading times and RTs, the same procedure as in Study 1 was followed which resulted in a 4.3% data loss. Additionally, we considered excluding any participants with low accuracy on the English Level Placement Test. With a maximum possible score of 25, none of the L2 participants scored below 17 or 68% correct (M = 21.2; SD = 1.63). Since all L2 speakers were at an “intermediate” or “upper intermediate or above” level (based on the British Council’s automatic classification for the test), none of them were excluded.

Moreover, we explored whether L2 participants’ reading behavior varied as a function of their overall pronoun interpretative preferences; thus, as in Study 1, the Switch variable was computed which yielded 36 Biased participants and 7 Switchers. Three participants who were purely balanced in their preferences were excluded from the analyses that included the Switch measure.

Results

Effects of the R&M manipulation on pronoun interpretative preferences

The distribution of SUB–OBJ responses in each condition is presented in Table 6. When the OBJ antecedent was a rhyming element in the R&M-consistent condition, it was chosen slightly more often (49%) compared to when it was not rhyming in the R&M-disrupted condition (46%). This 3% difference is nonsignificant (p > 0.05).

Table 6. Whole-group mean percentage of SUB and OBJ responses by condition (L2 group)

Note: SD and CIs derived from the pooled mean (i.e., grand mean of individuals’ mean preferences).

Effects of the R&M manipulation on reading times and RTs

Line reading times and RT data are presented in Table 7. As in Study 1, critical line 3 and 4 reading times as well as the RTs to the question were analyzed. The effect of Condition on line 3 and line 4 reading times was not significant (p > 0.05). In terms of the RT data, the R&M-consistent condition led to significantly slower responses to the interpretation question (β = 0.08, 95% CI [0.007, 0.14], t(698) = 2.18, p = 0.029, d = 0.168, 95% CI [0.01, 0.32]). Thus, the L2 participants did not adapt their reading rate at any point depending on which version of the texts they were viewing. Instead, differential processing between the two conditions was observed in RTs, as response rates were significantly slower in the R&M-consistent condition for the L2 group.

Table 7. Mean reading times and RTs to the interpretation question by condition in ms (L2 group)

Note: The geometric mean is reported (i.e., antilog of the average of the logarithms of the data).

Finally, as in Study 1, we examined line 5 reading times to rule out the possibility of differential wrap-up effects between the two conditions. There were no significant differences (p > 0.05).

Individual differences

We examined by-participant variability in pronoun interpretative preferences by including in analyses L2 participants’ scores on additional tasks (shown in Table 8). The new model included Reading Span and Backward Digit Span task scores but not Condition as it did not improve model fit. Since the two WM measures were not strongly correlated (r(41) = .15, p > 0.05), both were modeled. Whereas higher Reading Span task scores were associated with greater odds of choosing the OBJ (β = 0.04, 95% CI [0.006, 0.08], p = 0.023; OR = 1.04, 95% CI [1.01, 1.08]), higher Backward Digit Span task scores led to a decrease in the odds of choosing the OBJ (β = –0.15, 95% CI [–0.29, –0.003], p = 0.045; OR = 0.86, 95% CI [0.75, 1.00]).

Table 8. Mean scores on additional tasks (L2 group)

Moreover, we examined if reading behavior was modulated by comprehender-dependent measures. We included in models the scores of participants on additional tasks and the Switch variable. The new model for line 3 reading times included Condition and Switch, line 3 character count and the following measures: English Level Placement Test, Reading Span, Forward Digit Span, and Rhyme Judgment task scores. As in the previous analysis for line 3, the effect of Condition remained nonsignificant (p > 0.05). There were significant effects of the Reading Span (β = –0.02, 95% CI [–0.03, –0.003], t(644) = –3.12, p = 0.002) and the English Level Placement Test scores (β = –0.08, 95% CI [–0.13, –0.04], t(644) = –3.57, p < 0.001), suggesting that better performance in these measures is predictive of faster line 3 reading times. Line 3 character count was also significant (β = 0.10, 95% CI [0.06, 0.15], t(644) = 4.60, p < 0.001). The interaction between Switch (Switchers) and Condition (R&M-consistent) was not significant (p > 0.05).

In terms of line 4 reading times, neither Condition nor the Switch variable proved to be significant predictors (p’s > 0.05). Instead, the updated model included the Reading Span, the Forward Digit Span and the English Level Placement Test scores. Only the effect of Reading Span was significant (β = –0.01, 95% CI [–0.03, 0.008],t(651) = –2.09, p = 0.037), suggesting that better performance in this WM task was associated with faster reading times in the 4^th region containing the ambiguous pronoun.

Regarding the RT data, neither the Switch variable nor the additional measures improved model fit compared to the previous analysis with just Condition as a predictor.

Finally, logistic models were run to examine whether switching tendencies in L2 participants can be accounted for. However, none of the additional tasks could explain why some L2 participants switched preferences and some did not (p’s > 0.05).

Study 1 & 2: Comparison between L1 and L2 group

Effects of the R&M manipulation on pronoun interpretative preferences

The percentage to which the OBJ candidate was selected in each Condition and by each Group is presented in Figure 2.

Figure 2. L1 & L2 percentage of OBJ preference by condition (CI errorbars based on the pooled mean).

In order to examine whether the two groups differed in terms of their interpretative preferences, a logistic mixed model with Condition and Group as predictors was run. Neither predictor proved significant (p’s > 0.05). Thus, in order to draw any conclusions, we rely on the separate models run for each group. To the extent that Condition had an effect, this only applied to L1 speakers, as was demonstrated by the previously reported analyses. In fact, significantly lower odds of choosing the OBJ in the R&M-consistent condition were only estimated for the L1 group; instead, for the L2 group, an increase in the odds was estimated, albeit non-significant.

Effects of the R&M manipulation on reading times and RTs

Line-by-line reading times of the L1 and L2 group are plotted side by side for comparison in Figure 3. In a linear mixed model with Condition and Group included to predict line 3 reading times, the effect of Group was significant; the L2 participants took longer to read critical line 3 (β = 0.58, 95% CI [0.43, 0.73], t(1299) = 7.80, p < 0.001; d = 1.57, 95% CI [1.16, 1.98]). Additionally, the effect of Condition was significant (β = 0.07, 95% CI [0.02, 0.12], t(1299) = 2.84, p = 0.005; d = 0.196, 95% CI [0.05, 0.33]), suggesting that the R&M-consistent condition led to delays when both L1 and L2 participants were processing line 3. In the case of line 4 reading times, Condition was not a significant predictor (p > 0.05). There was only a significant effect of Group, as L2 participants took longer to read line 4 (β = 0.55, 95% CI [0.38, 0.72], t(1301) = 6.33, p < 0.001; d = 1.45, 95% CI [0.98, 1.92]). Importantly, there were no significant interactions in any of the models, indicating that Condition had similar effects in the L1 and L2 group.

Figure 3. Mean reading times by group and condition (SE error bars).

With regard to the RTs to the interpretation question, a significant effect of Group was observed; L2 participants took longer to indicate a preference (β = 0.33, 95% CI [0.19, 0.47], t(1301) = 4.58, p < 0.001; d = 0.75, 95% CI [0.42, 1.08]). Although the effect of Condition was not significant, the interaction between Group (L2) and Condition (R&M-consistent) was statistically significant (β = 0.12, 95% CI [0.02, 0.22], t(1301) = 2.47, p = 0.014). Post hoc comparisons revealed that L2 participants took significantly longer to provide a response in the R&M-consistent condition compared to the R&M-disrupted one (β = 0.06, p = 0.0475; d = 0.153, 95% CI [0.001, 0.303]). There were no differential effects of Condition in the L1 group (p > 0.05).

Hence, the L2ers had a slower reading rate compared to L1 participants. This result is not surprising; L2 participants required more time to read the whole texts, not just the critical lines. Our main interest was to examine whether Condition had the same effect for both groups or whether the two groups behaved dissimilarly when reading the two different versions of the texts. In terms of reading rate, there were no interactions between Group and Condition, suggesting similar processing across the two groups. However, given what was already known based on the separate models for the L1 group and L2 group, the slow-down for line 3 proved to be significant only for the L1 group, not for the L2 group. As such, although the directionality of effects is the same for both groups (positive beta coefficients), the magnitude of the effect differs. In fact, as is shown in Table 3, the L1 participants experienced a 183 ms slow-down on line 3 in the R&M-consistent condition compared to the R&M-disrupted one. In contrast to this, as is shown in Table 7, the L2 participants experienced an attenuated 137 ms slow-down, which was not significant when examined separately.

Moreover, differential effects of Condition were detected in terms of RTs. The significant interaction between Group and Condition was expected given what the separate models for L1 and L2 speakers had estimated. That is, while the L1 comprehenders were only slightly (and nonsignificantly) faster in their response rates in the R&M-consistent condition compared to the R&M-disrupted one (−151ms), the opposite pattern emerged for L2 participants who exhibited a significantly slower response rate in the face of R&M consistency (322ms). Possible explanations to account for this differential effect between groups are proposed in the General Discussion.

Finally, new analyses were run with data from both groups considered which included covariates. Our main goal was to test if the same result patterns would emerge for both groups when the Switch variable was added. These new analyses concerned reading time data and RTs to the question (for differences between L1 and L2 Switchers and Biased participants in terms of interpretative preferences, descriptive statistics can be found in Appendix B). Regarding other terms considered for inclusion in models, we allowed for character count or word count but did not include scores on additional tasks as these differed between groups.

Line 3 and 4 reading times as well as RTs by Group, Condition, and Switch are plotted in Figure 4. The new model for line 3 reading times included Group, Condition, and Switch, as well as line 3 character count. As in the previous analyses, significantly longer reading times were estimated in the R&M-consistent condition (p = 0.008) and for L2 participants (p < 0.001). There was also a significant effect of line 3 character count (β = 0.01, 95% CI [0.006, 0.02], t(1233) = 3.44, p < 0.001). Crucially, the interaction between Switch (Switchers) and Condition (R&M-consistent) proved significant (β = 0.13, 95% CI [0.01, 0.25], t(1233) = 2.19, p = 0.029). Post hoc tests revealed that the interaction was driven by Switchers taking longer to read line 3 in the R&M-consistent condition compared to the R&M-disrupted one (β = 0.14, p = 0.008; d = 0.392, 95% CI [0.1, 0.68]). There were no differential effects of Condition for Biased participants (p > 0.05).

Figure 4. Mean reading times for Line 3 (top) and Line 4 (middle) and RTs to the Interpretation Question (bottom) by Group, Condition and Switch (SE error bars).

In terms of line 4 reading times, there was again an effect of Group (L2), as was the case in the previous analysis (p < 0.001). Line 4 word count was also significant (β = 0.07, 95% CI [0.02, 0.12], t(1235) = 2.53, p = 0.011). Importantly, the interaction between Switch and Condition was significant (β = 0.11, 95% CI [0.01, 0.21], t(1235) = 2.20, p = 0.028). Post hoc comparisons revealed that it was again Switchers that took longer to read line 4 in the R&M-consistent condition; yet, this result did not survive Bonferroni corrections (β = 0.079, p = 0.071).

Thus, these results revealed that Switchers took significantly longer to read critical lines 3 in the R&M-consistent condition; a similar effect, although marginal, was detected for line 4. Since there were no interactions with Group, it seems that both L1 and L2 Switchers processed the critical lines similarly in the R&M-consistent condition. The same interaction between Switch and Condition was not observed in the RT data; instead, it was the interaction between Group and Condition that remained significant (p = 0.019), as in the previous model with just Group and Condition as predictors.

General discussion

Pronoun interpretative preferences

Study 1 and 2 were designed to explore how L1 and L2 English comprehenders would process and interpret anaphoric ambiguities in the presence of competing cues. As far as L1 speakers are concerned, it was hypothesized that the rhyme cue on the nonsubject (OBJ) candidate would influence interpretative preferences; the magnitude of the effect was expected to be small due to the nature of the manipulation and further modulated by comprehender-dependent factors.

We did indeed find an effect of prosodic cues on L1 interpretative preferences. It was observed that the SUB antecedent was interpreted as coreferential with the pronoun to a greater extent in the R&M-consistent condition relative to the R&M-disrupted one. This could be attributed to the manipulation introduced, namely the effect of the rhyme cue: it made the OBJ antecedent blend in with preceding context in the R&M-consistent condition, thus allowing for the otherwise default competitor (SUB) to gain ground. In parallel, the incongruence of the OBJ in the R&M-disrupted condition led to it being marked as a line-final non-rhyming element in the otherwise rhyming context; as a consequence, the OBJ antecedent became prominent and accessible for coreference (Arnold, Reference Arnold2010) to a greater extent compared to when it was a rhymed word in the R&M-consistent condition.

Importantly, the suggestion that the OBJ blended in with preceding context in the R&M-consistent condition does not imply that the rhyming antecedent was perceptually non-salient or inappreciable (see also Carminati et al., Reference Carminati, Stabler, Roberts and Fischer2006 for a similar justification). As will be discussed in more detail later on, the reading time results indicate that having the OBJ be a rhyming antecedent in the R&M-consistent condition caused L1 comprehenders to slow down their reading rate. This is thought to be due to a rhyme noticing effect and/or heightened competition. If that is indeed the case, the finding that the OBJ candidate was dispreferred compared to the SUB, specifically when it was a rhyming element, could be the result of readers inhibiting the OBJ response. It is possible that readers perceived rhyme as an obtrusive or biasing cue which they needed to suppress, although this hypothesis cannot be tested given the scope of the present experimental studies.

It is also noteworthy that we observed no categorical subject bias, regardless of Condition. For instance, in Cunnings et al. (Reference Cunnings, Fotiadou and Tsimpli2017) L1 participants selected SUB antecedents as referents of ambiguous pronouns at 87% in subject-biasing contexts and at 66% in object-biasing contexts. In the present study, L1 speakers selected the SUB at 51% in the R&M-disrupted condition and at 58% in the R&M-consistent condition. Importantly, ART performance was not a significant predictor of L1 comprehenders’ interpretative preferences. Although it has been suggested in the literature that greater print exposure contributes to the strength of the subject bias (Arnold et al., Reference Arnold, Strangmann, Hwang, Zerkle and Nappa2018; Langlois & Arnold, Reference Langlois and Arnold2020), the present findings do not suggest such an association. Since no compreheder-dependent measure was predictive of L1 participants’ interpretative preferences, the source of this modest subject preference in the present study remains unclear. It is possible that the poetic contexts designed were successful at maximizing perceived ambiguity or neutralizing biases. Another possibility is that the low subject preference we observed is attributable to the placement of the alternative candidate (i.e., the OBJ) at the end of a line, which is in and of itself “a salient position” (Fabb, Reference Fabb2022, p. 167-168). We thank an anonymous reviewer for pointing this out.

With regard to L2 speakers’ interpretative preferences, we expected any effects our manipulation would have for L1 comprehenders to be absent or attenuated in comparison. The results obtained lend support to this hypothesis. While L1 participants seemed to prefer the SUB interpretation over the OBJ in the R&M-consistent condition, L2 speakers had no such inclination and, arguably, exhibited no clear preference in either condition.

Thus, L2 speakers did not show sensitivity to the prosodic cues introduced like L1 participants did; instead, what seemed to contribute to L2ers’ interpretative preferences was WM. On the one hand, higher Backward Digit Span scores led to a lower probability of choosing the OBJ, irrespective of Condition. This finding is in line with the proposition put forth by Bel et al. (Reference Bel, Sagarra, Comínguez and García-Alcaraz2016), namely that memory-related principles (e.g., recency) contribute to the selection of antecedents for anaphora resolution in the L2; hence, low span L2 participants chose the non-distant OBJ referent more often than high span L2 speakers. Surprisingly, the opposite relationship was detected when Reading Span scores were examined (for dissociations between Reading and Backward Digit Span, see also Farmer et al., Reference Farmer, Fine, Misyak and Christiansen2017). The OBJ candidate was more likely to be selected with better performance in the Reading Span task. This result is more in line with Swets et al. (Reference Swets, Desmet, Hambrick and Ferreira2007) who also observed that high reading span readers preferred local referents of ambiguous relative pronouns. However, this claim was not made for L2 speakers; rather, it applied to L1 participants in their study. The present L1 results do not suggest a similar positive relation between Reading Span and nonsubject preference; instead, higher Reading Span scores were predictive of greater odds of switching between referential interpretations as a function of the textual version (Condition) in the L1 group. Thus, it may be concluded that WM influences pronoun interpretative preferences in both groups, albeit in dissimilar ways; for L2ers, WM interacts with other cues to guide resolution (e.g., recency, subjecthood, etc.), but there is no additional influence of the prosodically differentiated context.

The time course of anaphora resolution

The analysis of reading time data from the L1 group revealed that rhythmic and phonological regularity slowed down reading rate in the region containing the two competing antecedents. There was no effect on the subsequent region containing the ambiguous anaphoric pronoun. These findings may be indicative of a rhyme noticing effect or heightened attention to the rhyming candidate at final position on line 3. It is also possible that readers experienced cue competition in the R&M-consistent condition and, because they expected to be asked about the pronoun’s referent, they spent additional time to process the critical line with the two referents. This may then have led to somewhat speedier interpretation responses, if a commitment to a parse had already been made. Placing these results in a broader theoretical context, the fact that rhythmic and phonological regularity did not lead to a speed-up but a slow-down points toward a semantic/conceptual disfluency cost; that is, the presence of regular prosodic cues may have rendered resolution more effortful and/or increased perceived ambiguity (e.g., Menninghaus et al., Reference Menninghaus, Bohrn, Knoop, Kotz, Schlotz and Jacobs2015; Wallot & Menninghaus, Reference Wallot and Menninghaus2018).

It is noteworthy that this slow-down on line 3 was modulated by individuals’ interpretative biases. Participants who switched interpretative preferences in response to the prosodic cues took longer to process line 3 when R&M was consistent. No differences between conditions in terms of reading rate were detected for participants with fixed interpretative biases. Additionally, the fact that higher Reading Span scores were associated with being a Switcher provides support to Nieuwland and Van Berkum (Reference Nieuwland and Van Berkum2006) assertion, namely that high span readers consider alternative interpretations and integrate multiple types of information during resolution.

Comparisons between L1 and L2 speakers showed that the prosodic cues affected reading rate in the L1 group, but not in the L2 group. Although in the combined analyses of L1 and L2 data it was observed that both groups slowed down in the R&M-consistent condition, this effect was not observed in the L2 group when examined separately. Thus, result patterns are quite complex, allowing for different conclusions to be reached depending on whether the lack of interactions in the combined analyses or the differential effect sizes in the separate analyses are deemed more reliable. Since L2 participants did not slow down to the same extent as L1 readers, we believe this may go some way in explaining why L2 readers experienced delays later on when responding to the interpretation question in the R&M-consistent condition. In more detail, it is possible that L1 speakers followed an expediting strategy: they spent more time early on to encode competing information and decide on one of the two referents; once the interpretation had been determined, this allowed for somewhat speedier RTs to the interpretation question. By contrast, L2 speakers might have followed a delaying strategy; they did not slow down to assess which one of the two competing candidates could become the referent of the upcoming pronoun; as a result, additional time was needed when prompted to arrive at an interpretation at the question stage.

On the one hand, the lack of significant effects when the L2 data were examined separately may suggest “non-native-like,” shallower or otherwise qualitatively distinct behavior in the L2 (Clahsen & Felser, Reference Clahsen and Felser2006; Sorace & Filiaci, Reference Sorace and Filiaci2006); more broadly, they align with previous studies’ findings of reduced sensitivity to prosodic and other form-based cues in the L2 (Foltz, Reference Foltz2021; Martin et al., Reference Martin, Thierry, Kuipers, Boutonnet, Foucart and Costa2013; Nakamura et al., Reference Nakamura, Arai, Hirose and Flynn2020; Perdomo & Kaan, Reference Perdomo and Kaan2021; Roncaglia-Denissen et al., Reference Roncaglia-Denissen, Schmidt-Kassow, Heine and Kotz2015; Schafer et al., Reference Schafer, Takeda, Rohde and Grüter2015; Schmidt et al., Reference Schmidt, Pérez, Cilibrasi and Tsimpli2020). Yet, this may not be a complete picture; a slow-down in the face of R&M consistency did emerge in the L2 group, although in a different measure than in the L1 group, namely in RTs to the interpretation question. This result could be interpreted in line with accounts that do not argue for qualitative, but instead, quantitative differences and “delayed reflexes” in the L2 (Cunnings, Reference Cunnings2017; Hopp, Reference Hopp2006; McDonald, Reference McDonald2006; see also Clahsen & Felser, Reference Clahsen and Felser2018; Sorace, Reference Sorace2011). However, this explanation can be challenged since this slow-down was very much delayed in the L2 group, to the extent that it manifested in a postprocessing measure. Thus, this result may not be indicative of a deterministic, slower-acting effect in the L2, but to preferential reliance on distinct processing strategies between groups (e.g., an expediting strategy in the L1 and a delaying approach in the L2). Yet again, this proposal is not incontestable and it is up for debate whether this dissimilarly expressed effect points toward differential ability or differential utility of processing strategies between groups (see Kaan & Grüter, Reference Kaan, Grüter, Kaan and Grüter2021 for discussion).

Therefore, depending on the result patterns that the reader is inclined to assign more weight to, different conclusions may be reached and findings may be interpreted as providing support to alternative accounts. In our view, it is difficult to identify a single account that can explain all aspects of the data. We believe this is partly because in most prior research emphasis has been placed on L1–L2 comparisons in order to establish to what extent the latter group behaves in a “native-like” way. An implicit assumption within this approach is that there is an L1 “standard” against which the L2 performance is to be evaluated. Yet, as mentioned above, we discovered participant-level variability in L1 speakers, which casts doubt on the existence of this uniform L1 benchmark; this also limits any claims we can make about L2 divergence from L1 norms. For instance, we found that L1 Biased participants (i.e., those who exhibited fixed pronoun interpretative preferences and were characterized by lower WM) were not affected by the prosodic cues during anaphora resolution. This highlights that arguing for a L2-specific difficulty or deficiency may be problematic, since a subgroup of L1 participants also did not exhibit sensitivity to the prosodic cues introduced during anaphora resolution. By pointing this out, we merely wish to highlight that alongside L1–L2 differences, there is value in examining anaphora resolution through an “individual differences” lens, which brings us to our next point.

In terms of individual differences in the L2 group, the critical region containing the two referents was processed faster by L2 comprehenders with higher WM and proficiency level in English. Additionally, as in the L1 group, L2 participants who switched interpretative preferences in response to the prosodic cues also took longer to process critical line 3 in the R&M-consistent condition. However, this result was observed only when both L1 and L2 data were examined together in the same model; when L2 data were examined separately, there was no effect of the Switch measure in the L2 group. This suggests that, even though L2 speakers also switched preferences in response to the prosodic cues just like L1 comprehenders, the slow-down that the L2 Switchers experienced was not as pronounced as in the L1 group. In addition to this, unlike in the L1 group, in the L2 group neither reading span nor any other comprehender-dependent measure proved to be predictive of switching tendencies. This may be because there was not enough variability in the L2 group in relation to the Switch measure. In fact, L2 participants were predominantly Biased, which suggests that interpretative preferences may be more fixed or likely to persist in the L2 (Cunnings et al., Reference Cunnings, Fotiadou and Tsimpli2017; Jacob & Felser, Reference Jacob and Felser2016; Pozzan & Trueswell, Reference Pozzan and Trueswell2016).

Given all the information discussed above, it seems unwarranted to claim that the prosodic cues induced effects only in L1 speakers. Instead, it seems better to conclude that the effects of phonological and rhythmic regularity are reserved for individuals who consider contextual cues and activate alternative referential interpretations, rather than for those with fixed biases. There is reason to believe that this is not something specific to the L1 group. We identified participants who switched interpretative preferences in the L2 group too. However, compared to L1 Switchers, L2 Switchers were fewer and their switching tendencies were not explained by WM or any other measure. Future research is needed to explore what drives switching behavior in L2 speakers and whether similar switching rates exist in L2 groups with different L1 backgrounds and proficiency levels.

Limitations

Before concluding, we wish to acknowledge the limitations of this study. Our sample of L1 speakers is smaller than that of previous studies (e.g., Arnold et al., Reference Arnold, Strangmann, Hwang, Zerkle and Nappa2018), and this may have contributed to the lack of significant results in our individual differences analyses. The same may also apply in the case of the L2 group. Additionally, there are certain outstanding questions that our study gives rise to. One of them relates to whether similar effects emerge when the material is presented as prose and/or when a nonsubject antecedent is not a line-final element. Another one concerns how rhyme influences the accessibility of an object antecedent, independent of alterations to the subject (recall that in our stimuli the rhyme manipulation affected the OBJ, but the meter disruption led to a change in both the SUB and OBJ, rendering them disyllabic). Our study cannot provide answers to these questions. This is either because they were outside the scope of our research (e.g., we did not manipulate the placement of antecedents within lines) or because we could not identify a way to address them that would not have introduced confounds (e.g., syllable count disparities between conditions affecting the antecedents’ salience and the line-by-line reading time results; see also the Materials section under Study 1). As such, the aforementioned questions constitute interesting avenues for future research and remain to be explored.

Conclusion

The present results provide evidence of differential effects of prosodic cues both between and within L1 and L2 groups. The findings also highlight the distinct contribution that WM makes for L1 and L2 participants in anaphora resolution. Despite the differences between L1 and L2 speakers in terms of the magnitude and the temporal emergence of effects detected, in both cases we observed cue competition during processing and more so for readers who experienced the texts as highly ambiguous rather than for individuals with fixed pronoun interpretative biases.

Replication package

Anonymized data, coding scripts, model output, and supplementary material can be found at https://osf.io/ucx6q/.

Funding statement

This study was funded by the Economic and Social Research Council (Project Reference: 2275541).

Competing interests

The authors declare that they have no conflict of interest.

Appendix A. Critical Stimuli

Below are all 16 critical items used in the study. Slashes mark line breaks. The lines in bold highlight the R&M manipulation on the 3^rd line (R&M-consistent first, followed by the R&M-disrupted condition).

1. By the house that’s cold and dark // with the dogs that always bark // often John encounters Mark //Cooper encounters Andrew // when he quickly tries to park // by the house that’s cold and dark
2. At the dark and crowded gym // where the lights were always dim // often Rob would wave at Tim // Tyler would wave at Conor // when he’d get in line to swim // in the pool inside the gym
3. At a really crowded place // by the school where kids would race // often Belle would stretch with Grace // Caitlin would stretch with Jasmine // when she’d get the urge to race // all the kids around that place
4. At the park where boys would play // and the girls would read all day // often Eve would contact May // Penny would contact Helen // when she’d choose to sit and stay // with the boys who liked to play
5. At a pub that wasn’t great // and would always open late // often Paul would message Nate // Dennis would message Barney // when he ordered drinks and ate // at the pub that wasn’t great
6. Even though they said they’d train // everyday despite the rain // lately Seth would sit with Zain // Austin would sit with Callum // when he sadly couldn’t train // due to serious muscle pain
7. By the leafy maple tree // where the girls liked drinking tea // often Joy would wink at Bea // Carla would wink at Jenna // when she’d pause to hear and see // all the birds above the tree
8. Since they never thought to look // for their mother’s cooking book // often Paige consulted Brooke // Catherine consulted Scarlett // when she got prepared to cook // as they didn’t have the book
9. Though they always made a mess // during tests because of stress // often Hope would smile at Tess // Margot would smile at Hailey // when she’d take a seat and guess // all the answers in the test
10. By the small and swallow lake // where the boys had seen the snake // often Luke would lean on Jake // Nathan would lean on Harvey // when he’d quickly try to take // many pics that scared the snake
11. By the house that’s white and black // and has ducks that always quack // often Fred would chat with Jack // Finley would chat with Arthur // when he’d sit to drink and snack // by the house that’s white and black
12. At impressive fashion shows // where they wore their fancy clothes // often Beth would step on Rose // Sophie would step on Hannah // when she’d madly try to pose // in distinct designer clothes
13. At a park where children played // if they found a place with shade // often Ruth would rest with Jade // Maddie would rest with Lauren // when she felt unwell and made // silly comments while they stayed
14. At the local market square // where they held the annual fair // often Faith would meet with Claire // Florence would meet with Courtney // when she’d shop for clothes to wear // at the clubs around the square
15. On their long and tiring trip // on the massive sailing ship // often Tom would elbow Pip // Simon would elbow Danny // when he’d try to run and skip // on the massive sailing ship
16. By the school’s enormous gate // where the late arrivals wait // often Gwen would glance at Kate // Meghan would glance at Carmen // when she’d leave to go and skate // near the school’s enormous gate

Appendix B. Interpretative Preferences by Condition, Group and Switch

Table B1. Whole-group mean percentage of SUB and OBJ responses by Condition, Switch and group (L1 and L2 group)

Note: SD derived from the pooled mean (i.e., grand mean of individuals’ mean preferences).

Footnotes

1 It is noted that the emergence of implicit causality effects depends on the conjunction used to link clauses (Ehrlich, Reference Ehrlich1980). Effects are most dependably observed when causal conjunctions are used (e.g., because); temporal conjunctions (e.g., when) can also imply a causal relation.

2 https://www.cambridgecognition.com/cantab/cognitive-tests/memory/digit-span-dgs/

3 https://learnenglish.britishcouncil.org/online-english-level-test

4 https://prolific.co/

5 Regarding lexical choices, the proper names used were checked against registration data for baby names in England and Wales for 2019 by the Office for National Statistics (2020). Only names listed therein and their common diminutive forms were considered for the stimuli. Where possible, the two names used in each item would be matched for character length and popularity rank (ranked based on count of registered babies born and given a specific name); for example, the names Rose-Beth (Beth short for Elizabeth) were paired as SUB-OBJ in one of the critical items as they had been ranked 56^th and 55^th, respectively. In terms of the main clause verb semantics, care was taken to avoid any that would involve implicit causality. The lists provided by Hartshorne and Snedeker (Reference Hartshorne and Snedeker2013) were consulted so that any SUB- (e.g., frighten) or OBJ-(e.g., dislike) biasing verbs would not be considered. Instead, where possible, neutral verbs such as “rest with” and “encounter” were selected. For similar reasons, subordinating conjunctions such as “because” were avoided and instead “when” was chosen to link the main clause on the 3^rd verse with the dependent clause on the 4^th in all items.

6 Tukey’s rule was applied to identify participant- and condition-specific IQR-based lower and upper bounds; any values smaller than the lower bound and larger than the upper bound would be replaced by these respective cutoff values instead.

7 It is noted that in some cases Biased participants exhibited an overall preference for either the SUB or OBJ in only one condition but were balanced in their preferences (50% - 50%) in the other.

8 Note that in cases where random slopes lowered AIC, but the model converged with warnings (e.g., singular fit) or did not converge at all, the decision was made to specify intercepts for random effects.

9 Log-transformed data were used in analyses as the untransformed data were right-skewed; this was also the case in Study 2. The mean reading times and RT values reported in Table 3 and Table 7 are based on the geometric mean, that is, the antilog of the average of the logarithms of the data (Crawley, Reference Crawley2011) which were used in models, and its SE (Norris, Reference Norris1940).

10 Although we tried to match the L1 and L2 group in terms of age (L1 Median_AGE = 21; L2 Median_AGE = 22), an unpaired Wilcoxon rank sum test revealed that the two groups differed significantly (p < 0.001).

References

Acheson, D. J., Wells, J. B., & MacDonald, M. C. (2008). New and updated tests of print exposure and reading abilities in college students. Behavior Research Methods, 40(1), 278–289. https://doi.org/10.3758/brm.40.1.278 CrossRef Google Scholar

Arnold, J. E. (1998). Reference Form and Discourse Patterns. Doctoral Dissertation, Stanford University. https://www.proquest.com/openview/43afc359f194e1158c509e6c0cc2f7b8/1?pq-origsite=gscholar&cbl=18750&diss=y.Google Scholar

Arnold, J. E. (2010). How speakers refer: The role of accessibility. Language and Linguistics Compass, 4(4), 187–203. https://doi.org/10.1111/j.1749-818X.2010.00193.x CrossRef Google Scholar

Arnold, J. E., Strangmann, I. M., Hwang, H., Zerkle, S., & Nappa, R. (2018). Linguistic experience affects pronoun interpretation. Journal of Memory and Language, 102, 41–54. https://doi.org/10.1016/j.jml.2018.05.002 CrossRef Google Scholar

Arslan, S., Palasis, K., & Meunier, F. (2020). Electrophysiological differences in older and younger adults’ anaphoric but not Cataphoric pronoun processing in the absence of age-related behavioural slowdown. Scientific Reports, 10(1), 19234. https://doi.org/10.1038/s41598-020-75550-3 CrossRef Google Scholar

Au, T. K. (1986). A verb is worth a thousand words: The causes and consequences of interpersonal events implicit in language. Journal of Memory and Language, 25(1), 104–122. https://doi.org/10.1016/0749-596X(86)90024-0 CrossRef Google Scholar

Bassetti, B., Mairano, P., Masterson, J., & Cerni, T. (2020). Effects of orthographic forms on second language speech production and phonological awareness, with consideration of speaker-level predictors. Language Learning, 70(4), 1218–1256. https://doi.org/10.1111/lang.12423 CrossRef Google Scholar

Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1–48. https://doi.org/10.18637/jss.v067.i01 CrossRef Google Scholar

Bel, A., Sagarra, N., Comínguez, J. P., & García-Alcaraz, E. (2016). Transfer and proficiency effects in l2 processing of subject anaphora. Lingua, 184, 134–159. https://doi.org/10.1016/j.lingua.2016.07.001 CrossRef Google Scholar

Blohm, S., Wagner, V., Schlesewsky, M., & Menninghaus, W. (2018). Sentence judgments and the grammar of poetry: Linking linguistic structure and poetic effect. Poetics, 69, 41–56. https://doi.org/10.1016/j.poetic.2018.04.005 CrossRef Google Scholar

Boucher, V. J. (2006). On the function of stress rhythms in speech: evidence of a link with grouping effects on serial memory. Language and Speech, 49(4), 495–519. https://doi.org/10.1177/00238309060490040301 CrossRef Google Scholar PubMed

Breen, M. (2014). Empirical investigations of the role of implicit prosody in sentence processing. Language and Linguistics Compass, 8(2), 37–50. https://doi.org/10.1111/lnc3.12061 CrossRef Google Scholar

Breen, M., & Clifton, C. Jr (2011). Stress matters: Effects of anticipated lexical stress on silent reading. Journal of Memory and Language, 64(2), 153–170. https://doi.org/10.1016/j.jml.2010.11.001 CrossRef Google Scholar PubMed

Breen, M., & Clifton, C. Jr (2013). Stress matters revisited: A boundary change experiment. Quarterly Journal of Experimental Psychology, 66(10), 1896–1909. https://doi.org/10.1080/17470218.2013.766899 CrossRef Google Scholar PubMed

Brower, C. (1993). Memory and the perception of rhythm. Music Theory Spectrum, 15(1), 19–35. https://doi.org/10.2307/745907 CrossRef Google Scholar

Carminati, M. N., Stabler, J., Roberts, A. M., & Fischer, M. H. (2006). Readers’ responses to sub-genre and rhyme scheme in poetry. Poetics, 34(3), 204–218. https://doi.org/10.1016/j.poetic.2006.05.001 CrossRef Google Scholar

Chambers, C. G., & Smyth, R. (1998). Structural parallelism and discourse coherence: A test of centering theory. Journal of Memory and Language, 39(4), 593–608. https://doi.org/10.1006/jmla.1998.2575 CrossRef Google Scholar

Clahsen, H., & Felser, C. (2006). Continuity and shallow structures in language processing. Applied Psycholinguistics, 27(1), 107–126. https://doi.org/10.1017/S0142716406060206 CrossRef Google Scholar

Clahsen, H., & Felser, C. (2018). Some notes on the shallow structure hypothesis. Studies in Second Language Acquisition, 40(3), 693–706. https://doi.org/10.1017/S0272263117000250 CrossRef Google Scholar

Classon, E., Rudner, M., & Rönnberg, J. (2013). Working memory compensates for hearing related phonological processing deficit. Journal of Communication Disorders, 46(1), 17–29. https://doi.org/10.1016/j.jcomdis.2012.10.001 CrossRef Google Scholar PubMed

Clifton, C. Jr (2015). The roles of phonology in silent reading: A selective review. In Frazier, L. & Gibson, E. (Eds.), Explicit and Implicit Prosody in Sentence Processing: Studies in Honor of Janet Dean Fodor (pp. 161–176). Cham: Springer. https://doi.org/10.1007/978-3-319-12961-7_9 CrossRef Google Scholar

Contemori, C., Asiri, O., & Irigoyen, E. D. P. (2019). Anaphora resolution in L2 English: An analysis of discourse complexity and cross-linguistic interference. Studies in Second Language Acquisition, 41(5), 971–998. https://doi.org/10.1017/S0272263119000111 CrossRef Google Scholar

Contemori, C., & Dussias, P. E. (2020). The processing of subject pronouns in highly proficient L2 speakers of English. Glossa: A Journal of General Linguistics, 5(1), 1–19. https://doi.org/10.5334/gjgl.972 Google Scholar

Cowles, H. W., Walenski, M., & Kluender, R. (2007). Linguistic and cognitive prominence in anaphor resolution: Topic, contrastive focus and pronouns. Topoi, 26(1), 3–18. https://doi.org/10.1007/s11245-006-9004-6 CrossRef Google Scholar

Crawley, M. J. (2011). Central tendency. In Statistics: An Introduction using R (pp. 23–31). Wiley Online Library. https://doi.org/10.1002/9781119941750.ch3 Google Scholar

Cunnings, I. (2017). Parsing and working memory in bilingual sentence processing. Bilingualism: Language and Cognition, 20(4), 659–678. https://doi.org/10.1017/S1366728916000675 CrossRef Google Scholar

Cunnings, I., & Felser, C. (2013). The role of working memory in the processing of reflexives. Language and Cognitive Processes, 28(1–2), 188–219. https://doi.org/10.1080/01690965.2010.548391 CrossRef Google Scholar

Cunnings, I., Fotiadou, G., & Tsimpli, I. (2017). Anaphora resolution and reanalysis during L2 sentence processing: evidence from the visual world paradigm. Studies in Second Language Acquisition, 39(4), 621–652. https://doi.org/10.1017/S0272263116000292 CrossRef Google Scholar

de Hoop, H. (2013). Incremental optimization of pronoun interpretation. Theoretical Linguistics, 39(1–2), 87–93. https://doi.org/10.1515/tl-2013-0005 CrossRef Google Scholar

de la Fuente, I., Hemforth, B., Colonna, S., & Schimke, S. (2016). The role of syntax, semantics, and pragmatics in pronoun resolution: A cross-linguistic overview. In Holler, A. & Suckow, K. (Eds.), Empirical Perspectives on Anaphora Resolution (pp. 11–31). Berlin, Boston: De Gruyter. https://doi.org/10.1515/9783110464108-003 CrossRef Google Scholar

de Leeuw, J. R. (2015). jsPsych: A JavaScript library for creating behavioral experiments in a web browser. Behavior Research Methods, 47 (1), 1–12. https://doi.org/10.3758/s13428-014-0458-y CrossRef Google Scholar

Ehrlich, K. (1980). Comprehension of pronouns. The Quarterly Journal of Experimental Psychology, 32(2), 247–255. https://doi.org/10.1080/14640748008401161 CrossRef Google Scholar

Fabb, N. (1999). Verse constituency and the locality of alliteration. Lingua, 108(4), 223–245. https://doi.org/10.1016/S0024-3841(98)00054-0 CrossRef Google Scholar

Fabb, N. (2009). Symmetric and asymmetric relations, and the aesthetics of form in poetic language. European English Messenger, 18(1), 50–59. https://strathprints.strath.ac.uk/id/eprint/16673 Google Scholar

Fabb, N. (2015). What is Poetry?: Language and Memory in the Poems of the World. Cambridge: Cambridge University Press. https://doi.org/10.1017/CBO9780511736575 CrossRef Google Scholar

Fabb, N. (2022). Rhyme and Alliteration Are Significantly Different as Types of Sound Patterning. In Rhyme and Rhyming in Verbal Art, Language, and Song (Vol. 14, pp. 155–171). Finnish Literature Society. http://www.jstor.org/stable/j.ctv371cp40.11 Google Scholar

Fabb, N., & Halle, M. (2008). Meter in Poetry: A New Theory. Cambridge: Cambridge University Press. https://doi.org/10.1017/CBO9780511755040 CrossRef Google Scholar

Farmer, T. A., Fine, A. B., Misyak, J. B., & Christiansen, M. H. (2017). Reading span task performance, linguistic experience, and the processing of unexpected syntactic events. Quarterly Journal of Experimental Psychology, 70 (3), 413–433. https://doi.org/10.1080/17470218.2015.1131310 CrossRef Google Scholar PubMed

Fernández, E. M. (2002). Relative clause attachment in bilinguals and monolinguals. In Heredia, R. R. & Altarriba, J. (Eds.), Bilingual Sentence Processing (pp. 187–215). North-Holland. https://doi.org/10.1016/S0166-4115(02)80011-5 CrossRef Google Scholar

Foltz, A. (2021). Using prosody to predict upcoming referents in the L1 and the L2: The role of recent exposure. Studies in Second Language Acquisition, 43(4), 753–780. https://doi.org/10.1017/S0272263120000509 CrossRef Google Scholar

Fotiadou, G., Muñoz, A. I. P., & Tsimpli, I. (2020). Anaphora resolution and word-order across adulthood: Ageing effects on online listening comprehension. Glossa: A Journal of General Linguistics, 5(1), 1–29. https://doi.org/10.5334/gjgl.997 Google Scholar

Frisson, S., Koole, H., Hughes, L., Olson, A., & Wheeldon, L. (2014). Competition between or-thographically and phonologically similar words during sentence reading: evidence from eye movements. Journal of Memory and Language, 73, 148–173. https://doi.org/10.1016/j.jml.2014.03.004 CrossRef Google Scholar

Garnham, A., Traxler, M., Oakhill, J., & Gernsbacher, M. A. (1996). The locus of implicit causality effects in comprehension. Journal of Memory and Language, 35(4), 517–543. https://doi.org/10.1006/jmla.1996.0028 CrossRef Google Scholar PubMed

Goldman, S. R., Meyerson, P. M., & Coté, N. (2006). Poetry as a mnemonic prompt in children’s stories. Reading Psychology, 27(4), 345–376. https://doi.org/10.1080/02702710600846894 CrossRef Google Scholar

Hartshorne, J. K. (2014). What is implicit causality? Language, Cognition and Neuroscience, 29(7), 804–824. https://doi.org/10.1080/01690965.2013.796396 CrossRef Google Scholar

Hartshorne, J. K., & Snedeker, J. (2013). Verb argument structure predicts implicit causality: The advantages of finer-grained semantics. Language and Cognitive Processes, 28(10), 1474–1508. https://doi.org/10.1080/01690965.2012.689305 CrossRef Google Scholar

Hoorn, J. (1996). Psychophysiology and literary processing: ERPs to semantic and phonological deviations in reading small verses. In Kruez, R. J. & MacNealey, M. S. (Eds.), Empirical Approaches to Literature and Aesthetics (pp. 339–358). New Jersey: Ablex Publishing. https://psycnet.apa.org/record/1996-97973-019 Google Scholar

Hopp, H. (2006). Syntactic features and reanalysis in near-native processing. Second Language Research, 22(3), 369–397. https://doi.org/10.1191/0267658306sr272oa CrossRef Google Scholar

Hopp, H. (2022). Second language sentence processing. Annual Review of Linguistics, 8, 235–256. https://doi.org/10.1146/annurev-linguistics-030821-054113 CrossRef Google Scholar

Jacob, G., & Felser, C. (2016). Reanalysis and semantic persistence in native and non-native garden-path recovery. Quarterly Journal of Experimental Psychology, 69(5), 907–925. https://doi.org/10.1080/17470218.2014.984231 CrossRef Google Scholar PubMed

Johnson, J. L., & Hayes, D. S. (1987). Preschool children’s retention of rhyming and nonrhyming text: paraphrase and rote recitation measures. Journal of Applied Developmental Psychology, 8(3), 317–327. https://doi.org/10.1016/0193-3973(87)90007-4 CrossRef Google Scholar

Johnston, R. S., & McDermott, E. A. (1986). Suppression effects in rhyme judgement tasks. The Quarterly Journal of Experimental Psychology Section A, 38(1), 111–124. https://doi.org/10.1080/14640748608401587 CrossRef Google Scholar

Juffs, A., & Harrington, M. (2011). Aspects of working memory in L2 learning. Language Teaching, 44(2), 137–166. https://doi.org/10.1017/S0261444810000509 CrossRef Google Scholar

Just, M. A., & Carpenter, P. A. (1992). A capacity theory of comprehension: Individual differences in working memory. Psychological Review, 99(1), 122–149. https://doi.org/10.1037/0033-295X.99.1.122 CrossRef Google Scholar PubMed

Just, M. A., Carpenter, P. A., & Woolley, J. D. (1982). Paradigms and Processes in Reading Comprehension. Journal of Experimental Psychology: General, 111(2), 228–238. https://doi.org/10.1037/0096-3445.111.2.228 CrossRef Google Scholar PubMed

Kaan, E., & Grüter, T. (2021). Prediction in second language processing and learning: Advances and directions. In Kaan, E. & Grüter, T. (Eds.), Prediction in Second Language Processing and Learning (pp. 1–24). Amsterdam: John Benjamins. https://doi.org/10.1075/bpa.12.01kaa CrossRef Google Scholar

Kaiser, E. (2011). Focusing on pronouns: Consequences of subjecthood, pronominalisation, and con-trastive focus. Language and Cognitive Processes, 26(10), 1625–1666. https://doi.org/10.1080/01690965.2010.523082 CrossRef Google Scholar

Kaiser, E., & Trueswell, J. C. (2008). Interpreting pronouns and demonstratives in finnish: evidence for a form-specific approach to reference resolution. Language and Cognitive Processes, 23(5), 709–748. https://doi.org/10.1080/01690960701771220 CrossRef Google Scholar

Keating, G. D., VanPatten, B., & Jegerski, J. (2011). Who was walking on the beach?: Anaphora resolution in Spanish heritage speakers and adult second language learners. Studies in Second Language Acquisition, 33 (2), 193–221. https://doi.org/10.1017/S0272263110000732 CrossRef Google Scholar

Király, I., Takacs, S., Kaldy, Z., & Blaser, E. (2016). Preschoolers have better long-term memory for rhyming text than adults. Developmental Science, 20(3), e12398. https://doi.org/10.1111/desc.12398 CrossRef Google Scholar PubMed

Koornneef, A. W., & Van Berkum, J. J. (2006). On the use of verb-based implicit causality in sentence comprehension: Evidence from self-paced reading and eye tracking. Journal of Memory and Language, 54(4), 445–465. https://doi.org/10.1016/j.jml.2005.12.003 CrossRef Google Scholar

Kotz, S. A., & Schmidt-Kassow, M. (2015). Basal Ganglia contribution to rule expectancy and temporal predictability in speech. Cortex, 68, 48–60. https://doi.org/10.1016/j.cortex.2015.02.021 CrossRef Google Scholar PubMed

Langlois, V. J., & Arnold, J. E. (2020). Print exposure explains individual differences in using syntactic but not semantic cues for pronoun comprehension. Cognition, 197, 104155. https://doi.org/10.1016/j.cognition.2019.104155 CrossRef Google Scholar

Leong, V., Raheel, K., Sim, J. Y., Kacker, K., Karlaftis, V. M., Vassiliu, C., …, & Kourtzi, Z. (2022). A new remote guided method for supervised web-based cognitive testing to ensure high-quality data: Development and usability study. Journal of Medical Internet Research, 24(1), e28368. https://doi.org/10.2196/28368 CrossRef Google Scholar PubMed

Liu, B., Jin, Z., Wang, Z., & Xin, S. (2011). An ERP study on whether the P600 can reflect the presence of unexpected phonology. Experimental Brain Research, 212, 399–408. https://doi.org/10.1007/s00221-011-2739-3 CrossRef Google Scholar

Martin, C. D., Thierry, G., Kuipers, J., Boutonnet, B., Foucart, A., & Costa, A. (2013). Bilinguals reading in their second language do not predict upcoming words as native readers do. Journal of Memory and Language, 69 (4), 574–588. https://doi.org/10.1016/j.jml.2013.08.001 CrossRef Google Scholar

Martin-Chang, S. L., & Gould, O. N. (2008). Revisiting print exposure: Exploring differential links to vocabulary, comprehension and reading rate. Journal of Research in Reading, 31(3), 273–284. https://doi.org/10.1111/j.1467-9817.2008.00371.x CrossRef Google Scholar

McCarron, S. P., & Kuperman, V. (2021). Is the author recognition test a useful metric for native and non-native english speakers? An item response theory analysis. Behavior Research Methods, 53, 2226–2237. https://doi.org/10.3758/s13428-014-0534-3 CrossRef Google Scholar

McDonald, J. L. (2006). Beyond the Critical period: Processing-based explanations for poor grammaticality judgment performance by late second language learners. Journal of Memory and Language, 55(3), 381–401. https://doi.org/10.1016/j.jml.2006.06.006 CrossRef Google Scholar

Mennen, I., & De Leeuw, E. (2014). Beyond segments: Prosody in SLA. Studies in Second Language Acquisition, 36(2), 183–194. https://doi.org/10.1017/S0272263114000138 CrossRef Google Scholar

Menninghaus, W., Bohrn, I. C., Knoop, C. A., Kotz, S. A., Schlotz, W., & Jacobs, A. M. (2015). Rhetorical features facilitate prosodic processing while handicapping ease of semantic comprehension. Cognition, 143, 48–60. https://doi.org/10.1016/j.cognition.2015.05.026 CrossRef Google Scholar PubMed

Moore, M., & Gordon, P. C. (2015). Reading ability and print exposure: Item response theory analysis of the author recognition test. Behavior Research Methods, 47(4), 1095–1109. https://doi.org/10.3758/s13428-014-0534-3 CrossRef Google Scholar PubMed

Nakamura, C., Arai, M., Hirose, Y., & Flynn, S. (2020). An extra cue is beneficial for native speakers but can be disruptive for second language learners: Integration of prosody and visual context in syntactic ambiguity resolution. Frontiers in Psychology, 10, 2835. https://doi.org/10.3389/fpsyg.2019.02835 CrossRef Google Scholar PubMed

Nicklin, C., & Plonsky, L. (2020). Outliers in L2 research in applied linguistics: A synthesis and data re-analysis. Annual Review of Applied Linguistics, 40, 26–55. https://doi.org/10.1017/S0267190520000057 CrossRef Google Scholar

Nieuwland, M. S., & Van Berkum, J. J. A. (2006). Individual differences and contextual bias in pronoun resolution: Evidence from ERPs. Brain Research, 1118(1), 155–167. https://doi.org/10.1016/j.brainres.2006.08.022 CrossRef Google Scholar PubMed

Norris, N. (1940). The standard errors of the geometric and harmonic means and their application to index numbers. The Annals of Mathematical Statistics, 11(4), 445–448. https://www.jstor.org/stable/2235723 CrossRef Google Scholar

Nowbakht, M. (2019). The role of working memory, language proficiency, and learners’ age in second language English learners’ processing and comprehension of anaphoric sentences. Journal of Psycholinguistic Research, 48, 353–370. https://doi.org/10.1007/s10936-018-9607-2 CrossRef Google Scholar PubMed

Obermeier, C., Menninghaus, W., von Koppenfels, M., Raettig, T., Schmidt-Kassow, M., Otterbein, S., & Kotz, S. A. (2013). Aesthetic and emotional effects of meter and rhyme in poetry. Frontiers in Psychology, 4(10). https://doi.org/10.3389/fpsyg.2013.00010 CrossRef Google Scholar PubMed

Office for National Statistics. (2020). Baby names in England and Wales: 2019. https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/livebirths/bulletins/babynamesenglandandwales/2019.Google Scholar

Payne, B. R., Grison, S., Gao, X., Christianson, K., Morrow, D. G., & Stine-Morrow, E. A. L. (2014). Aging and individual differences in binding during sentence understanding: evidence from temporary and global syntactic attachment ambiguities. Cognition, 130(2), 157–173. https://doi.org/10.1016/j.cognition.2013.10.005 CrossRef Google Scholar PubMed

Perdomo, M., & Kaan, E. (2021). Prosodic cues in second-language speech processing: A visual world eye-tracking study. Second Language Research, 37(2), 349–375. https://doi.org/10.1177/0267658319879196 CrossRef Google Scholar

Pitt, M. A., & Samuel, A. G. (1990). The use of rhythm in attending to speech. Journal of Experimental Psychology: Human Perception and Performance, 16(3), 564–573. https://doi.org/10.1037/0096-1523.16.3.564 Google Scholar PubMed

Pozzan, L., & Trueswell, J. C. (2016). Second language processing and revision of garden-path sentences: A visual word study. Bilingualism: Language and Cognition, 19(3), 636–643. https://doi.org/10.1017/S1366728915000838 CrossRef Google Scholar PubMed

Pratt, E. (2017). Prosody in sentence processing. In Fernández, E. M. & Smith Cairns, H. (Eds.), The Handbook of Psycholinguistics (pp. 365–391). Hoboken, NJ: John Wiley & Sons Inc. https://doi.org/10.1002/9781118829516.ch16 CrossRef Google Scholar

Read, K., Macauley, M., & Furay, E. (2014). The seuss boost: Rhyme helps children retain words from shared storybook reading. First Language, 34(4), 354–371. https://doi.org/10.1177/0142723714544410 CrossRef Google Scholar

Read, K., & Regan, M. (2018). The cat has a…: Children’s use of rhyme to guide sentence completion. Cognitive Development, 47, 97–106. https://doi.org/10.1016/j.cogdev.2018.04.004 CrossRef Google Scholar

Rickert, W. E. (1984). Semantic consequences of rhyme. Text and Performance Quarterly, 4(2), 1–9. https://doi.org/10.1080/10462938409391552 Google Scholar

Roberts, L., Gullberg, M., & Indefrey, P. (2008). Online pronoun resolution in L2 discourse: L1 influence and general learner effects. Studies in Second Language Acquisition, 30(3), 333–357. https://doi.org/10.1017/S0272263108080480 CrossRef Google Scholar

Roncaglia-Denissen, M. P., Schmidt-Kassow, M., Heine, A., & Kotz, S. A. (2015). On the impact of L2 speech rhythm on syntactic ambiguity resolution. Second Language Research, 31(2), 157–178. https://doi.org/10.1177/0267658314554497 CrossRef Google Scholar

Roncaglia-Denissen, M. P., Schmidt-Kassow, M., & Kotz, S. A. (2013). Speech rhythm facilitates syntactic ambiguity resolution: ERP evidence. PloS ONE, 8(2), e56000. https://doi.org/10.1371/journal.pone.0056000 CrossRef Google Scholar PubMed

Rothermich, K., & Kotz, S. A. (2013). Predictions in speech comprehension: fMRI evidence on the meter-semantic interface. NeuroImage, 70, 89–100. https://doi.org/10.1016/j.neuroimage.2012.12.013 CrossRef Google Scholar PubMed

Rothermich, K., Schmidt-Kassow, M., & Kotz, S. A. (2012). Rhythm’s gonna get you: Regular meter facilitates semantic sentence processing. Neuropsychologia, 50(2), 232–244. https://doi.org/10.1016/j.neuropsychologia.2011.10.025 CrossRef Google Scholar PubMed

Schafer, A. J., Camp, A., Rohde, H., & Grüter, T. (2019). Contrastive prosody and the subsequent mention of alternatives during discourse processing. In Carlson, K., Clifton, C. , Jr. & Fodor, J. (Eds.), Grammatical Approaches to Language Processing: Essays in Honor of Lyn Frazier (pp. 29–44). Cham: Springer. https://doi.org/10.1007/978-3-030-01563-3_3 CrossRef Google Scholar

Schafer, A. J., Takeda, A., Rohde, H., & Grüter, T. (2015). Mapping Prosody to Reference in L2. Poster Presented at the 40th Boston University Conference on Language Development, Boston, MA. https://www.research.ed.ac.uk/en/publications/mapping-prosody-to -reference-in-l2.Google Scholar

Schimke, S., & Colonna, S. (2016). Native and nonnative interpretation of pronominal forms: Evidence from French and Turkish. Studies in Second Language Acquisition, 38(1), 131–162. https://doi.org/10.1017/S0272263115000303 CrossRef Google Scholar

Schmidt, E., Pérez, A., Cilibrasi, L., & Tsimpli, I. (2020). Prosody facilitates memory recall in L1 but not in L2 in highly proficient listeners. Studies in Second Language Acquisition, 42(1), 223–238. https://doi.org/10.1017/S0272263119000433 CrossRef Google Scholar

Schmidt-Kassow, M., Rothermich, K., Schwartze, M., & Kotz, S. A. (2011). Did you get the beat? Late proficient french-german learners extract strong–weak patterns in tonal but not in linguistic sequences. NeuroImage, 54(1), 568–576. https://doi.org/10.1016/j.neuroimage.2010.07.062 CrossRef Google Scholar PubMed

Smyth, R. (1994). Grammatical determinants of ambiguous pronoun resolution. Journal of Psycholinguistic Research, 23(3), 197–229. https://doi.org/10.1007/BF02139085 CrossRef Google Scholar

Sorace, A. (2011). Pinning down the concept of “interface” in bilingualism. Linguistic Approaches to Bilingualism, 1(1), 1–33. https://doi.org/10.1075/lab.1.1.01sor CrossRef Google Scholar

Sorace, A., & Filiaci, F. (2006). Anaphora resolution in near-native speakers of Italian. Second Language Research, 22(3), 339–368. https://www.jstor.org/stable/43103710 CrossRef Google Scholar

Stewart, A. J., Pickering, M. J., & Sanford, A. J. (2000). The time course of the influence of implicit causality information: Focusing versus integration accounts. Journal of Memory and Language, 42(3), 423–443. https://doi.org/10.1006/jmla.1999.2691 CrossRef Google Scholar

Swets, B., Desmet, T., Hambrick, D. Z., & Ferreira, F. (2007). The role of working memory in syntactic ambiguity resolution: A psychometric approach. Journal of Experimental Psychology: General, 136(1), 64–81. https://doi.org/10.1037/0096-3445.136.1.64 CrossRef Google Scholar PubMed

Valiouli, M. (1987). Parallel function, non-parallel function, implicit causality. Selected papers on Theoretical and Applied Linguistics, 1, 58–69. https://doi.org/10.26262/istal.v1i0.7214 Google Scholar

Van Rij, J., Van Rijn, H., & Hendriks, P. (2013). How WM load influences linguistic processing in adults: A computational model of pronoun interpretation in discourse. Topics in Cognitive Science, 5(3), 564–580. https://doi.org/10.1111/tops.12029 CrossRef Google Scholar PubMed

Vogelzang, M., Guasti, M. T., van Rijn, H., & Hendriks, P. (2021). How children process reduced forms: A computational cognitive modeling approach to pronoun processing in discourse. Cognitive Science, 45(4), e12951. https://doi.org/10.1111/cogs.12951 CrossRef Google Scholar PubMed

Wallot, S., & Menninghaus, W. (2018). Ambiguity effects of rhyme and meter. Journal of Experimental Psychology: Learning, Memory, and Cognition, 44(12), 1947–1954. https://doi.org/10.1037/xlm0000557 Google Scholar PubMed

Zecker, S. G., Tanenhaus, M. K., Alderman, L., & Siqueland, L. (1986). Lateralization of lexical codes in auditory word recognition. Brain and Language, 29(2), 372–389. https://doi.org/10.1016/0093-934X(86)90055-6 CrossRef Google Scholar PubMed

Table 1. Example of critical item

Table 2. Whole-group mean percentage of SUB and OBJ responses by condition (L1 group)

Table 3. Mean reading times and RTs to the interpretation question by condition in ms (L1 group)

Table 4. Mean scores on additional tasks (L1 group)

Figure 1. Difference in reading span task scores between L1 switchers and biased participants.

Table 5. Level and use of English in L2 participants (Study 2)

Table 6. Whole-group mean percentage of SUB and OBJ responses by condition (L2 group)

Table 7. Mean reading times and RTs to the interpretation question by condition in ms (L2 group)

Table 8. Mean scores on additional tasks (L2 group)

Figure 2. L1 & L2 percentage of OBJ preference by condition (CI errorbars based on the pooled mean).

Figure 3. Mean reading times by group and condition (SE error bars).

Figure 4. Mean reading times for Line 3 (top) and Line 4 (middle) and RTs to the Interpretation Question (bottom) by Group, Condition and Switch (SE error bars).

Table B1. Whole-group mean percentage of SUB and OBJ responses by Condition, Switch and group (L1 and L2 group)

Article contents

Individual differences in L1 and L2 anaphora resolution: effects of implicit prosodic cues and working memory

Abstract

Keywords

Individual differences in L1 speakers

L2 anaphora resolution

Individual differences in L2 speakers

Interim summary

The current studies

Exploring the effects of rhyme and meter

Hypotheses

Pronoun interpretative preferences

The time course of anaphora resolution

Individual differences

Design

Study 1: L1 speakers of English

Participants

Materials

Procedure

Data analyses

Results

Effects of the R&M manipulation on pronoun interpretative preferences

Effects of the R&M manipulation on reading times and RTs

Individual differences

Study 2: L2 English speakers

Participants

Materials

Procedure

Data analyses

Results

Effects of the R&M manipulation on pronoun interpretative preferences

Effects of the R&M manipulation on reading times and RTs

Individual differences

Study 1 & 2: Comparison between L1 and L2 group

Effects of the R&M manipulation on pronoun interpretative preferences

Effects of the R&M manipulation on reading times and RTs

General discussion

Pronoun interpretative preferences

The time course of anaphora resolution

Limitations

Conclusion

Replication package

Funding statement

Competing interests

Appendix A. Critical Stimuli

Appendix B. Interpretative Preferences by Condition, Group and Switch

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests