Introduction
There has been much discussion in second language acquisition literature on why processing in the native language (L1) and in a second language (L2) differs from each other. One circulating hypothesis is that constructions that are similar in the L1 and L2 are processed similarly, resulting in faster or more accurate processing, also known as (positive) L1 transfer. However, if the constructions differ, L2 learners have to employ different or less commonly used processing strategies, leading to slower or inaccurate processing (see reviews Clahsen & Felser, Reference Clahsen and Felser2006; Frenck-Mestre, Reference Frenck-Mestre, Altarriba and Herridia2002; seen in, e.g., wh-fronting Juffs, Reference Juffs2005). Another hypothesis is that L2 learners may experience difficulties in the interface between some linguistic domains in which the integration of information is required (Sorace, Reference Sorace2011; Sorace & Filiaci, Reference Sorace and Filiaci2006; White, Reference White2011). This view suggests that L2 learners are able to easily acquire simple syntactic structures, but they may find it more difficult to combine information from multiple constructs to adjust their interpretation of a sentence as it unfolds (Sorace, Reference Sorace2011), as reported for the processing of contrastive focus (Lee et al., Reference Lee, Perdomo and Kaan2020), wh-fronting (Marinis et al., Reference Marinis, Roberts, Felser and Clahsen2005), and anaphora resolution (Sorace & Filiaci, Reference Sorace and Filiaci2006). This may be due to less (detailed) knowledge that is automatically available or simply fewer cognitive resources available for the integration in the L2 (for a review see Sorace Reference Sorace2011). Although both of these hypotheses assume that L1 and L2 processing differs, this does not automatically mean that L2 learners cannot achieve L1-like proficiency. For example, highly proficient Dutch learners of English use different processing strategies than native speakers of English (Akker & Cutler, Reference Akker and Cutler2003; Chen & Lai, Reference Chen and Lai2012; Ge et al. Reference Ge, Chen and Yip2021a), but they can be as quick and accurate as native speakers as measured by reaction time at the end of a target sentence (Ge et al. Reference Ge, Chen and Yip2021a). This suggests that, instead of a ‘wrong’ approach, L2 listeners may simply use a different online processing strategy than L1 listeners (Bley-Vroman, Reference Bley-Vroman1983; Purdy, Reference Purdy2001). Taken together, L2 constructs that also occur in the L1 are processed with an L1-like strategy, but simpler ones that do not exist in the L1 can be acquired and processed with strategies similar to the ones employed by native speakers of the L2. If L2 constructs are complex and do not occur in the L1, or they require the integration of domains, L2 listeners tend to employ a strategy that differs from native speakers of either language, that is, an interlanguage processing strategy.
But what processing strategy do highly proficient L2 listeners employ when they are confronted by a challenging interface between individual constructs that occur in both their L1 and L2? The present study examined the processing strategies used during the integration of linguistic domains in the L2 in the face of complexity and L1-L2 similarities of the individual constructs, through the lens of prosodic processing in sentences with the focus particle only. We examined advanced listeners whose L1 and L2 are typologically highly similar in domains involved in the processing of sentences with only, i.e., prosody and semantics, but differ in the interface between those domains. Briefly, Dutch and English are similar in their use of prosody in focus marking (Ladd, Reference Ladd1980; Gussenhoven, Reference Gussenhoven1983; Trommelen & Zonneveld, Reference Trommelen, Zonneveld and van der Hulst1999) and the use of focus particles, such as only (Bouma et al., Reference Bouma, Hendriks and Hoeksema2007; Foolen et al., Reference Foolen, Gerrevink, Hogeweg and Prawiro-Atmodjo2009). However, Dutch and English differ when it comes to the interface between only and placement of accentuation caused by an unrelated syntactical difference concerning the positioning of the finite verb in a sentence (Bouma et al., Reference Bouma, Hendriks and Hoeksema2007; Foolen et al., Reference Foolen, Gerrevink, Hogeweg and Prawiro-Atmodjo2009; Ge et al. Reference Ge, Mulders, Kang, Chen and Yip2021b). Moreover, Dutch high school graduates often achieve advanced proficiency due to their English education starting from as early as the age of four years, with exposure sometimes starting earlier via (social) media, for example, undubbed English-spoken television shows (Michel et al., Reference Michel, Vidon, Graaff and Lowie2021). At the age of 16 to 18, Dutch high school graduates have achieved English A2-B2 level, with 90% reported to be able to engage in conversation (Michel et al., Reference Michel, Vidon, Graaff and Lowie2021).
Considering the high proficiency of Dutch learners of English and the dissimilarity between L1 and L2 only laying in the interface between linguistic domains, we examined whether existing similarities are sufficient for L1-like or native-like processing or whether a similarity in the interface is required as well. Specifically, we examined the anticipation and processing of prosodically realized focus in real time using an event-related potential (ERP) paradigm via electroencephalogram (EEG).
The prosodic-semantic interface in sentences with only
Constructs with only exist in both Dutch and English and require complex processing of the interface between prosody and semantics (Bouma et al., Reference Bouma, Hendriks and Hoeksema2007; Gussenhoven, Reference Gussenhoven1983; Filik et al., Reference Filik, Paterson and Liversedge2009; Foolen et al., Reference Foolen, Gerrevink, Hogeweg and Prawiro-Atmodjo2009; Ladd, Reference Ladd1980; Zimmermann & Onea, Reference Zimmermann and Onea2011). Semantically, only signals an upcoming contrastive focus (hereafter focus) in the utterance. It flags characteristics of the contrasted element as true and activates alternatives that are related to the contrast but have characteristics that are false (Filik et al., Reference Filik, Paterson and Liversedge2009; Rooth, Reference Regel, Meyer and Gunter1992; Zimmermann & Onea, Reference Zimmermann and Onea2011). This contrast is signaled by acoustic prominence, achieved via accentuation (i.e., placement of a pitch accent) on the word carrying the contrastive information in both English and Dutch (Gussenhoven, Reference Gussenhoven1983; Reference Gussenhoven2004; Ladd, Reference Ladd1980; Reference Ladd1996; Reference Ladd2008; Zimmermann & Onea, Reference Zimmermann and Onea2011). In example (1a), the accented element is ball, eliciting an object-focus reading. Only highlights that the ball is the one thing that John is throwing and that he is not throwing any other objects. Similarly, in (1b) containing a verb focus, throwing the ball is the one action that John does.
The Dutch equivalent alleen is similar to only semantically and is always coupled with a pitch accent elsewhere in the utterance. However, there is a critical difference in the importance of prosodic features in encoding the locus of focus, which is triggered by an independent difference between Dutch and English syntax. In English, verbs are preferably placed adjacent to their direct objects (Bouma et al., Reference Bouma, Hendriks and Hoeksema2007), resulting in only being more frequently placed in the preverbal position (1a and 1b) regardless of the locus of focus. Pitch accents are then used as the primary cue to locate focus. In contrast, Dutch has OV word order with the verb appearing before the object in main clauses and has in general a preference for alleen to be placed adjacent to the focal element (2a and 2b) (Bouma et al., Reference Bouma, Hendriks and Hoeksema2007; Foolen et al., Reference Foolen, Gerrevink, Hogeweg and Prawiro-Atmodjo2009). Thus, the position of alleen serves as an important cue to the locus of focus in tandem with accentuation.
The preferred positioning of only during production also affects where listeners expect focus to be (Ge et al., Reference Ge, Mulders, Kang, Chen and Yip2021b; Mulders & Szendroi, Reference Mulders and Szendroi2016). English listeners listening to English tend to wait for accentuation to determine the contrast (Ge et al., Reference Ge, Mulders, Kang, Chen and Yip2021b). In contrast, Dutch listeners listening to Dutch have a strong expectation for focus adjacent to alleen (Mulders & Szendroi, Reference Mulders and Szendroi2016), although they can also use prosodic cues to quickly reinterpret sentences with nonadjacent focus (Dimitrova, Reference Dimitrova2012; Mulders & Szendroi, Reference Mulders and Szendroi2016).
ERP research on focus processing
In the present study, we adopted an event-related potential (ERP) paradigm using electroencephalogram (EEG) to examine prosodic focus processing in English sentences with only in L1 and L2 listeners. By aligning the onset of a neural response with the acoustic onset of a pitch accent, a representation (ERP component) of how the brain processes and integrates accentuation with other information in real time can be formed (Luck, Reference Luck2014). Focus particles and context affect the accessibility of certain elements (Chafe, Reference Chafe1994), resulting in expectations for where the focus will be and eliciting specific ERP components (Dimitrova et al., Reference Dimitrova, Stowe, Redeker and Hoeks2012). For example, in question-answer pairs, a missing or incongruent pitch accent in the answer based on the focus position implied by the preceding question often elicits a negativity peaking at 400 ms (N400) (e.g., Dimitrova et al., Reference Dimitrova, Stowe, Redeker and Hoeks2012; Hruska et al., Reference Hruska, Alter, Steinhauer and Steube2000; Johnson et al., Reference Johnson, Clifton, Breen and Morris2003; Schumacher & Baumann, Reference Schumacher and Baumann2010; Toepel et al., Reference Toepel, Pannekamp, van der Meer, Alter, Horne, Lindgren, Roll and Torkildsen2009). The N400 reflects registrations of violations in morphosyntax, phrase structures, verb tenses, grammatical gender, and so on (e.g., Baumann & Schumacher, Reference Baumann and Schumacher2011; Osterhout & Holcomb, Reference Osterhout and Holcomb1992; Reference Osterhout and Holcomb1993; Osterhout et al., Reference Osterhout, Bersick and McLaughlin1997; Schumacher & Baumann, Reference Schumacher and Baumann2010; Toepel et al., Reference Toepel, Pannekamp, van der Meer, Alter, Horne, Lindgren, Roll and Torkildsen2009; Wicha et al., Reference Wicha, Morena and Kutas2004). This ERP component is often followed by a centroparietal positivity at approximately 600 ms (P600), associated with reevaluation of sentence meaning or difficulty in integrating information (Burkhardt, Reference Brown and Hagoort2007; Friederici, Reference Friederici2011; Hagoort, Reference Hagoort2003; Kaan et al., Reference Kaan, Harris, Gibson and Holcomb2000).
Prosodic focus processing elicits two additional ERP components. First, focus expectancy elicits a frontal negativity between 100–200 ms, i.e., the expectancy negativity (EN) (e.g., Heim & Alter, Reference Heim and Alter2006; Hruska & Alter, Reference Hruska, Alter, Steinhauer and Steube2004; Toepel et al., Reference Toepel, Pannekamp, van der Meer, Alter, Horne, Lindgren, Roll and Torkildsen2009). This is followed by ERP effects specific for processing the pitch accents, which have been found for both implicit prosody during reading (Cowles et al., Reference Cowles, Kluender, Kutas and Polinsky2007; Reichle & Birdsong, Reference Reichle and Birdsong2014; Stolterfoht et al., Reference Stolterfoht, Friederici, Alter and Steube2007) and explicit prosody during auditory stimulation. Auditory pitch accents elicit a positive deflection around 200–500 ms measured broadly over the scalp (Baumann & Schumacher, Reference Baumann and Schumacher2011; Dimitrova, Reference Dimitrova2012; Dimitrova et al., Reference Dimitrova, Stowe, Redeker and Hoeks2012; Heim and Alter, Reference Heim and Alter2006; Hruska et al., Reference Hruska, Alter, Steinhauer and Steube2001; Lee et al., Reference Lee, Perdomo and Kaan2020), interpreted as either frontocentral P300, depicting surprise (Baumann & Schumacher, Reference Baumann and Schumacher2011; Magne et al., Reference Magne, Astesano, Lacheret-Dujour, Morel, Alter and Besson2005), or broadly distributed P200(-like) ‘accent positivity’ reflecting the processing of the acoustic features of the pitch accent (Dimitrova Reference Dimitrova2012, Dimitrova et al., Reference Dimitrova, Stowe, Redeker and Hoeks2012; Heim and Alter, Reference Heim and Alter2006; Lee et al., Reference Lee, Perdomo and Kaan2020). Previously, it has been shown that this ‘accent positivity’ may be absent or weakened in L2 listeners when listening to contrastive focus (Lee et al., Reference Lee, Perdomo and Kaan2020).
To the best of our knowledge, there has been one published study on only and prosodic focus processing in L1 Dutch (Dimitrova, Reference Dimitrova2012). Dimitrova (Reference Dimitrova2012) found that alleen elicited an anticipation for adjacent focus, that is, rapid processing of the word, seen as a positivity at 100–200 ms, followed by the ‘accent positivity’. Nonadjacent focus was processed differently. Instead of an early positivity, accentuation elicited the EN, followed by the ‘accent positivity’. The author argued that the early processes both reflect expectancies but that the expectancies are built differently, thus affecting their polarity. That is, expectancy for the adjacent element is induced by alleen, whereas expectancy for the nonadjacent element depends on the lack of preceding prosodic focus marking (Dimitrova, Reference Dimitrova2012). Finally, nonadjacent focus also evoked a left anterior P600, suggesting the need to reevaluate the sentence. Together, these findings show that L1 Dutch listeners expect focus adjacent to alleen, although they are flexible and quick to adjust their interpretation when processing nonadjacent focus (Dimitrova, Reference Dimitrova2012).
The current study
The present study examined anticipation and processing of focus in the interface between linguistic domains by L1 English listeners and highly proficient L2 English-L1 Dutch listeners, in order to shed light on how language similarities within domains affect the processing between domains in L2 listeners in comparison to L1 listeners. We presented sentences with only with prosodic focus marking either adjacent (on the verb) or nonadjacent (on the object noun) to the focus particle (e.g., in ‘The dinosaur is only throwing the bucket’). The locus of focus could only be determined by processing and integrating the pitch accent correctly. Additionally, these sentences were presented either in isolation, that is, single sentences, or preceded by context. The purpose of the context was not to impose a contrast. Rather, it provided possible alternatives to the adjacent and nonadjacent words that could be activated by only (Filik et al., Reference Filik, Paterson and Liversedge2009; Rooth, Reference Regel, Meyer and Gunter1992; Zimmermann & Onea, Reference Zimmermann and Onea2011), contexualising the use of only. This may facilitate processing of focus by alleviating cognitive resources (Dimitrova et al., Reference Dimitrova, Stowe, Redeker and Hoeks2012). To explore L2 strategies and examine them against L1 strategies, we formulate three research questions, accompanied by hypotheses and predictions.
Research question 1
Do L1 and L2 English listeners differ in their processing of prosodic focus on the element adjacent to only, and to what extent do they differ from L1 Dutch listeners?
Hypothesis 1
As only can imply prosodic focus to its adjacent element in both languages (Bouma et al., Reference Bouma, Hendriks and Hoeksema2007; Foolen et al., Reference Foolen, Gerrevink, Hogeweg and Prawiro-Atmodjo2009), we hypothesize that L1 English listeners will show similar processing patterns as L1 Dutch listeners (Dimitrova, Reference Dimitrova2012) (Hypothesis 1a), and that L2 English listeners will do so too regardless of the presence or absence of the preceding context, as a result of positive L1 transfer (Clahsen & Felser, Reference Clahsen and Felser2006; Frenck-Mestre, Reference Frenck-Mestre, Altarriba and Herridia2002) (Hypothesis 1b). Our prediction is that both L1 and L2 listeners will show an early broadly distributed positive peak (100–200 ms), reflecting expectancy in the form of quick processing of the adjacent element (Dimitrova Reference Dimitrova2012). This early positive peak will be followed by a broadly distributed ‘accent positivity’ (200–390 ms), reflecting the processing of the acoustic features of the pitch accent (Dimitrova et al., Reference Dimitrova, Stowe, Redeker and Hoeks2012; Heim and Alter, Reference Heim and Alter2006; Lee et al., Reference Lee, Perdomo and Kaan2020).
Research question 2
Do L1 and L2 English listeners differ in their processing of prosodic focus when the focused element is not adjacent to only, and do they differ from L1 Dutch listeners?
Hypothesis 2
Because only is more likely to be accompanied by nonadjacent focus in English than in Dutch (Bouma et al., Reference Bouma, Hendriks and Hoeksema2007; Foolen et al., Reference Foolen, Gerrevink, Hogeweg and Prawiro-Atmodjo2009), we hypothesize that L1 English listeners will show slightly different processing patterns than those found in L1 Dutch listeners (Dimitrova, Reference Dimitrova2012) (Hypothesis 2a). As advanced Dutch learners of English have been shown to exhibit slow or absent integration of prosodic and focus information in English (Akker & Cutler, Reference Akker and Cutler2003; Ge et al., Reference Ge, Mulders, Kang, Chen and Yip2021b), we hypothesize that L2 listeners will experience difficulty in integrating different strands of linguistic information in L2 (Sorace, Reference Sorace2011; Sorace & Filiaci, Reference Sorace and Filiaci2006) and employ their own strategy to process nonadjacent focus, different from L1 English listeners’ strategy and L1 Dutch listeners’ strategy (Dimitrova, Reference Dimitrova2012) (Hypothesis 2b). Specifically, our prediction is that L1 English listeners will expect focus on the nonadjacent position and show an early EN (100–200 ms), as this focus position is cued by the lack of prosodic marking in the preceding verb (Dimitrova, Reference Dimitrova2012; Heim & Alter, Reference Heim and Alter2006; Hruska & Alter, Reference Hruska, Alter, Steinhauer and Steube2004; Toepel et al., Reference Toepel, Pannekamp, van der Meer, Alter, Horne, Lindgren, Roll and Torkildsen2009), which will then be followed by the ‘accent positivity’ (200–390 ms). Different from L1 Dutch processing (Dimitrova, Reference Dimitrova2012), we do not expect to see broadly distributed P600 (500–900 ms) in L1 English processing. As both focus positions are possible in English, reevaluation of the sentence meaning should not be needed (Friederici, Reference Friederici2011; Ge et al., Reference Ge, Mulders, Kang, Chen and Yip2021b; Hagoort, Reference Hagoort2003; Kaan et al., Reference Kaan, Harris, Gibson and Holcomb2000). Our prediction for L2 listeners is that they will not expect the nonadjacent focus and therefore not show an EN, but they will be able to react to the pitch accent itself (‘accent positivity’ at 200–390 ms), after which sentence reevaluation will occur as shown by a broadly distributed P600 (500–900 ms).
Research question 3
Will the presence of context adjust L2 listeners’ processing of prosodic focus?
Hypothesis 3
We hypothesize that the presence of a preceding context will alleviate the cognitive load in L2 listeners (Chafe, Reference Chafe1994; Filik et al., Reference Filik, Paterson and Liversedge2009; Zimmermann & Onea, Reference Zimmermann and Onea2011). Our prediction is that L2 listeners will show more L1 English-like processing patterns of contrastive focus in sentences preceded by a context than in isolated sentences.
Method
Participants
Twenty native speakers of English (eight male, 18–34 years old, M = 24 years, SD = 4.5 years, hereafter the L1 group) and thirty-seven native speakers of Dutch (fourteen male, 19–26 years old, M = 22 years, SD = 1.8 years, hereafter the L2 group) participated in the ERP experiment.Footnote 1 Two female L1 participants were excluded due to inattentiveness as measured with a semantic relatedness task (see Procedure). Four L2 participants (two male) were excluded due to technical failure (n = 2), being left-handed (n = 1), or inattentiveness (n = 1). The remaining participants were right-handed, had normal or corrected to normal vision, and reported not having any neurological, psychiatric, hearing, or language impairments.
All L2 participants were university students and frequently exposed to English in their daily lives at the time of testing, for example, through (social) media (Michel et al., Reference Michel, Vidon, Graaff and Lowie2021) (n = 5) or doing a degree program that was taught in English (n = 28). The test LexTALE (Lemhöfer & Broersma, Reference Lemhöfer and Broersma2012) was used to assess the L2 participants’ English proficiency. LexTALE is a short validated vocabulary test in which test takers see a written word and rate whether it is an English word or a nonword. Although it only tests vocabulary knowledge, LexTALE scores correlate with general English proficiency as tested in the Test of English for International Communication (TOEIC) and the Quick Placement Test (QPT). On average, a LexTALE score below 59% corresponds to B1 level or lower, 60–80% to B2, and 80–100% to C1/C2 (Lemhöfer & Broersma, Reference Lemhöfer and Broersma2012). In our study, the participants achieved a mean LexTALE score of 87.88 (n = 30; three participants had missing data; score range: 65 to 100; SD = 9.97; 95% CI = [86.70; 89.05]), implying that the L2 group was highly proficient. Considering the high exposure to English and high LexTALE scores, we decided to not include proficiency as a factor in the analyses.
Stimuli
Sixty unique pairs of experimental stimuli (Table 1) and 118 fillers (Table 2) were derived from stimuli used in Ge et al. (Reference Ge, Mulders, Kang, Chen and Yip2022).Footnote 2 The stimuli were recorded by a male native speaker of British English (age 25–27 years, from York, the United Kingdom) at 44.1kHz sampling frequency with 16 bits resolution in a shielded recording booth. The speaker was instructed to naturally produce context and target sentences with half of the 120 stimuli containing accentuation on the verb (Table 1.A), and the other half on the object (Table 1.B). The context sentences presented two possible objects and actions, and hinted at a correction. The correction occurred in the target sentences, which could only be correctly interpreted by accurate processing of the accentuation. To test the effect of context, we created an additional 120 stimuli by removing the context from the above-mentioned stimuli (Table 1.C and D). In total, we used 240 experimental items.
* Target sentences C and D are not preceded by context sentences.
Two sets of fillers lacking the focus particle only were used. The first set consisted of target sentences containing an explicit correction, either preceded by context sentences with three objects and an action (n = 26, Table 2.A) or lacked these context sentences (n = 52, Table 2.B). The target sentences were semantically and prosodically congruent with the context sentences. The second set contained context sentences with an additional sentence, followed by either a target sentence with a congruent pitch accent (n = 20, Table 2.C), or incongruent pitch accent (n = 20, Table 2.D). The target sentences were semantically incongruent with the context sentences. Despite the limited variety of filler types, we believed they were sufficiently different from the experimental stimuli to distract the participant from our research purpose, confirmed by answers that the participants provided in an informal questionnaire after the EEG recording.
* Accented words are in capitals.
** This type of filler did not have context sentences.
Twelve lists were created with all experimental and filler items (n = 358); participants were randomly assigned to one of the lists. Six of these lists were uniquely pseudo-randomized, meaning that (1) items with the same target sentence were never presented consecutively, (2) no more than three stimuli with the same accentuation pattern were presented consecutively, and (3) no more than three stimuli with the same context were presented consecutively. These six lists were reversed for the remaining six lists such that the first trial of the original list became the last trial. This procedure counterbalanced the trial order across participants, minimizing effect of fatigue in the later trials. Each list was preceded by the same practice round consisting of four items. Two of which were similar to the experimental items and two to the filler items. See Supplementary Materials A for the full list of stimuliFootnote 3.
Acoustic analysis
To establish that the stimuli in the two accentuation conditions were acoustically different and were therefore appropriate for the ERP experiment, we conducted an acoustic analysis on the critical words (i.e., verbs and object nouns). Mean (f0), minimum (f0min), and maximum pitch (f0max), pitch range (f0range), and duration were extracted from the critical words using Praat (Boersma & Weenink, Reference Boersma and Weenink2022), and were analyzed with Wilcoxon signed-rank test given that the acoustic data were skewed and contained missing values due to differences in voice quality.
We expected that accentuation would result in longer duration, higher f0 max, lower f0 min, and a larger f0 range. Indeed, the results (Table 3) showed that on average the accented verbs were 36 ms longer, had a 74Hz higher mean pitch, a 1Hz lower minimum pitch, a 203Hz higher maximum pitch, and a 112Hz larger pitch range (i.e. difference between the maximum and minimum pitch) than their unaccented counterparts. Similarly, in comparison to the unaccented object nouns, the accented object nouns were on average 43 ms longer, had a 119Hz higher mean pitch, a 140Hz higher maximum pitch, and 71Hz larger pitch range. Thus, the accentuation resulted in the expected acoustic changes and the stimuli could be used without further manipulation.
* Significance at p < 0.05.
Procedure
This study was conducted in accordance with the Ethics Assessment Committee Linguistics (ETCL) of the Faculty of Humanities at Utrecht University. All participants provided informed consent and their visit (approximately 2.2 hours) was compensated with 23 euros.
The participants were seated in front of a computer screen in a quiet, unshielded room in the EEG lab of the Institute for Language Sciences. After electrode application and instructions, auditory stimuli were presented with Presentation software (v.20.0, Neurobehavioral Systems, 2019) via loudspeakers, positioned on either side of the screen. The experiment was preceded by four practice trials, after which the participant had the opportunity to ask questions to the experimenter. The remaining trials were divided into six blocks of fifty-one trials and one block of fifty-six trials. Each block was approximately 11 minutes long and was followed by a 2-minute obligatory break, allowing the participants to blink and stretch. In addition, optional breaks were provided approximately every 4 minutes and were as long as the participants needed. During auditory stimulation, to minimize eye movement the participants were instructed to fixate on a white cross against a black background on the computer screen, which appeared 100 ms prior to stimulus presentation and remained until 1000 ms after stimulus presentation. Each auditory stimulus was followed by a blink trial, in which the cross was replaced by three dashes for 2000 ms to indicate that participants could blink between trials.
To ensure that the participants were attentive to the stimuli, a semantic relatedness task, similar to the one used in Dimitrova et al.’s (Reference Dimitrova, Stowe, Redeker and Hoeks2012) Dutch prosodic processing study, was implemented in 25% of the trials. By only implementing the task in a quarter of the trials, we minimized task-related ERPs (Dimitrova et al., Reference Dimitrova, Stowe, Redeker and Hoeks2012). In these ninety-four trials, a word was presented visually at the end of a trial. The participants were instructed to decide whether the presented word was related to the fragment they last heard by pressing the left shift key on a keyboard for YES or the right shift key for NO. These were visually marked by color to represent the different answers. These words were either unrelated or related to the subjects, verbs, or objects of the preceding audio stimulus. All words were always presented after the same stimulus and were balanced over the four word categories (for exact word content and related stimuli see Supplementary Materials A). After the participants’ response, the blink trial was presented before the next trial was initiated.
EEG was recorded in sixty-four channels at 2048 Hz using a sixty-four-channel BioSemi cap with Ag/AgCl electrodes in the 10–20 configuration. Additional electrodes on both mastoids, next to both eyes, and above and below the left eye were used for offline referencing, correction of horizontal eye movements (hEOG), and correction of vertical eye movements (vEOG) or blinks respectively. Impedances were kept under ±20 μV and in rare cases ±30 μV. EEG markers were time-locked to the acoustic onset of the verbs and objects.
EEG preprocessing
The raw EEG signal was preprocessed using BrainVision Analyzer (v2.1.2.327, Brain Products, 2019). We performed the following steps in a chronological order. The data were first rereferenced to the average of both mastoids, after which they were bandpass-filtered at 0.1–35 Hz (24 dB slope), and downsampled to 500 Hz for further processing. Using the EEG markers, epochs were created from 100 ms before to 1000 ms after the onset of the verbs and objects. Next, mean amplitude in the 100 ms before the onset (–100 to 0 ms) was subtracted from every epoch as means of baseline correction. Finally, artefact rejection was done in two rounds. In the first round, trials that were contaminated by blinks or other eye movements were rejected from all channels. To this end, two new channels were created using the two hEOG and two vEOG electrodes. If, within an epoch, the signals in these two channels exceeded an amplitude of ±75 μV, displayed a voltage step of 50 μV or more between two adjacent sampling points or the difference in signal activity was lower than 0.5 μV in an interval of 100 ms, the whole epoch was rejected in all other channels. The second round served to reject trials contaminated by other artefacts, for example, muscle movement, on a channel-specific level. The same rejection criteria in the first round were applied to the signals of all sixty-four scalp electrodes. If the criteria were violated, an epoch was marked for rejection on an individual electrode and trial basis. In the end, 87%, 88%, 91%, and 91% of the trials in the L1 group, and 85%, 85%, 88%, and 89% in the L2 group in conditions A, B, C, and D, respectively, were included in the analyses.
To create the datasets for statistical analysis, the following steps were done in Matlab (R2019a, 2019). The trials marked for rejection were excluded on channel and electrode levels. ERPs, as area-under-the-curve data, per word in each condition were then calculated. To do so, epochs were averaged per channel into three time windows: an early time window (100–200 ms) to test for the frontal EN or the broadly distributed positivity, a middle time window (200–390 ms) for the global ‘accent positivity’, and a late time window (500–900 ms) for the anterior P600. These time windows were predefined based on previous research (Dimitrova et al., Reference Dimitrova, Stowe and Hoeks2015; Reference Dimitrova, Stowe, Redeker and Hoeks2010a; Reference Dimitrova, Stowe, Redeker and Hoeks2010b; Reference Dimitrova2012; Luck & Gaspelin, Reference Luck and Gaspelin2017; Regel et al., Reference Regel, Meyer and Gunter2014) and, in the case of the middle time window, also the average duration of the verbs in our study. Finally, the epochs per scalp electrode were assigned to one of nine scalp regions, which were based on Dimitrova (Reference Dimitrova2012) in order to compare our results with L1 Dutch: left anterior (Fp1, AF3, AF7, F3, F5, and F7), Left Central (FC3, FC5, C3, C5, CP3, and CP5), left posterior (P3, P5, P7, PO3, PO7, and O1), Right Anterior (Fp2, AF4, AF8, F4, F6, and F8), right central (FC4, FC6, C4, C6, CP4, and CP6), right posterior (P4, P6, P8, PO4, PO8, and O2), center anterior (FPz, AFz, and Fz), center central (FCz, Cz, and CPz), and center posterior (Pz, POz, and Oz).
In the end, we obtained two ERP datasets for the verbs and three datasets for the objects,Footnote 4 each in the separate time windows. Each dataset contained trial dataFootnote 5 per participant in four conditions per scalp region.
Statistical analysis
The EEG datasets were analyzed using mixed-effect linear regression modeling by means of R (R Core Team, 2015), RStudio (RStudio Team, 2019), and the package lmer4 package (v.1.1.21, Bates et al., Reference Bates, Maechler, Bolker and Walker2015). Separate models were created for the verbs for the early and middle time windows (as the late window went beyond the duration of the verb) and objects and for each of the three time windows, resulting in five different models.
The models were constructed using the outcome variable ERP and random factors Participant and Item. Fixed factors included Context (with or without), Accent (on the verb or object), Region of Interest (henceforth ROI: left anterior, center anterior, right anterior, left central, center central, right central, left posterior, center posterior, right posterior), and Group (L1 or L2). We also added a random slope of Accent in Participants to allow some individual variability in responses to the pitch accent. We added the factors iteratively and used the -2LL chi-square test to examine whether models improved (p <. 05).
Because we are interested in how listeners process accentuation (i.e., the effect of Accent) and how the presence of context affects accent processing (i.e., the interaction between Context and Accent), we only analyzed main effects of and interactions with Accent in detail. We also only examined the highest level of an interaction that included Accent. By limiting analyses to main effects and highest level of interactions with Accent, we could eliminate unnecessary analyses and thus lower our experiment-wise error rate (Luck & Gaspelin, Reference Luck and Gaspelin2017).
Significant interactions with Accent were further analyzed using pairwise comparisons using R (R Core Team 2015) and the package emmeans (v.1.3.5, Russell Lenth, 2019). As the emmeans package typically conducts all possible pairwise comparisons within an interaction, we chose to only analyze and manually adjust the p-value for simple comparisons of Accent and Context, differing only in one factor, using Benjamini–Hochberg correction. This correction method controls the expected proportion of false positives, while keeping a high power (Hemmelmann et al., Reference Hemmelmann, Horn, Süsse, Vollandt and Weiss2005).
Results
Performance on semantic relatedness task
To determine attentiveness, the participants needed to achieve a minimum accuracy of 80% in the semantic relatedness task. Three participants did not meet this requirement and were excluded from the analysis. The accuracy of the final sample ranged from 81% to 96% (n = 18, M = 87%, SD = 4.20%, 95% C.I. [89, 92]) in the L1 group, and from 81% to 96% (n = 33, M = 90%, SD = 3.83%, 95% C.I. [89, 91]) in the L2 group.
Results on the ERP datasets
Complete models, model summaries, and the emmeans output of relevant significant interactions can be found in Supplementary Materials B for the verbs and Supplementary Materials C for the objects. Only the significant findings are reported here and were taken from the best-fit models. All best-fit models contained the random effects of Item and Participant, with the random slope for Accent over Participants, and the fixed effects of Context, Accent, ROI, and Group. In addition, each model had several interactions that improved the final model (see Table 4).
Verbs: 100–200 ms time window
There were significant main effects of Intercept, Accent, ROI, and Group and significant interactions of Accent*ROI, Context*Group, Accent*Group, Context*Accent*Group, and Context*Accent*ROI*Group, although all effect sizes were rather small. The interaction Context*Accent*ROI*Group was tested further. Table 5 lists the emmeans output for reported findings (see Supplementary Materials B for the complete list).
* LA = left anterior; CA = center anterior; RA = right anterior; LC = left central; CC = center central; RC = right central; LP = left posterior; CP = center posterior; RP = right posterior.
As shown in Figure 1 and the left half of Figure 2, the L1 group responded to the accented verbs with more positivity than to the unaccented verbs in the absence of context in the 100–200 ms time window. This main effect of Accent was significant only for the left and center anterior regions, and right central region.
The main effect of Accent was also found for the L2 group, as shown in the right half of Figure 2 and Figure 3. However, it was only significant for the center and right anterior region in the presence of context.
The accented verbs therefore elicited an early anterior positive response in both groups but in different context conditions: without context for the L1 group and with context for the L2 group. This is in line with our predictions that both groups would show the early positivity and that the context would have an effect in the L2 listeners. The fact that they only showed this positivity in the presence of context suggests that they needed more contextual information during the processing of focus on the verb in the early time window.
Verbs: 200–390 ms time window
We found significant main effects of Intercept, Context, Accent, ROI, and Group and significant interactions of Context*Group, ROI*Group, Context*ROI*Group, and Accent*ROI*Group. The interaction of Accent*ROI*Group was further examined (see Table 5 for statistics).
Both the L1 and L2 groups showed a larger positivity to the accented than unaccented verbs that was broadly distributed (Figures 1, 2, and 3). The groups differed in involved regions: For the L1 group the positivity was present for the frontal, and center and right central regions, whereas all regions were involved in the case of the L2 group (Figure 1 and left half of Figure 2). Again, the results of the statistical analyses were in line with our prediction that the processing of the accented verbs would be reflected by the globally distributed positivity as a result of the processing of the pitch accent’s acoustic features.
Objects: 100–200 ms time window
For the objects in the first time window, we found a significant main effect of ROI and significant interactions of ROI*Group and Context*Accent*ROI*Group. The interaction Context*Accent*ROI*Group was further examined. However, after adjustment, there were no significant effects of accentuation in neither group in any of the scalp regions (see Supplementary B for all findings, and Figures 4, 5, and 6 for the ERP graphs). This suggests that neither the L1 nor the L2 group showed early responses to the pitch accent, which goes against our prediction that nonadjacent accentuation would elicit an early frontal negativity in the L1 group.
Objects: 200–390 ms time window
We did not find a significant main effect of Accent or significant interactions that included the effect of Accent in the best-fit model (see Supplementary B), as can also be observed between 200 to 390 ms in Figures 4 and 6. Instead, we found significant effects of ROI and ROI*Group, which were not examined any further. The lack of significant Accent effects goes against our predictions that both groups would show a positivity as a response to accentuation, suggesting that the groups did not process the pitch accent on the object noun.
Objects: 500–900 ms time window
Different from the previous two time windows, we found significant main effects Accent and ROI and significant interactions of Context*Group, ROI*Group, and Context*Accent*ROI*Group. The interaction of Context*Accent*ROI*Group was analyzed further. However, after adjustment, there were no significant effects of Accent in either group in any of the nine scalp regions (see Supplementary B). This finding is not in line with our predictions that the L1 group would show more positivity to the accented than unaccented objects, reflecting sentence reinterpretation, and the L2 group would not show such an effect.
The main effect of Accent in the best-fit model did suggest that the accented objects elicited more negative responses than the unaccented objects (b = –1.208, t = 2.262, p = 0.024), although this effect may be skewed due to the presence of interactions.
Discussion
The current ERP study examined whether native speakers of English (L1 group) and advanced Dutch learners of English (L2 group) processed English focus adjacent and nonadjacent to only differently and, if so, whether the L2 group employed an L1 Dutch-strategy (L1 transfer; Clahsen & Felser, Reference Clahsen and Felser2006; Frenck-Mestre, Reference Frenck-Mestre, Altarriba and Herridia2002; Juffs, Reference Juffs2005) or an interlanguage strategy for difficult constructs (Bley-Vroman, Reference Bley-Vroman1983; Lee et al., Reference Lee, Perdomo and Kaan2020; Marinis et al., Reference Marinis, Roberts, Felser and Clahsen2005; Purdy, Reference Purdy2001; Sorace, Reference Sorace2011; Sorace & Filiaci, Reference Sorace and Filiaci2006; White, Reference White2011). For adjacent focus, we hypothesized that the L1 English listeners would be similar in processing to L1 Dutch listeners (Dimitrova, Reference Dimitrova2012) (Hypothesis 1a), and so will the L2 English listeners, regardless of the presence or absence of context, as a result of positive L1 transfer (Hypothesis 1b). For nonadjacent focus, we hypothesized that the L1 English listeners would show slightly different processing patterns than L1 Dutch listeners (Dimitrova, Reference Dimitrova2012), resulting in an L1-English specific strategy (Hypothesis 2a). The L2 listeners were hypothesized to experience difficulty integrating L2 information and to employ their own interlanguage strategy as a result (Hypothesis 2b), but they may show more L1 English-like processing patterns when the nonadjacent focus was preceded by the context (Hypotheses 3).
Processing of adjacent focus
Our results on the processing of adjacent focus partly support Hypothesis 1. The L1 and L2 group displayed mostly similar responses to the accented verbs. Specifically, the early anterior positivity at 100–200 ms was elicited in both groups, except that it occurred when the accented verbs were in absence of context in the L1 listeners and in presence of context in the L2 listeners. These responses were followed by the frontal-central and global positivity in the L1 and L2 listeners respectively at 200–390 ms.
The early positivity at 100–200 ms
In accordance with our predictions, in both groups, the early positivity at 100–200 ms occurred shortly after word onset, indicating that the processing of the pitch accent was facilitated by prior expectations for accentuation, which were elicited by only (Bouma et al., Reference Bouma, Hendriks and Hoeksema2007; Clahsen & Felser, Reference Clahsen and Felser2006; Dimitrova, Reference Dimitrova2012; Foolen et al., Reference Foolen, Gerrevink, Hogeweg and Prawiro-Atmodjo2009; Frenck-Mestre, Reference Frenck-Mestre, Altarriba and Herridia2002). However, we did not predict an effect of context. We will attempt to provide an explanation below.
First, regarding the effect of context on the L1 group, Dimitrova’s (Reference Dimitrova2012) study on focus expectancy in L1 Dutch has shown that native listeners of Dutch process adjacent focus earlier when presented with isolated sentences than sentences preceded by a context. Although it is unknown from previous research what effect context has on L1 English processing, we will attempt to provide an explanation for the disappearance of early processing in the L1 English group when the target sentence is embedded in context. In English, adjacent and nonadjacent focus are equally preferred (Bouma et al., Reference Bouma, Hendriks and Hoeksema2007). When provided with alternatives to verbs and objects in the context, the L1 group may have concluded that focus can be either on adjacent or nonadjacent to only, resulting no expectancy effects. However, this additional information was missing in isolated sentences, in which only was the sole cue for upcoming contrast (Filik et al., Reference Filik, Paterson and Liversedge2009; Zimmermann & Onea, Reference Zimmermann and Onea2011). It is possible that the L1 group may have defaulted to an expectation for adjacent focus to speed up processing.
In contrast, the L2 group’s early processing of the pitch accent in the presence of context may be explained as follows. They expected the pitch accent on the verb when the context provided alternatives and hinted at a contrast (Filik et al., Reference Filik, Paterson and Liversedge2009; Zimmermann & Onea, Reference Zimmermann and Onea2011) but not when only was the sole cue for contrast (before the verb unfolds itself) in isolated sentences. As the processing of only increases cognitive load (Chafe, Reference Chafe1994; Filik et al., Reference Filik, Paterson and Liversedge2009; Zimmermann & Onea, Reference Zimmermann and Onea2011), the L2 group may not have had the cognitive resources to expect focus, suggesting a difficulty in integrating semantic information to update the sentence meaning online (Sorace, Reference Sorace2011). When accented verbs were preceded by context, the possibility for contrast and alternatives were provided much earlier before the contrast was prosodically realized. It is then plausible that the L2 group used the information from the context and were then simply waiting for only for the exact placement of the contrast. Based on their L1 experience, that is, via positive L1 transfer, this would be the word adjacent to only.
Thus, partly in line with our Hypothesis 1, the L1 English listeners showed similar processing patterns to L1 Dutch listeners (Dimitrova, Reference Dimitrova2012). However, the L2 English listeners showed a processing strategy that is different from that of the L1 English listeners and L1 Dutch listeners, suggesting an interlanguage strategy that requires more information to facilitate expectations for prosodic focus in the semantic-syntactic-prosodic interface on the locus adjacent to only.
The positivity at 200–300 ms
As predicted, the positivity at 200–390 ms occurred over a large portion of the scalp in both groups, representing the ‘accent positivity’. Both the L1 and L2 listeners, therefore, processed the prosodic information in the verbs (Dimitrova, Reference Dimitrova2012, Dimitrova et al., Reference Dimitrova, Stowe, Redeker and Hoeks2012; Heim and Alter, Reference Heim and Alter2006; Lee et al., Reference Lee, Perdomo and Kaan2020). As the ‘accent positivity’ was comparable between the two groups, we interpreted the L2 prosodic processing to be native-like, supporting Hypothesis 1b.
The native-like processing found in our study is in contrast with what Lee et al. (Reference Lee, Perdomo and Kaan2020) found, that is, a diminished ‘accent positivity’ in the L2 group. This discrepancy may be due to differences in typological similarity between the L1-L2 pairings of the two studies, which has found to be of importance to L2 processing (e.g., Dussias et al., Reference Dussias, Valdés Kroff, Guzzardo Tamargo and Gerfen2013; Ortega-Llebaria & Colantoni, Reference Ortega-Llebaria and Colantoni2014; Steinhauer, Reference Steinhauer2014). In our study, the L1 and L2 (Dutch versus English) are intonation languages with a lexical stress system and use the same prosodic parameters for focus marking. In Lee et al.’s (Reference Lee, Perdomo and Kaan2020) study, the L1 and L2 (English versus Mandarin Chinese) are similar in the prosodic parameters involved but different in how they are deployed due to differences in the prosodic grammar, such as English being an intonation language and Mandarin Chinese being a tone language. The L1-L2 similarity is thus bigger in our study than in Lee et al.’s (Reference Lee, Perdomo and Kaan2020) study.
Underlying mechanisms for adjacent focus processing
Our data show that, at least for adjacent focus, the L1 and L2 English listeners were very similar in prosodic focus processing. Both groups were capable of expecting prosodic cues to focus, but they differed significantly in the amount of available information they needed from the preceding context. The L1 group in our study showed similar early processing patterns to L1 Dutch listeners when presented with isolated sentences (Dimitrova, Reference Dimitrova2012), indicating that expected focus in the native language was processed similarly by the L1 English and L1 Dutch listeners when adjacent to only. In contrast, the L2 group were unable to expect adjacent focus upon hearing only in isolated sentences and required the aid in the form of alternatives to verbs and objects provided in the context. This suggests that the L2 listeners may have experienced difficulty with quickly integrating information from only to rapidly form focus expectations (Sorace, Reference Sorace2011).
We also found evidence for native-like processing of pitch accents, that is, similar processing in L1 Dutch, L1 English, and L2 English. This native-like processing can be either a result of successful learning or positive L1 transfer (Clahsen & Felser, Reference Clahsen and Felser2006; Frenck-Mestre, Reference Frenck-Mestre, Altarriba and Herridia2002), which is difficult to disentangle at this location, as pitch accents adjacent to only are similar in acoustics and function in both languages (Filik et al., Reference Filik, Paterson and Liversedge2009; Gussenhoven, Reference Gussenhoven1983; Ladd, Reference Ladd1980; Zimmermann & Onea, Reference Zimmermann and Onea2011).
Taken together, our findings on focus adjacent to only suggest that theories of L1 transfer and interlanguage processing strategies are not mutually exclusive. Rather, it appears that L2 listeners show almost simultaneous occurrence of difficulty in information integration in some aspects and native-like processing in other aspects.
Processing of the nonadjacent focus
Whether the L2 listeners were indeed successful in learning English constructs with only or benefited from positive L1 transfer was expected to be elucidated by data on nonadjacent accentuation, as English and Dutch differ in the preferential positioning of only at this location (Bouma et al., Reference Bouma, Hendriks and Hoeksema2007; Foolen et al., Reference Foolen, Gerrevink, Hogeweg and Prawiro-Atmodjo2009). However, rather unexpectedly, we did not find any effects related to the accentuation on the objects. This suggests that the L1 and L2 listeners did not expect accentuation, respond to its acoustic features, and feel the need to reinterpret the sentence. Furthermore, the presence of context did not affect prosodic processing.
A possible explanation for the lack of expectation and prosodic processing could be that the listeners were simply not sensitive to the pitch accent on the object. However, this is not likely as both English and Dutch listeners have shown to react to and interpret object-focus correctly in offline and online methods in their L1 (Dimitrova, Reference Dimitrova2012; Ge et al., Reference Ge, Mulders, Kang, Chen and Yip2021b; Mulders & Szendroi, Reference Mulders and Szendroi2016). An alternative explanation could be that the listeners used information prior to the object to interpret the sentence and did not need additional prosodic information provided at the object. That is, an accented verb that preceded the object implied that the object was unaccented, and vice versa. Information on accentuation in the object may become superfluous. The listeners may thus have ignored the pitch accent and focused on other information given in the object position. That listeners can actively ignore superfluous pitch accents has previously been shown (Llanos et al., Reference Llanos, German, Nike Gnanateja and Chandrasekaran2021). Figures 4 and 6 support this speculation: Objects appear not to be processed based on accentuation status, but rather on presence of context. This is especially evident in Figure 6 as a frontocentral negative deflection peaking around 400 ms in the L2 group for the objects in sentences without context relative to with context. We did not statistically analyze this negativity as we had predetermined our time windows, but the polarity and peak latency suggest a possible N400 effect. This component has previously been associated with detection of anomalies (Baumann & Schumacher, Reference Baumann and Schumacher2011; Osterhout & Holcomb, Reference Osterhout and Holcomb1992; Reference Osterhout and Holcomb1993; Osterhout et al., Reference Osterhout, Bersick and McLaughlin1997; Schumacher & Baumann, Reference Schumacher and Baumann2010; Toepel et al., Reference Toepel, Pannekamp, van der Meer, Alter, Horne, Lindgren, Roll and Torkildsen2009; Wicha et al., Reference Wicha, Morena and Kutas2004) and with difficulty integrating semantic information into sentence meaning based on its context (van Berkum, Reference van Berkum, Sauerland and Yatsushiro2009; Brown & Hagoort, Reference Brown and Hagoort1993; Kutas & Hillyard, Reference Kutas and Hillyard1980). As the potential N400 only occurred in the absence of context, this finding suggests that the L2 listeners may have had difficulties accepting the objects in isolated sentences. We therefore propose that, instead of prosodic processing, the listeners diverted to lexical processing instead (similar to Lee et al., Reference Lee, Perdomo and Kaan2020). However, we are unable to test this theory using the current data.
Taken together, our results do not support our hypotheses on nonadjacent prosodic focus processing and we are unable to test our proposal on possible underlying mechanisms at this point. More research is needed to explore why prosodic processing does not occur for accentuation information in the object nouns. One way to do this is to study the processing of accentuation in non-sentence-final object nouns in more complex constructions where a constitute following the object can also be the focus (e.g., The dinosaur is only throwing the BUCKET at the tiger versus The dinosaur is only throwing the bucket at the TIGER) to only to determine whether the positioning within the sentence is of influence.
Conclusions
In conclusion, the present ERP study has examined the differences and similarities in L1 and L2 processing of the prosodic-semantic interface in English sentences with only by native and nonnative listeners. Our results have provided evidence for native-like processing with regard to the processing of prosodically-realized focus when adjacent to only. However, in line with findings from previous studies (Akker & Cutler, Reference Akker and Cutler2003; Chen & Lai, Reference Chen and Lai2012; Ge et al., Reference Frenck-Mestre, Altarriba and Herridia2021a; Reference Ge, Chen and Yip2021b; Lee et al., Reference Lee, Perdomo and Kaan2020), L2 listeners perform nonnative-like in forming expectations for focus placement when presented with the same amount of information as L1 listeners. That is, only as the singular cue results in expectations for adjacent contrastive focus in native listeners, whereas L2 listeners need additional information to do. This extends the view that L2 listeners experience difficulty when integrating information from different domains in difficult and non-L1-like constructs to build expectations for the rest of the sentence (Sorace, Reference Sorace2011) to interfaces that involve L1-like and relatively simple constructs. Our study has thus provided first ERP evidence for interface-based processing in which similarities in domains on their own are not sufficient for native-like prosodic processing in L2.
Supplementary material
The supplementary material for this article can be found at http://doi.org/10.1017/S0272263124000019.
Acknowledgments
We thank all the participants for their participation, and Zenghui Liu, Aelish Hart, and Nelleke Jansen for their help with testing the participants. We also thank Piet van Tuijl for his input on the statistical analyses and Chris van Run and his colleagues at the Institute for Language Sciences Labs (previously UiL-OTS Linguistics Labs) for their technical support. We also thank Karsten Steinhauer for his feedback on the data preprocessing and the anonymous reviewers for their useful comments. This research was funded by a Westerdijk Talent grant awarded to Aoju Chen by the Dutch Research Council. Portions of the (preliminary) results were reported at the International Workshop on the L1 and L2 acquisition of Information Structure (2019), the Ninth Annual Society for the Neurobiology of Language Meeting (2017), the Second Hanyang International Symposium on Phonetics and Cognitive Sciences of Language (2019), the Third Phonetics and Phonology in Europe Conference (2019), and the 35th Annual Conference on Human Sentence Processing (2021).
Competing interest
The authors declare that there were no competing interests during this research.