INTRODUCTION
The acquisition of a foreign language (L2) occurs within various linguistic dimensions simultaneously. While many studies focus on L2 attainment of segmental, lexical, syntactic, and semantic properties, research on the acquisition of L2 prosody is substantially underrepresented. Furthermore, within the field of L2 prosody acquisition, the attainment of L2 rhythm has received relatively little attention compared to other suprasegmental properties, like lexical stress, phrasal prominence, and speech rate (Gut, Reference Gut2009). However, native listeners as young as five days old are capable of discriminating languages that have traditionally been classified as prototypically “stress timed” and “syllable timed,” like Dutch and Spanish respectively, based only on their rhythm (Nazzi, Bertoncini, & Mehler, Reference Nazzi, Bertoncini and Mehler1998; Ramus & Mehler, Reference Ramus and Mehler1999; Ramus, Dupoux, & Mehler, Reference Ramus, Dupoux, Mehler, Solé, Recasens and Romero2003). Previous studies suggest that perceived rhythm is the result of an interaction between language-specific factors, such as timing properties, prominence, and boundary marking by means of syllable duration, and syllable structure (e.g., Abercrombie, Reference Abercrombie1967; Li & Post, Reference Li and Post2014; Post & Payne, Reference Post, Payne, Prieto and Esteve-Gibert2018; Prieto, Vanrell, Astruc, Payne, & Post, Reference Prieto, Vanrell, Astruc, Payne and Post2012; White & Mattys, Reference White and Mattys2007a). Thus, rhythm is related to phonemic, phonotactic, and intonational features of language, and producing speech with adequate rhythm therefore requires control in multiple areas.
Hence, it is perhaps unsurprising that many L2 learners have difficulties acquiring the speech rhythm characteristic of their L2. It has been shown that L2 speakers, especially in the early phases of learning, tend to transfer rhythmic properties of their native language (L1) to the L2 (e.g., White & Mattys, Reference White, Mattys, Prieto, Mascaró and Solé2007b), suggesting that targetlike rhythm production is easier when the L1 and L2 are rhythmically similar (e.g., Ordin & Polyanskaya, Reference Ordin and Polyanskaya2015). Indeed, the similarity between the L1 and L2 as a factor of successful L2 acquisition has been studied extensively within the fields of L2 phonology and phonetics, and several theoretical models are based on it, like the Second Language Model (SLM, Flege, 1995), the Perceptual Assimilation Model (PAM, Best, Reference Bolinger, Abe and Kanekiyo1995), and the Second Language Perception model (L2LP, Escudero & Boersma, Reference Escudero and Boersma2004). However, the direction in which languages are learned has been studied less frequently (Gut, Reference Gut2009). Intuitively learning direction is an important factor to investigate: Arguably, acquisition is more challenging from less complex languages toward more complex ones, than vice versa. To our knowledge, the only study on learning direction as a function of L2 prosody acquisition is Rasier’s (Reference Rasier2006) study on the acquisition of pitch accents to mark focus by L1 Dutch learners of French and L1 French learners of Dutch, showing that learning direction indeed affected the degree of success with which L2 learners produced targetlike pitch accent distributions. And while an analysis in which the same two languages are compared cross-directionally sheds more light on the processes underlying the role of learning direction in L2 acquisition, no study has performed such a comparison for speech rhythm acquisition.
Therefore, we explore whether the direction in which L2 acquisition occurs affects the successful attainment of speech rhythm by L2 learners of two languages that are rhythmically different, namely Dutch learners of Spanish (DLS) and Spanish learners of Dutch (SLD). To test which learner group advances more toward its target, we compare DLS and SLD with varying proficiency levels for two measures that correlate with speech rhythm, that is, accentual and final lengthening, in different phonotactic conditions. Before turning to our predictions, several concepts relevant to the current study are described in more detail.
SPEECH RHYTHM
The construct of rhythm has been operationalized in various ways. A diachronic overview of rhythm analyses generally starts with the notion of rhythm as a categorical concept dependent on isochrony (Abercrombie, Reference Abercrombie1967; Pike, Reference Pike1945). Within this view, a distinction has been made between syllable-timed and stress-timed languages, where the former refers to languages in which the intervals between the beginning of all syllables were taken to be equal (e.g., Spanish), while the latter applies to languages in which only the intervals between stressed syllables were assumed to be similar (e.g., Dutch). This categorical distinction was questioned by studies showing that the idea of equal intervals, between all syllables or between stressed syllables only, was not supported by acoustic measurements (e.g., Bolinger, Reference Bolinger, Abe and Kanekiyo1965). This initiated a shift toward the notion of rhythm as a gradient property, with the underlying assumption that no language is either completely syllable-timed or stress-timed (Dauer, Reference Dauer1983,Footnote 1 1987). The results of acoustic and phonetic experiments provided an overview of properties that are relevant to speech rhythm and enable comparisons between languages based on these properties.
More recently, it was shown that not only phonetic and phonotactic but also prosodic features of language influence speech rhythm. Studies showed that languages differ in the extent to which they lengthen stressed and/or accented syllables vis-à-vis unstressed and unaccented syllables (e.g., Beckman & Edwards, Reference Beckman, Edwards and Keating1994), as well as in their lengthening of syllables preceding a prosodic boundary within or at the end of an utterance (e.g., Byrd, Reference Byrd2000). Prieto et al. (Reference Prieto, Vanrell, Astruc, Payne and Post2012) showed that the degree to which languages apply these prosodic lengthening measures contributes to cross-linguistic rhythmic differences. Resulting from these developments, more recent studies of speech rhythm rely on rhythm metrics to measure the timing patterns of utterances, occasionally in combination with lengthening analyses (Grabe & Low, Reference Grabe and Low2002; Gut, Reference Gut2009; Li & Post, Reference Li and Post2014; Ordin & Polyanskaya, Reference Ordin and Polyanskaya2015; Prieto et al., Reference Prieto, Vanrell, Astruc, Payne and Post2012; Ramus, Nespor, & Mehler, Reference Ramus, Nespor and Mehler1999). These analyses followed from the dismissal of rhythm as a dichotomous notion, leading to a need for quantitative data to corroborate the idea of a rhythm continuum and to position a given language on it.Footnote 2 In this study, we base our analyses of speech rhythm on measures of accentual and final lengthening, in agreement with Dauer’s (Reference Dauer1983, Reference Dauer1987) list of parametric criteria to rhythmically differentiate between languages, one of which is the presence or absence of durational variation between stressed and unstressed syllables and the use of pitch to mark prominence (also see Allen & Hawkins, Reference Allen, Hawkins, Bell and Hooper1978). The following section explains why the two languages studied in the current investigation differ significantly in their speech rhythm.
TYPOLOGICAL DIFFERENCES BETWEEN DUTCH AND SPANISH
Several typological differences between Spanish and Dutch have been hypothesized to underlie the perceptual distinction between these languages concerning rhythm. One difference concerns syllable complexity constraints: The majority of Spanish syllables have an open structure (syllables consisting of a consonant [C] followed by a vowel [V] are most frequent in two corpus studies: 58.0% of all syllables had CV structure in Navarro Tomás, Reference Navarro Tomás1966, and 53.9% in Hartsuiker, Reference Hartsuiker2002), while the majority of Dutch syllables is closed (CVC syllables represented 62.4% in the corpus study by Hartsuiker, Reference Hartsuiker2002). Moreover, Spanish allows for relatively few syllable structures that are more complex than the CV configuration. Navarro Tomás (Reference Navarro Tomás1966) stated that the most complex syllable type in Spanish is CCVCC, as found in the first syllable of trans-for-mar, “to transform.” Conversely, Dutch is documented as more varied in its syllable structure with complex structures being the norm. Syllable complexity can increase to up to seven segments in one syllable, for instance in the word strengst (“strictest”), which has a CCCVCCC structure (Booij, Reference Booij1995; Van Zon, Reference Van Zon1997). Because Dutch and Spanish differ typologically in this respect, the current study controls for syllable structure in two out of the three conditions (using predominantly CV and CVC syllables, respectively), while in the last (Mixed) condition syllable structures are used that are typical of both languages. In addition to these phonotactic differences, the two languages also differ in prosodic properties: Spanish is known to employ little accentual and final lengthening while both are employed extensively in Dutch (Cambier-Langeveld & Turk, Reference Cambier-Langeveld and Turk1999; Cambier-Langeveld, Nespor, & Van Heuven, Reference Cambier-Langeveld, Nespor and van Heuven1997; Delattre, Reference Delattre1966; Prieto et al., Reference Prieto, Vanrell, Astruc, Payne and Post2012).Footnote 3 In the following section, the effects of these differences on L2 rhythm acquisition are discussed.
L2 RHYTHM ACQUISITION
Prior work on L2 rhythm attainment generally concentrated on the influence of the L1 on the L2, and typically reported that although L2 learners from different L1 backgrounds increasingly approached their target, considerable transfer effects usually also occur from the L1 to the L2 (e.g., Carter, Reference Carter, Gess and Rudin2005; White & Mattys, Reference White and Mattys2007a). Recently, Li and Post (Reference Li and Post2014) investigated the rhythm produced by Chinese and German learners of English with intermediate or advanced proficiency level. Their analyses showed that while learners from both L1 backgrounds produced rhythm metric values and syllable durations that increasingly approached the L2 target, their development also showed signs of L1 transfer: Where intermediate learners produced values that were closer to those typical of their L1, the advanced learners produced values that were more similar to those of the L2 target. Interestingly, both learner groups performed equally well, which is surprising, because intuitively German rhythm is more similar to English rhythm than Mandarin rhythm. One might therefore assume that the German learners of English would be more successful at producing the target speech rhythm than the Chinese learners of English.
This is precisely the idea developed further in Ordin and Polyanskaya (Reference Ordin and Polyanskaya2015), who compared French and German L2 learners of English at beginner, intermediate, or advanced proficiency level. Their results corroborated those of Li and Post (Reference Li and Post2014) in that rhythm metric values of both learner groups revealed that durational variability increased as L2 acquisition progressed, which would be an indication of universal L2 acquisition development. Conversely, their results further showed that while the most proficient German learners of English achieved target values (and for some metrics the intermediate learners did too), the French learners of English did not. Ordin and Polyanskaya considered this an indication that L1 speakers of a syllable-timed language (here French) encountered more difficulty acquiring the speech rhythm of a stress-timed language (here English), than L1 speakers of another stress-timed language (here German). However, because Ordin and Polyanskaya compared two different L1-L2 combinations, the design of their study makes it impossible to rule out the possibility that the differences between these two learner groups were due to other segmental, phonotactic, or prosodic properties in which French and German differ from each other.
THE CURRENT STUDY
In view of our limited understanding of rhythm transfer in general, and the importance of learning direction in this context specifically, this study compares DLS and SLD in their rhythm production, to determine which L2 group is more successful at producing targetlike speech rhythm. Consequently, a language model is required that allows for predictions based on more than just the similarity of the L1 and L2 because both learning directions consist of the same language combination. Unfortunately, this excludes popular models of L2 acquisition, such as the SLM (Flege, Reference Flege and Strange1995), PAM (Best, Reference Best and Strange1995), and L2LP (Escudero & Boersma, Reference Escudero and Boersma2004). Moreover, these models concern the acquisition of segmental features and are therefore difficult to apply to suprasegmental L2 properties. Other models that do allow for predictions regarding prosodic features tend to focus on prosodic cues at a lexical level only, and generally take a Universal Grammar perspective, assuming that specific parameters are organized into a hierarchical tree structure in which some are embedded within others (e.g., Archibald, Reference Archibald1994; Özçelik, Reference Özçelik2016).
We therefore base our predictions on Eckman’s (Reference Eckman1977, Reference Eckman, Hansen Edwards and Zampini2008) Markedness Differential Hypothesis (MDH), which is applicable to most areas of L2 acquisition and does not depart from a specific language acquisition theory:
(1) MDH: “The areas of difficulty that a language learner will have can be predicted such that
(a) Those areas of the target language which differ from the native language and are more marked that the native language will be difficult;
(b) The relative degree of difficulty of the areas of difference of target language which are more marked that the native language will correspond to the relative degree of markedness;
(c) Those areas of the target language which are different from the native language, but are not more marked than the native language will not be difficult.” (Eckman, Reference Eckman1977, p. 321)
Eckman defined markedness as follows: “A phenomenon is typologically more marked if its presence in a language implies the presence of another phenomenon; but the presence of the latter does not imply the presence of the former” (1977, pp. 320–321).
As argued, Dutch and Spanish not only differ concerning the overall perception of their rhythm, but also with respect to various phonotactic and prosodic properties that underlie this perceptual distinction. The MDH can therefore be applied on at least three levels: First, young children initially produce speech with a rhythm that has been classified as more syllable-timed, and only later acquire the rhythmic properties specific of their L1 (Allen & Hawkins, Reference Allen, Hawkins, Bell and Hooper1978; Bunta & Ingram, Reference Bunta and Ingram2007; Grabe, Watson, & Post, Reference Grabe, Watson, Post, Ohala, Hasegawa, Ohala, Granville and Bailey1999; Schmidt & Post, Reference Schmidt and Post2015). Most recently, Polyanskaya and Ordin (Reference Polyanskaya and Ordin2015) investigated the attainment of rhythmic patterns by monolingual English children from 4–5 to 10–11 years old and adults. Their results corroborated earlier work and showed that the speech rhythm of the children developed from more syllable-timed to more stress-timed as language acquisition continued. As we know of no cases in which infants first produced a stress-timed rhythm (to later develop a syllable-timed speech rhythm if this is typical of their L1), we assume that a stress-timed rhythm implies a syllable-timed rhythm in an earlier developmental stage, but not vice versa, indicating that stress-timed rhythm is typologically more marked than syllable-timed rhythm.
Second, a similar reasoning is applicable to correlates of rhythm, such as syllable complexity (Prieto et al., Reference Prieto, Vanrell, Astruc, Payne and Post2012): The use of complex syllable structures such as CCCVCCC implies that simple syllable structures, such as CV, are also possible within a language (an example from Dutch being da-me, “lady”). However, the possibility of a CV syllable in Spanish does not imply that a syllable with CCCVCCC structure is also acceptable. From this it follows that the syllable structure of Dutch is more marked than the syllable structure of Spanish (also see Levelt & Van de Vijver, Reference Levelt, Van de Vijver, Kager, Pater and Zonneveld2004; Ordin & Polyanskaya, Reference Ordin and Polyanskaya2015). Third, Dutch is also more marked than Spanish concerning lengthening effects, which are known to correlate with rhythm perception, as accentual and final lengthening are employed more extensively in Dutch than in Spanish. Lengthening implies a baseline that is not lengthened, but not vice versa: Not only does lengthening require more physiological effort than not lengthening (Ten Bosch, 1991), but the majority of all syllables in speech is not lengthened, whereas only a subset of the syllables is lengthened. This implies that the former is indeed the “norm” (less marked), while the latter is the “exception” (more marked). In sum, in all areas discussed, Dutch is arguably more marked than Spanish, which, according to the MDH, should make acquisition of these properties more difficult in Dutch than in Spanish. We therefore predict the following:
(2) Dutch learners of Spanish (DLS) are more successful at approaching their target rhythm than Spanish learners of Dutch (SLD).
Recently, the MDH has been used in two studies on L2 prosody acquisition: Rasier (Reference Rasier2006), who applied it to the production of (de)accentuation patterns to signal focus in L2 French by L1 speakers of Dutch and L2 Dutch by L1 speakers of French, and Ordin and Polyanskaya (Reference Polyanskaya and Ordin2015), who employed it in their analysis of L2 rhythm acquisition by German and French learners of English. Both reported that learners with an L1 background that is less marked than the target language (the L1 speakers of French who were learning Dutch, and the French learners of English, respectively) were less successful at attaining the L2 target than learners with an L1 background that is more or equally marked as the target L2 (the Dutch learners of French and the German learners of English, respectively). In the next section, the collection and analysis of speech data by DLS, SLD, and L1 speakers of Spanish and Dutch is described, followed by a comparison of the two learner groups, by means of accentual and final lengthening measures.
METHOD
PARTICIPANTS
Seventy adults participated in our experiment: five L1 speakers of Dutch and five L1 speakers of Spanish, whose data serve as a baselineFootnote 4 to which the data of 30 DLS and 30 SLD are compared. All participants were raised in a monolingual environment and participated voluntarily (Table S1 in the Online Supplementary Materials contains those details about the speaker sample that are relevant to the experiment). The DLS were students of the Spanish program at the University of Groningen or Fuentes Academia de Español. The most proficient DLS were teachers at the Spanish Department of the Radboud University in Nijmegen or the University of Groningen. The SLD were students at the Escuela Oficial de Idiomas in Madrid or Barcelona, and the most proficient SLD were generally teachers at the Escuela Oficial de Idiomas. The L2 learners were subdivided into different proficiency groups, based on the proficiency levels of the Common European Framework of Reference for Languages, which distinguishes between six different proficiency levels ranging from A1 or A2 for beginners, to B1 and B2 for intermediate learners, and C1 and C2 for advanced speakers of an L2 (Council of Europe, 2001). Five speakers were recorded per proficiency level. The institutions already used these proficiency levels, which facilitated the process of determining the proficiency of our participants. Their level in this study corresponded to the level of the last course they had successfully completed. Participants were asked to self-evaluate their skills with respect to specific reading, writing, speaking, and listening proficiency, which were corroborated by the first author with their teachers. In general, these were congruent with students’ overall level, with the productive skills being slightly more challenging than the receptive skills.
Because French is an obligatory subject in Dutch high schools, as is English in Spanish high schools, all participants had some knowledge of an additional West Germanic or Romance language. However, none of them spoke that language at a proficiency level higher than their target L2 proficiency level. We therefore assume that L2 learners were not influenced in their target rhythm production by other foreign languages from the same language family.
MATERIALS
Following Prieto et al. (Reference Prieto, Vanrell, Astruc, Payne and Post2012), the stimuli consisted of 30 sentences per language: 5 sentences with predominantly open syllables (CV), 5 with mostly closed syllables (CVC), and 20 that reflected typical syllable structures in either Dutch or Spanish (Mixed). Consequently, syllable structure was controlled in one third of the stimuli. The Spanish CV and CVC sentences were taken from Prieto et al. (Reference Prieto, Vanrell, Astruc, Payne and Post2012). The Dutch CV and CVC sentences were created by the authors to match the Spanish ones. The Mixed sentences were taken or adapted from Nazzi et al. (Reference Nazzi, Bertoncini and Mehler1998) and Prieto et al. (Reference Prieto, Vanrell, Astruc, Payne and Post2012). The percentage of open syllables was 81.6% in the Dutch CV sentences and 91.9% in the Spanish CV sentences. In the Dutch CVC sentences 78.3% of the syllables were closed, while in the Spanish CVC sentences 59.0% were closed. In the Mixed sentences, 47.7% of the Dutch syllables had an open structure, in contrast to the Spanish Mixed sentences with 69.1% of all syllables open. Thus, the manipulation of syllable structure was realized as intended.Footnote 5 All sentences were matched for number of syllables (range: 12–19 syllables), although this may vary somewhat across individuals as a result of participant-specific pronunciation preferences. Sentences were also matched as best as possible for orthographic words (Spanish M = 9.03, Dutch M = 9.63 per sentence) and prosodic words (Spanish M = 4.87, Dutch M = 5.26 per sentence). Infrequent words and complex sentence constructions were avoided where possible, to facilitate the task for L2 learners. Example (3) shows a stimulus sentence for each of the categories in Dutch and Spanish. The whole stimuli set can be found in the Online Supplementary Materials.
(3a) CV syllable structure (16 syllables):
D: De mama van Susana is een gezellige lerares
S: La madre de Susana es una buena profesora
(3b) CVC syllable structure (15 syllables):
D: De wedstrijd van de voetbalclub was niet in het sportcomplex
S: El mitin del club de tenis no fue en el parking del club
(3c) Mixed syllable structure (16 syllables):
D: De dader werd helaas bij gebrek aan bewijs vrijgesproken
S: Reportan inundaciones graves en la primavera
PROCEDURE
Experimental sessions were performed individually and lasted approximately 10 minutes for the L1 speakers, and 20 minutes for the DLS and SLD, who performed the task in both their L1 and L2. The order in which the L2 learners performed these tasks was randomized across participants. The recordings, made with Praat (Boersma & Weenink, Reference Boersma and Weenink2015) and the internal microphone of an Apple Macbook Pro, took place in a quiet room. Participants were instructed to read the sentences at a normal, comfortable pace from the laptop screen, and to repeat the sentence if there were hesitations or other irregularities in their speech, continuing to the next sentence at their own convenience. While a higher L2 proficiency level generally entailed less repetitions, this method ensured very few disfluencies in the speech by L2 learners of all proficiency levels. The few pauses and/or disfluencies that were unavoidable in the recording of the data were excluded from measurement on a syllable basis and are not included in data analysis. L2 learners could ask for translations of words and sentences, but the experiment leader did not provide phonetic coaching and refrained from pronouncing the target words herself. Participants filled in a questionnaire to verify that they met the requirements of each language group concerning L1/L2, proficiency, experience in countries where the target L2 is spoken, age, and gender, and to ensure that none of the participants had dyslexia or visual problems, which might influence their reading performance.
PROSODIC ANALYSIS
The audio recordings were analyzed prosodically in Praat: Each utterance was segmented into words, syllables, and phonemes. Segmental annotation for all utterances was first performed automatically using Praatalign, version 1.9b (Lubbers & Torreira, Reference Lubbers and Torreira2015). Subsequent segmentation and coding was performed manually by the first author, a trained phonetician who is an L1 speaker of Dutch and proficient in Spanish. Manual correction of the preprocessed speech was done by visual inspection of the speech waveforms and wideband spectrograms following standard criteria (see Peterson & Lehiste, Reference Peterson and Lehiste1960; Prieto et al., Reference Prieto, Vanrell, Astruc, Payne and Post2012; White & Mattys, Reference White and Mattys2007a).
In two additional tiers, segments were coded as consonants or vowels, and syllable boundaries were placed. For Spanish, these boundaries were positioned following Prieto et al. (Reference Prieto, Vanrell, Astruc, Payne and Post2012): Prevocalic glides were coded as part of the preceding consonantal interval, and postvocalic glides as part of the preceding vocalic interval (e.g., the first syllable of buena was treated as CCV; the first syllable of Ceilán as CVV). Furthermore, CV structures were maintained whenever possible and a CV resyllabification process occurred across word boundaries. Following Schiller, Meyer, Baayen, and Levelt (Reference Schiller, Meyer, Baayen and Levelt1996), resyllabification also took place in the Dutch utterances, after taking into account phonological rules such as final devoicing (“aard”/ard/ becomes [art]), degemination (“komen naar”/kɔmən nar/becomes [kɔmənar]), as well as final –n deletion after a schwa, and progressive voice assimilation (“uitvallen” /œytvɑlən/ becomes [œytfɑlə]).
To analyze final lengthening effects, each syllable was marked for its phrasal position as either non-final, intermediate phrase (ip) final or intonational phrase (IP) final following the procedure described in Prieto et al. (Reference Prieto, Vanrell, Astruc, Payne and Post2012). The criterion for an IP break was a pause of at least 200 milliseconds, while a break of less than 200 milliseconds and a continuation rise characterized an ip boundary. The non-final syllables were then taken as a baseline condition to which the length of ip-final and IP-final syllables was compared. Prosodic prominence was also annotated, distinguishing between unstressed and unaccented, stressed and accented, and stressed and nuclear accented syllables. In this case, unstressed and unaccented syllables correspond to the baseline to which stressed and accented, and stressed and nuclear accented syllables were compared.Footnote 6 Figure 1 illustrates the orthographic, segmental, and prosodic transcription of the Spanish utterance La madre de Susana es una buena profesora (“Susana’s mother is a good teacher”) produced by an L1 speaker of Spanish. The first tier contains the orthographic transcription, the second one the phonetic segmentation, and the third the consonant/vowel coding. In the fourth tier, syllabic segmentation and syllable structure is depicted, and in the two final tiers prominence and phrasal position is coded.Footnote 7 In total, 2,100 utterances were collected (5 speakers × 30 utterances × 14 language groups), resulting in 35,808 analyzed syllables and 48,068 analyzed segments.
Intertranscriber reliability of the prosodic coding was tested with 10% (105 Dutch and 105 Spanish utterances) of our data. These utterances were randomly selected by the first author, who ensured that they equally represented all language groups, speakers, and phonotactic conditions. After discussing several examples with the first author, two transcribers (one L1 speaker of Dutch and one L1 speaker of Spanish) independently labeled the utterances for phrasal position and phrasal prominence using the guidelines provided in this section. A comparison of the prosodic transcription across the two transcribers per language revealed a high interrater reliability both in phrasal prominence and phrasal position labeling. Agreement on the choice of phrasing level was high: 99.1% consistency for Dutch (κ = .974) and 93.4% for Spanish (κ = .785). Similarly, agreement on the choice of phrasal prominence levels was 97.8% for Dutch (κ = .956) and 88.5% for Spanish (κ = .754). This is comparable to interrater reliability scores in similar studies using prosodic labeling (Prieto et al., Reference Prieto, Vanrell, Astruc, Payne and Post2012), indicating that both prosodic features were labeled reliably (Landis & Koch, Reference Landis and Koch1977).
RESULTS
In what follows, we first compare syllable duration data by L1 Dutch and L1 Spanish as a function of prosodic prominence and phrasal position to form a baseline against which we subsequently compare the DLS and SLD. All analyses are performed using a Generalized Linear Mixed Model (GLMM). Specific response variables and fixed factors are described in the relevant sections, but for all analyses subjects and items were included as random factors, including random intercepts and random slopes for fixed effects and their interaction (Barr, Levy, Scheepers, and Tily, Reference Barr, Levy, Scheepers and Tily2013). Pairwise comparisons that explain main effects and interactions were Bonferroni adjusted.
L1 SPANISH VERSUS L1 DUTCH
A GLMM analysis was performed with syllable duration in seconds as the response variable, and Language Group (two levels: L1 Dutch, L1 Spanish), Syllable Structure (three levels: CV, CVC, Mixed), Phrasal Prominence (three levels: unstressed and unaccented, stressed and accented, stressed and nuclear accented), and Phrasal Position (three levels: non-final, ip-final, IP-final) as fixed factors. The analysis reveals significant main effects for all fixed factors and significant interactions for all relevant combinations, except the interaction between Language Group and Phrasal Position (see Appendix Table A1 for all potential main effects and interactions).
Pairwise comparisons between the three Phrasal Prominence conditions within each L1 group reveal that in both L1 Spanish and L1 Dutch increasing prominence of the syllable entails longer syllable durations. As shown in Figure 2, in L1 Spanish, all Phrasal Prominence levels differ significantly from one another (p < .001), whereas in L1 Dutch, both stressed and accented syllables, and stressed and nuclear accented syllables are significantly longer than unstressed and unaccented syllables (p < .001), but the syllable durations of stressed and accented syllables do not differ significantly from those of nuclear accented syllables (p = .099). Pairwise comparisons between Language Groups within Phrasal Prominence levels reveal that L1 Dutch and L1 Spanish have similar default syllable lengths for unstressed and unaccented syllables (p = .652), but they differ significantly from each other for the other two Phrasal Prominence levels (p < .001) as syllables are lengthened more extensively in L1 Dutch than in L1 Spanish. This confirms prior research on the degree of accentual lengthening used in both languages (Cambier-Langeveld & Turk, Reference Cambier-Langeveld and Turk1999; Cambier-Langeveld et al., Reference Cambier-Langeveld, Nespor and van Heuven1997; Delattre, Reference Delattre1966; Prieto et al., Reference Prieto, Vanrell, Astruc, Payne and Post2012).
Controlling for Syllable Structure by examining the CV sentences only does not generate substantial differences to this pattern (see Appendix Table A2 for mean syllable durations per Phrasal Prominence condition and Language Group for both CV and all sentences). The only difference is that the values are lower for both Language Groups in the CV condition than in the complete dataset, which can be explained by the fact that in CVC and Mixed sentences syllables are usually longer, due to their more complex syllable structure. As shown in Figure 3, pairwise comparisons between Language Groups within each Phrasal Prominence level for CV sentences only reveal a similar pattern as the one found for all sentences: L1 Dutch and L1 Spanish have comparable syllable lengths for unstressed and unaccented syllables (p = .205), but they differ significantly from each other for the other two Phrasal Prominence levels (stressed and accented syllables: p = .003, stressed and nuclear accented syllables: p = .009).
Regarding final lengthening, pairwise comparisons between the three Phrasal Position conditions within L1 groups show that for both Language Groups syllable durations increase significantly with respect to the baseline when the phrasal position of a syllable precedes an ip or IP boundary (see Figure 4, p < .001 for all comparisons). Furthermore, in both Language Groups the ip-final and IP-final syllables do not differ significantly from each other (Dutch: p = .863, Spanish: p = .374). Pairwise comparisons between Language Groups within each Phrasal Position condition show that L1 Dutch and L1 Spanish differ significantly from each other for all Phrasal Position conditions (non-final and IP-final syllables: p < .001, ip-final syllables: p = .003). This could again be because syllables are longer in L1 Dutch in general, even in non-final position, due to its more complex syllable structure.
Controlling for Syllable Structure by examining pairwise comparisons between Phrasal Position conditions within both Language Groups for the CV sentences only reveals a comparable pattern: In both L1s, non-final syllables are significantly shorter than ip-final and IP-final syllables (p < .001 for both), while there is no significant difference between ip-final and IP-final syllables (Dutch: p = .908, Spanish: p = .434). Pairwise comparisons show that speakers of L1 Dutch and L1 Spanish still differ significantly from each other in the non-final and IP-final conditions (non-final syllables: p = .007, IP-final syllables: p = .021), but the difference between the two L1s is not significant in the ip-final condition (p = .313) (see Figure 5). Appendix Table A3 contains the mean syllable durations per Phrasal Position and Language Group for all sentences and CV sentences only.
The significant interaction effect between Phrasal Position and Phrasal Prominence on syllable durations was further explored by examining the mean syllable durations per Language Group for all Phrasal Position and Phrasal Prominence combinations. Table 1 shows that both factors interact systematically: Within each Phrasal Position condition accentual lengthening effects increase as syllables are more prominent in the sentences, while increasing syllable durations are also observed between Phrasal Position conditions.
L1 SPANISH VERSUS DLS
To compare the DLS to their target L1 group, a GLMM analysis was performed with syllable duration as the response variable, and Language Group (seven levels: L1 Spanish, DLS_A1, DLS_A2, DLS_B1, DLS_B2, DLS_C1, and DLS_C2), Syllable Structure (see analysis L1 speakers), Phrasal Prominence (see analysis L1 speakers), and Phrasal Position (see analysis L1 speakers) as fixed factors. The analysis reveals significant main effects for all fixed factors and significant interactions for all relevant combinations, except for the interaction between Language Group, Syllable Structure, and Phrasal Position (see Appendix Table A4 for all potential main effects and interactions).
Pairwise comparisons between all Language Groups overall reveal that the DLS gradually approach target syllable durations as their proficiency increases. The most proficient group, DLS_C2, no longer differs significantly from the target L1 Spanish (p = .735), while all other DLS groups still do (DLS_C1 and DLS_B2: p = .001, DLS_B1, DLS_A2, and DLS_A1: p < .001). This implies that while the DLS_C2 learners have attained a nativelike level in their L2, learners of all other levels still differ significantly from their target. Controlling for syllable structure by comparing the DLS to the L1 Spanish within the CV condition reveals that the effect of Language Group is partially dependent on syllable structure: Within the CV condition the L1 Spanish values are not only comparable to those of the DLS_C2 (p = 1.000), but also to those of the DLS_C1 (p = .116) and DLS_B2 (p = .064). To examine whether the DLS approach L1 values similarly for accentual and final lengthening, pairwise comparisons between Language Groups within prominence and finality conditions were performed.
Regarding Phrasal Prominence, the results show that within all Phrasal Prominence conditions the DLS_C2 are not significantly different from the L1 Spanish. Contrary to the L1 data, this effect appears susceptible to the syllable structure of the sentence in speech by L2 learners, as making the same comparisons within the CV condition reveals that the three highest proficiency levels are comparable to the L1 target, see Table 2 and Figures 6 and 7. Examination of the syllable durations of the different Phrasal Prominence conditions within all Language Groups reveals that all DLS groups show a similar pattern to the L1 Spanish, in which syllable durations are longer as syllables are more prominent within an utterance (see Appendix Table A5).
Concerning final lengthening, pairwise comparisons show that for non-final syllables the DLS_C2 and DLS_C1 are not significantly different from the L1 Spanish, and for ip-final and IP-final the DLS_C2 are not significantly different from the target L1 Spanish (see Table 3). This effect is once again influenced by syllable structure, as making the same comparisons within the CV condition reveals that the three highest proficiency levels are comparable to the L1 target.
Examination of syllable durations for the different Phrasal Position conditions within each Language Group reveals that the three most proficient DLS groups show a similar pattern as the L1 Spanish in which syllable durations are longer when syllables precede a prosodic boundary. Conversely, the values of the three lowest proficiency groups coincide more with the L1 Dutch, corroborating the presence of transfer effects in L2 rhythm acquisition (see Figures 8 and 9, and Appendix Table A6).
Finally, the joint effect of Phrasal Position and Phrasal Prominence on syllable durations is examined by inspecting the mean syllable durations for all Phrasal Prominence conditions within the separate Phrasal Position conditions. This reveals that both factors interact systematically within each Language Group: Within each Phrasal Position condition accentual lengthening increases as syllables are more prominent in the utterance, while increasing syllable durations are also observed between Phrasal Position conditions (see Appendix Table A7).
L1 DUTCH VERSUS SLD
To compare the SLD to the L1 speakers of Dutch, a GLMM analysis was performed with syllable duration as the response variable, and Language Group (seven levels: L1 Dutch, SLD_A1, SLD_A2, SLD_B1, SLD_B2, SLD_C1, and SLD_C2), Syllable Structure (see analysis L1 speakers), Phrasal Prominence (see analysis L1 speakers), and Phrasal Position (see analysis L1 speakers) as fixed factors. The analysis reveals significant main effects for all fixed factors and significant interactions for all relevant combinations, except for the interaction between Language Group and Phrasal Prominence (see Appendix Table A8 for all main effects and interactions). Pairwise comparisons between Language Groups overall reveal that although the SLD progressively approach target syllable durations as their proficiency increases, all the SLD groups still differ significantly from the L1 Dutch (p-values from p < .001 to p = .028). Crucially, this appears completely due to the syllable structure of the utterances because when comparing the SLD to the L1 Dutch within the CV condition, the L1 Dutch values do not differ significantly from the SLD values for all proficiency levels (p-values from p = .089 to p = 1.000).
Turning to Phrasal Prominence first, the results show that all the SLD groups differ significantly from the target L1 Dutch for both unstressed and unaccented syllables and stressed and nuclear accented syllables (see Table 4). In the stressed and accented condition, only the SLD_A1 group differs significantly from the L1 Dutch. However, this effect is highly susceptible to the syllable structure of the utterance, as making the same comparisons within the CV condition reveals that all the SLD groups for all Phrasal Prominence conditions are comparable to the L1 target.
Examination of the syllable durations of the different Phrasal Prominence conditions within all Language Groups reveals that both SLD and L1 Dutch show a similar pattern in which syllable durations are longer as syllables are more prominent within an utterance (see Figures 10 and 11, and Appendix Table A9).
Regarding final lengthening, pairwise comparisons between the different Language Groups within the three Phrasal Position conditions show that for non-final syllables all SLD groups differ significantly from the L1 Dutch (see Table 5). However, for ip-final syllables the SLD_B1, SLD_B2, and SLD_C2 groups are comparable to the L1 Dutch and for the IP-final syllables this is the case for the DLS_B2 and DLS_C1 groups. This effect is again largely due to syllable structure, as identical comparisons in the CV condition reveal that almost all SLD groups are no longer significantly different from the L1 Dutch.
Examination of the syllable durations of the different Phrasal Position conditions within each Language Group reveals that all SLD groups show a similar pattern as the L1 Dutch, in which syllable durations are longer when syllables precede a boundary, either within an utterance or at its end (see Figures 12 and 13, and Appendix Table A10).
Examining the joint effect of Phrasal Position and Phrasal Prominence on syllable durations by inspection of the mean syllable durations for all Phrasal Prominence conditions within the separate Phrasal Position conditions reveals that both factors interact systematically within each Language Group: Within each Phrasal Position condition accentual lengthening increases as syllables are more prominent in the utterance, while increasing syllable durations are also shown between Phrasal Position conditions (see Appendix Table A11). Accentual lengthening and final lengthening appear to contribute equally to the differences found between the L1 Dutch and the different SLD groups, especially when controlling for syllable structure. When only analyzing the CV sentences, all SLD groups appear to be fully on target in their syllable duration production, however when diversifying syllable structure (consequently making it more typical of L1 Dutch) it becomes rather more difficult to discern a logical pattern in the SLD productions.
DISCUSSION AND CONCLUSION
The current study investigated whether L2 learning direction affects the successful attainment of speech rhythm by DLS and SLD. Based on the MDH, we hypothesized that DLS would be more successful at approaching their target than SLD because rhythm as a whole, and its correlates syllable structure and lengthening effects, is more marked in Dutch than in Spanish. Overall, our results indeed show that learning direction influences L2 rhythm acquisition: Our analyses reveal a different development for DLS than SLD. Comparing the two groups, we can conclude that DLS show a more systematic development toward their target, and more successful attainment of an overall rhythm pattern that coincides with the one produced by L1 Spanish speakers. Thus, our results support our hypothesis and corroborate prior work (Ordin & Polyanskaya, Reference Ordin and Polyanskaya2015; Rasier, Reference Rasier2006).
However, our results do not allow for a complete disentanglement between speech rhythm and syllable structure complexity: Our lengthening analyses revealed different acquisition processes for DLS and SLD. The DLS systematically approach L1 values in all lengthening conditions until attaining targetlike values, generally at the highest proficiency level for all sentences, and at an intermediate to advanced level for the CV sentences only. Conversely, SLD of all proficiency levels are completely on target in the CV sentences only but show no systematic attainment in the analyses including all sentences. Therefore, it seems unlikely that the insignificant difference between the L1 Dutch and the least proficient SLD in the CV sentences is completely due to a perfectly produced speech rhythm by the SLD. Not only do these results show that learning direction influences L2 development, they also suggest that rhythm acquisition by SLD is substantially affected by their difficulties at producing utterances with more complex and/or closed syllable structures: When syllables are more complex and predominantly closed, the SLD are unable to reach target syllable durations, yet when syllables are predominantly open and have a simple CV structure, target patterns appear attainable. In this sense, L2 rhythm acquisition resembles L1 rhythm development in which physical output constraints related to consonant (cluster) production also affect targetlike rhythm production (Ordin & Polyanskaya, Reference Ordin and Polyanskaya2014; Payne, Post, Preito, Vanrell, & Astruc, Reference Payne, Post, Prieto, Vanrell and Astruc2012).
Similar to Li and Post (Reference Li and Post2014), our study shows that L2 rhythm acquisition (like L1 rhythm acquisition, see Post & Payne, Reference Post, Payne, Prieto and Esteve-Gibert2018) is a multisystemic process that requires the simultaneous attainment of several language-specific features, both phonotactic and prosodic. Crucially, depending on the learning direction, some of these features may be more challenging than others. Gradient properties, such as accentual and final lengthening, seem challenging for both DLS and SLD. Yet other, more categorical, characteristics (e.g., syllable structure constraints) appear substantially more difficult to acquire for SLD than DLS. In addition, between-speaker variability may also influence the acquisition process. While we matched our participants to the best of our ability based on their language experience and proficiency and included subject as a random factor in our statistical analyses, individual differences in the L2 acquisition process tend to be substantial (Ellis, Reference Ellis1994), and some factors, such as motivation and language aptitude, could not be considered. Especially studies on the multisystemic nature of L2 prosody acquisition would benefit from careful participant selection, as variation across individual speakers might occur in all “systems” and thus be magnified even more. Our study thus reinforces the need for L2 acquisition theories and models that allow for predictions based on the multisystemic nature of L2 prosody acquisition and that accommodate the inclusion of learning direction, as well as other speaker-based characteristics, as a relevant factor.
Moreover, our results are relevant pedagogically, as they demonstrate that adequate segment production is a prerequisite for successful rhythm attainment. The acquisition of suprasegmentals is often overlooked in educational programs because they are difficult to manipulate consciously and highly context dependent. Conversely, the correct pronunciation of segments usually receives considerable attention. On its own, this might not be a bad practice, as the current research suggests that training in this area may also lead to more successful rhythm production. Interestingly, recent work by Polyanskaya, Ordin, and Busà (Reference Polyanskaya, Ordin and Busà2017) suggests that the relative contribution of segmental characteristics and timing patterns to the assessment of accentedness differ as a factor of the proficiency level of the L2 learner. In other words, while the incorrect pronunciation of segmental properties might contribute more to accentedness in speech produced by less proficient L2 learners, deviance in speech rate and rhythmic patterns could become more salient as L2 learners become more proficient. Future research might therefore be dedicated to production studies investigating this further, as well as to perception studies that may confirm both the effect of deviance in different phonetic areas and in speech by learners of different proficiency levels on judgments by L1 speakers.
Future research could also address the effect of segmental pronunciation training on rhythm acquisition in different developmental stages, in addition to the effect of learning direction for other prosodic features, like lexical stress. Furthermore, because rhythm is related to several language-specific features, the current study could be extended by similar analyses for different L1-L2 combinations. Aside from follow-up studies on L2 production, the effect of (in)correct L2 rhythm production, perhaps in combination with other prosodic features, on L1 perception might be investigated.
SUPPLEMENTARY MATERIAL
To view supplementary material for this article, please visit https://doi.org/10.1017/S0272263118000062