1. Introduction
Heritage speakers are simultaneous or early sequential bilinguals that acquire a family language that is different from the societal language (Benmamoun, Montrul & Polinsky, Reference Benmamoun, Montrul and Polinsky2013). Heritage speakers receive heritage language input mainly in colloquial registers and are rarely exposed to formal varieties. During the school years, heritage speakers become in frequent contact with the societal language and potentially shift their dominance from the heritage language to the societal language (Polinsky & Scontras, Reference Polinsky and Scontras2020; Stevens, Reference Stevens1992). Given these circumstances, heritage speakers are not a homogeneous group. Instead, they display rich heterogeneity in terms of their linguistic proficiency, use, and attitudes toward the heritage language (Montrul, Reference Montrul2008; Valdés, Reference Valdés, Wiley, Peyton, Moore, Christian and Liu2014), and demonstrate divergence from monolingual speakers to varying degrees. Potential causes of heritage speakers’ divergent grammars are insufficient amount of heritage language input (Putnam, Reference Putnam2019; Putnam & Sánchez, Reference Putnam and Sánchez2013), exposure to input lacking target linguistic properties (Pires & Rothman, Reference Pires and Rothman2009), exposure to linguistic varieties other than the parents’ varieties, or lack of access to monolingual forms (Lowther Pereira, Reference Lowther Pereira2015).
A great deal of heritage language studies have been conducted on morphosyntactic properties in which divergence from monolingual norms has been found (Montrul & Bowles, Reference Montrul and Bowles2009; Montrul & Sánchez-Walker, Reference Montrul and Sánchez-Walker2013; Polinsky, Reference Polinsky2008, among others). Compared to this, less research has been done in phonological domains, most likely given that positive effects of early exposure have been attested on heritage language pronunciation (Au, Knightly, Jun & Oh, Reference Au, Knightly, Jun and Oh2002; Knightly, Jun, Oh & Au, Reference Knightly, Jun, Oh and Au2003). Nevertheless, recent studies on heritage language phonology discuss the existence of a “heritage accent” (Au, Oh, Knightly, Jun & Romo, Reference Au, Oh, Knightly, Jun and Romo2008; Lloyd-Smith, Einfeldt & Kupisch, Reference Kupisch2020; Stangen, Kupisch, Proietti Ergün & Zielke, Reference Stangen, Kupisch, Proietti Ergün, Zielke and Peukert2015) and found deviations from monolingual norms at both segmental (Amengual, Reference Amengual2012, Reference Amengual2016; Elias, McKinnon & Milla-Muñoz, Reference Elias, McKinnon and Milla-Muñoz2017; Godson, Reference Godson2004; Kissling, Reference Kissling2018; Ronquest, Reference Ronquest, Howe, Blackwell and Lubbers Quesada2013; Willis, Reference Willis2005) and suprasegmental levels (Chang, Yao, Haynes & Rhodes, Reference Chang, Yao, Haynes and Rhodes2011; Colantoni, Cuza & Mazzaro, Reference Colantoni, Cuza, Mazzaro, del M. Vanrell, Armstrong and Henriksen2016; Henriksen, Reference Henriksen2016; Kim, Reference Kim2019; Kim, Reference Kim2020). For instance, Au et al. (Reference Au, Oh, Knightly, Jun and Romo2008) found that Spanish heritage speakers, regardless of whether they regularly used Spanish during childhood, sounded more native-like than late L2 learners, but when compared with non-heritage native speakers, their speech was perceived to have a stronger foreign accent. That is, although early exposure to the heritage language has positive effects on heritage language pronunciation, it does not guarantee accent-free speech (Lloyd-Smith et al., Reference Lloyd-Smith, Einfeldt and Kupisch2020).
In order to move the field of heritage language acquisition forward, recent scholarship (Montrul, Reference Montrul2018; Polinsky, Reference Polinsky2018) has urged to examine the stages of heritage language development over the lifespan. This type of research would provide a better understanding of heritage speakers’ divergent grammars. In this study, we adopt a developmental approach to examine heritage language phonology.
2. Understanding Heritage Phonological Grammars
Heritage speakers demonstrate stability in some aspects of heritage language phonology such as phonemic contrasts (Chang, Haynes, Yao & Rhodes, Reference Chang, Haynes, Yao and Rhodes2009; Chang et al., Reference Chang, Yao, Haynes and Rhodes2011; Einfeldt, van de Weijer & Kupisch, Reference Einfeldt, van de Weijer and Kupisch2019; Lein, Kupisch & van de Weijer, Reference Lein, Kupisch and van de Weijer2016). For instance, Einfeldt et al. (Reference Einfeldt, van de Weijer and Kupisch2019) found that heritage speakers of Italian maintain the contrast between singleton and geminate consonants. However, in other areas they show variability that is different from monolingual norms (Alvord & Rogers, Reference Alvord and Rogers2014; Amengual, Reference Amengual2016; Colantoni et al., Reference Colantoni, Cuza, Mazzaro, del M. Vanrell, Armstrong and Henriksen2016; Godson, Reference Godson2004; Henriksen, Reference Henriksen2015; Robles-Puente, Reference Robles-Puente2014; Ronquest, Reference Ronquest2012, Reference Ronquest, Howe, Blackwell and Lubbers Quesada2013). Studies have found that heritage language vowel space either shows assimilation to or dissimilation from that of the majority language (Cummings Ruiz, Reference Cummings Ruiz, Calhoun, Escudero, Tabain and Warren2019; Ronquest, Reference Ronquest2012, Reference Ronquest, Howe, Blackwell and Lubbers Quesada2013) and that heritage language stop productions are affected by factors such as cognate status (Amengual, Reference Amengual2012), code-switching (Łyskawa, Maddeaux, Melara & Nagy, Reference Łyskawa, Maddeaux, Melara and Nagy2016), and speaker generation (Mayr & Siddika, Reference Mayr and Siddika2018; Nodari, Celata & Nagy, Reference Nodari, Celata and Nagy2019). Some phonological properties may also show both variability and stability. For example, Chang and Yao (Reference Chang and Yao2016) found that, while Chinese heritage speakers produced some aspects of the Chinese tones (T0-T4) (e.g., turning point of T3, T3 reduction in non-phrase-final contexts) similarly to Chinese long-term residents in the US, in other aspects (e.g., T3 reduction in phrase-final multisyllabic contexts, pitch contour variability in isolation forms) they patterned distinctly from the baseline.
Divergence from monolingual grammars is often interpreted as incomplete acquisition or acquisition without mastery (Montrul, Reference Montrul2002; Montrul & Bowles, Reference Montrul and Bowles2009). While incomplete acquisition is a possible outcome in heritage grammars, this term has raised a lot of controversy in the literature (see Kupisch & Rothman, Reference Kupisch and Rothman2018 and Domínguez, Hicks & Slabakova, Reference Domínguez, Hicks and Slabakova2019 and commentaries). As an attempt to redefine the construct, Pires and Rothman (Reference Pires and Rothman2009) proposed a distinction between “true incomplete acquisition” and “missing-input competence divergence”. The former arises when the heritage language input presents target linguistic properties and the latter appears when the input lacks the target properties. For instance, Mayr and Siddika (Reference Mayr and Siddika2018) compared the production of Sylheti stops across three generations of Bangladeshi immigrants in the United Kingdom and found that second-generation children produced the Sylheti voiced coronal /ɖ/ and velar stops /gʱ/ with longer voice onset time (VOT) than their first-generation mothers, but in a more target-like manner than age-matched third-generation children. Aside from the amount of input in Sylheti, these two groups of children differ in that the second-generation children received target-like input from their first-generation mothers (i.e., true incomplete acquisition), whereas the third-generation children were exposed to non-target-like stop productions by their second-generation mothers (i.e., missing-input competence divergence).
To understand heritage speakers’ divergent grammars, it is important to establish an appropriate baseline for comparison (Polinsky, Reference Polinsky2018). If the goal is to answer whether heritage speakers successfully acquired the language to which they were exposed, it would not be informative to compare heritage speakers only with homeland speakers, since the input that they receive may not be the same as the input of their monolingual peers. Heritage speakers’ input most likely comes from their caregivers who are first-generation immigrants whose grammars sometimes show signs of L1 attrition after long-term residence away from the homeland. Additionally, heritage speakers are exposed to homeland varieties through interactions with relatives in the homeland or with recent immigrants from the homeland, as well as to other varieties in the speech community. This raises the question of how researchers can best characterize the sources of input that heritage speakers receive and, equally importantly, how to decide the baseline for comparison.
In order to account for the development of heritage speakers’ divergent grammars, Polinsky and Scontras (Reference Polinsky and Scontras2020) established three scenarios by comparing child heritage speakers (CHS), adult heritage speakers (AHS), and baseline first-generation immigrants (BASE). The first scenario occurs when a given linguistic property is present in the baseline, but it is used differently in both the adult and child heritage speakers (CHS = AHS ≠ BASE) (i.e., incomplete acquisition or divergent attainment). In the second scenario, child heritage speakers pattern like the baseline in their use of the property and adult heritage speakers differ from both groups (BASE = CHS ≠ AHS) (i.e., attrition during childhood). Lastly, adult heritage speakers and the baseline are alike, but child heritage speakers pattern differently from the two groups (CHS ≠ AHS = BASE) (i.e., reanalysis during adulthood).
A comprehensive bulk of research has been conducted on bilingual grammars in early childhood, specifically bilingual children residing in a country where the home language does not coincide with the societal language (i.e., child heritage speakers) (Fabiano-Smith & Goldstein, Reference Fabiano-Smith and Goldstein2010; Kehoe & Havy, Reference Kehoe and Havy2018; Lleó & Cortés, Reference Lleó and Cortés2013; Lleó, Reference Lleó2018a, Reference Lleó, Prieto i Vives and Esteve-Gibert2018b). Based on Paradis and Genesee's (Reference Paradis and Genesee1996) framework of cross-linguistic interaction, many of these studies explained bilingual children's divergence from age-match monolinguals through acceleration, deceleration, and transfer. Acceleration refers to a faster rate of acquisition in bilinguals than in age-matched monolinguals. Lleó, Kuchenbrandt, Kehoe and Trujillo (Reference Lleó, Kuchenbrandt, Kehoe, Trujillo and Müller2003) found that Spanish–German bilinguals produced syllabic codas at an earlier stage than Spanish monolinguals, possibly due to exposure to an input with more codas. Deceleration refers to a slower rate of acquisition of a given linguistic property compared to age-matched monolinguals. For instance, Fabiano-Smith and Goldstein (Reference Fabiano-Smith and Goldstein2010) found that Spanish-English bilingual children between 3;0 and 4;0 years produced Spanish trills, fricatives, and glides with lower accuracy than Spanish monolinguals. Lastly, transfer is defined as the incorporation of a linguistic property of one language into another. Kehoe, Lleó and Rakow (Reference Kehoe, Lleó and Rakow2004) found that a Spanish–German bilingual child (2;3- 2;6) produced Spanish voiceless stops with longer VOTs than those described in monolingual grammars. Kehoe (Reference Kehoe, Babatsouli and Ingram2015)'s review on child bilingual development documents two additional types of cross-linguistic influence: merging and deflecting. Merging arises when two phonological systems coalesce (see Kehoe & Lleó (Reference Kehoe, Lleó, Yavas, Kehoe and Cardoso2017) for assimilation in stressed-to-unstressed vowel duration ratios). Deflecting occurs when two phonological systems maximize their contrasts (see Yang & Fox (Reference Yang and Fox2017) for separation of L1-L2 acoustic vowel space).
Most studies on heritage language phonology examine bilingual language development during early childhood or adult heritage grammars. However, there is a lack of research on what happens when heritage speakers become more exposed to the societal language and experience a shift to that language: that is, during school-age period. Exploring this gap in the literature, identified as the “missing link” (Montrul, Reference Montrul2018), would shed light on heritage language phonological development.
3. Spanish alveolar trill /r/
The Spanish alveolar trill /r/ is canonically produced with 2-3 brief contacts between the tongue tip and the alveolar ridge (i.e., phonetic trill [r]) (Hualde, Reference Hualde2014) and surfaces word-initially (e.g., rana ‘frog’), word-internally between vowels (e.g., perro ‘(male) dog’), or after alveolar consonants /n, l, s/ (e.g., alrededor ‘around’). In word-medial intervocalic position, the trill is in phonemically contrastive relation with the tap /ɾ/ which is another rhotic consonant in Spanish (e.g., perro ‘(male) dog’ vs. pero ‘but’). According to Solé (Reference Solé2002), syllable-initial intervocalic positions provide an optimal articulatory condition to achieve successful trill production, such as constrained positioning, tongue configuration, and aerodynamic requirements for tongue-tip vibration. This may explain the contrastive nature of this position. In other contexts, the two rhotics are mostly found in complementary distribution.
3.1 Trill production by non-heritage native speakers
The production of the phonetic trill [r] requires a complex coordination of articulators and a sufficient amount of oropharyngeal pressure (Lewis, Reference Lewis, Warren, Park and Agwuele2004; Solé, Reference Solé2002). Due to its articulatory complexity, [r] is categorized as one of the latest developing sounds (Acevedo, Reference Acevedo1993; Bosch, Reference Bosch1983; Fabiano-Smith & Goldstein, Reference Fabiano-Smith and Goldstein2010). Typically developing monolingual children often do not have full command of [r] (i.e., 90% accuracy) until the age of 7 (Bosch, Reference Bosch1983) and alternatively substitute [r] with other phones, such as laterals, taps, and /d/, or omit it (Acevedo, Reference Acevedo1993; Bosch, Reference Bosch1983; Fabiano-Smith & Goldstein, Reference Fabiano-Smith and Goldstein2010). Carballo and Mendoza (Reference Carballo and Mendoza2000) examined the production of the Spanish trill /r/ by children (3–6.6 years) of different intelligibility levels and compared them to a control group of older children (7.0–9.6 years) who successfully produced the phonetic trill [r]. Overall the high intelligibility group produced /r/ with longer duration, more apertures and occlusions, and shorter first aperture duration than the low intelligibility group, and showed similar values to those of the control group. Given that the high and low intelligibility groups are of similar ages (3–6.6 years), Carballo and Mendoza (Reference Carballo and Mendoza2000) argued that the differences found between these groups may be associated with greater or lesser motor control that some children demonstrate when producing /r/ as they progress through maturation.
Besides being classified as one of the latest acquired sounds, the Spanish trill is phonetically realized in various ways within and across dialects, which includes approximant (Díaz-Campos, Reference Díaz-Campos2008), fricative (Bradley & Willis, Reference Bradley and Willis2012; Colantoni, Reference Colantoni, Sagarra and Toribio2006; Lewis, Reference Lewis, Warren, Park and Agwuele2004; Willis, Reference Willis, Sagarra and Toribio2006), pre-breathy tap, or tap followed by frication (Bradley & Willis, Reference Bradley and Willis2012; Willis, Reference Willis, Sagarra and Toribio2006), and it can be either voiced or voiceless (Lewis, Reference Lewis, Warren, Park and Agwuele2004). However, in some dialects /r/ is still realized most frequently with two apico-alveolar constrictions (e.g., see Lastra & Martín Butragueño, Reference Lastra, Martín Butragueño, Cestero Mancera, Molina Martos and Paredes García2006 for Mexico City Spanish or Henriksen, Reference Henriksen2014 for Peninsular Spanish). With regards to positional differences, Henriksen (Reference Henriksen2014) found that /r/ in phonemically contrastive position (i.e., word-medial intervocalic) presents more occlusions than /r/ in non-phonemically contrastive position (e.g., word-initial). Along the same line, Lastra and Martín Butragueño (Reference Lastra, Martín Butragueño, Cestero Mancera, Molina Martos and Paredes García2006) showed that the production of /r/ with two or more occlusions is more likely to appear in word-medial intervocalic position than in word-initial position.
Regarding long-term Latino immigrants in the US, studies have shown that these speakers produce the Spanish /r/ with fewer occlusions than the canonical trill. Kissling (Reference Kissling2018) found that long-term immigrants in Virginia produced /r/ with an average of 1.34 occlusions. Similarly, Henriksen (Reference Henriksen2015) found that long-term immigrants in Chicago presented a mean value of 1.20 occlusions. Due to the phonetic variation of the Spanish /r/ in non-heritage native grammars, it is important that the baseline group(s) of heritage language studies reflect such variation.
3.2 Trill production by heritage speakers
Various studies on Spanish heritage speakers have found a prevalence of non-target-like realizations and high variability in the trill productions (e.g., fricative, approximant tap, approximant trill) by both children (Fabiano-Smith & Goldstein, Reference Fabiano-Smith and Goldstein2010; Kehoe & Havy, Reference Kehoe and Havy2018; Menke, Reference Menke2018) and adults (Amengual, Reference Amengual2016; Henriksen, Reference Henriksen2015; Kissling, Reference Kissling2018). For instance, Kehoe (Reference Kehoe2018) examined the speech of Spanish–German bilingual children in Germany longitudinally from 1;9 to 3;6 years and compared their rhotic development to that of Spanish and German monolingual children. The Spanish monolinguals showed target-like realizations of the Spanish trill (i.e., phonetic trill [r]) at 3;0 years (60% accuracy), whereas for the bilinguals they occured at a later stage (3;6 years) (50–60% accuracy) by only half of the speakers. Similarly, Fabiano-Smith and Goldstein (Reference Fabiano-Smith and Goldstein2010) found that Spanish-English bilingual children in the US (3;6 years) produced the Spanish trill in a target-like manner less frequently (4.1%) than age-matched monolinguals of Mexican Spanish (37.5%). Note that while the monolinguals produced more target-like trills, their accuracy rate was still low, which indicates that Spanish speakers do not completely acquire this sound early on. Further examining heritage speakers’ trill development in childhood, Menke (Reference Menke2018) investigated the Spanish trill produced by school-aged child heritage speakers in the US between Grade 1 (6;8–7;6) and Grade 7 (12;8–13;5). Allophonic variants attested in monolingual data (i.e., phonetic trills, taps with frication, and assibilated trills) were considered target-like. Results showed that target-like trill rates gradually increased from Grade 1 (27.2%) to Grade 7 (76%), while the number of alveolar approximants followed a reverse path and decreased from Grade 1 (18.2%) to Grade 7 (0%). Segment duration also increased from Grade 1 (75.3 ms) to Grade 7 (84.65 ms). The number of occlusions of the phonetic trill variants, however, was consistent across the age groups (from 2.7 times in Grade 1 to 2.59 times in Grade 7). According to Menke (Reference Menke2018), this delay in development might be caused by the increase of exposure to English through interactions with school peers and teachers when child heritage speakers are still acquiring the trill.
Studies on adult heritage speakers lay out a more complex scenario where target-like rates vary depending on language dominance (Amengual, Reference Amengual2016), cultural identity (Kissling, Reference Kissling2018), and the type of baseline used for comparison (Kissling, Reference Kissling2018; Henriksen, Reference Henriksen2015). For example, Henriksen (Reference Henriksen2015) examined heritage speakers in Chicago and included long-term immigrants as the reference level. Even though the heritage speakers produced trills with fewer lingual constrictions (1.10 occlusions) and shorter duration (70.49 ms) than the long-term immigrants (1.20 occlusions, 74.17 ms), no significant difference was found between the two groups. Nevertheless, Henriksen (Reference Henriksen2015) observed a difference in the manner of articulation, where the long-term immigrants favored fricatives, whereas the heritage speakers favored alveolar approximants. Kissling (Reference Kissling2018) incorporated both long-term immigrants and homeland speakers as baseline groups to examine heritage speakers’ trill production. Results showed that the heritage speakers presented trills with significantly shorter durations (80.78 ms), fewer occlusions (1.39) and more frication (16.45 ms) than the homeland speakers (duration: 89.26 ms; occlusion: 1.83; and frication portion: 9.04 ms), but no significant difference was found between the heritage speakers and the long-term immigrants. Finally, Amengual (Reference Amengual2016) explored the variation among heritage speakers based on their language dominance and found that English-dominant speakers realized the trill with 0 or 1 occlusion, while Spanish-dominant speakers produced the majority of their trills with two or more occlusions.
In spite of the existing body of research on child heritage speakers (Fabiano-Smith & Goldstein, Reference Fabiano-Smith and Goldstein2010; Kehoe, Reference Kehoe2018; Menke, Reference Menke2018) and adult heritage speakers (Kissling, Reference Kissling2018; Amengual, Reference Amengual2016; Henriksen, Reference Henriksen2015), a direct comparison between the two cannot be drawn due to different research methods. For instance, in Kehoe's (Reference Kehoe2018) study, the child heritage speakers were bilingual speakers of Spanish and German, unlike other studies in which the heritage speakers were Spanish-English bilinguals. Moreover, while the studies on children used single-word picture naming tasks (Fabiano-Smith & Goldstein, Reference Fabiano-Smith and Goldstein2010; Menke, Reference Menke2018), the studies on adults used semi-spontaneous speech (Kissling, Reference Kissling2018; Henriksen, Reference Henriksen2015) or a sentence reading task (Amengual, Reference Amengual2016). That is, some tasks were more controlled (e.g., reading, picture naming task) than others (e.g., semi-spontaneous speech), which may affect the articulation of the trill to varying degrees. Thus, it is important to use the same research method when comparing children and adults.
4. Research Questions
In this study we compared the production of the Spanish trill by school-aged child heritage speakers (i.e., past the age at which normally developing monolingual children acquire the trill) to adult heritage speakers. In this study we intend to answer the following research questions:
(1) Is there an effect of age on heritage speakers' production of the Spanish trill? That is, do child heritage speakers and adult heritage speakers differ in their production of the trill?
An attrition-based model would predict that child heritage speakers will produce the trills in a more target-like manner than adult heritage speakers (CHS > AHS). A delayed-development-based model would predict that child heritage speakers will produce the trill in a less target-like manner than adult heritage speakers (CHS < AHS). A developmental path in which the acquisition of the trill is arrested would predict that child and adult heritage speakers will not significantly differ in their production of target-like trills (CHS = AHS).
(2) Is there an effect of position on heritage speakers' production of the Spanish trill?
Based on the findings from monolingual grammars (Henriksen, Reference Henriksen2014; Lastra & Martín Butragueño, Reference Lastra, Martín Butragueño, Cestero Mancera, Molina Martos and Paredes García2006), we predicted that heritage speakers’ trills will be more target-like in contrastive positions (i.e., word-medial intervocalic) than in non-contrastive positions (i.e., word-initial).
5. Methodology
5.1 Participants
Sixteen adult heritage speakers (14 F, 2 M, mean age = 20.5 years, SD = 1.5) and 11 child heritage speakers (5 F, 6 M, mean age = 9.6 years, SD = 0.54) participated in the study. All the participants were Mexican Americans who were born and raised in Los Angeles county, and had both parents that immigrated to the US from Mexico as adults, except for 3 child heritage speakers (CHS2, CHS9, and CHS10) for whom only one of their parents were from Mexico.Footnote 1 The heritage speakers were first exposed to Spanish at home and learned English before age 5. The adult heritage speakers were undergraduate students at the University of California, Los Angeles, and had previously taken courses in Spanish. The child heritage speakers were recruited from Spanish-English dual language immersion programs at two elementary schools in Los Angeles, in which the instructors mainly use Mexican Spanish.
5.2 Procedures
The heritage speakers narrated the story of a wordless picture book, Frog where are you? (Mayer, Reference Mayer1969) (henceforth the frog story). The frog story is an appropriate tool to elicit the Spanish trill in a naturalistic manner, since it includes many instances of words with this sound (e.g., perro ‘dog’, rana ‘frog’). The recordings were conducted in a quiet room using an AKG C520 head-mounted microphone connected to a Zoom H4n handy portable digital recorder with a sampling rate of 44.1 kHz and a sample size of 16 bits.
Following Polinsky and Kagan (Reference Polinsky and Kagan2007), we used lexical proficiency to assess participants’ Spanish proficiency.Footnote 2 Based on the speakers’ oral narratives, we calculated two measures of lexical diversity (i.e., VOCDFootnote 3 (McCarthy & Jarvis, Reference McCarthy and Jarvis2007) and number of different words in the first 100 words (NDW)) and a measure of lexical sophistication (i.e., content word frequency). The first two measures were calculated using the Child Language Analysis (CLAN) program (MacWhinney, Reference Macwhinney1992) and the absolute content word frequency (per million) (Crossley & McNamara, Reference Crossley and McNamara2012) was calculated using the CLEARPOND software interface (Marian, Bartolotti, Chabal & Shook, Reference Marian, Bartolotti, Chabal and Shook2012) which adopts the SUBTLEX-ESP corpus (Cuetos, Glez-Nosti, Barbón & Brysbaert, Reference Cuetos, Glez-Nosti, Barbón and Brysbaert2011). Table 1 summarizes the results. We performed independent-samples t-tests to compare the proficiency measures between the two groups. Results showed that the adult heritage speakers performed significantly better than the child heritage speakers in VOCD (t(25) = 2.44, p < 0.05) and NDW (t(25) = 2.11, p < 0.05), whereas the content word frequency did not differ between the two groups (t(20) = −0.11, p = 0.91). This suggests that, as heritage speakers grow up, their vocabulary size may increase, but they may not acquire more advanced vocabulary due to limited domains of heritage language use.
5.3 Coding and Analysis
Forced alignment of heritage speakers’ speech was carried out at the segmental level using EasyAlign (Goldman, Reference Goldman2011) which is a plug-in of Praat (Boersma & Weenink, Reference Boersma and Weenink2020). All instances of Spanish phonological trill in word-initial (e.g., rana ‘frog’) and word-medial intervocalic positions (e.g., perro ‘dog’) were extracted. The classification of the variants was adapted from Rose (Reference Rose2010): phonetic trill, approximant trill, tap, approximant tap, perceptual tap, fricative, and tap+fricative. The phonetic trill was identified as a token with two or more occlusions, represented as clear breaks in the spectrogram. If a token showed two or more visible constrictions in the spectrogram, but with a continuation of the formant structure (i.e., trill with weaker lingual constriction), it was coded as an approximant trill. The true tap was coded when there was one occlusion that was clearly marked in the spectrogram. The approximant tap was identified as a token with a vertical band with continuation of the formant structure (i.e., tap with a weaker constriction). The perceptual tap was coded when a tap gesture was auditorily perceived, but no constriction was identified in the spectrogram. The fricative was coded when a turbulent noise was visible in the acoustic signal without any occlusions. The tap+fricative was coded when the tap was followed by an aperiodic waveform. Table S1 (Supplementary Material) demonstrates an example of each category. The variants that do not fit into any of the categories above were coded as “other” (e.g., lateral or retroflex realizations). To ensure inter-rater reliability, the kappa statistic was performed on a subset of the data (i.e., 12 participants, 337 tokens, 33.3% of the data) which were annotated by the authors who are trained phoneticians. The results showed that there was only a fair agreement between the two annotators (K = 0.437, p < 0.001), according to Landis and Koch's (Reference Landis and Koch1977) interpretation. The low kappa coefficient was mainly due to discrepancies in the annotation of the data of one speaker. Thus, we reviewed the discrepancies, re-annotated the remaining data, and re-ran the kappa statistic in which the kappa coefficient reached the level of almost perfect agreement (K = 0.868, p < 0.001).
The present study analyzed heritage speakers’ production of phonetic trills and target-like trills. The trills were considered target-like if they were produced with two or more brief lingual constrictions: that is, as phonetic trills or approximant trills. Regarding the acoustic properties of the trill, segment duration (ms) and the number of lingual constrictions were extracted from all the tokens. For the tokens produced with two or more constrictions, the duration of the first aperture (ms) was also measured.
Statistical analyses were performed using R statistical software (R Core Team, 2020). Generalized linear mixed-effects models were conducted for the analyses of phonetic trills and target-like trills (i.e., binary data) using the glmer function in the lme4 package (Bates, Mächler, Bolker & Walker, Reference Bates, Mächler, Bolker and Walker2015) with group (adult vs. child), position (initial vs. medial), and their interaction as fixed effects and participant and word as random effects. For the acoustic properties (i.e., continuous data) linear mixed-effects models were performed using the lmer function in the same package. Post-hoc power analyses (1-β) were simulated (100 simulations) using the simr package (Green & MacLeod, Reference Green and MacLeod2016). We report log odds ratio (OR) as a proxy for the effect size in the generalized linear mixed effects models. For the linear mixed effects models, we report the R2 statistic (Snijders & Bosker, Reference Snijders and Bosker1994; Bryk & Raudenbush, Reference Bryk and Raudenbush1992) as a measure of explained variance.
6. Results
6.1 Realization of the Spanish phonological trill
In total 836 cases of Spanish phonological trill were obtained. Among them 25 tokens were excluded from the analyses due to creaky voice (N = 21), devoicing (N = 1), or incorrect production (N = 3) (e.g., rona instead of rana ‘frog’). The remaining 811 tokens consisted of 345 word-initial and 466 word-medial intervocalic phonological trills. As demonstrated in Table 2, the trills were realized in various forms and their distribution slightly differed depending on the age group and the position within the word. The adult heritage speakers produced the (phonological) trills most frequently as (phonetic) trills regardless of the position (word-initial: 27.85%, word-medial: 33.45%), although in word-initial position fricatives (25.32%) were also frequently found. With regard to the child heritage speakers, the fricative was the most commonly observed variant in word-initial position (40.74%), whereas in word-medial position no clear preference for a particular form was found.
6.2 Phonetic trill rate and target-like trill rate
Figure 1 demonstrates the percentage of phonetic trills (left) and target-like trills (i.e., phonetic trills and approximant trills) (right). Results showed that there was a main effect of group (β = −2.913, SE = 0.784, z = −3.713, p < 0.001, 1-β = 1, OR = 0.054, 95% CI [ 0.010, 0.301]), which indicates that the adult heritage speakers (i.e., reference level) produced significantly higher rates of phonetic trills (M = 30.87%, SD = 46.24) than the child heritage speakers (M = 4.05%, SD = 19.76). No main effect of position or significant interaction between group and position was found.
As for the percentage of target-like trills, results showed that there was a main effect of group (β = −2.727, SE = 0.817, z = −3.336, p < 0.001, 1-β = 0.89, OR = 0.065, 95% CI [0.010, 0.301]), which indicates that the adult heritage speakers produced the trills in a target-like manner with significantly higher rates (M = 43.69%, SD = 49.65) than the child heritage speakers (M = 12.84%, SD = 33.51). No main effect of position or significant interaction between group and position was found.
6.3 Number of lingual occlusions and duration
Figure 2 presents a histogram of the number of lingual constrictions. Results showed that there was a main effect of group (β = −0.574, SE = 0.193, t = −2.9693, p < 0.01, 1-β = 0.83, R2 = 8.13%, 95% CI [0.386, 0.821]) and a main effect of position (β = 0.27, SE = 0.083, t = 3.265, p < 0.05, 1-β = 0.76, R2 = 2.54%, 95% CI [1.110, 1.559]), which indicates that the adult heritage speakers produced the trills with significantly more lingual constrictions (M = 1.39, SD = 0.98) than the child heritage speakers (M = 0.8, SD = 0.7) and the number of lingual constrictions was significantly higher in word-medial position (M = 1.23, SD = 0.86) than in word-initial position (i.e., reference level) (M = 1.1, SD = 1.02). The explained variance (i.e., R2 as per Bryk & Raudenbush, Reference Bryk and Raudenbush1992; Snijders & Bosker, Reference Snijders and Bosker1994) was higher in the main effect of group than in the main effect of trill position. No significant interaction was found between the two fixed factors.
Figure 3 demonstrates the duration of the trills. Main effects of group (β = −0.017, SE = 0.007, t = −2.479, p < 0.05, 1-β = 0.98, R2 = 7.61%, 95% CI [0.970, 0.996]) and position were found (β = −0.009, SE = 0.003, t = −2.74, p < 0.05, 1-β = 0.98, R2 = 0.94%, 95% CI [0.984, 0.997]), suggesting that the adult heritage speakers produced longer trills (M = 67.19 ms, SD = 29.66) than the child heritage speaker (M = 48.6 ms, SD = 25) and the trills were significantly longer in word-initial position (M = 67.34 ms, SD = 31.75) than in word-medial position (M = 55.27 ms, SD = 26.47). The explained variance was higher in the main effect of group than in the main effect of trill position. No significant interaction between group and position was found.
Regarding the duration of the first aperture, we only report the results of target-like trills. This property demonstrated similar values across the group (adult: M = 22.46 ms, SD = 0.63; child: M = 20.41 ms, SD = 0.46) and the position (word-initial: M = 21.49 ms, SD = 0.69; word-medial intervocalic: M = 22.63 ms, SD = 0.54). No significant interaction between the two factors was found.
7. Discussion
Various studies in heritage language acquisition (Montrul, Reference Montrul2018; Polinsky, Reference Polinsky2018) have encouraged researchers to bridge the gap between the scholarship on early bilingualism and adult heritage speakers. The present study followed this line of research in order to better understand Spanish heritage speakers’ trill development and account for their divergent grammars when compared to non-heritage native speakers.
The first objective of this study was to compare trill production between child and adult heritage speakers. Our results showed that, compared to the 9–10 year-old child heritage speakers, the adult heritage speakers produced significantly higher rates of phonetic trills (i.e., with two or more clear occlusions) and target-like trills (i.e., phonetic trills and variants with two or more soft constrictions that resemble phonetic trill production), and produced the trills with significantly more lingual constrictions and longer duration. However, an agreement has yet to be reached in establishing the baseline of comparison for heritage speakers (Otheguy, Reference Otheguy, Tortora, den Dikken, Montoya and O'Neil2016). In this section, we discuss the appropriate baseline groups for heritage speakers and compare our findings to those of the baseline reported in other studies that used the same or similar data elicitation methods (i.e., (semi-)spontaneous speech).
7.1 Identifying the baseline
Assuming that the heritage speakers in this study have been exposed to varieties of monolingual Mexican Spanish (e.g., family in parents’ hometown in Mexico, recent immigrants from Mexico, Mexican media), as well as bilingual Spanish (e.g., long-term immigrants from Mexico), we compared our results to those reported in monolingual and bilingual varieties of Mexican Spanish. Table 3 summarizes the findings across groups.
H: Henriksen (Reference Henriksen2015), K: Kissling (Reference Kissling2018), B&W: Bradley and Willis (Reference Bradley and Willis2012), L&M: Lastra and Martín Butragueño (Reference Lastra, Martín Butragueño, Cestero Mancera, Molina Martos and Paredes García2006)
We first contrasted our findings to those of monolingual Spanish varieties, specifically Veracruz Mexican Spanish (Bradley & Willis, Reference Bradley and Willis2012), Central Mexican Spanish (and recent immigrants) (Kissling, Reference Kissling2018), and Mexico City Spanish (Lastra & Martín Butragueño, Reference Lastra, Martín Butragueño, Cestero Mancera, Molina Martos and Paredes García2006). The first two studies used the frog story for data elicitation, same with our study, and the third study analyzed natural conversations. With regard to trill production rates, Bradley and Willis (Reference Bradley and Willis2012) defined normative trill as the variant consisting of two or more visible lingual contacts represented as a clear reduction in intensity in the waveform and spectrogram, which coincides with our criteria for target-like trill. Lastra and Martín Butragueño (Reference Lastra, Martín Butragueño, Cestero Mancera, Molina Martos and Paredes García2006) defined vibrante (rr) as the variant demonstrating two or more brief interruptions of energy corresponding to “spaces in white” in the spectrogram, which coincides with our criteria for phonetic trill. Thus, we make comparisons with heritage speakers’ trill rates for both target-like trills and phonetic trills. The monolinguals in Bradley and Willis (Reference Bradley and Willis2012) demonstrated a slightly higher target-like trill rate (49.26%) than the adult heritage speakers in this study (43.69%) and a much higher rate than the child heritage speakers (12.84%). As for the monolinguals in Lastra and Martín Butragueño (Reference Lastra, Martín Butragueño, Cestero Mancera, Molina Martos and Paredes García2006), the phonetic trill rate was even higher (65%). With regard to the number of lingual constrictions, both the adult heritage speakers (1.39) and the child heritage speakers (0.8) demonstrated lower values than the speakers in Kissling (Reference Kissling2018) (1.87). Regarding segment duration, the adult heritage speakers (word-initial: 72.33 ms, intervocalic: 62.81 ms) and the child heritage speakers (word-initial: 56.4 ms, intervocalic: 44.13 ms) presented shorter trills, compared to those of the speakers in Bradley and Willis (Reference Bradley and Willis2012) (word-initial: 77 ms, intervocalic: 70 ms) and Kissling (Reference Kissling2018) (position combined: 89.26 ms).
To compare our results to those of bilingual varieties, we relied on the findings of long-term immigrants from Mexico (Henriksen, Reference Henriksen2015; Kissling, Reference Kissling2018). Both studies used the frog story for data elicitation. While these studies did not report phonetic or target-like trill rates, Henriksen (Reference Henriksen2015) presented the distribution of tokens with varying numbers of occlusions (0–4) which he determined using similar criteria as Bradley and Willis (Reference Bradley and Willis2012). Thus, we calculated the percentage of cases in which the trill was produced with 2 occlusions or more and, based on this information, the long-term immigrants’ trill rate in Henriksen (Reference Henriksen2015) was 39.63%. This is in between the target-like trill rate (43.69%) and the phonetic trill rate (30.87%) of the adult heritage speakers in our study and much higher than both child heritage speakers’ target-like trill rate (12.84%) and phonetic trill rate (4.05%). In relation to the number of occlusions, the adult heritage speakers demonstrated similar values (1.39) to those in Kissling (Reference Kissling2018) (1.34) and Henriksen (Reference Henriksen2015) (1.20), while these values were lower for the child heritage speakers (0.8). With regard to segment duration, both the trills produced by the adult heritage speakers (position combined: 67.19 ms) and the child heritage speakers (position combined: 48.6 ms) were shorter than the values reported in Kissling (Reference Kissling2018) (position combined: 79.14 ms) and Henriksen (position combined: 74.17 ms).
To summarize, when monolingual speakers of Mexican Spanish (Bradley & Willis, Reference Bradley and Willis2012; Kissling, Reference Kissling2018; Lastra & Martín Butragueño, Reference Lastra, Martín Butragueño, Cestero Mancera, Molina Martos and Paredes García2006) were set as the baseline, the heritage speakers in our study, both adults and children, seemed to diverge from the baseline in all three acoustic properties (i.e., phonetic/target-like trill rates, number of lingual constrictions, and segment duration) and the child heritage speakers demonstrated stronger divergence than the adult heritage speakers (CHS < AHS < BASE). On the other hand, when long-term immigrants from Mexico (Henriksen, Reference Henriksen2015; Kissling, Reference Kissling2018) were set as the baseline, we found mixed results. As for the phonetic/target-like trill rates and the number of occlusions, only the adult heritage speakers produced the trills in a similar manner as the baseline, while the child heritage speakers produced them with lower rates and with fewer occlusions (CHS < AHS = BASE). With regard to segment duration, both heritage speaker groups produced the trills with shorter duration than the baseline, and the deviance from the baseline was larger for the child heritage speakers than the adult heritage speakers (CHS < AHS < BASE).
This by no means suggests that all heritage speakers followed the same patterns. As shown in the high standard deviation values in Table 3, our data demonstrated large inter-speaker variation. 6 out of 16 adult heritage speakers (AHS2: 55.3%, AHS3: 73.5%, AHS5: 40.6%, AHS8: 42.9%, AHS11: 40%, AHS15: 72.7%) produced the trill as a phonetic trill within the range of the two baseline groups (39.6%-65%) or more. None of the child heritage speakers produced the phonetic trill at this frequency. With regard to the target-like trill rate, half of the adult heritage speakers (AHS2: 72.3%, AHS3: 83.7%, AHS5: 53.1%, AHS8: 65.7%, AHS10: 76.2%, AHS11: 55%, AHS13: 40.6%, AHS15: 72.7%) and 2 child heritage speakers (CHS3: 50%, CHS8: 46%) produced the trill within or more than the range of the baseline groups. Our data also showed an overlap in the number of lingual constrictions between most of these speakers (AHS2: 2, AHS3: 1.9, AHS5: 1.5, AHS8: 1.9, AHS9: 1.4, AHS10: 2, AHS11: 1.6, AHS13: 1.3, AHS15: 2.6, CHS3: 1.3) and the baseline groups (1.2–1.9), which is expected given the close relationship between target-like trill rate and the number of lingual constrictions. Lastly, regarding segment duration, 4 adult heritage speakers (AHS2: 81.2 ms, AHS8: 74.4 ms, AHS10: 106 ms, AHS15: 111.3ms) and 1 child heritage speaker (CHS3: 83.7ms) demonstrated similar (or longer) duration as the baseline groups (74.2 ms - 89.3ms). The analysis of individual data suggests that, while some heritage speakers (AHS2, AHS8, AHS10, ASH15, CHS3) aligned with both baseline groups in all phonetic properties investigated, there were fewer heritage speakers that converged with the baseline in segment duration, compared to other phonetic properties. Thus, it appears that, when heritage speakers produce heritage language speech sounds, they tend to diverge from the baseline in certain phonetic properties (in this case, segment duration) more than others (Chang & Yao, Reference Chang and Yao2016).
In the case of the child heritage speakers, apart from the two baseline groups above, a comparison with Spanish monolingual children (Carballo & Mendoza, Reference Carballo and Mendoza2000) was carried out to explore whether these speakers show similar developmental patterns to those of age-matched monolinguals (Paradis & Genesee, Reference Paradis and Genesee1996). Since Carballo and Mendoza (Reference Carballo and Mendoza2000) used a different elicitation method (i.e., picture-naming task) and the participants were Peninsular Spanish speakers (Granada), we acknowledge that it is not ideal to directly compare the findings of the two studies. However, to our knowledge, Carballo and Mendoza (Reference Carballo and Mendoza2000) is the only study that investigated the trill production of school-aged Spanish monolingual children. Thus, we make this comparison with caution.
Compared to the monolingual peers in Carballo and Mendoza (Reference Carballo and Mendoza2000), the child heritage speakers in our study produced fewer lingual constrictions (0.8 vs. 2.3) and shorter trills (48.6 ms vs. 115.7 ms). While the large durational difference between the two groups may be due to difference in the number of lingual constrictions, it may also be associated with different task types. That is, the speakers in Carballo and Mendoza (Reference Carballo and Mendoza2000) might have produced longer trills, because the task elicited more controlled speech than in our study. It is worth pointing out that the child heritage speakers in Menke (Reference Menke2018), who completed a similar task as in Carballo and Mendoza (Reference Carballo and Mendoza2000) (i.e., picture-sorting task), also demonstrated noticeably shorter durations (70.4 ms) than the monolingual children. Thus, the shorter segment duration found in our study compared to Carballo and Mendoza (Reference Carballo and Mendoza2000) is likely to be a result of both fewer lingual constrictions and task type. With regard to the duration of the first aperture in target-like trills, which is a property that clearly distinguishes more proficient from less proficient trillers (Carballo & Mendoza, Reference Carballo and Mendoza2000), the child heritage speakers in our study (20.41 ms), as well as the adult heritage speakers (22.46 ms), presented similar values to those of the monolingual children (21.9 ms) in Carballo and Mendoza (Reference Carballo and Mendoza2000).
To summarize, the child heritage speakers produced the trill with fewer occlusions and shorter segment duration than the age-matched monolingual baseline, but when they are able to produce the trill with two or more lingual constrictions (i.e., in a target-like manner), they do so with the same degree of articulatory precision as monolingual children.
7.2 Effects of position on heritage speakers’ trill production
The second objective of this study was to examine whether phonetic trill production was affected by the position of the trill in the word (word-medial and word-initial). In this study, we found that, while heritage speakers’ trills in word-medial intervocalic position (i.e., phonemically contrastive) were not produced more frequently as phonetic/target-like trills than those in word-initial position (i.e., non-phonemically contrastive), they were produced with more lingual constrictions. This is similar to the findings of monolingual Spanish varieties (Henriksen, Reference Henriksen2014; Lastra & Martín Butragueño, Reference Lastra, Martín Butragueño, Cestero Mancera, Molina Martos and Paredes García2006). Thus, our data align with previous studies in heritage language phonology in that heritage speakers maintain the distinction in language-internal phonemic contrasts (Chang et al., Reference Chang, Haynes, Yao and Rhodes2009, Reference Chang, Yao, Haynes and Rhodes2011; Einfeldt et al., Reference Einfeldt, van de Weijer and Kupisch2019; Lein et al., Reference Lein, Kupisch and van de Weijer2016). As suggested in Kupisch (Reference Kupisch2020), heritage speakers may maintain or even over-mark phonemic contrasts as a way to ease the overtaxing costs of one-to-more mappings in a situation in which more than one language competes for limited cognitive resources (i.e., avoidance of ambiguity in Polinsky & Scontras, Reference Polinsky and Scontras2020).
With regard to segment duration, we found that the heritage speakers produced the trills with longer duration in word-initial (adult: 72.33 ms, child: 56.4 ms) than in word-medial intervocalic position (adult: 62.81 ms, child: 44.13 ms). While this may appear counterintuitive, the longer duration in word-initial position is likely to be an effect of domain-initial strengthening, by which consonants in higher prosodic domains (e.g., word-initial) are produced with stronger articulation (e.g., longer duration) than those in lower prosodic domains (e.g., word-medial) (Fougeron & Keating, Reference Fougeron and Keating1997).
7.3 Connecting the dots between child and adult heritage speakers' trill production
With the goal to account for adult Spanish heritage speakers’ divergent trill production from the monolingual norms (Amengual, Reference Amengual2016; Kissling, Reference Kissling2018; Henriksen, Reference Henriksen2015), the present study adopted a developmental approach by directly comparing child and adult heritage speakers. We then compared our findings to those of non-heritage native speakers reported in other studies (Bradley & Willis, Reference Bradley and Willis2012; Carballo & Mendoza, Reference Carballo and Mendoza2000; Henriksen, Reference Henriksen2015; Lastra & Martín Butragueño, Reference Lastra, Martín Butragueño, Cestero Mancera, Molina Martos and Paredes García2006; Kissling, Reference Kissling2018). While both the adult heritage speakers and the child heritage speakers showed divergence from the non-heritage native baselines in one or more phonetic properties of the trill, the adult heritage speakers produced the trill in a more target-like manner than the child heritage speakers (CHS < AHS). Thus, our findings support Menke (Reference Menke2018) in that heritage speakers continue developing the Spanish phonological trill during childhood. Moreover, our data showed that, apart from the adult baseline groups, the child heritage speakers diverged from age-matched monolingual children, suggesting that heritage trill development occurs at a slower rate compared to their monolingual peers. Deceleration in bilingual development has also been proposed in Goldstein and Washington (Reference Goldstein and Washington2001) in which 4-year-old child heritage speakers produced the Spanish trill with lower accuracy than the English approximant. Goldstein and Washington (Reference Goldstein and Washington2001) argued that, as a way to distinguish their two phonological systems, it is likely that child heritage speakers focus on mastering the English approximant prior to later-developing sounds in Spanish, such as taps and trills.
When comparing heritage speakers to the baseline groups, it is important to take into account that non-heritage native speakers demonstrate variation. Lastra and Martín Butragueño (Reference Lastra, Martín Butragueño, Cestero Mancera, Molina Martos and Paredes García2006) found that Mexico City Spanish speakers mainly used three trill variants: normative trill (65%), non-sibilant fricative (19%), and sibilant fricative (14%). Similarly, Bradley and Willis (Reference Bradley and Willis2012) showed that the representative allophones found in Veracruz Mexican Spanish were normative trill (49.3%), tap followed by vocalic r-coloring or frication, and non-vibrant forms such as fricatives.Footnote 4 With regard to long-term Mexican immigrants, Henriksen (Reference Henriksen2015) found that more than half of the trills had zero or one occlusion. Although allophonic distribution of non-normative trills was not the main focus of the study, Henriksen (Reference Henriksen2015) reported that the variants with zero occlusion were primarily fricatives. Moreover, based on the findings in Henriksen (Reference Henriksen2015), in which the speakers who mainly produced the trills with one occlusion demonstrated significantly longer duration than the phonological taps, it is likely that the trill variants with one occlusion were taps followed by vocalic r-coloring or frication, similar to those in Bradley and Willis (Reference Bradley and Willis2012). Thus, it appears that, apart from the normative trill, non-heritage native speakers often use variants containing frication (i.e., sibilant/non-sibilant fricative, tap followed by frication).
We further explored the distribution of heritage speakers’ non-target-like realizations of the Spanish trill (i.e., variants other than the phonetic trill and the approximant trill). We found that those containing frication (i.e., fricative, tap+fricative) were frequently used by both the child heritage speakers (46.96%) and the adult heritage speakers (41.55%) (see Table 1), which suggests that, like the non-heritage native baselines, heritage speakers associate frication with the Spanish trill. We also found that, while the phonetic tap and its continuant variants (i.e., approximant tap, perceptual tap) comprised a large part of the child heritage speaker data (36.15%), the adult heritage speakers produced these variants in only 12.62% of the time. That is, the variants related to the phonetic tap and those with frication were the two most frequent types of realization in the child heritage speakers’ speech, while for the adult heritage speakers the two most frequent types of realization were the variants with frication and those related to the phonetic trill (i.e., phonetic trill, approximant trill).
We classified the allophonic variants presented in Table 1 into three broad categories: trill (i.e., phonetic trill, approximant trill), frication (i.e., fricative, tap+fricative), and tap (i.e., true tap, approximant tap, perceptual tap). Table 4 presents heritage speakers’ trill inventories based on the types that comprised more than 10% of their productions. The last row (i.e., trill, frication) represents the inventory found in non-heritage native baselines.
Tap: True tap, approximant tap, perceptual tap; Frication: Fricative, tap+fricative, Trill: Phonetic trill, approximant trill
All the heritage speakers used frication as one of the strategies to produce the trill, except for one child heritage speaker (CHS2) who consistently produced the tap variants (true tap: 50%, approximant tap: 38.9%, perceptual tap: 11.1%). Almost half of the adult heritage speakers (AHS2, AHS3, AHS5, AHS6, AHS8, AHS10, AHS11, AHS13, AHS15) used both trill and frication (i.e., target-like inventory), while fewer than a third of the child heritage speakers (CHS3, CHS4, CHS8) demonstrated this pattern. Note that these speakers largely coincide with the ones whose target-like trill rates and number of lingual constrictions were within the baseline range (see Section 7.1). While most of them produced the trill variants more frequently than the variants with frication, 1 adult heritage speaker and 1 child heritage speaker primarily used frication to produce the trill (AHS6: 82.1%, CHS4: 65.2%). Frication was also the predominant strategy used by heritage speakers who demonstrated non-target-like inventories. It is important to note that the most frequent non-target-like inventory found in the child heritage speaker data was frication and tap, whereas for the adult heritage speakers it was the full inventory (i.e., trill, frication, tap). This finding suggests that the child heritage speakers have not yet acquired the trill variants and the adult heritage speakers, even after acquiring the trill variants, have not abandoned the tap variants in their inventories.
The association between frication and the Spanish trill has also been attested in L2 phonological development. Morales Reyes, Arechabaleta-Regulez and Montrul (Reference Morales Reyes, Arechabaleta-Regulez and Montrul2017) found that 4- to- 7-year-old American English L2 learners of Spanish produced the Spanish tap as a phonetic tap or its variants (i.e., approximant tap, perceptual tap) most of the time (89.7%), similar to Spanish monolingual children (88.9%), whereas they produced the Spanish trill with frication (i.e., fricative, tap+fricative) with much higher rates (66.7%) than their monolingual peers (37.8.%). The variant that was observed most frequently in the monolingual data was the phonetic trill (46.7%). Morales Reyes et al. (Reference Morales Reyes, Arechabaleta-Regulez and Montrul2017) also examined the relationship between the amount of exposure to Spanish and learners’ realizations of the Spanish trill.Footnote 5 and found that the percentage of the phonetic trill was higher for those with more exposure to Spanish. Rose (Reference Rose2010) found similar patterns in the speech of adult American English L2 learners of Spanish and argued that L2 learners go through several stages when acquiring the Spanish tap-trill contrast. That is, L2 learners initially do not distinguish the two phonemes and associate both of them with the English alveolar approximant. Later, they gradually introduce other continuants in their repertoire, such as the approximant tap and the perceptual tap, and then the phonetic tap. At a later stage of the development, L2 learners begin to associate the variants that involve frication (i.e., fricative, tap+fricative) and the phonetic trill with the phonological trill, and associate the tap variants with the phonological tap. However, these studies also showed that the tap variants persisted in highly proficient L2 learners’ trill inventories (Morales Reyes et al., Reference Morales Reyes, Arechabaleta-Regulez and Montrul2017; Rose, Reference Rose2010).
While heritage speakers do not share the same language history with L2 learners, the similarities found between them indicate that phonetic trills are introduced later in the phonological development of Spanish-English bilinguals and that bilingual trill development occurs in the following order with overlaps between stages: single lingual constriction → frication → multiple lingual constrictions. Note that this is very similar to the early trill development of Spanish monolingual children who acquire the trill later than the tap and often use the tap as a substitute for the trill (Acevedo, Reference Acevedo1993; Bosch, Reference Bosch1983). However, unlike Spanish-English bilinguals, Spanish monolingual children abandon the tap variants by the time they reach school age (2.2%) (Morales Reyes et al., Reference Morales Reyes, Arechabaleta-Regulez and Montrul2017). Some monolingual children at this stage may experience difficulties when producing the Spanish trill. Carballo and Mendoza (Reference Carballo and Mendoza2000) argued that this is due to their tongue body shape during trill production that may not be appropriate to meet the aerodynamic requirements to successfully trill. While Carballo and Mendoza (Reference Carballo and Mendoza2000) did not describe the variant as a fricative, we speculate that the non-target-like constriction observed in their study resembles the fricatives in our study.
Based on our findings, it seems that Spanish heritage speakers go through a similar developmental process as non-heritage native speakers, but they do so at a slower rate (i.e., deceleration). Moreover, even as adults some heritage speakers may not fully develop the speech motor control necessary to produce the Spanish trill and exhibit increased variability. The increase of variability found in our study aligns with Kupisch's (Reference Kupisch2020) remarks addressing that heritage speakers exploit language-inherent variation to avoid markedness. For instance, Kupisch (Reference Kupisch2020) pointed out that, when producing the Italian alveolar trill, Italian heritage speakers avoid the use of phonetic trills and instead produce phonetic taps and other variants. Similarly, Putnam (Reference Putnam2019) argued that, as a result of constant competition between the two languages, heritage speakers acquire linguistic representations that are more gradient and less stable than those in non-heritage grammars, which may contribute to their increased variability. Thus, deceleration followed by acquisition without mastery of the heritage language (Montrul, Reference Montrul2016, p. 126) or unstable/unconsolidated heritage grammars (Putnam, Reference Putnam2020) seems to best explain divergent trill productions found in some adult heritage speakers. This calls for the addition of a fourth scenario (i.e., CHS ≠ AHS ≠ BASE) in Polinsky and Scontras's (Reference Polinsky and Scontras2020) model.
It is important to note that the present study included two baseline groups (i.e., Spanish monolingual speakers and long-term immigrants from Mexico) whose patterns do not completely align. Although overall the adult heritage speakers in this study diverged from both baseline groups in one or more phonetic properties, they performed more similarly to the long-term immigrants than the monolingual speakers. Specifically, the adult heritage speakers patterned like the long-term immigrants (Henriksen, Reference Henriksen2015; Kissling, Reference Kissling2018) in that they produced fewer target-like trills and produced the trill with fewer lingual constrictions than the monolingual speakers (Bradley & Willis, Reference Bradley and Willis2012; Kissling, Reference Kissling2018; Lastra & Martín Butragueño, Reference Lastra, Martín Butragueño, Cestero Mancera, Molina Martos and Paredes García2006). These findings suggest that heritage speakers develop a phonological system approaching that of long-term immigrants who, after living in the US for a long period of time, may show attrition in their native variety in favor of the local variety (e.g., Los Angeles Spanish) or the majority language (e.g., English). In order to confirm this, future research should carefully examine heritage speakers’ source(s) of Spanish input, especially the input from long-term immigrants of their speech community, which may include varieties other than those of their parents’ homeland. Given that the long-term immigrant data were collected in Chicago, Illinois (Henriksen, Reference Henriksen2015) and Richmond, Virginia (Kissling, Reference Kissling2018), which differ from Los Angeles in the distribution of the Latino populations and the regional English dialects, it is possible that their varieties have undergone changes differently from the varieties of long-term immigrants in Los Angeles.
While this study compared child heritage speakers at an age when full mastery of the Spanish trill is reported in monolinguals, we cannot entirely rule out other possibilities leading to attrition, such as attrition during late childhood or adolescence. That is, if heritage speakers during late childhood or adolescence demonstrate target-like production of the trill and adult heritage speakers do not, this will be a case of delayed (complete) acquisition followed by attrition. Although this scenario seems less likely based on Menke's (Reference Menke2018) findings which demonstrated continued trill development of child heritage speakers between ages 6;8 and 13;5, we emphasize that meta-analysis should be taken with caution when done on studies that used different research methods (see Section 3.2). Thus, future research should consider the complete age spectrum, including early childhood, late childhood, adolescence, early adulthood, and late adulthood, and make comparisons using the same data elicitation and analysis methods to fully understand heritage Spanish trill development.
8. Conclusion
In order to address the “missing link” (Montrul, Reference Montrul2018) between early bilingualism and adult heritage grammars in the literature on heritage language phonology, we compared the production of the Spanish trill by school-aged (9–10 years) and adult heritage speakers. We found that the adult heritage speakers outperformed the child heritage speakers. However, almost half of the adult heritage speakers demonstrated divergence from non-heritage native baselines. Our findings indicate that child heritage speakers during this period are still in the process of developing heritage phonological grammars, but their grammars may not reach stability in adulthood.
While our study is the first to directly compare child heritage speakers’ and adult heritage speakers’ production of the Spanish trill, future research should include more age groups, including those of late adolescence and late adulthood, in order to track the complete developmental process of heritage Spanish trill. Heritage speakers’ divergence from monolingual norms is often claimed to result from reduced heritage language input and/or use. Although the amount of heritage language input and use account for major differences between heritage speakers and their monolingual peers, the type of heritage language input should also be taken into account. In our study, as a point of reference, target-like trill productions were considered as variants with two or more lingual constrictions (i.e., normative trill). However, this by no means indicates that heritage speakers should produce these variants categorically, given the variability of the Spanish trill found within and across dialects. Thus, future research should carefully examine the varieties to which heritage speakers are exposed and whether heritage speakers use the variants found in their input in a consistent manner.
Supplementary Material
For supplementary material accompanying this paper, visit https://doi.org/10.1017/S1366728920000668
Acknowledgements
We would like to thank the West Los Angeles Unified School District for approving the project and the Lab School at the University of California, Los Angeles for unconditionally supporting our research.