1. Introduction
Infants show remarkable abilities in acquiring complex language systems in a short amount of time. It has been suggested that early language input, which comes (partly) in the form of “infant-directed speech” (IDS), plays a critical role in early language acquisition (Kuhl et al., Reference Kuhl, Andruski, Chistovich, Chistovich, Kozhevnikova, Ryskina, Stolyarova, Sundberg and Lacerda1997). Many studies have demonstrated that, relative to adult-directed speech (ADS), phonological categories such as vowels and tones (in tonal languages) in IDS occupy expanded acoustic space, which has typically been interpreted as a sign of enhanced acoustic contrasts, potentially providing infants with acoustically more distinct phonological categories (Kuhl et al., Reference Kuhl, Andruski, Chistovich, Chistovich, Kozhevnikova, Ryskina, Stolyarova, Sundberg and Lacerda1997; Liu, Kuhl, & Tsao, Reference Liu, Kuhl and Tsao2003; Uther, Knoll, & Burnham, Reference Uther, Knoll and Burnham2007). However, accumulating evidence in recent years has shown that the (within-category) acoustic variability of vowels and tones also increases in IDS as compared to ADS (Cox et al., Reference Cox, Bergmann, Fowler, Keren-Portnoy, Roepstorff, Bryant and Fusaroli2023; Cristia & Seidl, Reference Cristia and Seidl2014; Miyazawa et al., Reference Miyazawa, Shinya, Martin, Kikuchi and Mazuka2017; Rosslund et al., Reference Rosslund, Mayor, Óturai and Kartushina2022, Reference Rosslund, Mayor, Mundry, Singh, Cristia and Kartushina2024; Wang et al., Reference Wang, Kalashnikova, Kager, Lai and Wong2021). Critically, the increased vowel and tonal variability leads to non-enhanced or even reduced vowel and tonal contrasts as quantified by both statistical centroids (i.e., means) and variances (Cox et al., Reference Cox, Bergmann, Fowler, Keren-Portnoy, Roepstorff, Bryant and Fusaroli2023; Cristia & Seidl, Reference Cristia and Seidl2014; Miyazawa et al., Reference Miyazawa, Shinya, Martin, Kikuchi and Mazuka2017; Rosslund et al., Reference Rosslund, Mayor, Óturai and Kartushina2022, Reference Rosslund, Mayor, Mundry, Singh, Cristia and Kartushina2024; Wang et al., Reference Wang, Kalashnikova, Kager, Lai and Wong2021). These findings start to challenge the traditional view that IDS exhibits enhanced acoustic contrasts, which supports phonetic learning. Moreover, they raise questions as to whether increased variability and the potential consequences for acoustic contrasts are observed for both vowels and tones in the same language. The present study therefore focused on Mandarin Chinese IDS, where both expanded vowel and tonal spaces have been observed (Tang et al., Reference Tang, Xu Rattanasone, Yuen and Demuth2017), to gain a more comprehensive view of acoustic contrasts in IDS at both segmental and suprasegmental levels.
Traditionally, the vowel space has been conceptualised as an area defined by the centroid means of three peripheral or point vowels (typically /a/, /i/, and /u/) in the first (F1) and second formants (F2) (e.g., Kuhl et al., Reference Kuhl, Andruski, Chistovich, Chistovich, Kozhevnikova, Ryskina, Stolyarova, Sundberg and Lacerda1997). On average, the vowel space is expanded in IDS relative to ADS (Lovcevic et al., Reference Lovcevic, Benders, Tsuji and Fusaroli2024), as observed in many languages, including English, Japanese, Mandarin, Norwegian, Russian, and Swedish (Andruski, Kuhl, & Hayashi, Reference Andruski, Kuhl and Hayashi1999; Burnham, Kitamura, & Vollmer-Conna, Reference Burnham, Kitamura and Vollmer-Conna2002; Cox et al., Reference Cox, Bergmann, Fowler, Keren-Portnoy, Roepstorff, Bryant and Fusaroli2023; Kuhl et al., Reference Kuhl, Andruski, Chistovich, Chistovich, Kozhevnikova, Ryskina, Stolyarova, Sundberg and Lacerda1997; Liu, Kuhl, & Tsao, Reference Liu, Kuhl and Tsao2003; Miyazawa et al., Reference Miyazawa, Shinya, Martin, Kikuchi and Mazuka2017; Rosslund et al., Reference Rosslund, Mayor, Óturai and Kartushina2022, Reference Rosslund, Mayor, Mundry, Singh, Cristia and Kartushina2024; Tang et al., Reference Tang, Xu Rattanasone, Yuen and Demuth2017); however, note that such expansion is not consistently found in all studies and languages (e.g., Norwegian: Englund & Behne, Reference Englund and Behne2005; Jamaican English: Wassink, Wright, & Franklin, Reference Wassink, Wright and Franklin2007; English: Green et al., Reference Green, Nip, Wilson, Mefferd and Yunusova2010; Dutch: Benders, Reference Benders2013; Cantonese: Xu Rattanasone, et al., Reference Xu Rattanasone, Burnham, Kitamura and Vollmer-Conna2013; Danish: Cox et al., Reference Cox, Bergmann, Fowler, Keren-Portnoy, Roepstorff, Bryant and Fusaroli2023). In tonal languages such as Mandarin Chinese and Cantonese, where lexical tones with different pitch contours are used to differentiate word meanings, an expanded tonal space has also been observed in IDS, as reflected by acoustically more distinct pitch contour features (Tang et al., Reference Tang, Xu Rattanasone, Yuen and Demuth2017; Xu Rattanasone, et al., Reference Xu Rattanasone, Burnham, Kitamura and Vollmer-Conna2013). These expanded acoustic spaces have been widely interpreted as a sign of enhanced phonemic contrasts, which might provide infants with acoustically more distinct categories, facilitating phonetic learning (e.g., Kuhl et al., Reference Kuhl, Andruski, Chistovich, Chistovich, Kozhevnikova, Ryskina, Stolyarova, Sundberg and Lacerda1997; Uther, Knoll, & Burnham, Reference Uther, Knoll and Burnham2007; Xu Rattanasone et al., Reference Xu Rattanasone, Burnham, Kitamura and Vollmer-Conna2013). This account is further supported by evidence that mothers’ vowel space area is associated with their infants’ speech discrimination ability as well as vocabulary size (Kalashnikova & Burnham, Reference Kalashnikova and Burnham2018; Liu, Kuhl, & Tsao, Reference Liu, Kuhl and Tsao2003).
Vowel and tonal spaces are defined only in terms of category centroids without considering the degree of variability within each category. However, to define acoustic contrastiveness, it is necessary to consider both category means and variances (Miyazawa et al., Reference Miyazawa, Shinya, Martin, Kikuchi and Mazuka2017). IDS has been reported to have acoustically more variable formants or pitch contours across vowels and/or tones as compared to ADS, resulting in an increase in the within-category variability of vowels and/or tones (Cox et al., Reference Cox, Bergmann, Fowler, Keren-Portnoy, Roepstorff, Bryant and Fusaroli2023; Cristia & Seidl, Reference Cristia and Seidl2014; Miyazawa et al., Reference Miyazawa, Shinya, Martin, Kikuchi and Mazuka2017; Rosslund et al., Reference Rosslund, Mayor, Óturai and Kartushina2022, Reference Rosslund, Mayor, Mundry, Singh, Cristia and Kartushina2024; Wang et al., Reference Wang, Kalashnikova, Kager, Lai and Wong2021; but see McClay et al., Reference McClay, Cebioglu, Broesch and Yeung2021 for counter-evidence in ni-Vanuatu IDS). Such increased variability has been shown to result in increased proximity or even overlap between vowel categories, as evident in English, Japanese, and Norwegian, outweighing the benefit of vowel space expansion and resulting in non-enhanced acoustic vowel contrasts (Cristia and Seidl, Reference Cristia and Seidl2014; Miyazawa et al., Reference Miyazawa, Shinya, Martin, Kikuchi and Mazuka2017; Rosslund et al., Reference Rosslund, Mayor, Óturai and Kartushina2022, Reference Rosslund, Mayor, Mundry, Singh, Cristia and Kartushina2024). For instance, Miyazawa et al. (Reference Miyazawa, Shinya, Martin, Kikuchi and Mazuka2017) examined three acoustic characteristics of IDS in the three Japanese vowels /a/, /i/, and /u/: vowel space area, acoustic variability for each vowel (in F1 and F2 values), and acoustic contrasts between vowels. That study measured acoustic contrasts using Mahalanobis distance, capturing both between-vowel distinctiveness and within-vowel variability. Although Japanese IDS is characterised by an expanded vowel space area, the acoustic variability of all vowel categories is also increased. As a result, the acoustic contrasts (Mahalanobis distance) are not significantly different between IDS and ADS. Similarly, Wang et al. (Reference Wang, Kalashnikova, Kager, Lai and Wong2021) found that, relative to ADS, although the tonal space area is expanded in Cantonese IDS, the increased within-category variability of lexical tones leads to non-enhanced tonal contrasts (calculated using a method similar to Mahalanobis distance).
These results thus begin to challenge the assumed link between, on the one hand, enhanced acoustic contrasts and expanded vowel/tone space in IDS and, on the other hand, the role vowel/tone space plays in phonetic learning. For instance, increased vowel variability in caregivers’ IDS is found to negatively correlate with several of their infants’ language outcomes at 18 and 24 months (Rosslund et al., Reference Rosslund, Mayor, Óturai and Kartushina2022). Computational modelling also showed that IDS does not yield more discriminable or more robust vowel categories as compared to ADS, primarily due to the increased acoustic variability (Kirchhoff & Schimmel, Reference Kirchhoff and Schimmel2005; Ludusan, Mazuka, & Dupoux, Reference Ludusan, Mazuka and Dupoux2021). However, evidence regarding the increased vowel variability in IDS and the potential consequences for vowel contrasts has only come from a few languages (Danish by Cox et al., Reference Cox, Bergmann, Fowler, Keren-Portnoy, Roepstorff, Bryant and Fusaroli2023, English by Cristia & Seidl, Reference Cristia and Seidl2014, Japanese by Miyazawa et al., Reference Miyazawa, Shinya, Martin, Kikuchi and Mazuka2017, and Norwegian by Rosslund et al., Reference Rosslund, Mayor, Óturai and Kartushina2022, Reference Rosslund, Mayor, Mundry, Singh, Cristia and Kartushina2024), and tonal variability in IDS has only been examined in one language, i.e., Cantonese (Wang et al., Reference Wang, Kalashnikova, Kager, Lai and Wong2021). No study has thus far examined both vowel and tonal variability in the same language to gain a more holistic view of acoustic contrasts in IDS at both segmental and suprasegmental levels.
The present study therefore focused on vowel and tonal variability in Mandarin IDS, which is characterised by expanded vowel and tonal spaces (Tang et al., Reference Tang, Xu Rattanasone, Yuen and Demuth2017). We asked (1) whether IDS is characterised by an increased within-category acoustic variability of vowels and tones as compared to that of ADS, and (2) whether IDS differs from ADS in terms of acoustic contrasts of vowels and tones. Based on previous findings (e.g., Cox et al., Reference Cox, Bergmann, Fowler, Keren-Portnoy, Roepstorff, Bryant and Fusaroli2023; Cristia & Seidl, Reference Cristia and Seidl2014; Miyazawa et al., Reference Miyazawa, Shinya, Martin, Kikuchi and Mazuka2017; Rosslund et al., Reference Rosslund, Mayor, Óturai and Kartushina2022, Reference Rosslund, Mayor, Mundry, Singh, Cristia and Kartushina2024; Wang et al., Reference Wang, Kalashnikova, Kager, Lai and Wong2021), we predicted that both vowels and tones would demonstrate greater variability in IDS, with non-enhanced or reduced acoustic contrasts compared to ADS.
2. Methods
2.1. Corpus
This study re-analysed the data presented in Tang et al. (Reference Tang, Xu Rattanasone, Yuen and Demuth2017), which included target stimuli consisting of three peripheral vowels (/a/, /i/, and /u/) and three simple-contour tones (T1, T2, and T4), produced by 15 Mandarin-speaking mothers of 11–13-month-old infants (7 boys and 8 girls) in IDS and ADS. These mothers were born and raised in Mandarin-speaking families in Northern China (Beijing, Hebei Province, or northeastern China), and they spoke only Mandarin Chinese to their infants at home. Target stimuli were disyllabic nouns that could be elicited with toys, i.e., “pa2 chong2 (爬虫)” worm, “pi2 qiu2 (皮球)” ball, and “pu2 ti2 (菩提)” Bodhi for target vowels, and “zhen1 zhu1 (珍珠)” pearl, “shan1 zhu2 (山竹)” mangosteen, and “guang1 zhu4 (光柱)” light stick for target tones. Target productions in IDS were elicited from mothers in a play session with their infants using these toys. Target productions in ADS were elicited in a conversation between the mothers and a native Mandarin speaker about the play sessions. Recordings were made in a sound-proof room while mothers wore a head-mounted condenser microphone (AKG C520) connected to a solid-state recorder (Marantz PMD661MKII) in a shoulder bag. All noise-free realisations of vowel targets were included in the vowel analysis. As target tones were the second syllable of the disyllabic target words, following a T1 syllable, tone sandhi effect on target tones should be minimised. Furthermore, to minimise anticipatory tonal-coarticulation effects from following tones (Xu, Reference Xu1994), the tonal analysis was restricted to target words in utterance-final position, defined as being followed by a pause longer than 2 seconds. The included data consisted of 997 vowel tokens (IDS: 572 tokens; ADS: 425 tokens) and 455 tonal tokens (IDS: 255 tokens; ADS: 200 tokens)Footnote 1.
2.2. Acoustic parameter extraction
As in Tang et al. (Reference Tang, Xu Rattanasone, Yuen and Demuth2017), vowels were characterised by F1 and F2 values (in Bark), extracted from the middle portion (40%–60%) of each target vowel production. Tones were originally measured in semitones (reference: 50Hz) and characterised by two pitch-contour parameters: slope and curvature. These parameters effectively capture the global pitch movement of Mandarin tones, which are the most important correlates in tonal perception, especially for infants (Gauthier, Shi, & Xu, Reference Gauthier, Shi and Xu2007). To obtain these two parameters, a second-order polynomial was fitted to the pitch contour of each target tone, and the linear and quadratic trends were extracted as the pitch contour slope and curvature values. A positive or negative slope indicates a rising or falling contour; a positive or negative curvature value reflects a concave- or convex-shaped contour (see Tang et al., Reference Tang, Yuen, Xu Rattanasone, Gao and Demuth2019 for a more detailed description).
2.3. Calculation of dependent variables
This study used ellipse area and Mahalanobis distance as two measures in the following analysis because the former reflects the degree of within-category variability, and the latter acoustic contrastiveness between categories.
The vowel or tonal variability was characterised as the ellipse area of each category based on F1 and F2 values (Hartman, Ratner, & Newman, Reference Hartman, Ratner and Newman2017; Formula 1), or pitch slope and curvature features (Formula 2), where a larger ellipse area indicates acoustically more variable tokens within a category (note that only speakers contributing at least two tokens per vowel or tone were included, as the calculation of SD values requires at least two data points):


Following Miyazawa et al. (Reference Miyazawa, Shinya, Martin, Kikuchi and Mazuka2017), the acoustic contrast between two phonological categories was indexed by the averaged Mahalanobis distance (Mahalanobis, Reference Mahalanobis1936), which captures the difference between members of a pair relative to their summed variance. The Mahalanobis distance between two vowels (e.g., /a/ and /i/) was calculated as follows:

Similarly, the Mahalanobis distance between two tones (e.g., T1 and T2) was calculated as follows:

When the variance increases in the denominator, Mahalanobis distance will become smaller, suggesting reduced acoustic contrast between categories. Conversely, when the variance decreases, the Mahalanobis distance will become larger, suggesting increased acoustic contrasts. The obtained Mahalanobis distances of all vowel or tonal pairs were averaged for each speaker, and these averages were compared between IDS and ADS across speakers.
2.4. Statistical analysis
Group comparisons were conducted using linear mixed-effects models, as implemented in the R package “lme4” (Bates, Mächler, Bolker, & Walker, Reference Bates, Maechler, Bolker and Walker2015; R Core Team, 2022). Best models with the simplest random effects were obtained through a backward elimination approach, starting with a maximal model as justified by the design (Barr, Levy, Scheepers, & Tily, Reference Barr, Levy, Scheepers and Tily2013). Then, we used the anova function to identify non-significant random slopes and excluded those from the models to keep models parsimonious (Bates, Mächler, Bolker, & Walker, Reference Bates, Maechler, Bolker and Walker2015; R code of all final models is provided with the description of each model). The residual plot and quantile–quantile (Q–Q) plot for each model were generated using the resid_panel function from the “ggResidpanel” package and were then visually inspected to confirm that the model residuals followed a normal distribution (Goode & Rey, Reference Goode and Rey2019). All p-values were obtained using the anova function of the package “lmerTest,” providing omnibus main effects and interactions between all factors using F-statistics (Kuznetsova, Brockhoff, & Christensen, Reference Kuznetsova, Brockhoff and Christensen2017). When a main effect or its interaction were significant (p < 0.05), Tukey-HSD post-hoc comparisons were conducted using the “lsmeans” package (Lenth, Reference Lenth and Lenth2018). The effect size (partial eta squared for F-statistics and Cohen’s d for t-statistics) was calculated using the eta_sq function of the package “sjstats” (Lüdecke, Reference Lüdecke2018), and the cohensD function of the package “lsr” (Navarro, Reference Navarro2015). The confidence interval (CI) for partial eta square was calculated using the ci.pvaf function of the package “MBESS” (Kelley, Reference Kelley2023).
3. Results
3.1. Vowel and tonal variability
Figure 1 illustrates the acoustic distribution of all vowel and tonal tokens in IDS and ADS, and Figure 2 shows the acoustic variability (i.e., ellipse area) for each vowel or tonal category (also see Appendix 1 for a detailed description). Two separate linear mixed-effects models were conducted on vowel and tonal ellipse areas to investigate the register difference in variability. Both models included two fixed factors: “Register” (ADS and IDS; dummy coded with reference level: ADS) and either “Vowel” (/a/, /i/, and /u/; dummy coded with reference level: /a/) or “Tone” (T1, T2, and T4; dummy coded with reference level: T1), and a random intercept by “Speaker” (R code of the two final models: Vowel ellipse ~ Register * Vowel + (1|Speaker)Footnote 2; Tonal ellipse ~ Register * Tone + (1|Speaker)Footnote 3).

Figure 1. Acoustic distribution of all vowel and tonal tokens in IDS and ADS.

Figure 2. Ellipse area as an index of (a) vowel variability across vowels and registers or (b) tonal variability across tones and registers, with error bars indicating +/- 1 standard error.
The results of the vowel variability model showed a significant interaction between “Register” and “Vowel” (Appendix 2). Tukey-HSD post-hoc tests showed that, relative to ADS, variability increased significantly in IDS for /i/ and /u/ (/i/: β = 1.080, SE = 0.47, df = 39, t = 2.308, p = 0.026, Cohen’s d = 0.82; /u/: β = 2.180, SE = 0.47, df = 39, t = 4.660, p <0.001***, Cohen’s d = 1.22), but there was no evidence for an increased variability of /a/ (β = -0.063, SE = 0.47, df = 39, t = -0.134, p = 0.894, Cohen’s d = 0.03).
The results of the tonal variability model showed a significant main effect of “Register” (Appendix 3), suggesting increased tonal variability in IDS as compared to ADS. Therefore, consistent with our hypotheses, IDS increased variability for the high vowels (see our explanation for the unexpected result for /a/ in the discussion) and across the investigated tones, providing evidence for increased within-category variability in (most) Mandarin Chinese vowels and tones.
3.2. Acoustic contrasts between vowels or tones
Figure 3 illustrates how acoustic contrasts (averaged Mahalanobis distance) differ between IDS and ADS, over (a) vowels or (b) tones (also see Appendix 4 for a detailed description). Two separate linear mixed-effect models were constructed using the averaged Mahalanobis distance for either vowels or tones as the dependent measure. Both models included one fixed factor “Register (ADS and IDS; dummy coded with reference level: ADS)” and a random intercept by “Speaker” (R code of the two final models: Averaged Mahalanobis distance for vowels ~ Register + (1 | Speaker)Footnote 4; Averaged Mahalanobis distance for tones ~ Register + (1 | Speaker)Footnote 5). The results of both models showed that the main effect of “Register” was not significant on averaged Mahalanobis distance for vowels (F (1, 14) = 4.238, p = 0.059, ηp 2 = 0.23 [95% CI: 0.00, 0.52]) or tones (F(1, 27) = 0.002, p = 0.968, ηp 2 = 0.00006 [95% CI: 0.00, 0.02]).

Figure 3. Averaged acoustic contrasts (Mahalanobis distance) for (a) vowels or (b) tones in IDS vs. ADS.
Null results for the “Register” effect on both vowels and tones provide no evidence for or against enhanced acoustic contrasts in IDS. However, if anything, the vowel results are compatible with larger contrasts in ADS, with the 95% CI ranging from ηp 2 = 0.00, suggesting no difference between the registers, to ηp 2 = 0.56, suggesting (much) larger contrasts in ADS. We therefore confidently interpret the vowel results as evidence that vowel contrasts are not enhanced in Mandarin IDS as compared to ADS. While the tonal data do not rule out enhancement in IDS, the point estimate of the associated effect size ηp 2 = 0.00006 is considered “small” in standard interpretations of effect size, and the upper bound of the CI ηp 2 = 0.02 falls well below a “medium” effect size of ηp = 0.06 (see Richardson, Reference Richardson2011 for a review). Therefore, we tentatively interpret these frequentist null results as evidence that tonal contrasts are not meaningfully enhanced in Mandarin IDS, as compared to ADS.
4. Discussion
The aim of the current study was to better understand the acoustic characteristics of IDS by investigating variability within vowels and tones in Mandarin IDS and its consequence for acoustic contrasts. Reanalysing data from Tang et al. (Reference Tang, Xu Rattanasone, Yuen and Demuth2017), which showed expansion of both vowel and tonal spaces in IDS, the present study showed increased acoustic variability of both vowels (except for /a/, which will be discussed later) and tones. This, in turn, resulted in no evidence for enhanced contrasts between vowels or tones in IDS compared to ADS.
Our findings of the increased vowel and tonal variability in Mandarin IDS are consistent with the increased vowel variability reported for IDS in non-tonal languages such as Danish, English, Japanese, and Norwegian (Cox et al., Reference Cox, Bergmann, Fowler, Keren-Portnoy, Roepstorff, Bryant and Fusaroli2023; Cristia & Seidl, Reference Cristia and Seidl2014; Miyazawa et al., Reference Miyazawa, Shinya, Martin, Kikuchi and Mazuka2017; Rosslund et al., Reference Rosslund, Mayor, Óturai and Kartushina2022, Reference Rosslund, Mayor, Mundry, Singh, Cristia and Kartushina2024), and the increased tonal variability reported for Cantonese IDS (Wang et al., Reference Wang, Kalashnikova, Kager, Lai and Wong2021). This suggests that an increase in variability might be a common feature of IDS vowels and tones across languages (albeit, see McClay et al. Reference McClay, Cebioglu, Broesch and Yeung2021, for non-increased variability in ni-Vanuatu IDS).
Furthermore, the non-enhanced vowel and tonal contrasts in Mandarin IDS also confirm and extend previous findings on IDS in the aforementioned non-tonal languages (e.g.,; Cox et al., Reference Cox, Bergmann, Fowler, Keren-Portnoy, Roepstorff, Bryant and Fusaroli2023 Cristia & Seidl, Reference Cristia and Seidl2014; Miyazawa et al., Reference Miyazawa, Shinya, Martin, Kikuchi and Mazuka2017; Rosslund et al., Reference Rosslund, Mayor, Óturai and Kartushina2022, Reference Rosslund, Mayor, Mundry, Singh, Cristia and Kartushina2024), challenging the traditional view that IDS might provide infants with acoustically more distinct phonetic targets for the benefit of language development. The current study thus underscores the necessity and importance of characterising acoustic contrasts beyond statistical centroids (e.g., means) by including variance. This is critical to our understanding of the nature of IDS (Cox et al., Reference Cox, Bergmann, Fowler, Keren-Portnoy, Roepstorff, Bryant and Fusaroli2023; Cristia & Seidl, Reference Cristia and Seidl2014; Hartman, Ratner, & Newman, Reference Hartman, Ratner and Newman2017; McMurray et al., Reference McMurray, Kovack-Lesh, Goodwin and McEchron2013; Miyazawa et al., Reference Miyazawa, Shinya, Martin, Kikuchi and Mazuka2017; Rosslund et al., Reference Rosslund, Mayor, Óturai and Kartushina2022, Reference Rosslund, Mayor, Mundry, Singh, Cristia and Kartushina2024; Wang et al., Reference Wang, Kalashnikova, Kager, Lai and Wong2021), as well as its potential influences on phonetic learning (will be discussed later; Eaves et al., Reference Eaves, Feldman, Griffiths and Shafto2016; Kirchhoff & Schimmel, Reference Kirchhoff and Schimmel2005; Ludusan et al., Reference Ludusan, Mazuka and Dupoux2021; Rosslund et al., Reference Rosslund, Mayor, Óturai and Kartushina2022). Compared to the commonly used acoustic space, which only considers the centroid means of categories, the Mahalanobis distance offers an approach to measuring acoustic contrasts that accounts for both between and within-category distances. Although the current IDS data has previously shown expanded vowel and tone spaces (Tang et al., Reference Tang, Xu Rattanasone, Yuen and Demuth2017), the Mahalanobis distance analyses here suggest that expanded spaces do not necessarily imply enhanced contrasts, aligning with similar findings from English, Japanese, and Norwegian (Cristia and Seidl, Reference Cristia and Seidl2014; Miyazawa et al., Reference Miyazawa, Shinya, Martin, Kikuchi and Mazuka2017; Rosslund et al., Reference Rosslund, Mayor, Óturai and Kartushina2022, Reference Rosslund, Mayor, Mundry, Singh, Cristia and Kartushina2024). A next step in this line of enquiry would be to examine whether Mahalanobis distance is related to perception and learning. Considering that infants’ language outcomes are positively associated with vowel space enhancement, defined by just the centroids (Kalashnikova & Burnham, Reference Kalashnikova and Burnham2018; Liu, Kuhl, & Tsao, Reference Liu, Kuhl and Tsao2003), and negatively associated with vowel variability (Rosslund et al., Reference Rosslund, Mayor, Óturai and Kartushina2022), such work would further clarify the relationship between various dimensions of acoustic contrast and infants’ actual language acquisition process.
It should be noted that there is no evidence for (or against) the variability of /a/ differing between IDS and ADS. This might be related to the fact that this vowel was relatively variable even in ADS (Figure 1). Previous studies have demonstrated that low vowels (such as /a/) usually display larger inter- and intra-speaker variability than non-low or high vowels (such as /i/ and /u/), due to the lack of lingual bracing against the palate during the articulation of low vowels (Gick et al., Reference Gick, Allen, Roewer-Després and Stavness2017; Whalen et al., Reference Whalen, Chen, Tiede and Nam2018). The observed vowel specificity in variability calls for future studies of IDS to include more vowel types in order to obtain a full picture of the variability that infants are exposed to across the vowel system.
Our results raise questions about possible sources of increased vowel and tonal variability in IDS. One possibility is that increased variability could be a by-product of vowel and/or tonal space expansion, i.e., the expanded space allows for mothers producing more variable instances within each category (Kuhl et al., Reference Kuhl, Andruski, Chistovich, Chistovich, Kozhevnikova, Ryskina, Stolyarova, Sundberg and Lacerda1997). However, a series of exploratory analyses of the current data did not provide strong evidence for associations between vowel or tonal space area and acoustic variability (as indexed by ellipse area) in IDS (Appendix 5). A second option is that the increased vowel and tonal variability might follow from more variable prosodic patterns in IDS, which have been suggested to serve an attention-attracting function (Fernald, Reference Fernald, Taeschner, Dunn, Papousek, de Boysson-Bardies and Fukui1989; Fernald & Kuhl, Reference Fernald and Kuhl1987). This possibility is supported by some classic phonetic studies showing that prosody can impact phonetic realisation (Gay, Reference Gay1978; Lindblom & Sundberg, Reference Lindblom and Sundberg1971; Sapir, Reference Sapir1989). More work is thus needed to further examine potential prosodic effects on increased vowel and tonal variability in IDS.
Our study also raises the question of whether infants can benefit from IDS during language learning, given that vowels and tones do not become (much) more distinct once the increased variability is considered. On the one hand, increased within-category variability can be beneficial for infants in learning words and developing robust phonological representations (Singh, Reference Singh2008; Thiessen, Reference Thiessen2011). For example, Singh (Reference Singh2008) found that 7.5-month-old infants trained with high-variability words (produced by the same speaker in different emotional states) are better at recognizing trained words and rejecting similar-sounding words than those who were trained with low-variability words (produced by the same speaker in a single emotional state). Thiessen (Reference Thiessen2011) also showed that 15–16-month-old infants exposed to the /d/-/t/ contrast with large variability, i.e., produced in multiple word contexts, were able to discriminate this contrast in novel word contexts, while those exposed to this contrast in a single word context failed to do so. On the other hand, recent evidence has shown that vowel variability in IDS is negatively correlated with expressive vocabulary size at 18 months (in Norwegian, Rosslund et al., Reference Rosslund, Mayor, Óturai and Kartushina2022) and various language abilities at 24 months (in American English; Hartman, Ratner, & Newman, Reference Hartman, Ratner and Newman2017). Therefore, it remains an open question whether the benefits of increased variability as observed in (experimental) word-learning contexts extend to multiple aspects of naturalistic language acquisition, and under what conditions variability may instead hinder language development.
There are several limitations to this study. First, while this study has partially controlled for the phonological environment of target vowels and tones by restricting them to the first (vowels) or second (tones) syllable of disyllabic nouns, there will be factors that could influence the acoustic realisation of vowels and tones in IDS, such as the pragmatic context (Katz, Cohn & Moore, Reference Katz, Cohn and Moore1996; Neer et al., Reference Neer, Brahmbhatt, Walsh and Warlaumont2024), infants’ responses during interaction (Outters et al., Reference Outters, Schreiner, Behne and Mani2020; Smith & Trainor, Reference Smith and Trainor2008), and parents’ emotional state during speech productions (Benders, Reference Benders2013; Kaplan et al., Reference Kaplan, Bachorowski, Smoski and Hudenko2002; Lam-Cassettari & Kohlhoff, Reference Lam-Cassettari and Kohlhoff2020). The goal of the present study was to make a broad comparison between IDS and ADS, including all the inherent differences between the registers, thereby directly extending previous IDS findings from other languages (e.g., Cantonese, English, Japanese, and Norwegian) to Mandarin. Future studies could examine whether these and other factors contribute to the increased acoustic variability in IDS, with more controlled designs and larger datasets to pinpoint sources of within-speaker variance and their potential implications for phonetic learning.
Second, by only analysing tones in the second syllable of a disyllabic word in the utterance-final position, the current tonal results cannot be generalised to other positions. Our choice is motivated by the need to control for the carryover tonal coarticulation effect from the first syllable (T1) of the disyllabic word while minimizing the effect of anticipatory tonal coarticulation from the following words. Future research could examine tonal variability in a wider range of contexts to provide a more comprehensive understanding of how tones are realised in IDS.
5. Conclusions
The current study demonstrated that Mandarin vowels and tones in IDS exhibit increased (within-category) variability, resulting in non-enhanced vowel and tonal contrasts. Thus, IDS might not be able to provide infants with acoustically well-specified phonetic targets that facilitate phonological acquisition as directly as often assumed. Rather, IDS might provide infants with considerable within-speaker phonetic variability. Future research should examine the linguistic and communicative factors that drive the increased acoustic variability in IDS, as well as its potential positive or negative impacts on language acquisition.
Supplementary material
The supplementary material for this article can be found at http://doi.org/10.1017/S0305000925000133.
Acknowledgements
The research is funded by the National Social Science Fund of China (20CYY012).