Pitch properties of infant-directed speech specific to word-learning contexts: a cross-linguistic investigation of Mandarin Chinese and Dutch

Mengru HAN; Nivja H. DE JONG; René KAGER

doi:10.1017/S0305000919000813

Pitch properties of infant-directed speech specific to word-learning contexts: a cross-linguistic investigation of Mandarin Chinese and Dutch

Published online by Cambridge University Press: 03 December 2019

Mengru HAN ,

Nivja H. DE JONG and

René KAGER

Show author details

Mengru HAN: Affiliation:
Department of Chinese Language and Literature, East China Normal University, Shanghai, China Utrecht Institute of Linguistics (OTS), Utrecht University, Utrecht, the Netherlands Language, Cognition, and Evolution Lab, East China Normal University, Shanghai, China
Nivja H. DE JONG: Affiliation:
Leiden University Center for Linguistics (LUCL), Leiden University, Leiden, the Netherlands Leiden University Graduate School of Teaching (ICLON), Leiden University, Leiden, the Netherlands
René KAGER*: Affiliation:
Utrecht Institute of Linguistics (OTS), Utrecht University, Utrecht, the Netherlands
*: *Corresponding author: E-mail: [email protected]

Article contents

Abstract
Experiment 1: Mandarin Chinese
Experiment 2: Dutch
Footnotes
References

Rights & Permissions

Abstract

This study investigates the pitch properties of infant-directed speech (IDS) specific to word-learning contexts in which mothers introduce unfamiliar words to children. Using a semi-spontaneous story-book telling task, we examined (1) whether mothers made distinctions between unfamiliar and familiar words with pitch in IDS compared to adult-directed speech (ADS); (2) whether pitch properties change when mothers address children from 18 to 24 months; and (3) how Mandarin Chinese and Dutch IDS differ in their pitch properties in word-learning contexts. Results show that the mean pitch of Mandarin Chinese IDS was already ADS-like when children were 24 months, but Dutch IDS remained exaggerated in pitch at the same age. Crucially, Mandarin Chinese mothers used a higher pitch and a larger pitch range in IDS when introducing unfamiliar words, while Dutch mothers used a higher pitch specifically for familiar words. These findings contribute to the language-specificity of prosodic input in early lexical development.

Keywords

infant-directed speech word learning cross-linguistic

Type: Articles
Information: Journal of Child Language , Volume 47 , Special Issue 1: The Influence of Input Quality and Communicative Interaction on Language Development Part 1 , January 2020 , pp. 85 - 111

DOI: https://doi.org/10.1017/S0305000919000813 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike licence (http://creativecommons.org/licenses/by-nc-sa/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the same Creative Commons licence is included and the original work is properly cited. The written permission of Cambridge University Press must be obtained for commercial re-use.
Copyright: Copyright © The Author(s) 2019

Introduction

Infant-directed speech (IDS) is an important type of input in early language acquisition (Ramírez-Esparza, García-Sierra, & Kuhl, Reference Ramírez-Esparza, García-Sierra and Kuhl2014). Prototypical IDS has exaggerated prosody compared to adult-directed speech (ADS), and is often considered universal across languages and cultures (see reviews in Cristia, Reference Cristia2013; Soderstrom, Reference Soderstrom2007). IDS prosody has been shown to facilitate word learning; toddlers learn words better from IDS compared to ADS (Graf Estes & Hurley, Reference Graf Estes and Hurley2013; Ma, Golinkoff, Houston, & Hirsh-Pasek, Reference Ma, Golinkoff, Houston and Hirsh-Pasek2011), and some aspects of IDS have been associated with children's vocabulary size (e.g., Kalashnikova & Burnham, Reference Kalashnikova and Burnham2018; Porritt, Zinser, Bachorowski, & Kaplan, Reference Porritt, Zinser, Bachorowski and Kaplan2014). However, the language-specificity of IDS is often neglected in the literature (see Wang, Seidl, & Cristia, Reference Wang, Seidl, Cristia, Heinz, Goedemans and van de Hulst2016, for a review). Also, no study to date has specifically investigated the prosody of IDS in word-learning contexts in which mothers introduce unfamiliar words to children. Specifically, it is not clear whether mothers use prosody to highlight novel (unfamiliar) words compared to known (familiar) words when addressing children. Furthermore, as most studies focus on IDS addressed to preverbal children, less is known about the age-related changes of IDS prosody in the second year of life when vocabulary learning is accelerated (Goldfield & Reznick, Reference Goldfield and Reznick1990). To better understand the language-specificity of IDS prosody in word-learning contexts and to demonstrate the age-related changes of IDS prosody in the second year of life, we conducted a cross-linguistic investigation of IDS using similar speech elicitation methods in two different languages. Since we focus on pitch cues relevant to word learning in linguistic input, we chose two languages that differ in their use of pitch at the lexical level, namely Mandarin Chinese, a tonal language, and Dutch, a non-tonal language.

IDS facilitates lexical development

Typically developing children acquire their vocabulary at a fast speed in the first two years of life. They recognize some common words at 6–9 months (Bergelson & Swingley, Reference Bergelson and Swingley2012), start to produce words by the end of their first year (Bloom, Reference Bloom2001), and become proficient word learners at around 18 months old. From about 18 months to 24 months, children's word learning ability gradually improves and their vocabulary size rapidly increases (Bion, Borovsky, & Fernald, Reference Bion, Borovsky and Fernald2013; Goldfield & Reznick, Reference Goldfield and Reznick1990). In order to learn words, children need to be familiar with the sounds in their native language(s) and must be able to segment words from continuous speech, recognize familiar words in speech, and associate a novel word label to an object or an action.

Children learn words from language input; however, little is known about the quality of prosodic input in word-learning contexts, in which mothers introduce unfamiliar words to their child. IDS is an important type of input in early language acquisition, and has a distinctive prosody compared to ADS. Prototypical IDS is mainly characterized by a higher pitch, a larger pitch range, exaggerated F0 excursions, and a slower speaking rate (Benders, Reference Benders2013; Cristia, Reference Cristia2013). These prosodic modifications in IDS have been shown to attract infants’ attention, convey positive affect, and facilitate language acquisition (Kitamura, Thanavishuth, Burnham, & Luksaneeyanawin, Reference Kitamura, Thanavishuth, Burnham and Luksaneeyanawin2002; Soderstrom, Blossom, Foygel, & Morgan, Reference Soderstrom, Blossom, Foygel and Morgan2008). In order to investigate the role of IDS in lexical development, studies often compare the prosodic characteristics of IDS with those of ADS (see Thorson, Reference Thorson, Prieto and Esteve-Gibert2018, for a review).

Two lines of studies have shown that the prosody and, in particular, the pitch properties of IDS may play a significant role in children's lexical development. The first line of research shows evidence that the pitch properties of IDS correlate with children's vocabulary size, and the second line directly compares children's word learning performance under either ADS or IDS conditions. Regarding the first line of research, only a few studies have investigated the correlations between IDS pitch and language outcomes, and the results are inconsistent. For instance, Porritt et al. (Reference Porritt, Zinser, Bachorowski and Kaplan2014) found that English-speaking mothers who had a higher F0 range in their speech had children with larger expressive vocabulary. However, in a recent study, Kalashnikova and Burnham (Reference Kalashnikova and Burnham2018) did not find any correlation between the exaggeration of pitch in Australian English IDS and children's vocabulary size. As the authors suggested, the pitch modifications in IDS may not be related to the facilitative effects of IDS on language acquisition. Taken together, whether pitch properties in IDS are related to vocabulary size is still unclear.

The second line of research suggests that children generally perform better in tasks related to lexical acquisition when they hear prototypical IDS compared to ADS. For example, English- and German-learning children could only segment words in continuous speech from IDS input but not when hearing ADS input (Thiessen, Hill, & Saffran, Reference Thiessen, Hill and Saffran2005; Mani & Pätzold, Reference Mani and Pätzold2016). English-learning infants were able to recognize words that they were familiarized with in IDS even after 24 hours, but not when the words were introduced in ADS (Singh, Nestor, Parikh, & Yull, Reference Singh, Nestor, Parikh and Yull2009). When it comes to word-to-object mapping, Ma et al. (Reference Ma, Golinkoff, Houston and Hirsh-Pasek2011) showed that English-learning 21-month-old children succeeded at word-to-object mapping when listening to the auditory forms of words presented in IDS but not in ADS. Only after 27 months of age could children learn novel words presented in ADS. Similarly, Graf Estes and Hurley (Reference Graf Estes and Hurley2013) found that 17.5-month-old English-learning children only learned word–object pairings when the words were produced in IDS, but they failed to learn words in the ADS condition. The facilitative effects of IDS on word learning are not restricted to behavioral evidence. Zangl and Mills (Reference Zangl and Mills2007) found that IDS increased infants’ neural activity compared with ADS. Specifically, 6-month-old English-learning children only showed increased neural activity in response to familiar words in IDS compared to ADS, and when children reached 13 months of age they showed increased neural activity for both familiar and unfamiliar words in IDS. It should be noted that the prototypical IDS stimuli in these studies had both exaggerated pitch and a slower speaking rate compared to the ADS stimuli. Thus, while the studies illustrated above invariably suggest that prototypical IDS facilitates children's online word processing, it is not clear whether such facilitative effects can be (solely) attributed to exaggerated pitch. Song, Demuth, and Morgan (Reference Song, Demuth and Morgan2010) further investigated which acoustic cues in IDS might support word recognition. Their findings suggest that slow speaking rate and vowel hyperarticulation, but not wide pitch range, significantly improved children's online word recognition.

Taken together, prototypical IDS facilitates children's online word learning, but the role of exaggerated pitch in these facilitative effects is not clear. Also, studies on the relationship between IDS pitch properties and lexical development are limited, and the results are inconsistent. In order to understand the role of IDS in word learning, it is first necessary to examine how mothers highlight unfamiliar words compared to familiar words in natural IDS. However, so far, little is known about the prosodic input in such word-learning contexts.

Pitch properties of IDS specific to word learning contexts

Hyper and Hypo-speech (H&H) theory suggests that speakers are aware of the information required by a listener and adapt their speech accordingly (Lindblom, Reference Lindblom, Hardcastle and Marchal1990), which was initially proposed to explain the phonetic variations in speech and has often been used to explain the ‘vowel hyperarticulation’ phenomenon in IDS which may facilitate children's categorical learning (e.g., Kuhl et al., Reference Kuhl, Andruski, Chistovich, Chistovich, Kozhevnikova, Ryskina, Stolyarova, Sundberg and Lacerda1997). Based on H&H theory, Fernald (Reference Fernald2000, p. 242) suggests that, when interacting with children, adults tend to “[…] modify their speech in ways that serve to maximize predictability for the immature listener […]”, which may consequently facilitate children's word recognition. For example, both American English and Japanese IDS have more words produced in isolation and have more repetitions (Fernald & Morikawa, Reference Fernald and Morikawa1993). Also, contextually new information is highlighted by prosodic means in IDS (e.g., Fernald & Mazzie, Reference Fernald and Mazzie1991, to be reviewed later). If IDS is adapted in a way that may facilitate word recognition, it is certainly possible that mothers distinguish unfamiliar words and familiar words with prosody in IDS in support of word learning. To our knowledge, however, no research has directly investigated the prosody of IDS in word-learning contexts by comparing how mothers distinguish unfamiliar words and familiar words with prosody when talking to children. It is important to distinguish between words that are new in a discourse context and words that are unfamiliar to the infants. Young children, as early language learners, encounter unfamiliar words on a regular basis, whereas words directed at adults are usually only ‘contextually new’ within a specific conversational context and rarely novel or unfamiliar. Thus, we use the terms ‘unfamiliar words’ to refer to words that children have not acquired, and ‘contextually new information’ to refer to parts of an utterance that are being introduced to the conversation for the first time but are not necessarily unfamiliar to the addressee. Correspondingly, the term ‘contextually given information’ refers to information that has already been established in the discourse context, while ‘familiar words’ refers to words that the child has already acquired into his or her vocabulary. In ADS, speakers usually highlight contextually new information in discourse by increasing pitch and/or enlarging pitch range, while downplaying given information by reducing these prosodic parameters or by using pronouns in place of lexical forms when mentioning a word for the second time (Chafe, Reference Chafe and Li1976; Gundel, Reference Gundel, Alexander and Minnick1999; Halliday, Reference Halliday1967). In IDS, however, mothers usually repeat the same word several times when talking to children instead of replacing the word with a pronoun (Fernald & Simon, Reference Fernald and Simon1984).

Several studies have shown that such prosodic marking of contextually new words is also present in IDS, and its manifestation is different from ADS. Fernald and Mazzie (Reference Fernald and Mazzie1991) were the first to examine how F0 is used to highlight contextually new words in English IDS compared to ADS. The target words used in the study were common clothing words (e.g., ‘shorts’ and ‘socks’). To elicit target words, mothers of 14-month-old children were instructed to describe a picture-book containing six target items, introduced successively, to their child and to an adult. They found that mothers typically placed the F0 peak of the utterance on the target words when they introduced contextually new words in IDS; however, the same pattern did not hold true for ADS. Moreover, the second-mention target items (i.e., contextually given words) also showed a greater tendency to occur on F0 peaks in IDS versus ADS. Plus, mothers tended to increase the maximum F0 on the second mentions of a target word compared to the first mentions in IDS. The authors interpreted these results as evidence that prosodic emphasis is placed on both contextually new and contextually given words in IDS – a phenomenon that is not typical in ADS. Even though the authors noted that the familiarity of the target words might vary among the infants, it was not taken into account in their analysis.

Fisher and Tokura (Reference Fisher and Tokura1995) also compared the production of contextually new (first-mention) and contextually given (second-mention) words in English IDS and ADS, but they differed from Fernald and Mazzie (Reference Fernald and Mazzie1991) in their elicitation methodology. In this study, mothers of 14-month-old children watched a puppet show which consisted of ten events acted out with ten puppets. The names of the puppets were target words (e.g., ‘tiger’, ‘lion’, and ‘giraffe’). Mothers were asked to describe the events to their child and to an adult. In each event, two puppets were engaged in an action. One puppet (a giraffe) was always in the scene, and the other animal puppet differed across events. The target puppet was always the patient of an action. For example:

“Your favorite. That's a giraffe. Look he's is petting, and lovin’ on the giraffe. Look he's petting’ him, pettin’ him.” (Fisher & Tokura, Reference Fisher and Tokura1995, p. 293)

The prosodic correlates (e.g., F0, position relative to pitch peak, duration, and amplitude) of vowels in the first and second mentions of the target words were analyzed. The results showed that, in IDS, vowels in second mentions had a lower pitch, a smaller pitch range, and a shorter duration, indicating that contextually new words were prosodically prominent compared to contextually given words. These results suggest that a given–new contrast does exist prosodically in IDS; however, contrary to Fernald and Mazzie (Reference Fernald and Mazzie1991), the authors conclude that the given–new contrast in IDS is similar to the pattern in ADS. As most of the target words were reported as “unknown to the infants” by the mothers (Fisher & Tokura, Reference Fisher and Tokura1995, p. 292), the familiarity of words was not taken into account as a factor in their analysis.

Since mothers tend to repeat a word several times in IDS, Bortfeld and Morgan (Reference Bortfeld and Morgan2010) extended Fisher and Tokura (Reference Fisher and Tokura1995) to multiple mentions of target words and examined how mothers of preverbal children (9- and 10-month-olds) mark contextually new and given information across multiple mentions. They used the same methods as in Fisher and Tokura but conducted their prosodic analyses on entire words instead of vowels. The results showed that, when the target words were mentioned for the first time, they received prosodic prominence, while second mentions did not. Specifically, the first mentions showed larger mean F0, higher maximum F0, and longer duration in comparison to second mentions. When measuring more mentions, a significant quartic trend is shown in four acoustic measures: mean F0, maximum F0, F0 range, and duration. These results suggest that mothers alternate between stressed and unstressed realizations across multiple mentions in English IDS. This study, however, did not test ADS in the same task; thus it is not clear whether the same speech pattern would emerge if mothers were involved in the same task in an ADS condition. As in Fisher and Tokura, the familiarity of words was not controlled for in their data analysis.

Even though the results from the studies outlined above are all interpretable as evidence for the highlighting of ‘new’ words in IDS, and they all indicate the facilitating effects of IDS on word learning, none of these studies has specifically addressed the prosody of unfamiliar words in comparison with familiar words in IDS. Despite a lack of understanding about the nature of prosody specific to word-learning contexts in IDS, there is some evidence to show that children's word learning may benefit from prosodic accentuation. Männel and Friederici (Reference Männel and Friederici2013) found that prosodic accentuation may facilitate 6- and 9-month-old children's recognition of familiar words. With regard to novel word learning, Grassmann and Tomasello (Reference Grassmann and Tomasello2007) demonstrated that 24-month-old children learned a novel noun only when it was prosodically accented. In their study, 24-month-old German-speaking children were taught a novel (unfamiliar) noun and a novel (unfamiliar) verb (both of which were phonotactically legal German pseudo-words) in a sentence, for example “Der Feks miekt”, in which either the noun or the verb was accented and marked by a higher pitch, a larger pitch range, and a longer duration. They found that children were able to learn the novel noun when it was both accented and novel but not when it was only accented (but novel) or only novel (but not accented).

To summarize, previous studies have only examined the prosodic marking of contextually new information. The familiarity of words to children has not been taken into account. Consequently, the prosody of IDS in word-learning contexts is not clear from these studies. The current study thus set out to investigate the prosody and specifically the pitch cues of IDS in word-learning contexts. If mothers specifically manipulate pitch in IDS in order to facilitate word learning, they would have an exaggerated pitch (i.e., higher pitch and larger pitch range) when they introduce unfamiliar words as compared to familiar words. Furthermore, it should be noted that previous studies on the prosodic marking of new information were all conducted on English-speaking dyads. It remains unknown whether these results can be generalized to other languages with different prosodic characteristics.

Language-universal and language-specific pitch modifications in IDS

The exaggerated prosody of IDS is found in almost all languages and cultures, with only a few exceptions such as Quiché Mayan (Bernstein Ratner & Pye, Reference Bernstein Ratner and Pye1984; Ingram, Reference Ingram1995). IDS is thus often considered to exist universally across languages and cultures. In most studies on IDS, the speech samples from IDS conditions are natural mother–child interactions or semi-structured play sessions in laboratory settings, while the speech samples from ADS conditions are conversations or interviews with an experimenter. Eliciting speech in such a way ensures the naturalness of speech data, but the content and contexts of speech data in natural mother–child interactions differ to a large extent, making it difficult to directly compare the results between studies on different languages.

Also, cross-linguistic comparisons of IDS are scarce. The few existing cross-linguistic investigations have only examined its generally exaggerated prosody, showing that the differences among IDS in different languages are mainly related to the degree of prosodic exaggeration. For example, even though IDS prosody in all these languages is exaggerated compared to ADS in the same language, the difference in mean pitch between American English ADS and IDS is larger than in British English, French, Italian, German, or Japanese (Fernald et al., Reference Fernald, Taeschner, Dunn, Papousek, de Boysson-Bardies and Fukui1989). To our knowledge, Grieser and Kuhl (Reference Grieser and Kuhl1988) were the first to compare IDS in non-tonal languages (American English and German) and a tonal language (Mandarin Chinese). They found that Mandarin Chinese IDS, as in American English and German, exhibits a higher pitch and larger pitch range compared to Mandarin Chinese ADS. Later, Kitamura et al. (Reference Kitamura, Thanavishuth, Burnham and Luksaneeyanawin2002) compared the pitch properties (mean pitch, pitch range, and utterance slope-F0) of spontaneous Australian English (a non-tonal language) and Thai (a tonal language) IDS in the first year of life. They found that both Australian English and Thai IDS were more exaggerated than ADS; however, Australian English IDS was generally more exaggerated with respect to pitch properties (mean pitch and pitch range) than Thai IDS. To summarize, cross-linguistic comparisons of IDS in different languages show a universal exaggeration of pitch-related properties compared to ADS, and language-specific aspects seem to be only with respect to the degree of prosodic exaggeration.

However, the prosodic differences between languages may affect IDS in a more complicated way. As mentioned above, previous studies on cross-linguistic comparisons are taken at the general prosodic level without taking a specific context into consideration. In word-learning contexts, different languages may employ different strategies to exaggerate general pitch properties and to highlight unfamiliar words while retaining contrastive pitch at the word level. Specifically, IDS in tonal languages and non-tonal languages may show differences in IDS pitch modifications. In non-tonal languages (e.g., English and Dutch), pitch is mainly used for intonational purposes, whereas in tonal languages (e.g., Mandarin Chinese and Thai), pitch is used to distinguish lexical meanings in addition to conveying intonational information (Yip, Reference Yip2002). Lexical pitch interacts with the generally exaggerated prosody, which may affect the word and sentence prosody in IDS. This interaction may further impact the pitch in word-learning contexts when unfamiliar words need to be highlighted with pitch on top of the general intonational modifications.

To consider the cross-linguistic differences and the effect of speech contexts on IDS prosody with respect to the different uses of pitch, we set up a word-learning context in which mothers introduced unfamiliar words and familiar words to their child, using similar speech elicitation methods in the two languages: Mandarin Chinese (a tonal language) and Dutch (a non-tonal language).

So far, only a few empirical studies have investigated Dutch and/or Mandarin Chinese IDS. These studies have focused on prosodic exaggeration and vowel hyperarticulation. At the intonational level, both Dutch and Mandarin Chinese IDS, as in many other languages, have a higher pitch and a larger pitch range compared to ADS when addressing preverbal children (Grieser & Kuhl, Reference Grieser and Kuhl1988; Liu et al., Reference Liu, Tsao and Kuhl2009; Van de Weijer, Reference Van de Weijer1999). Benders (Reference Benders2013) investigated Dutch IDS addressed to 11- and 15-month-old children. The results showed that the median F0 was higher and F0 excursions were larger in IDS compared to ADS at both ages. Vowels in Mandarin Chinese IDS are hyperarticulated (Liu, Tsao, & Kuhl, Reference Liu, Tsao and Kuhl2009; Tang, Xu Rattanasone, Yuen, & Demuth, Reference Tang, Xu Rattanasone, Yuen and Demuth2017), but vowels in Dutch IDS show hypoarticulation instead (Benders, Reference Benders2013). In addition, lexical tones in Mandarin Chinese IDS are hyperarticulated (Han, de Jong, & Kager, Reference Han, de Jong and Kager2018a; Tang et al., Reference Tang, Xu Rattanasone, Yuen and Demuth2017). Related to IDS in word-learning contexts, Johnson, Lahey, Ernestus, and Cutler (Reference Johnson, Lahey, Ernestus and Cutler2013) showed that, in a word-teaching task, caregivers produced adjectives less frequently compared to common nouns, proper nouns, or verbs. However, pitch properties were not included in their analyses. A recent study found that Dutch mothers slowed down their utterances when introducing unfamiliar words compared to utterances containing familiar words in IDS when addressing 18-month-old children (Han, de Jong, & Kager, Reference Han, de Jong, Kager, Bertolini and Kaplan2018b). To summarize: importantly, none of these studies has addressed the pitch properties of IDS specific to word-learning contexts.

Age effect

Another factor that affects prosodic modifications in IDS is a child's age. Many studies have investigated the age-related changes of IDS prosody in the first year of life. For example, Stern, Spieker, Barnett, and MacKain (Reference Stern, Spieker, Barnett and MacKain1983) found that the pitch properties in IDS were most exaggerated when children were about 4 months old. Kitamura et al. (Reference Kitamura, Thanavishuth, Burnham and Luksaneeyanawin2002) investigated the age-related changes in pitch in IDS addressing Australian English learners in their first year of life. They found that the mean F0 increased at 6 months, decreased at 9 months, and increased again at 12 months. However, F0 range did not differ between ADS and IDS in any of the age groups under investigation. In a cross-linguistic comparison of IDS in Korean, Tagalog, and Sri Lankan Tamil, Narayan and McDermott (Reference Narayan and McDermott2016) found that there were no age-related changes from 4 to 16 months. For all the languages, and at all ages under investigation, IDS had a higher pitch and a larger pitch range than ADS. In a longitudinal study on Dutch IDS, Benders (Reference Benders2013) found that pitch level and pitch excursions are more exaggerated to 15-month-old children than to 11-month-old children. A longitudinal study compared Taiwanese-Mandarin-speaking mothers’ speech to preverbal children and to five-year-olds. The degree of pitch exaggeration (measured on vowels) was larger with preverbal children compared to with five-year-old children (Liu et al., Reference Liu, Tsao and Kuhl2009).

Most of these studies suggest that pitch-related properties of IDS tend to become less exaggerated as children grow older, though conflicting results exist. Also, most studies focused on the first year of life; thus less is known about how IDS changes beyond the first year. From about 16–18 months to 24 months, both children's receptive and expressive vocabularies start to increase rapidly. This period is known as the ‘vocabulary spurt’ period (Goldfield & Reznick, Reference Goldfield and Reznick1990). Also, during this same age period children's ‘fast mapping’ ability – the ability to map a novel label and a novel object based on minimal exposure – gradually improves. In particular, 18-month-old children do not reliably map a novel label to a novel object, but 24-month-old children can reliably associate a novel label to a novel object (Bion et al., Reference Bion, Borovsky and Fernald2013). The current study, therefore, specifically targeted this age-range and asked whether Mandarin Chinese and Dutch IDS change from 18 to 24 months.

The current study

Taken together, most studies on IDS to date focus on its general prosody. In particular, no research has addressed whether mothers use pitch to highlight unfamiliar words compared to familiar words in IDS. Plus, there are relatively few cross-linguistic comparisons between IDS in languages with and without lexical tones, and age-related changes of IDS in the second year of life are less understood. Given the potential cross-linguistic differences and age-related changes in the use of pitch in IDS in word-learning contexts, the current study set out to investigate the following research questions: (1) Do mothers make distinctions between unfamiliar and familiar words with pitch in IDS compared to ADS? Specifically, do mothers use an exaggerated pitch (higher pitch and/or larger pitch range) when they introduce words that are unfamiliar to children compared to familiar words in IDS? Since exaggerated pitch attracts infants’ attention (e.g., Fernald & Simon, Reference Fernald and Simon1984; Masataka, Reference Masataka1992) and children associate novel words with novel objects only when the novel word is marked by a higher pitch, larger pitch range, and longer duration (Grassmann & Tomasello, Reference Grassmann and Tomasello2007), we expect that mothers would have a comparatively higher mean pitch and/or a larger pitch range when they introduce unfamiliar words than for familiar words in IDS in order to facilitate word learning. (2) Do pitch properties of IDS and IDS specific to word-learning contexts change when mothers address children from 18 to 24 months? As the prosodic exaggeration usually decreases as children get older, we predict that the global IDS prosody addressing 18-month-old children is more exaggerated than IDS addressing 24-month-old children. Regarding the pitch properties of IDS specific to word-learning contexts, we have two predictions. First, they may become less exaggerated compared to ADS from 18 to 24 months of age, consistent with global pitch modifications. Alternatively, they may remain the same between 18 and 24 months while the global pitch properties become less exaggerated. (3) How do Dutch and Mandarin Chinese IDS show different patterns in their use of pitch cues in word-learning contexts? To answer this research question, we will compare the pitch properties of IDS specific to word-learning contexts in Dutch and Mandarin Chinese.

To address the three research questions, we conducted two experiments using similar materials and procedures in both Experiment 1 (Mandarin Chinese) and Experiment 2 (Dutch). This study is part of a larger study on cross-linguistic comparisons of IDS prosody specific to word-learning contexts (see also Han et al., Reference Han, de Jong, Kager, Bertolini and Kaplan2018b). We adopted a cross-sectional design in the Mandarin Chinese experiment and used a longitudinal design in the Dutch experiment.Footnote ¹ In both experiments, we used a semi-spontaneous story-book telling task to elicit both ADS and IDS. The book contains words both familiar and unfamiliar to children. We measured pitch (mean F0 and F0 range) at word and utterance level in the speech data.

Experiment 1: Mandarin Chinese

Method

Participants

Twenty-one Mandarin-Chinese-speaking mothers of 18-month-old children (mean age = 18;15 [months;days], age range = 17;21–18;27; girls N = 9; mean age of mothers = 30 years, age range = 25–39 years) and nineteen mothers of 24-month-old children (mean age = 24;13, age range = 23;27–24;30; girls N = 10; mean age of mothers = 31 years, age range = 32–36 years) participated in the study. All mothers had higher education (undergraduate degree and above). The Mandarin Chinese dyads were recruited from kindergartens in Yichang, China. All the participant mothers spoke Mandarin Chinese (the official language in China) proficiently.Footnote ² All children were typically developing.

Materials

A picture-book was designed to elicit a set of seven target words, with five unfamiliar words and two familiar words (Table 1). For each page, one target word was shown on the left side, and an illustration including a depiction of the target word was shown on the right side. Aside from the target words, no other script was provided. An additional six pages of pictures were used as fillers throughout the book to make the story coherent. The target words were all disyllabic nouns. As we wanted to use similar materials for both the Mandarin Chinese and Dutch experiments, we selected familiar words that were listed in both the Mandarin Chinese (M-CDI; Tardif, Fletcher, Liang, & Kaciroti, Reference Tardif, Fletcher, Liang and Kaciroti2009) version and the Dutch version (N-CDI; Zink & Lejaegere, Reference Zink and Lejaegere2002) of MacArthur-Bates Communicative Development Inventories (CDI; Fenson, Marchman, Thal, Dale, & Reznick, Reference Fenson, Marchman, Thal, Dale and Reznick2007). In contrast, the unfamiliar words were not listed in either M-CDI or N-CDI. Also, the familiar words were more frequent than the unfamiliar words in each language.Footnote ³ Selecting target words in such a way was to ensure that the default familiarity of the words applied to most of the participants. However, due to individual differences in vocabulary knowledge, the actual familiarity of the target words might vary among children. Thus, after reading the picture-book in both ADS and IDS conditions, mothers filled out a word checklist to determine whether their child had already understood the target words before the experiment. This information was coded as Familiarity (Familiar/Unfamiliar) and used in data analyses.

Table 1. Target words in Experiment 1 and Experiment 2

Procedure

All participants were tested in a quiet room. Before the experiment, mothers were given a few minutes to familiarize themselves with the book. Each experiment consisted of two conditions: an IDS condition and an ADS condition. In the IDS condition, the child sat on his or her mother's lap, and the mother was instructed to tell the story to her child the way she usually would at home. The mothers were specifically told that they could use any sentences; the only requirement was to include the words given on each page. In the ADS condition, the mothers were instructed to tell the story to the experimenter (female, a native speaker of Mandarin Chinese), and to take into account the fact that she was a college student. The order of the two conditions was counterbalanced across participants. A ZOOM H1 recorder (with 16-bit resolution and a sampling rate of 44.1 kHz) was used to make audio-recordings. Each experimental session took about 15–20 minutes. All participants received a book as a gift after the experiment.

Experiment 2: Dutch

Method

Participants

Thirty Dutch-speaking mother–child dyads participated when children were 18 months old (mean age of children = 18;14, age range = 18;00–18;29; girls N = 17; mean age of mothers = 35 years, age range = 29–44 years). The same participants visited the lab again when the children were 24 months old (mean age of children = 24;18, age range = 24;00–26;30). The Dutch mother–child dyads were recruited from the Utrecht Baby Lab database and were all Dutch native speakers living in the Utrecht area in the Netherlands. As in the Experiment 1, all mothers had higher education (HBO (hogescholen ‘universities of applied sciences’) or WO (universiteiten ‘research universities’) and above) and all children were typically developing.

Materials

For the Dutch 18-month-old and 24-month-old children, two picture-books were designed to elicit two sets of seven target words, with five unfamiliar words and two familiar words in each set (Table 1). The book and the target words for Dutch 18-month-old children was the same with the Mandarin Chinese version. To ensure that children had not learned the words at 24 months, the five unfamiliar words in the 24-month-old version were replaced with new unfamiliar words, while keeping the book structure similar for both age groups.

Procedure

All participants were tested in a quiet room in the Utrecht Baby Lab. Each mother–child dyad came to the lab twice, once when the child was 18 months and once when the child was 24 months. The procedure was similar to Experiment 1; however, the experimenter was a native Dutch speaker (female).

Data analysis

A trained Mandarin Chinese native speaker (the author) and a Dutch native speaker annotated and extracted the target words and target utterances (utterances containing the target words) from the recordings using Praat (Boersma & Weenink, Reference Boersma and Weenink2017). An utterance boundary was defined in accordance with Martin, Igarashi, Jincho, and Mazuka (Reference Martin, Igarashi, Jincho and Mazuka2016, p. 54) as “any pause longer than 200ms which is preceded by an intonational phrase boundary (pauses not accompanied by an IP boundary were considered utterance internal)”. We followed Bortfeld and Morgan (Reference Bortfeld and Morgan2010) and extracted a minimum F0, maximum F0, and mean F0 (in Hz) of the target words. We also extracted these values from the utterances containing target words (i.e., target utterances). The F0 range was calculated as Maximum F0 – Minimum F0. Following Kitamura et al. (Reference Kitamura, Thanavishuth, Burnham and Luksaneeyanawin2002), the F0 range was transformed to Semitones (st) using the formula: Semitones = 12*log₂(maximum F0/minimum F0). The values were extracted automatically using a Praat script and checked manually for doubling and halving errors.

In total, 1375 Chinese utterances and 1434 Dutch utterances were elicited, among which were 857 familiar utterances in Chinese (ADS: 335) and 541 familiar utterances in Dutch (ADS: 226).Footnote ⁴

To examine whether mothers heightened pitch and/or enlarged pitch range specifically for unfamiliar words in IDS, we used linear mixed-effects models for all analyses. In the models, we included fixed factors of Age (18 months/24 months), Condition (ADS/IDS), and Familiarity (Familiar/Unfamiliar) with Participant as a random factor. The analyses were performed for each language on both word and utterance levels. For Mandarin Chinese, due to the cross-sectional design, we included Condition and Familiarity but not Age as random slopes. For Dutch (longitudinal design), we allowed for random slopes for Age, Condition, and Familiarity (Barr, Levy, Scheepers, & Tily, Reference Barr, Levy, Scheepers and Tily2013). The dependent variables were word mean F0 (Hz), word F0 range (Semitone (st)), utterance mean F0 (Hz), and utterance F0 range (Semitone (st)). We used the lme4 package (Bates, Mächler, Bolker, & Walker, Reference Bates, Mächler, Bolker and Walker2015) in the R environment (R Core Team, 2018). For each dependent variable, we took the backward elimination approach, starting with a model that included all fixed effects plus the random factor, and all interactions between them (the most complex model)Footnote ⁵ (Bates, Kliegl, Vasishth, & Baayen, Reference Bates, Kliegl, Vasishth and Baayen2015). Then, we used the ‘step’ function in the lmerTest package (Kuznetsova, Brockhoff, & Christensen, Reference Kuznetsova, Brockhoff and Christensen2017) to reduce the models by eliminating non-significant fixed and random factors or interactions using the default selection criteria as set by the ‘step’ function. When the models with multiple random effects failed to converge, we excluded Age from the random slopes. The means and standard deviations of each dependent variable are presented in Table 2.

Table 2. Mean word and utterance mean F0 (Hz) and F0 range (st) in Mandarin Chinese and Dutch (standard deviations in parentheses)

Results

Experiment 1: Mandarin Chinese

We checked whether there was an effect of testing order (ADS-IDS/IDS-ADS) for each dependent measure and no significant differences were found between the two testing orders for any of the dependent measures. Regarding the research questions, we first examined whether unfamiliar words specifically had a higher mean F0 and a larger F0 range than familiar words in IDS as compared to ADS. Figure 1 and Figure 2 show the box-plots of mean F0 and F0 range at word and utterance level for Mandarin Chinese.

Figure 1. Box-plots of word mean F0 (left panel) and word F0 range (right panel) for ADS and IDS in Mandarin Chinese.

Figure 2. Box-plots of utterance mean F0 (left panel) and utterance F0 range (right panel) for ADS and IDS in Mandarin Chinese.

The results showed that there was a main effect of Condition (ADS/IDS) and a main effect of Familiarity (Familiar/Unfamiliar) on word mean F0 (Table 3a), but there was no significant interaction between Condition and Familiarity. These results suggest that the target words have a higher mean F0 in IDS than in ADS regardless of Familiarity, and that the unfamiliar words have a higher mean F0 compared to familiar words regardless of Condition.

Table 3. Final models for Mandarin Chinese target word mean F0 and F0 range

Notes. Intercept in Table 3a represents ADS and Familiar; intercept in Table 3b represents ADS; intercept in Table 3c represents ADS and Familiar; *p < .05; **p < .01; ***p < .001.

As for the dependent variable word F0 range, there was a significant three-way interaction of Condition, Age, and Familiarity in the final model (β = 2.908, SE = 1.188, t = 2.447, p = .015). Thus, we split the data by Age (18 months/24 months). The results for 18 months (Table 3b) showed that there was a significant main effect of Condition (p = .008), but neither Familiarity nor the interaction between Condition and Familiarity was in the final model, suggesting that mothers expand pitch range for both familiar and unfamiliar words in IDS when children were 18 months. As for the 24-month-old group (Table 3c), there was a significant interaction of Condition and Familiarity (p = .017), but there were no significant effects of either Condition or Familiarity, indicating that Mandarin Chinese mothers specifically expand word F0 range for unfamiliar words in IDS when addressing 24-month-old children.

Results at the utterance level showed a significant interaction of Age and Condition on utterance mean F0 (β = –18.766, SE = 7.862, t = –2.387, p = .022) and utterance F0 range (β = –1.641, SE = 0.605, t = –2.712, p = .007). Thus, we split data by Age for each measurement.

For utterance mean F0 at 18 months (Table 4a), the results showed that there was a significant effect of Condition and a significant interaction of Condition and Familiarity (β = 15.670, SE = 6.312, t = 2.482, p = .013). These results suggest that utterances in IDS had a higher mean F0 compared to ADS, and that this difference was even more pronounced for utterances containing unfamiliar words. The results for 24-month-old children showed that utterance mean F0 did not differ significantly between ADS and IDS, as Condition was not in the final model (Table 4b).

Table 4. Final models for Mandarin Chinese utterance mean F0 for 18-month-old and 24-month-old children

Notes. Intercept in 4a represents ADS and Familiar; intercept in 4b represents Familiar; *p < .05; **p < .01; ***p < .001

Now we turn to the results for utterance F0 range. When splitting the data by Age, the final models for utterance pitch range revealed that there was only a significant main effect of Condition for 18-month-old children in IDS (Table 5a), suggesting that F0 range was larger in IDS as compared to ADS regardless of Familiarity. For 24-month-old children, the final model revealed that there was a significant main effect of Condition, and a significant main effect of Familiarity, as well as a significant interaction of Condition and Familiarity (Table 5b). The direction and size of the effects indicate that, surprisingly, IDS had a smaller pitch range than ADS, and utterances with unfamiliar words had a smaller pitch range than utterances with familiar words. However, the interaction indicates that the effect for familiarity was different in the two conditions. To follow up on this interaction, we split the data further in ADS and IDS, to test for effects of familiarity in each. The effect in ADS only showed a trend (p = .058), in that utterances with unfamiliar words tended to have smaller F0 ranges than utterances with familiar words. We found no such difference between familiar and unfamiliar words in IDS (p = .942).

Table 5. Final models for Mandarin Chinese utterance F0 range

Notes. Intercept in 5a represent ADS; intercept in 5b represents ADS and Familiar; *p < .05; **p < .01; ***p < .001.

Taken together, the results for Mandarin Chinese show age-related changes in IDS prosody. Mandarin Chinese IDS addressing 18-month-old children had a higher mean pitch compared to ADS, but IDS addressing 24-month-old children was already similar to ADS in pitch height. The results also show that Mandarin Chinese mothers tend to use pitch to highlight unfamiliar words. Specifically, at 18 months, when Mandarin Chinese IDS generally had a higher pitch than ADS, utterances with unfamiliar words were specifically higher than utterances with familiar words in IDS. At 24 months, utterance mean pitch of IDS was already similar to ADS, but Mandarin Chinese mothers specifically had a larger word pitch range for unfamiliar words in IDS. These findings suggest that Mandarin Chinese mothers of 18- and 24-month-old children distinguish unfamiliar words from familiar words mainly by exaggerating pitch when introducing unfamiliar words.

Experiment 2: Dutch

We performed similar analyses for the Dutch data. Similar to Mandarin Chinese, no significant differences were found between the two testing orders (ADS-IDS/IDS-ADS) for any of the dependent measures. Figure 3 and Figure 4 show the box-plots of mean F0 and F0 range for Dutch. We first examined whether unfamiliar words specifically had a higher mean F0 and/or a larger F0 range than familiar words in IDS as compared to ADS. The final model for Dutch word mean F0 (Table 6a) showed that there were significant main effects of Age and Condition. There were also significant interactions of Age and Condition as well as Condition and Familiarity. For some reason, apparently, the mothers spoke with a lower word mean F0 in ADS when they came back to the lab when their children were 24 months old. In IDS, however, their word mean F0 at 24 months old was higher compared to ADS. Also, unexpectedly, word mean F0 was specifically lower for unfamiliar words in IDS as compared to ADS.

Figure 3. Box-plots of word mean F0 (left panel) and word F0 range (right panel) for ADS and IDS in Dutch.

Figure 4. Box-plots of utterance mean F0 (left panel) and utterance F0 range (right panel) for ADS and IDS in Dutch.

Table 6. Final models for Dutch target word mean F0 and F0 range

Notes. Intercept in 6a represents ADS, 18 months, and Familiar; intercept in 6b represents ADS; *p < .05; **p < .01; ***p < .001.

Regarding word F0 range, there was only a significant main effect of Condition. As there was no significant interaction of Condition and Familiarity nor a significant interaction of Condition and Age for word F0 range, these results suggest that target words (regardless of Familiarity or Age) had a significantly larger F0 range in IDS than in ADS (Table 6b).

At the utterance level, the final model showed that there was a significant main effect of Condition, with a significant interaction between Condition and Age as well as Condition and Familiarity (Table 7a). These results showed age-related changes in IDS: utterance mean F0 was significantly lower when Dutch mothers addressed 24-month-old children compared to 18-month-old children, though utterance mean F0 was higher in IDS compared to ADS at both ages.Footnote ⁶ Also, surprisingly, utterances containing unfamiliar words specifically had a lower pitch than those containing familiar words in IDS across the two ages.

Table 7. Final models for Dutch utterance mean F0 (Hz) and F0 range (st)

Notes. Intercept in Table 7a represents ADS, 18 months, and Familiar; intercepts in 7b represents Familiar; *p < .05; **p < .01; ***p < .001.

The final model for Dutch utterance F0 range (Table 7b) showed that there was only a main effect of Familiarity, suggesting that utterances containing unfamiliar words had a smaller F0 range compared to utterances with familiar words, regardless of Age or Condition. There were no other significant main effects or interactions.

In sum, our Dutch results show that, contrary to our expectations, Dutch mothers had a lower mean F0 specifically for unfamiliar words and utterances containing unfamiliar words in IDS compared to ADS for both age groups. Dutch IDS also enlarged F0 range at the word level compared to ADS, but there were no significant differences in F0 range between ADS and IDS at the utterance level for both age groups. The results also showed age-related changes of mean F0 in IDS: both words and utterances in IDS addressing 24-month-old children had a lower mean F0 compared to IDS addressing 18-month-old children, yet IDS still had a higher mean F0 than ADS for both ages.Footnote ⁷

Discussion and conclusions

Despite the robust evidence supporting the universality of IDS, our results suggest that the prosodic input, and in particular the pitch of IDS in word-learning contexts, differs between Mandarin Chinese (a tonal language) and Dutch (a non-tonal language). We conducted two experiments on Mandarin Chinese and Dutch dyads using similar speech elicitation methods. In this design, the content and context were matched between languages as well as between conditions, and we kept the speech data as natural as possible by eliciting semi-spontaneous speech instead of scripted read speech. As the two languages differ in their use of lexical pitch, we focused on the pitch cues in word-learning contexts.

First, we asked whether Mandarin Chinese and Dutch mothers use pitch (e.g., a higher pitch or a larger pitch range) to highlight words that are unfamiliar to children compared to familiar words in IDS. The Mandarin Chinese results confirmed our expectations: when addressing 18-month-old children, utterance mean pitch increased specifically for unfamiliar words in IDS but not for familiar words. At 24 months, word pitch range in IDS were exaggerated when mothers introduce unfamiliar words compared to familiar words. However, the Dutch results showed the opposite: Dutch mothers’ word and utterance mean pitch raised specifically for familiar words instead of unfamiliar words for both age groups under investigation. One reviewer pointed out that our results on the word and utterance level may not be independent. As words are embedded in utterances, the word-level pitch may affect the utterance-level pitch. To address this issue, we performed additional analyses by including relatively long utterances, i.e., utterances that are longer than 2 seconds. In such long utterances the effect of the target word would only play a very minor role when calculating the measures of the utterance. We found that for the subset of long utterances (N = 429), results on Dutch utterance mean F0 are in line with our original results.

Second, we asked whether IDS prosody changes from 18 to 24 months of age in Mandarin Chinese and Dutch. Our results indicate age-related changes in both languages. Specifically, Mandarin Chinese IDS addressing 18-month-old children had a higher pitch and a larger pitch range than ADS, but IDS addressing 24-month-olds was similar to ADS with respect to mean pitch. Dutch IDS addressing 18- and 24-month-old children both had a higher pitch than ADS, while the pitch range did not differ between IDS and ADS. The results on Dutch pitch range are in accordance with previous findings for Australian, Japanese, and Thai, which showed that pitch range did not differ between ADS and IDS (Fernald et al., Reference Fernald, Taeschner, Dunn, Papousek, de Boysson-Bardies and Fukui1989; Kitamura et al., Reference Kitamura, Thanavishuth, Burnham and Luksaneeyanawin2002). The degree of pitch modifications, indicated by a relatively lower pitch level, was smaller in Dutch IDS addressing 24-month-old children compared to Dutch IDS addressing 18-month-old children. The general trend is that IDS becomes less exaggerated and more ADS-like from 18 months to 24 months in both languages. Previous studies on Taiwanese Mandarin and Dutch IDS focused on the first year of life and their findings showed that IDS had a higher pitch and larger pitch range in both languages (Benders, Reference Benders2013; Liu et al., Reference Liu, Tsao and Kuhl2009; Van de Weijer, Reference Van de Weijer1999). Specifically, when taking utterance length into consideration, Benders (Reference Benders2013) measured F0 excursions (F0 range divided by utterance duration) and found that Dutch IDS addressed to 11- and 15-month-old children had a larger F0 excursions as compared to ADS. Following Benders, we performed additional analyses on F0 excursions to examine whether F0 excursion was larger in IDS compared to ADS. The results showed that there was a significant main effect of Condition on Mandarin Chinese utterance F0 excursion (β = 1.78, SE = 0.27, t = 6.54, p < .001), suggesting that F0 excursion in Mandarin Chinese IDS was significantly larger than ADS at both ages. Similarly, there was a significant main effect of Condition on Dutch utterance F0 excursion (β = 0.38, SE = 0.05, t = 7.94, p < .001), suggesting that, similar to the findings in Benders, Dutch IDS has larger F0 excursions than ADS at both 18 and 24 months. To summarize, our results extend children's age to 24 months by showing that pitch in both Mandarin Chinese and Dutch IDS remains exaggerated compared to ADS until at least 18 months old, well beyond the first year of life. Previous studies that have examined the age-related changes in IDS have generated mixed results for different languages and different age groups under investigation (e.g., Kitamura et al., Reference Kitamura, Thanavishuth, Burnham and Luksaneeyanawin2002, on Australian English and Thai; Narayan & McDermott, Reference Narayan and McDermott2016, on Korean, Tagalog, and Sri Lankan Tamil). Our results contribute to the literature by showing age-related changes in Mandarin Chinese and Dutch IDS from 18 and 24 months.

Third, we asked how Dutch and Mandarin Chinese IDS differ in their use of pitch in a word-learning context. Previous studies have shown that IDS has an exaggerated prosody compared to ADS across languages; only the degree of prosodic exaggeration in IDS differs among languages. For example, American English IDS was more exaggerated than British English, French, Italian, German, and Japanese IDS (Fernald et al., Reference Fernald, Taeschner, Dunn, Papousek, de Boysson-Bardies and Fukui1989); Thai IDS was less exaggerated than Australian English IDS (Kitamura et al., Reference Kitamura, Thanavishuth, Burnham and Luksaneeyanawin2002). However, as illustrated above, our findings indicate that Mandarin Chinese mothers exaggerate pitch when they introduce unfamiliar words, whereas Dutch mothers exaggerate pitch when they introduce familiar words. These findings suggest that the cross-linguistic differences in IDS are not restricted to the degree of prosodic modifications. In fact, Mandarin Chinese and Dutch mothers exhibit different prosody when introducing unfamiliar words and familiar words to children. Previous studies suggest that mothers are aware of children's vocabulary knowledge at an item level (Fenson et al., Reference Fenson, Marchman, Thal, Dale and Reznick2007; Styles & Plunkett, Reference Styles and Plunkett2009). Our findings further imply that both Dutch and Mandarin Chinese mothers keep track of children's vocabulary knowledge in mother–child interactions and adapt their use of pitch accordingly, as shown by significant interactions of Condition and Familiarity. However, the effect of Familiarity on IDS prosody differs in the two languages. As such, pitch functions differently in Mandarin Chinese and Dutch, and languages employ different means in highlighting unfamiliar words in IDS, which may in turn influence children's strategies for word learning in meaningful ways.

The first question that arises, given these results, is why Mandarin Chinese and Dutch mothers exhibit completely different prosodic modifications regarding the familiarity of words. First, exaggerated pitch draws children's attention (Fernald & Simon, Reference Fernald and Simon1984). Also, children are sensitive to the mapping of prosodically highlighted words and novel objects (Grassmann & Tomasello, Reference Grassmann and Tomasello2007). Thus, we interpret the Mandarin Chinese results as evidence for the potential facilitating effects of IDS on word learning. Pitch cues such as higher pitch and larger pitch range in IDS do not have only linguistic functions, but also serve to signify positive affect (Singh, Morgan, & Best, Reference Singh, Morgan and Best2002; Trainor, Austin, & Desjardins, Reference Trainor, Austin and Desjardins2000). As such, the Dutch results, which showed higher pitch for familiar words, may be attributed to positive affect. In a longitudinal investigation of Dutch IDS addressed to 11- to 15-month-old children, Benders (Reference Benders2013) found that the acoustic properties of vowels in Dutch IDS convey positive affect but do not enhance vowel contrasts, which could consequently facilitate infants’ phonetic categorization. The target words in her study included words such as fiets ‘bike’, boek ‘book’, and schaap ‘sheep’. Even though the familiarity of these words for each child was unknown, these words were mostly listed in N-CDI (Zink & Lejaegere, Reference Zink and Lejaegere2002), and thus they are likely to be familiar to children. It is possible that Dutch mothers show more positive affect when they mention words that are familiar to their child compared to unfamiliar words, e.g., because placing positive affect on unfamiliar words might not be meaningful. In contrast, they might lower pitch for unfamiliar words to show a relatively neutral emotion. However, little is known about whether showing positive affect on familiar words may help or inhibit language learning. Future research may further investigate the emotional affect in word-learning contexts and the possible effects on word learning.

We have shown that Dutch mothers did not seem to exaggerate pitch to highlight unfamiliar words; however, this does not necessarily mean that Dutch mothers do not highlight unfamiliar words at all. Han et al. (Reference Han, de Jong, Kager, Bertolini and Kaplan2018b) found that Dutch mothers slowed down their utterances when introducing unfamiliar words compared to utterances containing familiar words in IDS. Combining these results, Mandarin Chinese and Dutch IDS employ different prosodic cues to highlight unfamiliar words. Mandarin Chinese IDS mainly uses exaggerated pitch, while Dutch IDS prefers temporal cues (i.e., articulation rate). However, these results only demonstrate how mothers use prosody to make distinctions between unfamiliar and familiar words during mother–child interactions. Future studies should examine whether such speech patterns in Mandarin Chinese and Dutch IDS indeed facilitate word learning in Mandarin Chinese and Dutch children.

The differences in the use of pitch cues in Mandarin Chinese and Dutch may also be attributed to typological differences between these two languages. Mandarin Chinese, as a tonal language, uses pitch to distinguish lexical meanings. As a result, the pitch range of words is crucial to word meanings, so mothers specifically enlarged pitch range when introducing unfamiliar words. They specifically did so at 24 months old, when children are learning words efficiently. Dutch, a stress language, may resort to temporal cues to highlight unfamiliar words.

A possibility for the difference between our Mandarin Chinese and Dutch results is the asymmetry in experimental design. However, as the Chinese study took a cross-sectional design and the Dutch took a longitudinal design, there would be, in theory, lower statistical power for Chinese compared to Dutch. As such, the cross-linguistic differences are likely to be an underestimation (rather than an overestimation) if there were indeed any effects of the asymmetric design. That is, if the difference in design had affected the results, we would have expected a stronger main effect of Age in Dutch as compared to Mandarin Chinese. However, our results showed that the age-related changes in Chinese were even stronger as compared to Dutch. Future cross-linguistic studies on IDS may avoid this asymmetry as much as possible, but the cross-linguistic differences found in the current study are not likely to be affected by the design.

The current study focused on pitch properties of IDS in word-learning contexts and the measurements included mean F0 and F0 range of the target words as well as the utterances containing the target words. In addition to these two common measures of pitch, other prosodic measures such as articulation rate, pausing, pitch peak, accentuation, and F0 slope may also be used to highlight unfamiliar words. Also, non-prosodic cues can be useful in highlighting unfamiliar words, for example, repetition, position of target words in an utterance, sentence type, sentence complexity, and multimodal cues. Further analyses of the current IDS corpora of Dutch and Mandarin Chinese IDS may reveal whether mothers employ these prosodic and non-prosodic means in word-learning contexts and whether there are differences between the two languages.

To conclude, despite robust evidence supporting the universality of IDS, our results suggest that the pitch properties in IDS specific to word-learning contexts show different patterns between Mandarin Chinese, a tonal language, and Dutch, a non-tonal language. Specifically, speakers of Mandarin Chinese IDS enlarge pitch range when they introduce unfamiliar words, but Dutch IDS speakers heighten pitch specifically when introducing familiar words. It is possible that the pitch cues in Mandarin Chinese IDS have more pedagogical functions, while the pitch cues in Dutch IDS convey positive affect and are more entertaining. Furthermore, the developmental changes from 18 months to 24 months differ in these two languages. Both Mandarin Chinese and Dutch IDS are exaggerated in pitch compared to ADS in these languages when addressing 18-month-old children. When children reach 24 months, Mandarin Chinese IDS is already similar to ADS, whereas Dutch IDS is still more exaggerated than ADS.

Our study contributes to the understanding of the quality of prosodic input in two distinct languages and cultures. Our findings indicate that the prosodic input in word-learning contexts differs between languages, and consequently the specific prosodic cues that account for the potential facilitative effects of IDS require further examination in a diversity of languages and cultures.

Supplementary materials

For Supplementary materials for this paper, please visit <https://doi.org/10.1017/S0305000919000813>.

Acknowledgments

We would like to thank Aihua Zou and Yuhong Li from Taohualing Kindergarten, Huan He from Gezhouba Dongshan Kindergarten, and Hong Xie at Gezhouba Early Education Center in Yichang, China, for their kind support and coordination in recruiting participants. We thank Lisanne Geurts, Karlijn Kouwer, and Run Chai for their help with data collection and annotation. We are grateful to all the families who participated in this study. We also acknowledge the members of Utrecht Babylab, Jeroen Breteler, and the anonymous reviewers for their valuable comments and suggestions.

Footnotes

¹ The difference in design was mainly due to the practical situation in which we recruited our participants in China. The participants were mostly recruited from early education programs in kindergartens where they did not enroll for longer than a semester (6 months).

² All the participant mothers spoke Mandarin Chinese and a dialect (Southwest Mandarin). The participant children heard this dialect in their language community, but were exposed to Mandarin Chinese at home, at kindergarten, and in the national media. This type of bilingual language background is common for most people in China (Li & Lee, Reference Li, Lee, Bhatia and Ritchie2006). We set these criteria when recruiting participants: (1) the mothers should speak Mandarin Chinese with good proficiency (with minimal accent); (2) the mothers should mostly speak Mandarin Chinese to their children at home; and (3) the children should be learning Mandarin Chinese as one of their first languages.

³ The ranking (lower rank indicating a higher frequency) of Mandarin Chinese word frequency based on Cai and Brysbaert (Reference Cai and Brysbaert2010) is: yé ye ‘grandpa’ (1662), píng guǒ ‘apple’ (2939), mí lù ‘moose’ (17914), hé lí ‘beaver’ (55578), hé tao ‘walnut’ (12883), chéng bǎo ‘castle’ (3149), and nán guā ‘pumpkin’ (5744). The ranking of Dutch word frequency according to Keuleers, Brysbaert, and New (Reference Keuleers, Brysbaert and New2010) is: opa ‘grandpa’ (1211), appel ‘apple’ (4666), eland ‘moose’ (12385), bever ‘beaver’ (11515), walnoot ‘walnut’ (28953), kasteel ‘castle’ (2185), pompoen ‘pumpkin’ (12830), bamboe ‘bamboo’ (30072), wezel ‘weasel’ (14576), emoe ‘emu’ (76161), kapel ‘chapel’ (8604), and jasmijn ‘jasmine’ (26190). Note that word frequency is only provided to show that unfamiliar words usually have a lower word frequency. Ranking is not comparable between languages. We used the mothers’ reports as an indication for Familiarity in analyses.

⁴ 13% of the reported familiarity of items diverged from the default familiarity for Dutch and 42% of the reported familiarity of items diverged from the default familiarity for Mandarin Chinese.

⁵ An example of the R codes is: lmer(meanF0 ~ Age * Condition * Familiarity + (1 + Age + Condition + Familiarity | Participant))

⁶ When splitting the data by Age, the results showed that, for 18 months, there was a significant main effect of Condition (β = 49.10, SE = 6.61, t = 7.45, p < .001) and a significant interaction of Condition and Familiarity (β = –20.75, SE = 7.14, t = –2.90, p = .004), but the main effect of Familiarity was not significant (β = –0.269, SE = 5.39, t = –0.05, p = .96). Similar results were obtained for the 24-month-old group: there was a significant main effect of Condition (β = 34.49, SE = 6.93, t = 4.97, p < .001) and a significant interaction of Condition and Familiarity (β = –17.89, SE = 7.21, t = –2.48, p = .013), but the main effect of Familiarity was not significant (β = –0.302, SE = 5.53, t = –0.06, p = .96).

⁷ One anonymous reviewer pointed out that the phonological properties of the target words may confound with the effect of Familiairy. Specifically, the familiar words (by default) included only trochaic words (opa and appel) while unfamiliar words (by default) had both trochaic (18 months: eland, bever, walnoot; 24 months: bamboe, wezel, and emoe) and iambic words (18 months: kasteel and pompoen; 24 months: kapel and jasmijn). It is possible that our results that mothers specifically increased pitch for familiar words could be due to the fact that all familiar words were trochees while only a subset of unfamiliar words are trochees. To elimilate this possibility, we conducted ad hoc analyses on utterance and word mean F0 by excluding iambic words from the data. The results showed that the results held even after excluding iambic words. The results can be found in Supplementary materials (available at <https://doi.10.1017/S0305000919000813>). The reviewer also pointed out potential effects of Mandarin Chinese Tone 3 (a contour tone) on the results. In Mandarin Chinese, Tone 3 is usually not fully realized in continuous speech except when it is in utterance-final position. In our data, only 57 out of 1375 cases had Tone 3 in utterance-final position, which are likely to be fully realized as a contour tone. We would not expect that the small proportion of the cases contributed largely to the results.

References

Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random effects structure for confirmatory hypothesis testing: keep it maximal. Journal of Memory and Language, 68, 255–78.CrossRef Google Scholar PubMed

Bates, D., Kliegl, R., Vasishth, S., & Baayen, H. (2015). Parsimonious mixed models. Available at <http://arxiv.org/abs/1506.04967>>Google Scholar

Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1–48.CrossRef Google Scholar

Benders, T. (2013). Mommy is only happy! Dutch mothers’ realisation of speech sounds in infant-directed speech expresses emotion, not didactic intent. Infant Behavior and Development, 36(4), 847–62.CrossRef Google Scholar

Bergelson, E., & Swingley, D. (2012). At 6–9 months, human infants know the meanings of many common nouns. Proceedings of the National Academy of Sciences, 109(9), 3253–8.CrossRef Google Scholar PubMed

Bernstein Ratner, N., & Pye, C. (1984). Higher pitch in BT is not universal: acoustic evidence from Quiché Mayan. Journal of Child Language, 11(3), 515–22.CrossRef Google Scholar

Bion, R. A. H., Borovsky, A., & Fernald, A. (2013). Fast mapping, slow learning: disambiguation of novel word–object mappings in relation to vocabulary learning at 18, 24, and 30 months. Cognition, 126(1), 39–53.CrossRef Google Scholar

Bloom, P. (2001). Précis of How children learn the meanings of words. Behavioral and Brain Sciences, 24(6), 1095–103.CrossRef Google Scholar

Boersma, P., & Weenink, D. J. M. (2017). Praat: doing phonetics by computer [Computer program]. Available at <http://www.praat.org/>..>Google Scholar

Bortfeld, H., & Morgan, J. L. (2010). Is early word-form processing stress-full? How natural variability supports recognition. Cognitive Psychology, 60(4), 241–66.CrossRef Google Scholar PubMed

Cai, Q., & Brysbaert, M. (2010). SUBTLEX-CH: Chinese word and character frequencies based on film subtitles. PloS one, 5(6), e10729.CrossRef Google Scholar PubMed

Chafe, W. (1976). Givenness, contrastiveness, definiteness, subjects, topics, and point of view. In: Li, C. (Ed.), Subject and topic (pp. 25–55). New York: Academic Press.Google Scholar

Cristia, A. (2013). Input to language: the phonetics and perception of infant-directed speech. Language and Linguistics Compass, 7(3), 157–70.CrossRef Google Scholar

Fenson, L., Marchman, V. A., Thal, D. J., Dale, P. S., & Reznick, J. S. (2007). MacArthur-Bates Communicative Development Inventories: user's guide and technical manual. Baltimore, MD: Brookes.Google Scholar

Fernald, A. (2000). Speech to infants as hyperspeech: knowledge-driven processes in early word recognition. Phonetica, 57, 242–54.CrossRef Google Scholar PubMed

Fernald, A., & Mazzie, C. (1991). Prosody and focus in speech to infants and adults. Developmental Psychology, 27(2), 209–21.CrossRef Google Scholar

Fernald, A., & Morikawa, H. (1993). Common themes and cultural variations in Japanese and American mothers’ speech to infants. Child Development, 64(3), 637–56.CrossRef Google Scholar PubMed

Fernald, A., & Simon, T. (1984). Expanded intonation contours in mothers’ speech to newborns. Developmental Psychology, 20(1), 104–13.CrossRef Google Scholar

Fernald, A., Taeschner, T., Dunn, J., Papousek, M., de Boysson-Bardies, B., & Fukui, I. (1989). A cross-language study of prosodic modifications in mothers’ and fathers’ speech to preverbal infants. Journal of Child Language, 16(3), 477–501.CrossRef Google Scholar PubMed

Fisher, C., & Tokura, H. (1995). The given–new contract in speech to infants. Journal of Memory and Language, 34(3), 287–310.CrossRef Google Scholar

Goldfield, B. A., & Reznick, J. S. (1990). Early lexical acquisition: rate, content, and the vocabulary spurt. Journal of Child Language, 17(1), 171–83.CrossRef Google Scholar PubMed

Graf Estes, K., & Hurley, K. (2013). Infant-directed prosody helps infants map sounds to meanings. Infancy, 18(5), 797–824.CrossRef Google Scholar

Grassmann, S., & Tomasello, M. (2007). Two-year-olds use primary sentence accent to learn new words. Journal of Child Language, 34(3), 677–87.CrossRef Google Scholar PubMed

Grieser, D. L., & Kuhl, P. K. (1988). Maternal speech to infants in a tonal language: support for universal prosodic features in motherese. Developmental Psychology, 24(1), 14–20.CrossRef Google Scholar

Gundel, J. K. (1999). Topic, focus, and the grammar–pragmatics interface. In Alexander, N. J. & Minnick, M. (Eds.), Proceedings of the 23rd Annual Penn Linguistics Colloquium, vol. 6.1. Penn Working Papers in Linguistics (pp. 185–200). Available at <https://repository.upenn.edu/pwpl/vol6/iss1/14>.Google Scholar

Halliday, M. A. K. (1967). Notes on transitivity and theme in English: part 2. Journal of Linguistics, 3(2), 199–244.CrossRef Google Scholar

Han, M., de Jong, N. H., & Kager, R. (2018a). Lexical tones in Mandarin Chinese infant-directed speech: age-related changes in the second year of life. Frontiers in Psychology, 9, 434. https://www.frontiersin.org/article/10.3389/fpsyg.2018.00434 CrossRef Google Scholar

Han, M., de Jong, N. H., & Kager, R. (2018b). Infant-directed speech is not always slower: cross-linguistic evidence from Dutch and Mandarin Chinese. In Bertolini, A. & Kaplan, M. (Eds.), Proceedings of the 42nd Annual Boston University Conference on Language Development (pp. 331–44). Somerville, MA: Cascadilla Press.Google Scholar

Ingram, D. (1995). The cultural basis of prosodic modifications to infants and children: a response to Fernald's universalist theory. Journal of Child Language, 22(1), 223–33.CrossRef Google Scholar PubMed

Johnson, E. K., Lahey, M., Ernestus, M., & Cutler, A. (2013). A multimodal corpus of speech to infant and adult listeners. Journal of the Acoustical Society of America, 134(6), EL534–EL540.CrossRef Google Scholar PubMed

Kalashnikova, M., & Burnham, D. (2018). Infant-directed speech from seven to nineteen months has similar acoustic properties but different functions. Journal of Child Language, 45(5), 1035–53.CrossRef Google Scholar PubMed

Keuleers, E., Brysbaert, M., & New, B. (2010). SUBTLEX-NL: a new measure for Dutch word frequency based on film subtitles. Behavior Research Methods, 42(3), 643–50.CrossRef Google Scholar PubMed

Kitamura, C., Thanavishuth, C., Burnham, D., & Luksaneeyanawin, S. (2002). Universality and specificity in infant-directed speech: pitch modifications as a function of infant age and sex in a tonal and non-tonal language. Infant Behavior and Development, 24(4), 372–92.CrossRef Google Scholar

Kuhl, P. K., Andruski, J. E., Chistovich, I. A., Chistovich, L. A., Kozhevnikova, E. V., Ryskina, V. L., Stolyarova, E. I., Sundberg, U., & Lacerda, F. (1997). Cross-language analysis of phonetic units in language addressed to infants. Science, 277, 684–6.CrossRef Google Scholar PubMed

Kuznetsova, A., Brockhoff, P. B., & Christensen, R. H. B. (2017). lmerTest package: tests in linear mixed effects models. Journal of Statistical Software, 82(13). 1–26.CrossRef Google Scholar

Li, D. C., & Lee, S. (2006). Bilingualism in East Asia. In Bhatia, T. K. & Ritchie, W. C. (Eds.), The handbook of bilingualism (pp. 742–79). Malden, MA: Blackwell.CrossRef Google Scholar

Lindblom, B. (1990). Explaining phonetic variation: a sketch of the H&H theory. In Hardcastle, W. J. & Marchal, A. (Eds.), Speech production and speech modelling (pp. 403–39). Dordrecht: Kluwer.CrossRef Google Scholar

Liu, H.-M., Tsao, F.-M., & Kuhl, P. K. (2009). Age-related changes in acoustic modifications of Mandarin maternal speech to preverbal infants and five-year-old children: a longitudinal study. Journal of Child Language, 36(4), 909–22.CrossRef Google Scholar PubMed

Ma, W., Golinkoff, R. M., Houston, D., & Hirsh-Pasek, K. (2011). Word learning in infant- and adult-directed speech. Language Learning and Development, 7(3), 185–201.CrossRef Google Scholar PubMed

Mani, N., & Pätzold, W. (2016). Sixteen-month-old infants’ segment words from infant- and adult-directed speech. Language Learning and Development, 12(4), 499–508.CrossRef Google Scholar

Männel, C., & Friederici, A. D. (2013). Accentuate or repeat? Brain signatures of developmental periods in infant word recognition. Cortex, 49(10), 2788–98.CrossRef Google Scholar PubMed

Martin, A., Igarashi, Y., Jincho, N., & Mazuka, R. (2016). Utterances in infant-directed speech are shorter, not slower. Cognition, 156, 52–9.CrossRef Google Scholar

Masataka, N. (1992). Pitch characteristics of Japanese maternal speech to infants. Journal of Child Language, 19(2), 213–23.CrossRef Google Scholar PubMed

Narayan, C. R., & McDermott, L. C. (2016). Speech rate and pitch characteristics of infant-directed speech: longitudinal and cross-linguistic observations. Journal of the Acoustical Society of America, 139(3), 1272–81.CrossRef Google Scholar PubMed

Porritt, L. L., Zinser, M. C., Bachorowski, J.-A., & Kaplan, P. S. (2014). Depression diagnoses and fundamental frequency-based acoustic cues in maternal infant-directed speech. Language Learning and Development, 10(1), 51–67.CrossRef Google Scholar PubMed

R Core Team (2018). R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. Online <https://www.R-project.org/>..>Google Scholar

Ramírez-Esparza, N., García-Sierra, A., & Kuhl, P. K. (2014). Look who's talking: speech style and social context in language input to infants are linked to concurrent and future speech development. Developmental Science, 17(6), 880–91.CrossRef Google Scholar PubMed

Singh, L., Morgan, J. L., & Best, C. T. (2002). Infants’ listening preferences: Baby talk or happy talk? Infancy, 3(3), 365–94.CrossRef Google Scholar

Singh, L., Nestor, S., Parikh, C., & Yull, A. (2009). Influences of infant-directed speech on early word recognition. Infancy, 14(6), 654–66.CrossRef Google Scholar

Soderstrom, M. (2007). Beyond babytalk: re-evaluating the nature and content of speech input to preverbal infants. Developmental Review, 27(4), 501–32.CrossRef Google Scholar

Soderstrom, M., Blossom, M., Foygel, R., & Morgan, J. L. (2008). Acoustical cues and grammatical units in speech to two preverbal infants. Journal of Child Language, 35(4), 869–902.CrossRef Google Scholar PubMed

Song, J. Y., Demuth, K., & Morgan, J. (2010). Effects of the acoustic properties of infant-directed speech on infant word recognition. Journal of the Acoustical Society of America, 128(1), 389–400.CrossRef Google Scholar PubMed

Stern, D. N., Spieker, S., Barnett, R. K., & MacKain, K. (1983). The prosody of maternal speech: infant age and context related changes. Journal of Child Language, 10(1), 1–15.CrossRef Google Scholar PubMed

Styles, S., & Plunkett, K. (2009). What is ‘word understanding’ for the parent of a one-year-old? Matching the difficulty of a lexical comprehension task to parental CDI report. Journal of Child Language, 36(4), 895–908.CrossRef Google Scholar PubMed

Tang, P., Xu Rattanasone, N., Yuen, I., & Demuth, K. (2017). Phonetic enhancement of Mandarin vowels and tones: infant-directed speech and lombard speech. Journal of the Acoustical Society of America, 142(2), 493–503.CrossRef Google Scholar PubMed

Tardif, T., Fletcher, P., Liang, W., & Kaciroti, N. (2009). Early vocabulary development in Mandarin (Putonghua) and Cantonese. Journal of Child Language, 36(5), 1115–44.CrossRef Google Scholar PubMed

Thiessen, E. D., Hill, E. A., & Saffran, J. R. (2005). Infant-directed speech facilitates word segmentation. Infancy, 7(1), 53–71.CrossRef Google Scholar

Thorson, J. C. (2018). The role of prosody in early word learning: behavioral evidence. In Prieto, P., & Esteve-Gibert, N. (Eds.), The development of prosody in first language acquisition (Vol. 23, 1st ed., pp. 60–77). Amsterdam: John Benjamins.CrossRef Google Scholar

Trainor, L. J., Austin, C. M., & Desjardins, R. N. (2000). Is infant-directed speech prosody a result of the vocal expression of emotion? Psychological Science, 11(3), 188–95.CrossRef Google Scholar PubMed

Van de Weijer, J. (1999). Language input for word discovery (Unpublished Doctoral dissertation), Radboud University Nijmegen, Nijmegen. doi:10.17617/2.2057670.CrossRef Google Scholar

Wang, Y., Seidl, A., & Cristia, A. (2016). Acoustic characteristics of infant-directed speech as a function of prosodic typology. In Heinz, J., Goedemans, R., & van de Hulst, H. (Eds.), Dimensions of phonological stress (pp. 311–24). Cambridge University Press.CrossRef Google Scholar

Yip, M. (2002). Tone. Cambridge University Press.CrossRef Google Scholar

Zangl, R., & Mills, D. L. (2007). Increased brain activity to infant-directed speech in 6- and 13-month-old infants. Infancy, 11(1), 31–62.CrossRef Google Scholar

Zink, I., & Lejaegere, M. (2002). N-CDI's lijsten voor communicatieve ontwikkeling [“Dutch MacArthur CDI's for communicative development”]. Leuven: Acco.Google Scholar