Introduction
Infant directed speech
In most cultures, parents use a special register for interacting with infants: infant directed speech (IDS) (Ferguson, Reference Ferguson1964; Fernald et al., Reference Fernald, Taeschner, Dunn, Papousek, De Boysson-Bardies and Fukui1989). Compared with Adult Directed Speech (ADS) this register is characterized by e.g., higher and greater variability in pitch (Fernald et al., Reference Fernald, Taeschner, Dunn, Papousek, De Boysson-Bardies and Fukui1989; Fischer & Tokura, Reference Fischer and Tokura1996), slower tempo (Swanson, Leonard & Gandour, Reference Swanson, Leonard and Gandour1992) and lexical and syntactic simplifications (Lieven, Reference Lieven, Gallaway and Richards1994). These modifications are important for engaging and maintaining infant attention and communicating affect (Ferguson, Reference Ferguson, Snow and Ferguson1977; Fernald, Reference Fernald1989). Indeed, infants prefer to listen to IDS over ADS (Cooper & Aslin, Reference Cooper and Aslin1990; Fernald, Reference Fernald, Barkow, Cosmides and Tooby1992; Kitamura & Burnham, Reference Kitamura and Burnham2003). There is also evidence that IDS has a positive effect on the infant’s language development. Word learning appears to be facilitated at the start of lexical acquisition, as suggested by, for instance, the finding that 21-month-old children learned novel words when presented in IDS but not when presented in ADS (Ma, Golinkoff, Houston & Hirsh-Pasek, Reference Ma, Golinkoff, Houston and Hirsh-Pasek2011). Although there is widespread agreement that parents change their speech when talking to their children, the specifics of the register continue to be debated. One topic of debate concerns the exaggerated pronunciation of speech sounds. It was shown by e.g., Bernstein Ratner (Reference Bernstein Ratner1984) and Kuhl et al. (Reference Kuhl, Andruski, Christovich, Christovich, Kozhevnikova, Ryskina and Lacerda1997) that parents produce their vowels more clearly in IDS as compared to ADS, by extending the vowel space. However, more recent research found no indications of such an expansion (e.g., Benders, Reference Benders2013). In the following sections, these conflicting findings will be elaborated on. The aim of the present study was to investigate the development of the parental vowel space longitudinally relative to children’s linguistic development. Changes in parents’ vowel space were investigated in the words that the children actually started using themselves.
Expanding or reducing the vowel space?
A frequently reported characteristic of IDS is exaggerated articulation or hyperarticulation: adults appear to exaggerate the pronunciation of speech sounds presumably to “speak clearly” and to produce speech sounds as distinctly as possible (Kitamura, Reference Kitamura, Brooks and Kempe2014). Vowels as well as consonants are characterized by this type of particularly clear articulation. The hyperarticulation of vowels is usually assessed by determining the space circumscribed by the three corner vowels /a/, /i/ and /u/, the so-called vowel triangle (Ladefoged, Reference Ladefoged2006). The corner vowels are the most extreme articulated vowels in terms of the tongue position in the vertical dimension (tongue height: high versus low) and the horizontal dimension (tongue position: front versus back). These extremes are reflected in the acoustic domain, and more specifically in the values of the first (F1) and second (F2) vowel formants. Previous research found that vowels exhibit the most extreme values in IDS compared to ADS, as measured by the space created by the first and the second formant distributions. More specifically, the point vowels /i/, /u/, and /a/ are acoustically more peripheral in IDS as compared to ADS. The distance between the point vowels in the F1/F2 plane is larger in IDS. Consequently, the surface of the triangle defined by those vowels is significantly larger (Andruski, Kuhl & Hayashi, Reference Andruski, Kuhl and Hayashi1999; Bernstein Ratner, Reference Bernstein Ratner1984; Burnham, Kitamura & Vollmer-Conna, Reference Burnham, Kitamura and Vollmer-Conna2002; Cristia & Seidl, Reference Cristia and Seidl2013; Kalashnikova & Burnham, Reference Kalashnikova and Burnham2018; Kondaurova, Bergeson & Dilley, Reference Kondaurova, Bergeson and Dilley2012; Kuhl et al., Reference Kuhl, Andruski, Christovich, Christovich, Kozhevnikova, Ryskina and Lacerda1997; Liu, Kuhl & Tsao, Reference Liu, Kuhl and Tsao2003; Uther, Knoll & Burnham, Reference Uther, Knoll and Burnham2007). Although much research investigating IDS concerned parents with an English-speaking background, extension of the vowel space was also found in other languages than English. It was reported for several regiolects or dialects of e.g., English, Dutch, Russian, Swedish, Hungarian, Japanese, Norwegian, French and Mandarin (see Marklund & Gustavsson, Reference Marklund and Gustavsson2020 for an overview).
Segmental and suprasegmental aspects of IDS are hypothesized to affect infants’ pre-linguistic and linguistic development (see Spinelli, Fasolo & Mesman, Reference Spinelli, Fasolo and Mesman2017 for a review). For instance, the larger vowel space is thought to facilitate phonological category learning and thus to promote language acquisition – “The exaggerated form [of the vowel space] serves two functions: It more effectively separates sounds into contrasting categories, and it highlights the parameters on which speech categories are distinguished […]” (Kuhl et al., Reference Kuhl, Andruski, Christovich, Christovich, Kozhevnikova, Ryskina and Lacerda1997, p. 686). Indeed, Liu et al. (Reference Liu, Kuhl and Tsao2003) found that the speech discrimination skills of infants are positively correlated with their mother’s expansion of the vowel space. Moreover, vowel modification in IDS appears to be linked to the development of speech perception abilities, as the degree of hyperarticulation of vowels in IDS to 18-month-old infants was a significant predictor of the receptive and expressive vocabulary size of the infants 6 months later (Hartman, Bernstein Ratner & Newman, Reference Hartman, Bernstein Ratner and Newman2017). Similarly, a recent study of Kalashnikova and Burnham (Reference Kalashnikova and Burnham2018) found that the degree of vowel hyperarticulation in IDS to infants of around 9 months was a significant predictor for the infants’ expressive vocabulary at 15 months and 19 months.
To summarize, there is compelling evidence showing that parents hyperarticulate their vowels in IDS. It has also been suggested that this characteristic of the speech addressed to infants is particularly helpful in their speech and language development. However, the evidence presented in the literature is not equivocal: in comparison to ADS, a reduction of the vowel space or hypospeech has been reported and in some instance a lack of difference between IDS and ADS.
The idea that parents may actively engage in stretching out their vowel space in IDS, thereby highlighting phonetic structure, has proven to be controversial. Detrimental to the notion that IDS is hyperarticulated speech are findings that categories, such as the vowel space or other phonetic categories, were not enhanced in IDS or even less distinct than in ADS. Several studies found smaller vowel areas in IDS (Benders, Reference Benders2013; Englund, Reference Englund2018; Englund & Behne, Reference Englund and Behne2006). Benders (Reference Benders2013), who found a smaller vowel space in Dutch IDS as compared to ADS, suggests that mothers mainly use a happy speaking style to reflect positive affect. Acoustic changes could be a side effect of that. Furthermore, in a comprehensive examination of consonantal and vocalic distinctions of Japanese mothers interacting with their infants, Martin et al. (Reference Martin, Schatz, Versteegh, Miyazawa, Mazuka, Dupoux and Cristia2015) reported a significant tendency for smaller distances between phonetic categories in IDS in comparison to ADS. In research of oral stop voicing contrasts in Nepali, a similar result was found. Benders, Pokharel, and Demuth (Reference Benders, Pokharel and Demuth2019) found hypoarticulation, i.e., an articulation characterized by more reductions of stop voicing contrast in IDS compared to ADS (Lindblom, Reference Lindblom, Hardcastle and Marchal1990). In Cantonese, there was no enhancement in IDS of phonetically similar and more confusing tones (Wong & Wing Sum Ng, Reference Wong and Wing Sum Ng2018). Thus, research findings seem to contradict one another: in some studies, an enlarged vowel space in IDS was reported, in others the opposite pattern was found, while in still other studies no difference was found in the vowel space of IDS and ADS.
How to solve the controversy about the vowel space?
A useful theoretical framework for investigating the sources of the contradictory findings regarding the vowel space in IDS is the H&H model originally formulated by Lindblom (Reference Lindblom, Hardcastle and Marchal1990). The Hyper-Hypo model of speech production views a speaker’s articulations on a continuum between hyperspeech and hypospeech. On one side of the continuum, hypospeech requires the least articulatory effort, it is less articulated. Less articulated vowels result in a smaller vowel space. On the other side, hyperspeech is much more articulated and thus requires more articulatory effort, and a more extended vowel space. Speakers control their speech in such a way that they aim to maximize the communicative efficiency of their speech with the least articulatory effort. In order to strike the balance between minimizing articulatory effort and effectively attaining communication goals, the speaker has to take into account several factors, such as contextual factors (e.g., the presence or absence of background noise, Hazan & Baker, Reference Hazan and Baker2011), as well as the communicative sophistication of the listener. For instance, if the listener is an infant, the speaker may induce immature communicative or linguistic knowledge, and hence the speaker may invest more articulatory effort in his speech, which results in hyperspeech. The speaker can infer what the child understands or is interested in by following the infant’s eye gaze, gestures, and the like to arrive at joint attention (Çetinçelik, Rowland & Snijders, Reference Çetinçelik, Rowland and Snijders2021; Tomasello & Todd, Reference Tomasello and Todd1983) and use hyperspeech to emphasize those topics.
To examine if these adaptations are related to the communicative and linguistic abilities of the addressee, IDS has been compared to pet-directed speech (PDS) and foreigner-directed speech (FDS). PDS and IDS are characterised by high pitch and positive affect, as measured by ratings of low-pass-filtered speech in which only the intonation and rhythm are retained and segmental information is blurred (Burnham et al., Reference Burnham, Kitamura and Vollmer-Conna2002; Gergely, Faragó, Galambos & Topál, Reference Gergely, Faragó, Galambos and Topál2017). However, an interesting opposition was revealed in speech directed to dogs versus parrots. In the former case there were no signs of hyperarticulated speech. But in speech directed to parrots, which can produce human-like speech, a larger vowel space was found which is also characteristic of IDS (Xu, Burnham, Kitamura & Vollmer-Conna, Reference Xu, Burnham, Kitamura and Vollmer-Conna2013). On the other hand, foreigner-directed speech (FDS) was characterized by hyperarticulated speech, but positive affect and pitch were lower in FDS than in IDS (Uther et al., Reference Uther, Knoll and Burnham2007). These findings appear to indicate that speakers modulate the hyperarticulation of vowels as a function of their expected or induced level of linguistic ability of the listener (Xu et al., Reference Xu, Burnham, Kitamura and Vollmer-Conna2013). The implications for IDS are quite straightforward: it can be expected that parents expand their vowel space in IDS, because the linguistic abilities, both production and perception, of an infant are low. If the (induced) linguistic abilities of the child change, then changes in the articulatory effort may be expected, which may result in an expansion or a reduction of the vowel space. In other words, changes of the vowel space relative to a child’s chronological age may be expected.
Extension of the vowel space as a function of chronological age?
In a recent review of the studies on the vowel space in IDS, Marklund and Gustavsson (Reference Marklund and Gustavsson2020, Figure 1) ordered a large number of studies on a timeline representing children’s chronological ages. For each study, the question was answered: does IDS addressed to children at this particular age result in a larger, smaller or equally large vowel space in comparison to ADS? The children’s ages in the different studies ranged from approximately 3 months to 63 months. The main conclusion was that studies explicitly investigating vowel hyper- or hypoarticulation in IDS found no differences between ages. Thus, IDS addressed at infants during the first three years of life, exhibits in comparison with ADS hyperarticulation according to some studies, hypoarticulation according to other studies, or even no difference between IDS and ADS. Hence the conclusion that chronological age does not constitute a significant predicting variable for the vowel space area.
It should be noted that research on the vowel space often involved children at one particular age (e.g., Kuhl et al., Reference Kuhl, Andruski, Christovich, Christovich, Kozhevnikova, Ryskina and Lacerda1997) or were r cross-sectional with children with a very wide age range (e.g., in Benders et al., Reference Benders, Pokharel and Demuth2019, the infants’ ages ranged from 10 to 18 months; in Martin et al., Reference Martin, Schatz, Versteegh, Miyazawa, Mazuka, Dupoux and Cristia2015, from 18 to 22 months). Other studies compared two ages that were relatively far apart (e.g., Liu, Tsao & Kuhl (Reference Liu, Tsao and Kuhl2009) compared IDS addressed to one-year-old infants and five-year-olds). Moreover, a longitudinal study of IDS addressed to the same children over a longer period of time appears to be lacking in the literature. Obviously, it is impossible to document gradual changes in IDS under those circumstances. This implies that repeated samples over a sufficiently long period of time are needed in order to capture possible changes in the characteristics of IDS. In this study a longitudinal perspective on changes in the vowel space was taken.
In addition, using chronological age as the basis for comparing speech addressed to children, as was done in previous research, may not be the most appropriate point of departure. If IDS is responsive to the child’s linguistic progress, then linguistic measures are needed instead of chronological age. Chronological age is only a proxy for linguistic progress. For instance, a ten-month-old may only be starting to babble, while another one may already be producing words. If IDS is sensitive to these changes in the verbal behaviors of a child, then chronological age is not the appropriate predicting variable. Therefore, in the current study relevant linguistic milestones and measures of language development were derived from a longitudinal corpus of mother-child interactions.
Extension of the vowel space as a function of linguistic development?
There are indications that IDS is not a static but a dynamic phenomenon which is responsive to the child. Adults seem to adapt their speech in response to the linguistic abilities of children, as is hypothesized by the fine tuning hypothesis (Snow & Ferguson, Reference Snow and Ferguson1977). Evidence was found in multiple aspects of IDS. For example, the complexity and mean length of utterance (MLU) of parents’ utterances tended to decrease when the infant was 6 months old until the end of the first year. It increased again afterwards, probably when the child entered the verbal stage (Genovese et al., Reference Genovese, Spinelli, Romero Lauro, Aureli, Castelletti and Fasolo2020; Murray, Johnson & Peters, Reference Murray, Johnson and Peters1990). Another non-linear trend was described for speech rate in IDS to infants: the increase of parents’ speech rate slowed down as the child reached the multiword stage and went up again afterwards (Ko, Reference Ko2012). This also applies to the phonetic-segmental level: the contrast between /s/ and /ʃ/ was enhanced in speech addressed to 12-14-month-olds, but not to 4-6-month-olds (Cristia, Reference Cristia2010). Similarly, Benders (Reference Benders2013) reported that fricative contrasts were enhanced in speech addressed to 11-month-old infants, but not to 15-month-olds. This type of fine tuning where parents adapt their speech to the general linguistic ability of their child is called coarse tuning (Roy, Frank & Roy, Reference Roy, Frank and Roy2009). These adaptations suggest that the clarity of articulation in IDS is subject to changes during the first years of an infant’s life. If the features of IDS are more prominent in particular time intervals relative to the child’s speech and language development, then a longitudinal approach is called for which focusses precisely on relevant linguistic aspects of the child’s speech and language.
A case in point is the development of MLU in IDS. It is shown that the MLU in parents’ speech evolves as the child progresses in their language development (Genovese et al., Reference Genovese, Spinelli, Romero Lauro, Aureli, Castelletti and Fasolo2020). This is an example of coarse tuning, but there is also tuning of IDS at the level of individual lexical items: fine lexical tuning. This type of adaptation can also be seen in the development of the MLU: parents take into account the children’s familiarity with particular lexical items and adapt the length of the utterances containing these words (Odijk & Gillis, Reference Odijk and Gillis2021; Roy et al., Reference Roy, Frank and Roy2009). It was found that MLU in IDS appeared to evolve relative to ‘word births’, i.e., the first time a word emerges in a child’s speech (Roy et al., Reference Roy, Frank and Roy2009). The studies of Roy et al. (Reference Roy, Frank and Roy2009) and Odijk and Gillis (Reference Odijk and Gillis2021) analyzed MLU in IDS of utterances containing individual words that were eventually present in the children’s vocabularies. They studied changes in the length of those utterances over time. The series of MLUs was plotted against time with word birth as the reference point, as not all words were acquired at the same time. It appeared that the MLU of IDS evolved in a similar way for all children for all words in the form of a U-shaped curve: MLU decreased as word birth approached and increased again afterwards. This was different from the development of the global MLU – i.e., all utterances in IDS and not only utterances with familiar words – as this showed an upward trend. In the current study, it was examined whether a similar developmental change also applied to the size of the vowel space. If there is a change analogous to the one established for MLU, it can be expected that parents pronounce a particular word much more clearly when their child is learning that word. Hence, more exaggeration of phonetic features is expected when the child learns that word or at least shows signs of understanding the word.
Measuring the vowel space
In the literature, the extension of the vowel space is almost without exception expressed in terms of the surface area of the vowel triangle. Consequently, extension of the vowel space means that the area of the triangle becomes larger. As already indicated, the vowel triangle encompasses the area defined by the point vowels /i/, /u/ and /a/. More specifically, given the values of the first formant (F1) and the second formant (F2) of the three points vowels, the surface area of the triangle is calculated in the F1/F2 plane (the specific formulae are provided in the methods section). Thus, one operationalization of the concept vowel space is the area of the vowel triangle.
As an illustration, the vowels of Standard Dutch reported in Adank, Van Hout, and Smits (Reference Adank, Van Hout and Smits2004) are represented in Figure 1. More specifically, the average F1 and F2 values of male speakers from Flanders, the Dutch-speaking part of Belgium, of the 12 monophthongs of Dutch (Table 1 in Adank, Van Hout, et al., Reference Adank, Van Hout and Smits2004) are depicted in Figure 1. The solid lines connecting the vowels /i, /u/ and /a/ represent the vowel triangle, of which the surface area can be computed. It can readily be inferred that as, for instance, the coordinates of the vowels become more extreme, e.g., the vowel /i/ moves more to the upper left corner in the F1/F2 plane, the surface of the triangle is extended.
In calculating the area of the vowel triangle, the coordinates of only three vowels are used. As can readily be inferred from the vowels charted in Figure 1, not all the vowels of Dutch are enclosed in the vowel triangle. For instance, the back vowels [o], [ɔ] and [ɑ] are outside the vowel triangle. The vowel triangle has been shown to largely underestimate the actual vowel space area (Jacewicz, Fox & Salmons, Reference Jacewicz, Fox and Salmons2007), especially in vowel systems with a large number of vowels such as English and Dutch. Some vowels fall outside the area circumscribed by the vowel triangle. An alternative estimate of the surface of the entire vowel space starts from all vowels and computes the surface of the polygon enclosing them. Thus, instead of calculating the surface area of the triangle constituted by the three point vowels, a polygon can be drawn that contains all vowels, as represented by the dotted line in Figure 1. In other words, the surface of the convex hull comprising all vowels can be computed as an estimate of the total vowel space area. Thus, in addition to the vowel triangle, a second operationalization of the concept vowel space is represented by the surface area of the convex hull encompassing all the vowels of a language: the vowel polygon.
In the current study, these two operationalizations of the vowel surface area were used: (1) the surface area of the vowel triangle, which is the common measure in the literature, but which underestimates the extension of the actual vowel space, and (2) the surface area of the convex hull containing all the vowels, the vowel polygon, which is a more precise estimate of the extension of the entire vowel space.
Objectives
The findings on the adaptive character of IDS are equivocal. On the one hand, parents were found to expand their vowel space when talking to their children. On the other hand, the opposite tendency was also reported in several studies (Benders et al., Reference Benders, Pokharel and Demuth2019; Englund, Reference Englund2018). As there are indications that changes at the phonetic-segmental level in IDS depend on children’s linguistic development (Benders, Reference Benders2013; Cristia, Reference Cristia2010), the aim of the current study was to investigate the development of the vowel space in IDS relative to children’s lexical development. As previous research mostly focused on children at a particular chronological age or compared IDS addressed at children at two ages that were relatively far apart, the current study was designed to track changes of the vowel space longitudinally. For this purpose, a longitudinal corpus of speech directed to typically developing children was analyzed.
In order to capture changes in IDS relative to children’s vocabulary knowledge, only words that were present in the children’s own speech were selected from IDS. The vowel space of the parents was computed using the stressed vowels of these words. Changes in the vowel space in IDS were then aligned to “word births”, defined as the first appearance of a particular word in a child’s spontaneous speech. By aligning the vowel space relative to these events in children’s linguistic development, adaptations in IDS could be attributed to the evolving linguistic abilities of the children. Indeed, the findings of Roy et al. (Reference Roy, Frank and Roy2009) and Odijk and Gillis (Reference Odijk and Gillis2021) suggested that parents tuned their utterances to the emergence of words in the infant. Here it was hypothesized that parents’ tuning in to children’s linguistic abilities also applied in the phonetic domain. The expansion of the vowel space in IDS was expected to follow a (an inverted) U-shaped trend. As the child’s first use of words approached, their (stressed) vowels were expected to take more extreme values, and hence, the vowel space was expected to gradually expand. After that a gradual fade out was expected as those words became more firmly settled in the child’s vocabulary.
Thus, the current study aimed to answer the following research questions. What is the development of the vowel space in IDS in the period immediately before and after the first appearance of words in the child’s speech? And more specifically, considering the stressed vowels in the words that the children start using, how do their phonetic characteristics change in the parents’ speech? Does the vowel triangle expand as the children’s first use comes nearer, indicating that the point vowels show characteristics of hyperarticulation? Does the vowel polygon expand as the children’s first usage approaches, indicating that the entire vowel space is expanded?
Method
Participants
The data of the current study were taken from the CLiPS Child Language Corpus (CCLC), which consists of longitudinal monthly recordings and transcriptions of typically developing Dutch acquiring infants between 0;6 and 2;0 (years;months) and their primary caretakers (Molemans, Reference Molemans2011; van den Berg, Reference van den Berg2012; Van Severen, Reference Van Severen2012). For the purpose of this study, nine children were randomly selected from this corpus by means of systematic sampling: four boys and five girls, in order to approximate an equal proportion of boys and girls. The sampling was done by arranging all children alphabetically and then choosing 9 children starting at the beginning of the alphabetical list. A sample size of 9 was chosen to keep data processing within reasonable time limits. Both parents participated in the study, except for one child where only the mother was present. The children were raised monolingually in Dutch as spoken in Flanders, the Dutch-speaking part of Belgium. The children were normally hearing, with no health and development problems reported during data collection in parental report and by regular observations provided by the Flemish agency Child and Family (Kind & Gezin). The parents were native speakers of Dutch and from a mid-to-high SES background (belonging to the two upper strata of the Hollingshead Index, Hollingshead, Reference Hollingshead1975). During data collection, the children’s language development was monitored by administering the N-CDI (Zink & Lejaegere, Reference Zink and Lejaegere2002) at ages 1;0, 1;6 and 2;0 (years; months). Results of the testing revealed normal language development for all children.
Existing corpus
The corpus consisted of monthly recordings of spontaneous interactions between children and their caretakers. For each recording session, parents interacted with their children as they would normally do in free play and daily routines in their home environment. No further guidelines were given by the researcher. A recording lasted on average 64 minutes (median = 63 minutes, range = 33 minutes to 114 minutes). In order to keep the transcription time within reasonable time limits while still retaining a reasonable amount of speech material, the researcher who was present at the recording selected different fragments where the child was the most vocally active. Long pauses and parts with noise were avoided. This resulted in a final selection of 20 minutes of each recording for transcription.
Transcriptions of the recordings were made using CHILDES’ CLAN program according to the CHAT conventions (MacWhinney, Reference MacWhinney2000). The words uttered by the children and the adults were transcribed orthographically and phonemically. Children’s words were identified based on the procedure proposed by Vihman and McCune (Reference Vihman and McCune1994). To qualify as a word, the vocalization had to meet at least two out of three criteria. The first criterion was based on the context in which a vocalization occurred. For example, by maternal identification of the vocalization, or multiple use of the vocalization in particular contexts. The second criterion was based on the shape of the vocalization: the vocalization was the exact or prosodic match of a target form. The last criterion was based on the relation to other vocalizations: the vocalization was imitated, or the vocalization was appropriately used, i.e., the vocalization was only used in plausible contexts.
Further information on data collection and transcription is provided in Molemans (Reference Molemans2011), Schauwers (Reference Schauwers2006), Van Severen (Reference Van Severen2012) and van den Berg (Reference van den Berg2012). The transcriptions were converted to PRAAT (Boersma & Weenink, Reference Boersma and Weenink2015) and TextGrids with the CHAT2PRAAT function in CLAN (MacWhinney, Reference MacWhinney2000). The TextGrids were time-aligned to the audio files at the utterance level.
Current study: Data selection
The words that the children eventually acquired were of interest in the current study. Thus, as the first step in the analyses the cumulative vocabulary of each individual child was collected from the existing transcripts. All the words in their expressive vocabularies were identified in the transcripts and listed together with the age of their first usage. The result was a list of “word births”: the first time each individual word appeared in a child’s speech. This implies that the birth of a particular word can differ across children in terms of their chronological ages. For instance, if child A first uses the word book at age 1;02 then the birth of book equals 1;02 for child A, but for child B this may be 1;08 if that child uses book only at 1;08 according to our transcripts. The first occurrence of a word in the transcripts was considered to be a reasonable, though not a perfect, proxy of the age at which the child acquired that word. In the current study, only the content words (nouns, verbs, adjectives and adverbs) were analyzed.
After collecting the cumulative vocabulary of each child, each word was identified in the transcripts of the speech of the child’s parents and the corresponding stretches of speech were selected from the recordings and saved as separate sound (.wav) files. Words were selected every three months, starting from nine months before word birth. Subsequently, the selected sound files were filtered for further phonetic processing in PRAAT. Only those sound files were retained in which the target words occurred with no overlapping speech of other speakers, no singing, no whispering, and no background noise. Then the vowels were identified and delineated since the acoustic measures in this study were conducted for the stressed vowels. To determine the word boundaries and to segment the vowels, the waveform, the spectrogram, the pitch and the intensity curves were used. The segmentation criteria of DePaolis, Vihman, and Kunnari (Reference DePaolis, Vihman and Kunnari2008) were used to identify the boundaries of the vowels. A total of 3,337 vowels were segmented.
After segmentation, additional information was added for each vowel: (1) the age at which the child first produced the word in which a vowel appeared, i.e., the child’s chronological age at word birth; and (2) for the adult’s production of the words, the word’s “age” relative to its word birth. For example, suppose a child first produced the word ball at 15 months, then the word birth of ball was 15 months. If the child’s parent uttered ball when the child was 12 months old, the /a/ in ball was given the time annotation -3 months in terms of the number of months from word birth. Or suppose another child uses ball for the first time at 18 months and the parent used ball when that child was 9 months old, then that particular /a/ received the time stamp -9 months from word birth. In this way, each utterance of a particular vowel received a time stamp relative to the birth of the word in which the vowel occurred. A separate audio file and Praat TextGrid file were created for each word.
Acoustic analysis
F1 and F2 values were measured in the vowels that were identified following the previously described procedure by means of a PRAAT script (Boersma & Weenink, Reference Boersma and Weenink2015). F1 and F2 were determined with PRAAT’s Burg LPC formant tracking algorithm. The formant maximum was set to 5500 Hz and the number of formants to 5, the default settings of PRAAT. The formant values were measured in the middle of the vowels, since there is the least influence of the surrounding speech sounds in this position (Verhoeven & Van Bael, Reference Verhoeven, Van Bael and Verhoeven2002). The measurements were normalized per speaker by means of a Lobanov-transformation (Lobanov, Reference Lobanov1971) in order to minimize the effect of speaker-related differences, such as vocal tract shape or gender (Adank, Smits & van Hout, Reference Adank, Smits and van Hout2004). With this procedure, the formant measures were transformed into z-scores. This transformation was used because it preserves phonemic and sociolinguistic variation, but reduces anatomical and physiological variation (Adank, Smits, et al., Reference Adank, Smits and van Hout2004). After transformation, extreme outliers were identified and removed, because they possibly distorted the vowel space area. Outliers were identified using the interquartile range rule for outlier identification and those values were subsequently excluded from the analysis (Barnett & Lewis, Reference Barnett and Lewis1994).
Vowel space area
The formant values measured in the previous step were used to calculate the surface area of the vowel space. In this study, two methods were used. First, the surface of the vowel triangle was calculated using the normalized formant values of the three point vowels /i/, /u/ and /a/ using Heron’s formula (Jacewicz et al., Reference Jacewicz, Fox and Salmons2007). In brief, given the coordinates of the point vowels in the F1/F2 plane, the distance between the vowels can be calculated according to equation (1):
where xi and yi are the coordinates of the three point vowels (/i/, /u/ and /a/) in the F1/F2 plane, yielding the distances di-u, du-a and di-a. The perimeter (p) of the vowel triangle is the sum of the three distances calculated according to equation (1), and the semi-perimeter (ps) is the perimeter divided by 2. Finally, the surface area (S) of the vowel triangle is computed with Heron’s formula represented in (2):
A Python script was written to compute the vowel triangle for each combination of the (normalized) F1 and F2 values that were measured of the three point vowels.
The surface of the vowel triangle has traditionally been used to estimate the surface of the vowel space. But this method does not consider the vowels that are situated outside the space circumscribed by the point vowels. Consequently, the vowel triangle has been shown to largely underestimate the actual vowel space area (Jacewicz et al., Reference Jacewicz, Fox and Salmons2007). In order to get an estimate of the entire vowel space area, an alternative approach was taken in addition to calculating the vowel triangle. The surface of the area covered by all Dutch steady-state vowels was computed by determining the surface of the convex hull circumscribing all vowels. The convex hull of the set of vowels is the smallest convex polygon that contains all of them, which can be found by applying the Graham scan algorithm (Graham, Reference Graham1972). The surface of the convex hull was computed using the standard Python package Shapely. This procedure was repeated 5,000 times, each time with a different random sample of the normalized F1 and F2 values of the vowels. To calculate the polygon, the current study used 9 of the 12 Dutch steady-state vowels. Three vowels (/y/, /Y/ and /ø/) were infrequent in the dataset causing missing datapoints, so it was decided to leave them out altogether. These vowels are infrequent in Dutch (Luyckx, Kloots, Coussé & Gillis, Reference Luyckx, Kloots, Coussé, Gillis, Sandra, Rymenans, Cuvelier and Van Petegem2007), so it is not surprising that they do not appear often in the current corpus.
For the calculation of the vowel triangles and polygons, the vowels of the parents were taken together. For the purpose of this study, only the words that were present in the vocabulary of the child were extracted from the parents’ speech. These words were not always used at the time of the recording sessions. Furthermore, there is an uneven frequency distribution of the different vowels: some vowels are very frequent, others very infrequent (Luyckx et al., Reference Luyckx, Kloots, Coussé, Gillis, Sandra, Rymenans, Cuvelier and Van Petegem2007). As a result, there were not enough data points to calculate a reliable vocal space for each measurement moment for each dyad, so the vowels were taken together.
Statistical analysis
For the statistical analysis, the software R (R Core Team, 2018) was used. To model the development of the vowel surface as a function of the number of months from word birth, a generalized linear model was used. The predicting variables were the linear, quadratic and cubic effects of time, measured as the number of months from the child’s first production of a particular word. Since the data were continuous, non-negative, and with a positive skew a generalized linear model with the gamma family and identity link function was used.
Reliability
To determine inter-rater reliability for the acoustic analysis, 10% of the items (n = 215) were reanalyzed by a second researcher. Pearson correlation coefficients were used to calculate reliability for each researcher’s value for F1 and F2. All measures were significantly correlated: r(F1) = 0.98 and r(F2) = 0.94, all p<0001. This indicates a good inter-rater reliability between the two annotators.
Results
In this study, a total number of 3,337 vowels was analyzed. The analysis focused on the surface area of the vowel triangle and the surface area of the vowel polygon. First the results of the changes of the vowel triangle will be presented and then the results of the whole vowel space.
Surface area of vowel triangle
The first analysis concerns the area circumscribed by the three point vowels (/i/, /u/, /a/) relative to a timeline centered around the birth of words. The vowel triangle was calculated per three months from word birth. The descriptive statistics of the vowel triangle are shown in Table 1. The vowels were measured in Herz and normalized into z-scores. The surface areas of the vowel triangles are expressed as z2 scores. The vowel triangle starts relatively small and gradually increases as the child’s first word use approaches. It decreases again afterwards. These results are visualized in Figure 2.
A generalized linear model was used to estimate the development of the vowel triangle as a function of the months from word birth. A graphic representation of the estimated development is displayed in Figure 2. The GLM revealed significant main effects of months from word birth (E = 0.07, t = 20.46, p < 0.001), a quadratic effect of months from word birth (E = –0.020, t = –41.62, p < 0.001) and a cubic effect of months from word birth (E = –0.002, t = –22.33, p < 0.001), explaining the inverted U-shaped curve in Figure 3. The vowel triangle increased as word birth approached, reached its summit around word birth, and decreased again after word birth.
Surface area of the vowel polygon
The second analysis of the vowel space area consisted of assessing the surface of the convex hull circumscribing the Dutch monophthongs. The vowel polygon was computed by estimating the convex hull of the vowel space determined by the normalized formants of the 9 monophthongs that were represented in the data each month. Descriptive statistics for the vowel polygon are shown in Table 2. The vowel polygon increases as word birth approaches and decreases again afterwards to a similar level as it was 9 months before word birth. The vowel polygon per three months from word birth is illustrated in Figure 4.
A generalized linear model was used to estimate the development of the vowel polygon relative to word births. A graphic representation of the results is displayed in Figure 5. The GLM revealed that there was a significant main effect of months from word birth (E = –0.03, t = –7.84, p < 0.001) and a significant quadratic effect of months from word birth (E = –0.01, t = –24.77, p < 0.001), explaining the inverted U-shaped curve seen in Figure 5. The cubic effect of months from word birth was not significant (E = 0.00004, t = 0.50, p = 0.62), The vowel space area increased as word birth approached and decreased again after word birth.
Discussion
The aim of this study was to investigate changes in the vowel space in IDS directed to children in the first stages of lexical development. Instead of analyzing IDS as a global, undifferentiated phenomenon encompassing all the words and utterances addressed to the children, the present study was restricted to those words that children eventually used themselves in the period studied. Those words were extracted from their parents’ speech. For this study, data were taken from a longitudinal corpus with spontaneous speech. The acoustic characteristics of the stressed vowels were measured, and the area of the vowel space was determined. The vowel space area was computed in two different ways: the area of the vowel triangle was calculated as well as the area of the polygon enclosing all the vowels. The main finding reported in this study, corroborated by the two methods to determine the vowel space area, was that an inverted U-shaped curve was exhibited: by the vowel space area of the vowels in the words the children eventually produced themselves. The curve described a trajectory of the area of the vowel space relative to the so-called word birth, i.e., the point in time when a word actually appeared in the children’s speech production. It showed that as the word birth came nearer, the vowel space area enlarged. This indicates that parents speak clearer, they pronounce the vowels of the words that will soon enter children’s speech more distinctively. After the word has appeared in children’s productive speech, the clear speech fades out again as witnessed by a gradual reduction of the vowel space.
Adults’ adaptation of their language and speech when talking to children has been well-documented in the literature. For instance, the mean length of utterance (MLU) is shortened as children get closer to their first words and after that MLU increases again (Genovese et al., Reference Genovese, Spinelli, Romero Lauro, Aureli, Castelletti and Fasolo2020; Murray et al., Reference Murray, Johnson and Peters1990). This is an adaptation of IDS known as coarse tuning, i.e., tuning to the language level of the child. Evidence shows that this tuning not only occurs on a global level, but that it is also geared towards the appearance of specific lexical items. This phenomenon is called fine lexical tuning (Odijk & Gillis, Reference Odijk and Gillis2021; Roy et al., Reference Roy, Frank and Roy2009). The results of the current study show that the vowel space area in IDS evolves similar to the complexity of utterances relative to the birth of words. The vowel space area, similar to MLU in IDS, shows an (inverted) U-shaped curve with time of word birth as turning point. This suggests that a similar mechanism is at work. Parents appear to have tacit knowledge of the words that children are on the verge of producing and they adjust their speech accordingly. This means that parents not only pay attention to the general language level of their child, but also to individual words. Remarkably, these adjustments do not start from the moment a word actually appears in a child’s production, but the “clearer” or more extreme pronunciation of vowels already sets in prior to word birth. This means that the vowel space starts stretching out well before word birth.
The fact that parents hyperarticulate the vowels in words well before the child actually uses them seems to suggest that parents act on their inferences about the child’s knowledge of those words. Indeed, according to the H&H theory of Lindblom (Reference Lindblom, Hardcastle and Marchal1990), hyperarticulation is a consequence of the speaker’s inferences about the interlocutor’s linguistic knowledge. In this case, the parents seem to act on the assumption that the child does not know a particular word and highlights that word by hyperarticulating it. Different indications may lead them to that inference. For instance, the child may show interest in an object by looking or pointing at it, possibly accompanied by an attention getting vocalisation. The lack of a specific label in the child’s communicative repertoire in such circumstances may lead parents to provide one and to highlight it by hyperarticulating it.
There is now converging evidence that the word births seem to function as a kind of magnet on different aspects of the language and speech that parents provide. It was already shown that parents shorten the length of their utterances containing the words that the child is on the verge of acquiring (Roy et al., Reference Roy, Frank and Roy2009; Odijk & Gillis, Reference Odijk and Gillis2021) so that those target words occur more and more in shorter utterances and eventually predominantly single word utterances. The child’s actual production of the words is followed by an increase of the length in which they occur in the adults’ speech. In the present study a similar U-shaped developmental curve was discovered: (stressed) vowels in the target words appear to be more hyperarticulated as the child’s first usage of the words approaches, leading to an extension of the vowel space. After the child has actually produced the words, the vowel space becomes more reduced again. In this way parents appear to facilitate word acquisition by optimizing the learning conditions: the utterances with the target words become shorter and the words themselves are articulated more distinctly.
Expanding or reducing the vowel space in IDS?
Contrasting findings in studies of the vowel space in IDS constituted the point of departure of the current study. Several studies reported a smaller vowel space in IDS (Benders, Reference Benders2013; Englund, Reference Englund2018; Englund & Behne, Reference Englund and Behne2006) and some a larger vowel space in IDS (Burnham et al., Reference Burnham, Kitamura and Vollmer-Conna2002; Cheng, Reference Cheng2014; Fernald, Reference Fernald2000; Uther et al., Reference Uther, Knoll and Burnham2007) compared to ADS. In the current study the vowel space of IDS was not contrasted with the vowel space in ADS, but the vowel space in IDS was studied over time. It was shown that it changes over time in a non-linear way. This implies that comparing the space in IDS to that in ADS at one specific point in time provides at best an incomplete picture. Moreover, since the development is non-linear, the result of measuring the vowel space depends on the exact point on the developmental curve that is chosen for the comparison. It follows that such a comparison may yield quite different outcomes depending on the choice that is made.
In addition, it was shown that the development is relative to word births. There may be a difference in the vowel space area depending on whether it concerns words that the child already knows or not. For the current study, the vowel space was mapped relative to word births. This choice was made based on the idea of fine lexical tuning. In a previous study of fine lexical tuning of MLU in IDS it was shown that there was a considerable difference between the global MLU and the MLU based on word births. The global MLU showed a monotonous upward trend, while the MLU based on word births showed a U-shaped curve. This suggests that it is important to take the child’s linguistic knowledge into account, and consequently the vowel space was computed per month from word birth. The results indicate that parents adjust their articulation depending on the linguistic abilities of the children: the vowel space changes relative to the birth of words. In this respect the outcome of this experiment adds to a growing corpus of research showing that parents use fine lexical tuning when talking to their children. This can be an indication that parents try to scaffold their children with word learning, by simplifying and emphasizing their speech around the time children start to produce particular words.
Although this study has shed light on the changes in vowel space area in IDS during the early lexical stages of a child, a few elements have not yet been covered in this study. The current study looked at the adjustments in the vowel space compared to the word births, independent of the point in the child’s vocabulary development. This means that the cumulative vocabulary can still be very limited at the time of word birth of a certain word. For other words, the cumulative vocabulary may already be more advanced. All these points have been taken together and only the time of birth was looked at. The question that remains unanswered is if the cumulative vocabulary at the time of word birth affects the size and extent of the vowel space. That could possibly be the case. Vowel space expansion is a technique that is used when a child is at the beginning of vocabulary development and therefore still needs a lot of scaffolding. However, as the parents infer that the child has become more adept at acquiring new words, they may feel less need to modify their speech very explicitly by expanding their vowel space.
Lastly, it should be noted that, as the children were only recorded once a month, none of the word births were estimated with complete accuracy. The first time a child used a word in the transcription was noted, but this word may already have entered the child’s vocabulary in between recordings sessions. Moreover, as the vowels of all the parents were pooled together in this research, individual differences were glossed over. As a result, individual variation between parents and children remains out of the picture. Future work should take the individual differences into account.
Conclusion
In this study the development of the vowel space area of Dutch-speaking parents of typically developing children was investigated. The results suggest that the child’s familiarity with different lexical items has an impact on parents’ IDS. It was found that parents expanded their vowels as the child’s first use of a word with that vowel approached. The vowel space area decreased again afterwards. Thus, parents adjust their IDS in response to the linguistic abilities of their child. Taken together, these results provide support to the idea that IDS facilitates language learning, as might be indicated by the adaptations parents made to their speech in accordance with the evolving linguistic abilities of their child.
Acknowledgement
We would like to thank the families and infants that participated in the study, and K. Schauwers, I. Molemans, R. van den Berg and L. Van Severen for collecting the CLiPS Child Language Corpus. The research reported in this article was supported by grant G.0235.18 of the Research Foundation Flanders (FWO) to S. Gillis.
Competing interests
The authors declare none