Introduction
Distinguishing between statements and questions is an important part of syntactic acquisition. In English, statements and questions differ both prosodically and syntactically. For example, the canonical ordering in simple transitive declarative sentences is subject-verb-object (Slobin & Bever, Reference Slobin and Bever1982), as in Anna likes chocolate (examples brought in and expanded from Geffen & Mintz, Reference Geffen and Mintz2017). In comparison, (yes/no) questions are often characterized by auxiliary-inversion and do-support (e.g., Anna does like chocolate becomes Does Anna like chocolate?). Wh- questions additionally include an initial wh-word such as who, what, how (e.g., What does Anna like?). Previous studies have found that questions account for approximately half of the utterances in young children’s input with yes/no questions comprising 23% of total utterances, and wh- questions 21% (Newport, Reference Newport, Castellan, Pisoni and Potts1977). Given the differences in word order between statements and questions, the ability to distinguish between statements and questions could give learners a basis from which to develop an analysis of differing grammatical structures (Slobin & Bever, Reference Slobin and Bever1982). However, it is unlikely that infants will make initial sentence-type distinctions based on word order.
While infants need to acquire knowledge of the grammatical structures of statements and questions more generally, wh- questions provide an interesting challenge for young language learners. They are syntactically similar to yes/no questions (e.g., AUX inversion), yet they are prosodically similar to statements, typically ending with final flat or falling intonation (Bartels, Reference Bartels1999; Hedberg, Sosa & Fadden, Reference Hedberg, Sosa and Fadden2004). This raises the question of how infants learn to distinguish between statements and wh- questions. Another question is whether infants use the same strategy to distinguish statements from all types of questions, or whether infants must rely on different strategies for distinguishing between different types of questions (e.g., yes/no versus wh- questions). While both questions are of interest to the field, this paper will focus on the first question.
Prosody as a Cue to Sentence Type: Pitch
One possible cue to the distinction between statements and questions is prosody. Prosody is realized as changes in pitch, duration, and intensity. Pitch is arguably the main prosodic cue for distinguishing sentence types. English relies on the final pitch contour as an important perceptual cue for sentence-type discrimination (e.g., American English – Săfárŏvá & Swerts, Reference Săfárŏvá and Swerts2004; British English [e.g., London, Cambridge] – Grabe, Reference Grabe, Gilles and Peters2004), as do many other languages (e.g., Castilian Spanish – Face, Reference Face2007; French – Vion & Colas, Reference Vion and Colas2006). In adult-directed speech, yes/no questions typically end with a final rise in pitch, while statements end with final flat or falling intonation (Bartels, Reference Bartels1999; Hedberg et al., Reference Hedberg, Sosa and Fadden2004; Ladd, Reference Ladd2008) with a few exceptions (e.g., in Belfast English, both statements and questions have rising intonation – – Grabe, Reference Grabe, Gilles and Peters2004; Jarman & Cruttenden, Reference Jarman and Cruttenden1976). For example, American English speakers were more likely to identify utterances as declarative questions if the utterances ended with a final rise (Săfárŏvá & Swerts, Reference Săfárŏvá and Swerts2004). The same is true of declarative questions, which differ from declarative statements only in prosody (e.g., Anna likes chocolate?). Wh- questions usually have a falling or level pitch, and thus have final contours similar to statements (Hedberg et al., Reference Hedberg, Sosa and Fadden2004; Ladd, Reference Ladd2008; Ladefoged & Johnson, Reference Ladefoged and Johnson2010). The contours described above (apart from the distinction between declarative statements and questions) have also been found in infant-directed speech (IDS) (Geffen & Mintz, Reference Geffen and Mintz2017).
As stated above, the final pitch contour is a critical region for discriminating between questions and statements. But what about when the final pitch contour is not distinct across utterance types, as with statements versus wh-questions (e.g., Geffen & Mintz, Reference Geffen and Mintz2017)? If prosody is an important cue for understanding this distinction, then it could come into play elsewhere in the sentence. Although most studies that have examined the acoustic features of questions have focused on the ends of utterances, work by O’Shaughnessy (Reference O’Shaughnessy1979) examined the first, medial and last accented syllable of yes/no questions and statements. The findings demonstrated that the question intonation affects the entire fundamental frequency (F0) contour and is not limited to the final rise or fall at the end of an utterance. The O’Shaughnessy (Reference O’Shaughnessy1979) study suggests that prosodic information is available in multiple places across the utterance although the end of the utterance is the most common location (e.g., Bartels, Reference Bartels1999; Hedberg et al., Reference Hedberg, Sosa and Fadden2004; Ladd, Reference Ladd2008; Ladefoged & Johnson, Reference Ladefoged and Johnson2010). More recently, a production experiment shows utterance-initial differences between Canadian English statements and both yes/no and declarative questions (although no differences between the two question types), with statements demonstrating a higher initial pitch accent, earlier pitch peak alignment and smaller F0 change (Patience, Marasco, Colanton, Klassen, Radu & Tararova, Reference Patience, Marasco, Colanton, Klassen, Radu and Tararova2018). These studies show that prosodic cues have been observed in multiple locations throughout the sentence in the production of sentence types, and in some instances can be used to distinguish between statements and questions. Thus, it is important to evaluate as many sentence types as possible, including different types of statements and questions to determine whether prosodic information can help to disambiguate between them.
Although most of the studies we have discussed so far have focused on statements versus yes/no questions, there have been several studies that have specifically looked at wh- questions. Hedberg and Sosa (Reference Hedberg and Sosa2002) found that wh- words were often marked with a rising pitch peak accent in wh- questions, as was the fronted negative auxiliary in negative yes/no questions (e.g., Isn’t that kind of underhanded?), suggesting there may be a common interrogative marker at the beginning of utterances, although Hedberg, Sosa, Görgülü, and Mameni (Reference Hedberg, Sosa, Görgülü and Mameni2010) cautioned that this effect may have arisen from the speech pattern of an individual speaker. Maxwell and Fletcher (Reference Maxwell and Fletcher2013) found similar results in Bengali English, in which adult speakers frequently produced a rising pitch peak accent on wh- words in wh- questions. Hedberg et al. (Reference Hedberg, Sosa, Görgülü and Mameni2010) found that American English wh- questions frequently had nuclear accents on either the wh- word or the immediately following auxiliary.
Perceptual studies have also demonstrated that adults are sensitive to these prosodic cues for making sentence-type distinctions. Most languages rely on some form of pitch cue to distinguish between statements and questions (e.g., Gussenhoven & Chen, Reference Gussenhoven and Chen2000). For example, in languages like English and French that use lexical markings (e.g., auxiliary verbs) to make sentence-type distinctions, adults primarily rely on fundamental frequency for perceiving distinctions between sentence types (English – Cruttenden, Reference Cruttenden1986; Lieberman, Reference Lieberman1967; French – Vion & Colas, Reference Vion and Colas2006), although how F0 changes depends on the language. For example, gating studies with adults have demonstrated greatest accuracy in sentence type discrimination when the final rise is present or absent (Dutch – van Heuven & Haan, Reference van Heuven and Haan2000; French – Vion & Colas, Reference Vion and Colas2006), though accuracy begins to increase more rapidly roughly halfway through the sentence (e.g., 60-65% of Dutch listeners could accurately identify sentence types when a phrase was truncated right before the second accent and increased to 80% accuracy when the phrase included the second accent; van Heuven & Haan, Reference van Heuven and Haan2000). Recent perception studies demonstrate that the prenuclear region (often the first word) provides sufficient cues (e.g., pitch accent height, pitch peak alignment timing and F0 change; Patience et al., Reference Patience, Marasco, Colanton, Klassen, Radu and Tararova2018) for adult listeners to distinguish between statements and questions (Canadian English – Saindon, Trehub, Schellenberg & van Lieshout, Reference Saindon, Trehub, Schellenberg and van Lieshout2017; German – Petrone & Niebuhr, Reference Petrone and Niebuhr2014). For example, a gating study found that 18% of Canadian English adult listeners identified questions at above-chance levels after hearing only a single word and increased to 49% of listeners after hearing three words (of a five-word utterance; Saindon et al., Reference Saindon, Trehub, Schellenberg and van Lieshout2017). A study with German adult listeners found that participants were better at matching utterances with question contexts when prenuclear accents had shallower slopes (20-28.2% for steeper slopes, 81.2-83.5% for shallower slopes; Petrone & Niebuhr, Reference Petrone and Niebuhr2014). These results may be partly attributable to cue weighting and lexical knowledge (which infants do not have). However, the specific cues and timing vary between languages. In Dutch, the wh- word in wh-questions is often characterized by a pitch accent (van Heuven & Haan, Reference van Heuven and Haan2000). In European Portuguese, participants can correctly identify statements as early as the first stressed vowel but cannot correctly identify declarative questions until the penultimate or final stressed vowel (although this was not the case in sentences that began with wh- words; Falé & Faria, Reference Falé and Faria2006).
The question remains whether other prosodic cues could be used to distinguish between statements and questions, as well as whether prosodic patterns earlier in the utterance can be used to distinguish between yes/no and wh-questions. The Geffen and Mintz (Reference Geffen and Mintz2017) study suggests that prosodic cues at the end of the utterance are not sufficient for making this distinction in infant-directed speech. However, a recent study by Chiang, Geffen, and Mintz (Reference Chiang, Geffen, Mintz, Bertolini and Kaplan2018) using the same IDS corpus found that prosodic cues (primarily pitch) in the first two syllables of utterances did differ between statements and questions but did not differ between yes/no and wh- questions. This suggests that sentence-initial prosodic cues are available for distinguishing between statements and wh- questions in infant-directed speech, although it remains to be seen whether infants are sensitive to these prosodic cues.
If infant-directed speech is organized similarly to adult-directed speech, prosody could provide a robust tool for infants to initially distinguish sentence types. Unlike adults, who can use lexical knowledge to distinguish between sentence types, infants who are only beginning to recognize words and do not have the lexical or syntactic knowledge necessary to distinguish between statements and questions may be able to use prosodic cues to begin to distinguish between them. Indeed, Geffen and Mintz (Reference Geffen and Mintz2017) suggested that infants may initially distinguish between statements and yes/no questions prosodically, allowing them to recognize the distributional similarity between yes/no and wh- questions. Soderstrom, Ko, and Nevzorova (Reference Soderstrom, Ko and Nevzorova2011) found that English-learning children could distinguish declarative statements from declarative questions (questions that have statement word order with rising pitch, e.g., Anna likes chocolate?) by two years of age. However, the wide age range of the subjects (between 4.5 months and 2;0 years) in the Soderstrom et al. (Reference Soderstrom, Ko and Nevzorova2011) study makes it difficult to determine when this ability emerges.
While it remains unclear when English-learning infants begin to use prosodic information to distinguish between sentence types, there is a wealth of evidence demonstrating infants’ general sensitivity to prosodic information from an early age. For example, infants can use prosody to make distinctions between broad rhythmic classes of languages as young as two days old (Mehler, Jusczyk, Lambertz, Halsted, Bertoncini & Amiel-Tison, Reference Mehler, Jusczyk, Lambertz, Halsted, Bertoncini and Amiel-Tison1988; Moon, Cooper & Fifer, Reference Moon, Cooper and Fifer1993; Nazzi, Bertoncini & Mehler, Reference Nazzi, Bertoncini and Mehler1998; see Nespor, Shukla & Mehler, Reference Nespor, Shukla, Mehler, van Oostendorp, Ewan, Hume and Rice2011, for an overview of rhythmic classification). Thus, there is ample evidence that infants are sensitive to prosody, suggesting that they could, in principle, use prosodic information to make initial sentence-type distinctions, even if they do not have a specific understanding of what a “statement” or “question” is.
Previous studies (e.g., Chiang et al., Reference Chiang, Geffen, Mintz, Bertolini and Kaplan2018; Geffen & Mintz, Reference Geffen and Mintz2017) have examined whether there are prosodic differences between sentence types without taking word category into account. For example, there are prosodic differences between open-class (e.g., content) and closed-class (e.g., function) words (e.g., Monaghan, Christiansen & Chater, Reference Monaghan, Christiansen and Chater2007). Given that statements and questions often begin with words that differ in category and frequency, it is important to consider not just whether prosodic information is available to distinguish between sentence types, but also how and whether it is affected by the syntactic categories (e.g., open- versus closed-class word) that characterize the words at the beginning of the utterance.
Duration
While F0 is a driving factor for the discrimination of sentence types, other prosodic cues also play a role. Duration is a secondary prosodic correlate of sentence type in other Germanic languages (e.g., Dutch, Orkney English – van Heuven & van Zanten, Reference van Heuven and van Zanten2005) in which pitch is the primary prosodic cue. Perceptual studies have demonstrated that infants can use prosodic cues including duration to identify syntactic unit boundaries such as clauses (e.g., Hirsh-Pasek, Kemler Nelson, Jusczyk, Cassidy, Druss & Kennedy, Reference Hirsh-Pasek, Kemler Nelson, Jusczyk, Cassidy, Druss and Kennedy1987; Kemler Nelson, Hirsh-Pasek, Jusczyk & Cassidy, Reference Kemler Nelson, Hirsh-Pasek, Jusczyk and Cassidy1989) and phrases (Gerken, Jusczyk & Mandel, Reference Gerken, Jusczyk and Mandel1994; Jusczyk, Hirsh-Pasek, Kemler Nelson, Kennedy, Woodward & Piwoz, Reference Jusczyk, Hirsh-Pasek, Kemler Nelson, Kennedy, Woodward and Piwoz1992). 9-month-olds can use syllable duration differences (e.g., “longer duration of the syllable immediately preceding a major phrasal boundary”; Jusczyk et al., Reference Jusczyk, Hirsh-Pasek, Kemler Nelson, Kennedy, Woodward and Piwoz1992, p. 289) to detect syntactic boundaries, but only in conjunction with other converging cues such as pitch changes (e.g., Hirsh-Pasek et al., Reference Hirsh-Pasek, Kemler Nelson, Jusczyk, Cassidy, Druss and Kennedy1987; Jusczyk et al., Reference Jusczyk, Hirsh-Pasek, Kemler Nelson, Kennedy, Woodward and Piwoz1992). Thus, duration, in conjunction with pitch, could help to discriminate statements and questions, but it remains to be seen whether it is useful as an independent cue for sentence-type discrimination. Geffen and Mintz (Reference Geffen and Mintz2017) found that final syllable duration did differ between infant-directed statements and yes/no questions, but not between statements and wh- questions. This difference could be attributed to the last word being one syllable in most statements but two syllables in at least half the yes/no questions, suggesting the difference exists at the word (category) level rather than the sentence level. Patel and Grigos (Reference Patel and Grigos2006) found similar final vowel duration differences when 4-, 7- and 11-year-olds produced declarative questions and statements, although 4-year-olds relied most heavily on final syllable lengthening, while 7- and 11-year-olds used a combination of final syllable lengthening and F0 to indicate sentence-type contrasts. The Patel and Grigos (Reference Patel and Grigos2006) study avoided the potential confound of different syllable lengths in the final word found in Geffen and Mintz (Reference Geffen and Mintz2017) by having children produce the same sentence (consisting of four monosyllabic words) with statement and question intonation. It is possible the same pattern of final vowel duration differences will be found in the initial words of statements and wh-questions – although it seems unlikely given that many statements begin with determiners or pronouns; and wh- questions, by definition, begin with one syllable wh- words. Alternatively, research shows there are prosodic differences between open-class and closed-class words (Monaghan et al., Reference Monaghan, Christiansen and Chater2007), including overall duration, so duration may still be a useful cue given that all wh- questions will start with a (closed-class) wh- word, but statements in an infant’s daily input could begin with a greater variety of words including closed-class (e.g., determiners, pronouns) and open-class (e.g., nouns) words.
Intensity
Intensity correlates with the other dimensions of prosody, especially pitch, and infants appear to be sensitive to this correlation (Fernald, Reference Fernald, Feagans, Garvey and Golinkoff1984). For example, in English, stressed syllables are marked by higher pitch and intensity compared to non-stressed syllables (e.g., Fry, Reference Fry1955; Hay & Saffran, Reference Hay and Saffran2012). There are significant differences in intensity between statements and wh- questions in infant-directed speech, with statements demonstrating significantly higher intensity in both the penultimate and final syllables (Geffen & Mintz, Reference Geffen and Mintz2017). Thus, as with duration, to the extent that intensity correlates with other prosodic cues that differentiate statements from questions, intensity could contribute to sentence-type discrimination.
Acquisition of sentence types and word categorie s
In English, and in many other languages, words can be separated into two broad classes of words: open-class words and closed-class words. These two classes of words differ in their acoustic and distributional properties. Open-class words include nouns, verbs, and adjectives. Closed-class words provide information about the grammatical relationships between words (e.g., articles, prepositions; Morgan, Shi & Allopenna, Reference Morgan, Shi, Allopenna, Morgan and Demuth1996). Compared to open-class words, closed-class words are typically characterized by syllable reduction, reduced vowels, and simplified syllable structure with minimal, if any, onsets and codas. These word classes can be distinguished by surface acoustic and phonological information cross-linguistically (English, Dutch, and French – Monaghan et al., Reference Monaghan, Christiansen and Chater2007; Mandarin and Turkish – Shi, Werker & Morgan, Reference Shi, Werker and Morgan1999). Open-class and closed-class words also differ in their frequency in daily speech (Morgan et al., Reference Morgan, Shi, Allopenna, Morgan and Demuth1996; Shi, Cutler, Werker & Cruickshank, Reference Shi, Cutler, Werker and Cruickshank2006a; Shi & Lepage, Reference Shi and Lepage2008).
Infants’ sensitivity to the distinction between open-class and closed-class words begins early in development. Newborns distinguish between open-class and closed-class words based on surface acoustic and phonological cues (Shi, Morgan & Allopenna, Reference Shi, Morgan and Allopenna1999) and by 6 months, infants prefer listening to open-class words over closed-class words (Shi & Werker, Reference Shi and Werker2001). By 10.5 months, infants show sensitivity to the phonological properties that are characteristic of closed-class words, demonstrating a preference for actual closed-class words rather than nonsense stressed syllables (Shady, Reference Shady1996). Twelve and 17-month-olds will map a novel pseudo open-class but not a novel pseudo closed-class word to an object (Hochmann, Endress & Mehler, Reference Hochmann, Endress and Mehler2010; MacKenzie, Curtin & Graham, Reference MacKenzie, Curtin and Graham2012). Sixteen-month-olds are sensitive to the position of closed-class words in a sentence and their relations to nouns and verbs (Shady, Reference Shady1996), and 17-month-olds can use this distributional information to identify closed-class words (Hochmann et al., Reference Hochmann, Endress and Mehler2010).
By the first year, infants learning a variety of languages can segment closed-class words from continuous speech (e.g., English – Shi et al., Reference Shi, Cutler, Werker and Cruickshank2006a; German – Höhle & Weissenborn, Reference Höhle and Weissenborn2003; French – Shi, Marquis & Gauthier, Reference Shi, Marquis and Gauthier2006b). Although closed-class words have higher frequency in speech than open-class words, there is variability in the frequency of different closed-class words. For example, Shi and colleagues evaluated a sample of 290,094 words from the Brent Corpus (Brent & Siskind, Reference Brent and Siskind2001) and found that the appeared 8513 times while her appeared 307 times (Shi et al., Reference Shi, Cutler, Werker and Cruickshank2006a). They then tested whether 8- and 11-month-old infants would show longer looking time to pseudo-nouns that had been preceded by the real closed-class words the or her, compared to pseudo-nouns that had been preceded by the pseudo-closed-class words kuh or ler. In the real closed-class word condition, infants segmented those pseudo-nouns preceded by the. They also found a developmental progression, likely shaped by language experience, where 8- but not 11-month-olds also looked longer to pseudo-nouns preceded by kuh, but not ler (Shi et al., Reference Shi, Cutler, Werker and Cruickshank2006a). This work suggests that the high frequency of the closed-class words can help young language learners to segment novel forms, and this ability is shaped by experience with the language.
Evaluating cues at the word level entails examining prosodic and phonetic differences between word categories (i.e., open- and closed-class words). It is also important to consider when and how infants begin to acquire these broad categories (open- and closed-class) as well as more specific categories (e.g., auxiliary verbs and wh- words within the closed-class word category). Rowland, Pine, Lieven, and Theakston (Reference Rowland, Pine, Lieven and Theakston2003) provided evidence that order of acquisition of wh- words is correlated with input frequency, i.e., high-frequency wh- words (e.g., what) are acquired before lower frequency words (e.g., how). In addition, wh- identity questions (what is that?) are often shorter and simpler, which may make them easier to acquire. Seidl, Hollich, and Jusczyk (Reference Seidl, Hollich and Jusczyk2003) found that by 15 months, infants demonstrate understanding of subject wh-questions, but they do not demonstrate understanding of object wh- questions until 20 months. This suggests that word category and input frequency can influence understanding and acquisition of categories at the word- and sentence-level. While wh- words are a relevant category when evaluating the beginning of wh- questions, it is also important to consider the broad categories that frequently make up the initial words in statements (i.e., open- and closed-class words).
These previous studies suggest that infants are sensitive to the difference between open- and closed-class words from an early age. Given the different types of words that characterize the beginning of statements and wh- questions, this knowledge could be useful to infants for making initial sentence-type distinctions.
Outline of the Study
While previous studies (e.g., Chiang et al., Reference Chiang, Geffen, Mintz, Bertolini and Kaplan2018; Geffen & Mintz, Reference Geffen and Mintz2017) have provided preliminary evidence that prosodic information is available at the beginning and end of infant-directed sentences to distinguish between statements and questions, these studies examined prosodic information regardless of syntactic information (e.g., word category). Given that wh- questions frequently start with a high frequency, closed-class word (though they may not show the typical acoustic patterns characteristic of other closed-class words, e.g., unstressed) while statements can begin with a closed-class or an open-class word, it is important to consider how prosodic differences across word types interact to aid discrimination. In line with previous research (e.g., Morgan et al., Reference Morgan, Shi, Allopenna, Morgan and Demuth1996), we classified wh- words as closed-class words. Therefore, this paper will focus on the distinction between high-frequency, closed-class (e.g., wh- words, pronouns, determiners) and low-frequency, open-class words (e.g., nouns, verbs). We will evaluate specific cues at the word level to determine what cue or combinations of cues are driving infants’ initial sentence-type discrimination (prosody, phonetic properties, or a combination of the two).
The current study evaluates whether there are utterance-initial prosodic cues (e.g., higher pitch on the first word relative to the second word) that distinguish between wh- questions and different types of statements (e.g., those that begin with closed- versus open-class words) in American English infant-directed speech. The current study aims to replicate the findings of Chiang et al. (Reference Chiang, Geffen, Mintz, Bertolini and Kaplan2018) by examining typical prosodic contours in a larger corpus, including the speech of 13 mothers to their preverbal infants. In addition, we will extend the findings by taking syntactic categories into account, looking at sentences that begin with open- versus closed-class words, and analyzing the prosodic characteristics at both the phrasal and word level.
We examined prosodic measures in wh- questions (closed-class – closed-class) and different types of statements (closed-class – closed-class, closed-class – open-class, open-class – closed-class, open-class – open-class). We analyzed the distribution of pitch (F0), duration, and intensity over the first two words of statements and wh- questions in infant-directed speech. We hypothesized that there would be utterance-initial prosodic differences between wh- questions and different types of statements in American English infant-directed speech depending on the first words (closed-class) that appear in wh-questions compared to the variety of first words found in statements (open- and closed-class). We further predicted that these prosodic and word category cues would serve to correctly classify utterances as either statements or wh- questions in the models discussed in the Results section, though we are not claiming that infants understand what statements or questions are. Together, these findings have implications for infants’ sentence-type discrimination ability, which we will address in the Discussion.
Methods
Input corpora
We evaluated input to 13 American English-learning children from the Brent corpus (Brent & Siskind, Reference Brent and Siskind2001, all dyads except d1, m2 and w1Footnote 1 , Footnote 2) from the CHILDES database (MacWhinney, Reference MacWhinney2000) for the analysis procedure. We selected the Brent corpus because it contains speech to preverbal infants, and, at the time we conducted the study, provided the largest number of audio recordings of Standard American English-speaking mother-child dyads of any corpora in the database. We analyzed utterances of the mother directed towards her preverbal infant, taken from 1-4Footnote 3 sessions spanning an approximately 2-4-week period depending on the subject. Infants’ ages ranged from 8.27 to 10.10 months. Recordings were made in the subjects’ homes. Mothers were fitted with a small waist pack and a lapel microphone located just below their mouths and instructed to go about their daily routines while home alone with the child (Brent & Siskind, Reference Brent and Siskind2001).
Selection of utterances
We analyzed the first two words in 3315 sentences that began with two monosyllabic words. This allowed us to replicate Chiang et al.’s (Reference Chiang, Geffen, Mintz, Bertolini and Kaplan2018) analysis of the first two syllables in IDS statements and wh- questions and address the potential confound of different numbers of syllables in initial words between the two sentence types in Geffen and Mintz (Reference Geffen and Mintz2017). All the wh- questions (427) began with two closed-class wordsFootnote 4. We divided statements into four categories: closed-class – closed-class (cc) (923), closed-class – open-class (co) (773), open-class – closed-class (oc) (931) and open-class – open-class (oo) (261). See Table 1 for the number of utterances per sentence type (statement and wh-question) and per phrase type within statements (cc, co, oc, oo) for each speaker, as well as the overall total which ranged from 29-664 utterances. Sentences had to meet several criteria for inclusion (see Appendix A for a full list of selection criteria). We based our initial selection criteria on structural properties. Wh- questions were characterized by the typical subject and auxiliary inversion structure of yes/no questions with a wh-word such as who, what, where or how in the utterance initial position, possibly in combination with do-support. We excluded utterances that began with discourse markers (e.g., yeah, oh) before the wh- word to ensure consistency in the analyses. We also excluded utterances that began with wh- words but were statements (e.g., what a good girl) because they did not demonstrate the auxiliary inversion characteristic of wh- questions. Finally, we excluded questions that began with a closed-class word followed by an open-class word since there were not enough data points (16 sentences) to be added to the analysis. Statements followed the canonical transitive word order and did not have question intonation (i.e., declarative questions). We did not include statements that only consisted of proper names (e.g., naming phrases [Big Bird, Chips Ahoy]), although we did include an utterance if the proper name was used in a sentence (e.g., Hi Pooh.).
Note. cc = sentences that start with two closed-class words, co = sentences that start with a closed-class word followed by an open-class word, oc = sentences that start with an open-class word followed by a closed-class word, oo = sentences that start with two open-class words.
More generally, we did not include utterances that only consisted of a single word (e.g., What? Yeah) because we wanted to be able to examine prosodic change over at least two (monosyllabic) words. We also did not include any utterances that contained partial and repeated initial words (e.g., stuttering) because this might not be representative of typical prosodic patterns.
We narrowed our selection further by listening to the accompanying waveform. Like Geffen and Mintz (Reference Geffen and Mintz2017), utterances were excluded if they included vocalization or background noise, making analysis difficult or impossible (e.g., laughing, crying, blowing raspberries). We excluded unintelligible sentences because we used the mother’s pronunciation to help define word boundaries. We also excluded utterances that reflected read or rehearsed speech (e.g., songs, reading, reciting the alphabet, etc.) since we were primarily interested in spontaneous speech.
Our analysis focused on word-level comparisons, including duplicate two-word phrases. For example, in the utterances “ let’s go in here” and “let’s go see” Footnote 5 ; the phrase “let’s go” is the target in both. We included duplicate phrases for both statements and wh- questions for several reasons. First, Geffen and Mintz (Reference Geffen and Mintz2017) found that including duplicate utterances did not change the results, so we decided to include duplicate two-word phrases for a larger corpus to provide more tokens on which the model can base its analysis (as described in the Results section). (Out of the 3,315 two-word phrases we included in the corpus analysis, 2,249 were duplicates [e.g., “let’s go”]). Second, previous studies (e.g., Fernald & Morikawa, Reference Fernald and Morikawa1993; Stern, Spieker, Barnett & MacKain, Reference Stern, Spieker, Barnett and MacKain1983) find that utterance repetition peaks between 4 and 6 months (up to 20% of maternal utterances are exact repetitions, McRoberts, McDonough & Lakusta, Reference McRoberts, McDonough and Lakusta2009), tapering off to adult-like (i.e., almost nonexistent) levels by 2;0 years. This demonstrates that repetition is a common characteristic of infant-directed speech (e.g., Stern et al., Reference Stern, Spieker, Barnett and MacKain1983), therefore making it an important part of any evaluation of naturalistic infant-directed speech. Third, the sessions we evaluated from the Brent corpus consist of naturalistic interactions between mothers and their 8- to 10-month-old infants, so we cannot control for the amount of exact or partially replicated phrases or utterances. This is particularly relevant for wh- questions since there are only a limited number of wh- words, guaranteeing repetitions, especially in the first two words. For example, our corpus included 67 repetitions of “where are” and 207 repetitions of “come on”. McRoberts et al. (Reference McRoberts, McDonough and Lakusta2009) suggest that repeated utterances can serve as an important framework for infants to notice salient perceptual cues in the speech signal. Thus, including replication provides a more accurate representation of daily input to young infants and may prove useful for syntactic development.
Location of analyses within sentences
Separate analyses of each acoustic measure are detailed below. We analyzed sentences to determine whether there were prosodic differences between wh- questions and different types of statements, such as acoustic prominence (e.g., higher pitch) at the beginning of the utterance. Given that previous research (Hedberg & Sosa, Reference Hedberg and Sosa2002; Maxwell & Fletcher, Reference Maxwell and Fletcher2013) found rising pitch contour accent on wh- words, we reasoned that the first two words (which often correspond to the first two syllables in the utterance, especially in wh- questions) were likely to contain prosodic cues that would allow infants to distinguish between wh- questions and statements with different phrase structures (e.g., those that begin with closed-class versus open-class words).
While Chiang et al. (Reference Chiang, Geffen, Mintz, Bertolini and Kaplan2018) found that there were prosodic differences between statements and wh- questions, they only analyzed a small number of sentences (roughly 100 utterances of each type). Their study also grouped all types of statements together, irrespective of word category. Here we examine a larger corpus (approximately 3300 phrases) to provide more tokens on which the model can base its analysis and evaluate whether there is differential acoustic prosody on the initial words of wh- questions and different types of statements (e.g., those that begin with closed-class versus open-class words). For each utterance, a single coder marked the boundaries for the first two words for analysis by hand, although multiple coders might have worked on different utterances within the same session. Coders initially marked word boundaries by examining the spectrogram and waveform and listening to the corresponding audio. In addition, there can be a lot of variability between speakers in the combination of syntactic and prosodic cues, especially in infant-directed speech. Therefore, we allowed our coders to use their best judgment as native English speakers (following similar transcription techniques to those used by Bergelson, Casillas, Soderstrom, Seidl, Warlaumont & Amatuni, Reference Bergelson, Casillas, Soderstrom, Seidl, Warlaumont and Amatuni2019 and Soderstrom, Blossom, Foygel & Morgan, Reference Soderstrom, Blossom, Foygel and Morgan2008). The second author performed spot checks on the coding to check for consistency across coders and performed additional checks if multiple problems were found within a session. Coders labeled each section with a rough English transcription of the word. We used a modified version of Lennes’ (Reference Lennes2003) Praat script (Boersma & Weenink, Reference Boersma and Weenink2011) to carry out batch extraction of mean F0, maximum F0, minimum F0, F0 range, duration, and mean intensity from each labeled interval (i.e., each of the first two words of each utterance) within all 29 sessions (using the default settings, F0 range: 100-500 Hz, F0 sample rate: 100 Hz.
Pitch variables
To assess the availability of pitch as a cue to sentence type in infant-directed speech, we measured average pitch (Mean F0) as well as the lowest and highest pitch (Min F0 and Max F0, respectively).
For Mean F0 and Max F0, we were interested primarily in the change of these values across the first two words (see Methods). Mean F0 provides a simple, coarse measure that can be used to identify the general degree of sentence-initial pitch rise or fall. Max F0, while clearly related to Mean F0, provides specific information about pitch peak. Languages like English, which do use lexical markings (e.g., auxiliary verbs) to distinguish sentence types, demonstrate differences in pitch peak location between sentence types. For example, “English wh- questions show an earlier pitch peak and final F0 decline” (Best, Levitt & McRoberts, Reference Best, Levitt and McRoberts1991, p. 162). Thus, the location and degree of Max F0 could be a useful cue for distinguishing between sentence types – for the purposes of the current study, we measured Max F0 in each of the words and identified which of the two was higher; we used this (Max F0 for the pair) and noted whether the location of Max F0 (for the pair of words) occurred in the first or second word. Since infant-directed speech is characterized by exaggerated prosodic contours, including higher mean pitch and expanded pitch range (Fernald & Kuhl, Reference Fernald and Kuhl1987), it is an open question which properties of pitch contours that distinguish between sentence types in adult-directed speech are maintained in speech to infants. Geffen and Mintz (Reference Geffen and Mintz2017) found that prosodic cues, primarily pitch, at the end of utterances distinguish statements from yes/no questions but not from wh- questions. Chiang et al. (Reference Chiang, Geffen, Mintz, Bertolini and Kaplan2018) found that similar prosodic cues at the beginning of utterances distinguish statements from yes/no and wh- questions, but do not distinguish these question types from each other. These results suggest that prosodic information is available at various points in an utterance to distinguish statements from different types of questions, but there does not appear to be one consistent set of prosodic cues by itself (e.g., pitch, intensity), that distinguishes all three sentence types.
For analyzing pitch range at the word level, we subtracted the F0 min from the F0 max regardless of their location in the word. For example, if the F0 min was 150.1 Hz and the F0 max was 390.9 Hz, the pitch range would be 240.8 Hz (390.9 - 150.1).
Duration and intensity
Duration and average intensity were extracted from the marked words in Praat (Boersma & Weenink, Reference Boersma and Weenink2011) using Lennes’ Praat script (Reference Lennes2003). (See the section, Location of analyses within sentences, above).
Results
All acoustic values were converted to Z-scores using the means and SDs of that measure for each speaker. All analyses were conducted using SPSS 24. We acknowledge that there is an imbalance between statements and wh- questions in the current corpus which mirrors the imbalance between the sentence types in naturalistic infant-directed speech (e.g., Newport, Reference Newport, Castellan, Pisoni and Potts1977; T. Wang, personal communication, July 20, 2020). To address this imbalance, we report both the overall correct classification rate for statements and wh- questions given by the binary logistic regression and how this compares to chance rates of classification for each analysis below.
Analysis 1: Wh-Questions vs Statements when both start with two closed-class words
We ran a binary logistic regression with Utterance (Wh_cc = wh- question starting with two closed-class words, and S_cc = statement starting with two closed-class words) as the classification variable and the six acoustic measurements (Duration, Mean F0, Max F0, Min F0, F0 range, and Intensity) in each word (w1 = word 1 and w2 = word 2) as the predictor variables. That means that the test is classifying the data into wh- questions and statements using the 12 predictors to assess whether the two utterances differ and which of the acoustic properties differentiate them. First, to find which of the 12 predictors contributed significantly to the classification of the data into statements and wh- questions, each predictor was entered in the model in a separate block (forward stepwise procedure). If the addition of each new predictor did not result in a statistically significant change in the model, then that predictor was removed from the final analysis (see Appendix B). Next, we ran a binary logistic regression with all the predictors found to be significant in the previous test, in one block. We found that the two utterance types differed in Duration, Mean F0, and Intensity of word 1, as well as in Duration and Intensity of word 2 (see Table 2). Wh- questions have shorter duration and higher Mean F0 and Intensity than statements (see Figure 1).
Note. This table compares wh- questions that start with two closed-class words (Wh_cc) and statements that start with two closed-class words (S_cc) for the first (w1) and second words (w2). The table lists all the predictors included in the model. Acoustic properties not included in the table were removed from the model based on the initial binary logistic regression.
* p < .05; R2 = 0.104 (Cox-Snell), 0.146 (Nagelkerke); -2 Log likelihood = 1536.607, Model χ2 (7) = 148.318, p < .05, Hosmer-Lemeshow χ2 (8) = 13.633, p > .05, overall correct classification = 70.7% (chance = 68.4%).
Overall, we found that there were differences in the acoustic properties at the beginning of the two utterances. That means that there are potential cues for infants to use when learning their language. In addition to the significance of the model and of the individual predictors, we can also assess the distinguishability of the two utterance categories with the classification results that are part of the binary logistic regression output. The binary logistic regression test initially classifies the data into the two categories tested (Wh_cc vs S_cc in this case) based only on their frequency. From this baseline model, we get the chance level of the classification. (68.4% in this case) Next, the test classifies the data into the two categories using all the predictors (for our case, the acoustic properties in Table 2). From this main model, we can see how much of the data were correctly classified in each category using our predictors.
In our test, even though there were statistically significant differences between the two utterances for some acoustic properties, those differences could only correctly classify 70.7% of the utterances into wh- questions and statements.Footnote 6 This means that these acoustic cues could help infants identify the correct utterance type a little above chance. If we further break down the success rate, we see that while statements were correctly classified at 92.6%, wh- questions were only correctly classified at 23.4% (see Figure 2). This means that the acoustic differences between the two utterances was not enough to overcome the bias towards statements (i.e., the high frequency category), even though they did increase the successful classification of the utterances by 2.3%.
Since statements may start with different combinations of open-class and closed-class words, while the wh- questions we analyzed in the current corpus typically start with two closed-class words (with the exception of the 16 wh- questions mentioned previously that began with a closed-class word followed by an open-class word), there may be prosodic differences between the two sentence categories that are related to the presence and number of open-class words in utterance-initial positions. In our corpus, there were many statements that started with one or two open-class words, suggesting that the input to infants is much richer than a simple contrast between wh- questions and statements that start with two closed-class words. Even though these two utterance types do not differ much in the acoustic properties at the beginning of the utterance, it is possible that other types of statements differ more from wh- questions, and thus might be more useful for separating utterance types. To further explore this possibility, we ran three additional tests: we compared (a) wh- questions with statements starting with two open-class words, (b) wh- questions with statements starting with an open-class word followed by a closed-class word, and finally (c) wh- questions with statements starting with a closed-class word followed by an open-class word.
Analysis 2: Wh-Questions vs. Statements starting with two open-class words
Following the same method as in Analysis 1, to compare wh-questions (Wh_cc) with statements that start with two open-class words (S_oo), we first ran a binary logistic regression with each acoustic property in a separate block (see Appendix C) and then we ran another binary logistic regression with all the predictors that were significant in the first binary regression in one block. Duration of word 2, Mean F0 of word 1, Min F0 of word 1 and Intensity of words 1 and 2 were significant cues that differentiated wh- questions from statements (see Table 3). Figure 3 shows that wh-questions have shorter duration and higher F0 and intensity than statements.
Note. This table compares wh- questions that start with two closed-class words (Wh_cc) and statements that start with two open-class words (S_oo) for the first (w1) and second words (w2)
The table includes all the predictors included in the model. Acoustic properties not included in the table were removed from the model based on the initial binary logistic regression.
* p < .05; R2 = 0.425 (Cox-Snell), 0.579 (Nagelkerke); -2 Log likelihood = 532.146, Model χ2 (7) = 381.175, p < .05, Hosmer-Lemeshow χ2 (8) = 64.534, p < .05, overall correct classification = 84% (chance = 62.1%).
Overall, we find that there were acoustic differences between the two utterance types. The classification results also support that there are sufficient acoustic cues that differentiate wh- questions from statements. The data were correctly classified 84% of the time (21.9% higher than chance),Footnote 7 with wh- questions having a success rate of 91.8% and statements of 71.3%. As Figure 4 also shows, the acoustic cues classify the utterances into the two types pretty well.
Analysis 3: Wh-Questions vs. Statements starting with an open-class and a closed-class word
Next, we compared the wh- questions (Wh_cc) to the statements that began with an open-class word followed by a closed-class word (S_oc). In the binary logistic regression with all the significant predictors in a single block (see Appendix D for the initial analysis), we found that Duration and Intensity of words 1 and 2 were significant cues (Table 4). Figure 5 shows that wh- questions have shorter duration and higher intensity than statements.
Note. This table compares wh- questions that start with two closed-class words (Wh_cc) and statements that start with an open-class word followed by a closed-class word (S_oc) for the first (w1) and second words (w2). The table includes all the predictors included in the model. Acoustic properties not included in the table were removed from the model based on the initial binary logistic regression.
* p < .05; R2 = 0.159 (Cox-Snell), 0.223 (Nagelkerke); -2 Log likelihood = 1456.115, Model χ2 (6) = 234.871, p < .05, Hosmer-Lemeshow χ2 (8) = 16.86, p < .05, overall correct classification = 72.1% (chance = 68.6%).
Overall, our results indicate that there are acoustic cues that distinguish wh- questions from statements, but looking at the classification, we see that the success rate is low. Only 72.1% of the data were correctly classified into wh- questions and statements (3.5% higher than chance).Footnote 8 The statements were classified 89.2% correctly, while the wh- questions only 34.9% (also in Figure 6).
Analysis 4: Wh-Questions vs. Statements starting with a closed-class and an open-class word
Finally, we compared wh- questions (Wh_cc) to statements that start with a closed-class word followed by an open-class word (S_co). In the binary logistic regression with all the significant predictors in a single block (see Appendix E for the initial analysis) we found that Duration and Intensity of words 1 and 2, and Mean F0 of word 1 were significant predictors in the classification of the data into wh- questions and statements (Table 5). In Figure 7, we see that wh- questions have shorter duration and higher F0 and Intensity.
Note. This table compares wh- questions that start with two closed-class words (Wh_cc) and statements that start with a closed-class word followed by an open-class word (S_co) for the first (w1) and second words (w2). The table includes all the predictors included in the model. Acoustic properties not included in the table were removed from the model based on the initial binary logistic regression.
* p < .05; R2 = 0.358 (Cox-Snell), 0.492 (Nagelkerke); -2 Log likelihood = 1030.84, Model χ2 (7) = 531.52, p < .05, Hosmer-Lemeshow χ2 (8) = 91.645, p < .05, overall correct classification = 81.3% (chance = 64.4%).
Overall, we found that there were acoustic cues at the beginning of an utterance to distinguish wh- questions from statements that began with a closed-class word followed by an open-class word. The classification results also supported this. We found that 81.3% of the data were correctly classified into the two utterances (16.9% higher than chance).Footnote 9 The wh- questions were correctly classified 70.3% of the time and the statements 87.3% (also in Figure 8).
Summary
We compared the results from the comparisons of wh- questions to the four types of statements: (1) beginning with two closed-class words (S_cc), (2) beginning with two open-class words (S_oo), (3) beginning with an open-class word followed by a closed-class word (S_oc), and (4) beginning with a closed-class word followed by an open-class word (S_co). The results are summarized in Table 6.
Note. This table summarizes comparisons between wh- questions that start with two closed-class words (Wh_cc) and statements that start with two closed-class words (S_cc), statements that start with two open-class words (S_oo), statements that start with an open-class word followed by a closed-class word (S_oc), and statement that start with a closed-class word followed by an open-class word (S_co).
The best model (i.e., the model that fits best to the data) was that of S_oo, which has the lowest deviance (-2LL) and the highest classification rate. That means that out of the four comparisons, the acoustic differences between wh- questions and statements distinguished best the questions from statements that begin with two open-class words. S_co also showed a good fit, indicating that wh- questions were also well distinguished from statements that began with a closed-class plus an open-class word. In contrast, S_cc and S_oc had the highest deviance and lower classification rates with the data showing a bias towards statements. That means that wh- questions were not well distinguished from statements when the statements began with two closed-class or an open-class plus a closed-class word. So, the statements that were more distinguishable from wh- questions were those that had an open-class word as their second word, while statements with a second closed-class word were very similar to wh- questions.
The differences between the utterances were mainly in duration and intensity.
Figure 9 summarizes all the acoustic properties for each utterance. The wh- questions began with shorter words than statements (not surprising, since the first word of a statement is often an open-class word, and might entail less vowel reduction or have more a complex onset/coda than wh- words), and especially for the statements with a second open-class word, the difference became even larger in the second word position (w2). On the other hand, wh- questions had a higher intensity than statements, especially in first word position (w1). With respect to the F0 properties (mean, max, min, range), in each comparison, a different property was significant with the exception of F0 range, which was never a significant factor. Overall, wh- questions started with higher F0 than statements, but the statements with an open-class plus a closed-class word were the most similar to the wh- questions.
Discussion
The fact that the specific combination of prosodic cues needed for sentence-type classification varies at the sentence-level as well as the word level (whether sentences start with an open- or closed-class word) highlights the challenges in developing a comprehensive account of how infants could use prosody to make useful sentence-type discriminations. We hypothesized that there would be broad utterance-initial prosodic differences between wh- questions and statements in American English infant-directed speech depending on the first words (closed-class) that appear in wh-questions compared to the variety of first words found in statements (open- and closed-class) and our results supported this hypothesis. However, although the results show that a variety of prosodic cues provide information that infants could use to broadly distinguish between sentence types, few studies have evaluated how useful these combinations of cues are for actually categorizing individual utterances. Previous studies (Chiang et al., Reference Chiang, Geffen, Mintz, Bertolini and Kaplan2018; Geffen & Mintz, Reference Geffen and Mintz2017) found that models of English IDS rarely exceeded 50-60% for correctly categorizing statements and wh- questions, whether evaluating information at the beginning (Chiang et al., Reference Chiang, Geffen, Mintz, Bertolini and Kaplan2018) or the end of utterances (Geffen & Mintz, Reference Geffen and Mintz2017). However, both previous studies utilized a small corpus (approximately 300 utterances) divided roughly equally between three sentence types: statements, yes/no questions and wh- questions. A larger sample of ~800 Dutch utterances were automatically classified with high levels of accuracy into broad categories of statements and questions (82% and 90% accuracy respectively) though question sub-types (yes/no and wh- questions) had lower accuracy (53-75%; van Heuven et al., Reference van Heuven, Haan and Pacilly1997). We wanted to determine whether providing more experience (in this case a greater number of phrases to learn from) would improve the model for English sentence-type discrimination. We chose to focus on the distinction between statements and wh- questions given previous findings about the similarities of these two sentence types at the end of utterances. We wanted to see if prosodic information was available at the beginning of the utterances that would provide cues for infants to make initial sentence-type distinctions. We also added additional sets of comparisons, comparing wh- questions to four different types of statement phrases to see if prosodic information would provide sufficient information to correctly categorize and distinguish wh- questions from the different types of statements. However, the decision to look at statements with different phrase structures (e.g., statements that begin with two closed-class words versus two open-class words) does raise questions about the generalizability of the results to all statements. It may be that, as with many other categories, there is more variability within one type of phrasal structure (e.g., statements that begin with two closed-class words) than there are between sentence types (e.g., wh- questions that begin with two closed-class words versus statements that begin with two closed-class words).
We predicted that prosodic cues (e.g., Mean F0, Intensity) and word category (open vs. closed-class) would serve to correctly classify utterances as either statements or wh- questions in the models discussed in the Results section. Our results partly supported this hypothesis. Collectively, the classification results demonstrated several patterns. First, prediction accuracy for sentence-type increases with sample size, a pattern we see through all four analyses (with the exception of Analysis 2, the statement category generally had a higher sample size). This is not surprising given that there is bias in the test itself towards the category with the highest frequency. Within our corpus, the statements had a much higher frequency than wh- questions. Even when the sample contained more questions than statements (as in Analysis 2), classification accuracy did not fall below 70% for statements, while there was greater variability for wh- questions (23.4%-91.8%). This suggests that acoustic cues are useful for classifying both sentence types, although these models are better at classifying statements than questions. Future research should examine how the model would perform if the corpus had a different ratio of statements and questions. For example, in the corpus in Newport (Reference Newport, Castellan, Pisoni and Potts1977), wh- questions accounted for approximately 21% of the input directed toward children, while they only accounted for approximately 15% of the current corpus. The current study suggests that an imbalance of statements and (wh-) questions may be the norm in infant-directed speech, with fewer questions in the daily input (in line with previous studies, e.g., Newport, Reference Newport, Castellan, Pisoni and Potts1977; T. Wang, personal communication, July 20, 2020). This raises the question of whether an increase of wh- questions in the input lead to greater classification accuracy for the model overall or only for other instances of wh- questions? More importantly, how much input (particularly for different question types) do infants require to increase their classification accuracy? Further research is needed to determine whether we can change the bias in the test itself when we add predictors, although our initial results suggest not. This also raises questions about what this kind of input means for the learning process.
Additionally, the word categories of the first two words in the sentence types, and the second word in particular, seem to impact categorization accuracy. When the second word in the phrase was a closed-class word (Analyses 1 and 3), there was a large difference in classification accuracy between statements and questions (69.2% and 54.3% respectively), whereas when the second word was an open-class word (Analyses 2 and 4), the difference was smaller (20.5% and 17% respectively). There was a similar difference in overall accuracy with phrases that had an open-class word as the second word; these demonstrated a greater difference from chance (21.9% and 16.9% greater than chance respectively) compared to phrases that had a closed-class word as the second word (2.3% and 3.5% greater than chance respectively). The average overall classification accuracy across all four models was 11.2% different from chance. This is in line with results from Geffen and Mintz (Reference Geffen and Mintz2017), which found that statements and questions were correctly identified 51% of the time and Chiang et al. (Reference Chiang, Geffen, Mintz, Bertolini and Kaplan2018) who found a classification rate of 59% (chance was approximately 50% for both). While prosodic information is available to distinguish between wh- questions and different types of statements, the current models varied in overall classification accuracy. The question remains how much of this is due to the different phrasal structures of the statements in the current corpus analysis (e.g., statements that begin with two closed-class words versus two open-class words). Thus, prosody could be a useful cue, but not likely the driving force behind infants’ initial sentence-type discrimination, something that should be evaluated in future infant perception studies. It is unclear whether 2.3% and 3.5% higher than chance would be enough for infants (or human categorization in general) to create those categories, though it seems unlikely.
Previous studies demonstrated that prosodic information (especially pitch) is available at the beginning (Chiang et al., Reference Chiang, Geffen, Mintz, Bertolini and Kaplan2018) but not the end of utterances (Geffen & Mintz, Reference Geffen and Mintz2017) to distinguish between statements and wh- questions. The current study provides support for Chiang et al. (Reference Chiang, Geffen, Mintz, Bertolini and Kaplan2018), finding prosodic differences between sentence types in Mean F0, Duration and Intensity. These results are also consistent with the Edge hypothesis (Seidl & Johnson, Reference Seidl and Johnson2006), which suggests that infants pay special attention to initial words. For example, children can learn auxiliary verbs when they are utterance initial (yes/no questions) but not utterance medial (wh- questions; Newport, Gleitman & Gleitman, Reference Newport, Gleitman, Gleitman, Snow and Ferguson1977). Our results provided partial support for the Edge hypothesis, suggesting that initial words are relevant for making sentence-type distinctions. There are acoustic properties at the beginning of utterances that could distinguish wh- questions from statements, but the degree of distinguishability and the usefulness of the cues to the learning infant (i.e., the classifier) depends on the types of words used (open-class vs. closed-class) as well as the position (the category of the second word was more important for classification accuracy than the category of the first word). Further research is needed to tease apart the impact of word category and position.
Results also demonstrated consistent differences in duration across the four analyses, such that wh- questions were always shorter than statements. As we discussed in the introduction, duration is often considered to be a secondary prosodic characteristic, varying along with pitch, but not necessarily the characteristic driving discrimination. For example, Geffen and Mintz (Reference Geffen and Mintz2017) found no durational differences between the ends of statements and wh- questions, although questions generally have shorter (overall) duration than statements (a pattern we saw in the utterance-initial words in the current corpus analysis). Nonetheless, in the current study duration was a significant predictor of classification accuracy in all four of the models, consistently showing significant differences for word 2 and, apart from Analysis 2, differences in word 1 as well.
The current results raise the question of whether infants are making distinctions at the sentence (statements vs. questions) or word level (closed-class vs. open-class). For example, all the wh- questions included in the current corpus began with two closed-class words (only 16 wh- questions out of 443 did not follow this pattern). In contrast, the current corpus analysis examined four different types of statements that differed in the category of their first and second word (cc, co, oo, oc). While looking at different types of statements provided an opportunity to look at word level distinctions, it likely also had the side effect of decreasing effect sizes in differences between statements and wh- questions. In future analyses, it might be more informative to contrast different types of phrasal structures (e.g., combining statements and wh- questions that begin with two closed-class words and comparing them to sentences that begin with two open-class words). Regardless of the type of distinction, these results suggest that there is a relationship between prosodic and syntactic information in an utterance-initial position that infants may be able to leverage to make sentence-type distinctions, though likely not at the 9- to 10-month age of the infants included in the current corpus analysis.
One limitation of the current study is that we did not take context into account when examining statements and questions. Given that there is not much work in this field, it was important to establish whether prosodic information is available at the beginning (Chiang et al., Reference Chiang, Geffen, Mintz, Bertolini and Kaplan2018) as well as the end of statements and questions (Geffen & Mintz, Reference Geffen and Mintz2017). However, sentences do not occur in a vacuum. Prosody changes within individual sentences as well as over the course of a conversation. Future research should examine whether there are discourse level changes in prosodic information that highlight the beginning of utterances as an important source of information for classifying and distinguishing between statements and questions.
Another potential limitation has to do with our selection criteria. Coders were instructed not to include any wh- questions where they did not feel that the mother’s intention was to ask a question (one of the exclusionary criteria in Appendix A). Our selection criteria were based on a combination of prosodic and syntactic features (e.g., utterance initial wh- word followed in many cases by an auxiliary verb or do-support) but this could have introduced an element of selection bias. Future studies should evaluate this potential bias by looking at all utterances that begin with wh- words, whether they have question or statement word order and/or prosodic cues (e.g., final flat or falling intonation).
The current study found (more pronounced) prosodic differences between wh- questions and different types of statements based on whether the first two words were open-class or closed-class words. While this result was unexpected, it makes sense given the prosodic and syntactic differences between open-class and closed-class words. This raises a question that was also posed by Chiang et al. (Reference Chiang, Geffen, Mintz, Bertolini and Kaplan2018): Are there similar types of distinctions between different types of wh- questions? We know that infants’ comprehension and production of different types of wh- questions proceeds in a fairly consistent order (Rowland et al., Reference Rowland, Pine, Lieven and Theakston2003). However, the question remains of whether this is due entirely to cognitive understanding of what the wh- question is asking or whether there is an additional prosodic element that makes it easier to understand certain questions sooner than others. Prosody can help disambiguate between different sentence forms, but syntactic complexity takes longer to acquire as shown by wh- question studies which consistently demonstrate that object wh-questions, which have greater syntactic complexity, are acquired later. It would be interesting to see whether once infants or toddlers have acquired these different types of wh- questions, they weight prosodic cues differently for each type. It is also not surprising that infants acquire subject wh- questions first considering that words like “what” are more common in their daily input than “how” (Rowland et al., Reference Rowland, Pine, Lieven and Theakston2003). Therefore, it makes sense that prosody may become more variable as syntactic complexity increases. If this is true in the early stages, it makes sense that we would see the consistent order of acquisition for wh- words and wh- questions that has been demonstrated by previous studies (Bloom, Merkin & Wooten, Reference Bloom, Merkin and Wooten1982; Rowland et al., Reference Rowland, Pine, Lieven and Theakston2003).
Conclusion
In summary, the current experiment provides preliminary evidence that prosodic information could be useful for distinguishing statements from wh- questions in infant-directed speech, although it remains to be seen whether infants can use these cues to make sentence-type distinctions by the time they are 1;0 year. Making this distinction could perhaps provide a foundation for distinguishing wh- questions from statements on utterance-initial distributional grounds, as we have found that those sentence types are prosodically similar at the end of utterances in infant-directed and adult-directed speech. Our results suggest that differing combinations of prosodic and segmental cues are available for infants to make broad distinctions between statements and questions. These differences could provide an important foundation for acquiring syntactic knowledge. Future research is needed to further evaluate infants’ sensitivity to prosodic and segmental differences between statements and questions, as well as between categories of words, specifically auxiliary verbs (e.g., can, do) and wh- words.
Acknowledgements
This research was supported by funding from Natural Sciences and Engineering Research Council of Canada [NSERC grant number 03797-2018, awarded to S. Curtin]. Funding agencies had no role in study design, data collection or analysis, or in the writing/submission of this paper. We would like to thank the members of the University of Calgary Speech Development Lab for their help with this research, especially our volunteer coders.
Competing interests
The authors declare none.
Appendix A: Selection Criteria
Appendix B. Model results – Wh_cc vs S_cc
Note. This table presents the results of a model that compares wh- questions that start with two closed-class words (Wh_cc) versus statements that start with two closed-class words (S_cc) for the first (w1) and second words (w2).
Appendix C. Model results – Wh_cc vs S_oo
Note. This table presents the results of a model that compares wh- questions that start with two closed-class words (Wh_cc) versus statements that start with two open-class words (S_oo) for the first (w1) and second words (w2).
Appendix D. Model results – Wh_cc vs S_oc
Note. This table presents the results of a model that compares wh- questions that start with two closed-class words (Wh_cc) versus statements that start with an open-class word followed by a closed-class word (S_oc) for the first (w1) and second words (w2).
Appendix E. Model results – Wh_cc vs S_co
Note. This table presents the results of a model that compares wh- questions that start with two closed-class words (Wh_cc) versus statements that start with a closed-class word followed by an open-class word (S_co) for the first (w1) and second words (w2).