Introduction
Consider the following exchange between two friends:
Question: “Are you going to the party?”
Answer: “I have an exam tomorrow.”
The response is an example of an indirect answer because it does not provide an explicit “yes” or “no” to the question. Nonetheless, a refusal to attending the party can be inferred from the response. Along with metaphors and sarcasm, indirect answers are a form of non-literal language due to the discrepancy between the speaker’s intended message and the explicit linguistic expression. Indirect answers are natural and common in everyday communication, accounting for 13%-38% of responses to yes/no questions (de Marneffe et al., Reference de Marneffe and Tonhauser2009; Hockey et al., Reference Hockey, Rossen-Knill, Spejewski, Stone and Isard1997; Stenstrom, Reference Stenstrom1984).
The intended meanings of indirect answers are known as conversational implicatures in the field of linguistic pragmatics. In “Logic and Conversation,” Grice (Reference Grice1975) described the phenomenon of meaning one thing while saying another and explained how speakers manage to understand each other. Grice postulated a general principle that speakers are cooperative with the intention to achieve effective communication. Assuming the responder intended to answer her friend’s question of attending the party in the exchange above, her response that she has an exam the next day must be relevant to her attendance at the party. Because it is commonly known that people prioritize responsibilities and obligations over leisure activities, her utterance intentionally communicated (or implicated per Grice’s terminology) a negative answer to the question.
Existing research suggests that children with typical development begin to comprehend indirect answers with some consistency around the age of 6 years, and this development continues to improve steadily throughout the early school years (Bernicot et al., Reference Bernicot, Laval and Chaminaud2007; Bucciarelli et al., Reference Bucciarelli, Colle and Bara2003; de Villiers et al., Reference de Villiers, de Villiers, Coles-White and Carpenter2009; Loukusa et al., Reference Loukusa, Leinonen and Ryder2007). Four recent studies depict this development from 2 to 10 years of age. First, Bucciarelli et al. (Reference Bucciarelli, Colle and Bara2003) used video-taped stories to test 2- to 7-year-old children on their comprehension of indirect answers (named “complex indirects” in the study). After viewing a story, the children chose a possible ending from four pictures. For example, in one scenario, two siblings stop in front of a doll shop. The brother asks, “Would you get me that game?” and the sister answers, “We don’t have any money.” In this example, selecting the picture of the siblings walking away from the store empty handed would be scored correct. Results indicated that accuracy increased with age: 38% for 2;6- to 3-year-olds, 42% for 3;6- to 4-year-olds, 43% for 4;6- to 5;6-year-olds, and 68% for 6- to 7-year-olds. It is noteworthy that only children older than 6 years showed relatively reliable performance above 50% accuracy when deriving indirect inferences.
Second, Loukusa et al. (Reference Loukusa, Leinonen and Ryder2007) tested children with typical development between the ages of 3 and 9 years on their ability to comprehend indirect comments. The researchers verbally presented a scenario such as: “A man is mowing, and a woman says to him, ‘There are flowers growing in the middle of the lawn so remember to be careful.’” followed by a question prompt, “Why does the woman say this?” Responses were judged as correct/appropriate (e.g., “So that the flowers wouldn’t be cut.”) or incorrect/inappropriate (e.g., “She doesn’t want to do it.”) based on whether the implicated meaning was derived. The researchers found that the mean score of correct/appropriate answers increased with age. Moreover, there was a significant difference in mean scores between 3- and 4-year-olds as well as between 5- and 6-year-olds. Examination of percent correct by age groups revealed that 3-year-olds drew indirect inferences from 21% of the questions; this percentage increased to 77% by the age of 6 years. Eight- and 9-year-old children performed near the ceiling.
Third, Bernicot et al. (Reference Bernicot, Laval and Chaminaud2007) used a computer-based story completion task to test comprehension of indirect answers, among other non-literal language forms, of children between the ages of 6 and 10 years. In one story, Donald and Daisy are in the yard. Donald asks Daisy, “Should I mow the lawn?” and Daisy replies, “The nephews are taking a nap.” The children had to pick a picture from two possible endings: one indicating the inference is understood (i.e., Donald waters the flowers) and the other indicating the inference is not understood (i.e., Donald mows the lawn). The researchers found that 75% of the 6-year-old children were able to correctly select the implicated ending in three or four of the four tested items. Performance for older children was near the ceiling, with 95% for the 8-year-olds and 100% for the 10-year-olds.
Finally, de Villiers et al. (Reference de Villiers, de Villiers, Coles-White and Carpenter2009) investigated comprehension of indirect answers by children aged 3 to 10 years. The researchers presented pictures with short question-answer pairs (e.g., Adult: “What happened to the ham?” Child: “The dog looks happy.”) to children and asked them to explain what the speaker meant (e.g., “What did the boy mean?” or “Why did he say that?”). Responses were coded as adequate or inadequate based on whether the implicated message was derived (e.g., “Because the dog ate the ham” vs. “Because the dog looked happy”). Results indicated that performance increased with age, where 4-year-olds provided adequate answers about 25% of the time and 9-year-olds provided adequate answers 90% of the time. Six-year-olds were able to provide adequate answers about half of the time.
In summary, the reviewed studies report a range of success rates for comprehension of indirect answers by 2- to 10-year-old children with typical development and indicate that this skill grows steadily with age. Particularly, children at the age of 6 years appear to be capable of drawing indirect inferences more consistently, ranging from 50% (de Villiers et al., Reference de Villiers, de Villiers, Coles-White and Carpenter2009) to 75% (Bernicot et al., Reference Bernicot, Laval and Chaminaud2007). It is important to note that one major methodological difference among the studies arises from how “comprehension” was measured. That is, Loukusa et al. (Reference Loukusa, Leinonen and Ryder2007) and de Villiers et al. (Reference de Villiers, de Villiers, Coles-White and Carpenter2009) used open-ended why-questions to probe children’s ability to explain indirect answers, whereas Bucciarelli et al. (Reference Bucciarelli, Colle and Bara2003) and Bernicot et al. (Reference Bernicot, Laval and Chaminaud2007) adapted a forced-choice format that simply assessed participants’ judgement of indirect answers.
Current Study
The purpose of the current study was to further examine the developmental trajectory of comprehension of indirect answers among 5- to 10-year-old children with typical development. There were three primary aims. The first was to examine comprehension of indirect answers with both forced-choice and open-ended questions. Observed differences in performance will clarify the discrepancies in previous findings and will inform future studies regarding the potential impact of methodology on task performance. Moreover, previous literature had only investigated indirect answers that were contextually clear (e.g., Q: “Are you going to the party?” A: “I have an exam tomorrow.”) but not indirect answers that were contextually ambiguous (e.g., Q: “Are you going to the party?” A: “Bob will be there.”). The current study included such a novel category to provide insight into children’s abilities to interpret the speaker’s intentions that are presumably unclear and more complex.
The second aim was to provide more empirical data to the development of comprehension and explanation of indirect answers in children, especially over the preschool and early elementary school years. The findings not only will further our understanding of the development of non-literal language but also will provide critical baseline data that can be compared to children with different cultural and linguistic backgrounds as well as those with varying cognitive and language profiles and communication difficulties, such as autism, Down syndrome, and developmental language disorder.
The third aim was to gather preliminary data on children’s explanations of indirect answers when they fail to interpret speaker intentions appropriately. Previous studies, such as Chin (Reference Chin2017), de Villiers et al. (Reference de Villiers, de Villiers, Coles-White and Carpenter2009), and Loukusa et al. (Reference Loukusa, Leinonen and Ryder2007), mentioned erroneous responses but did not analyze the characteristics of those errors. The findings in the current study will shed light on the challenges children may encounter in the reasoning process and will inform future investigations on teaching pedagogies for interpreting indirect answers. To address these aims, we included the following three research questions:
-
1. Is there a significant difference between comprehending and explaining indirect answers by 5- to 10-year-old children with typical development?
-
2a. Is there a significant difference in comprehending indirect answers by 5- to 10-year-old children with typical development?
-
2b. Is there a significant difference in explaining indirect answers by 5- to 10-year-old children with typical development?
-
3. When failing to derive speaker intentions from indirect answers, what are the most common error patterns produced by 5- to 10-year-old children with typical development?
Method
Participants
The study included 48 children, 23 boys and 25 girls, between the ages of 5 years; 0 months and 10 years; 11 months (M = 8;2, SD = 19.77 months). Of the 48 children, seven were 5-year-olds, eight were 6-year-olds, nine were 7-year-olds, seven were 8-year-olds, nine were 9-year-olds, and eight were 10-year-olds. Data collection occurred in summer 2019 during Minnesota’s annual State Fair. Participants were recruited through the University of Minnesota’s research facility where interested fairgoers could volunteer for a variety of research studies. The study was approved by the University of Minnesota’s Institutional Review Board for human subjects. Parents or guardians signed consent forms prior to participating in any study sessions.
Child participants met the following inclusionary criteria: (a) be a monolingual English speaker, (b) use at least 3-word utterances to communicate, (c) have normal or corrected-to-normal vision and hearing per parent report, and (d) receive a T-score lower than 60 on the Social Responsiveness Scale- Second Edition (SRS-2; Constantino & Gruber, Reference Constantino and Gruber2012). The SRS-2 identifies social impairments associated with autism; scores lower than 60 are considered within normal limits and not associated with clinical presentations of autism. This criterion was in place because research has found that individuals on the autism spectrum often do not appropriately interpret and use non-literal language, such as metaphors and irony (Colich et al., Reference Colich, Wang, Rudie, Hernandez, Bookheimer and Dapretto2012; Deliens et al., Reference Deliens, Papastamou, Ruytenbeek, Geelhand and Kissine2018b; Happé, Reference Happé1993, Reference Happé1995; Kalandadze et al., Reference Kalandadze, Norbury, Nærland and Næss2018; Norbury, Reference Noveck2005; Rundblad & Annaz, Reference Rundblad and Annaz2010).
Additionally, participants could not have a history of language impairment or developmental delay per parent report. For five participants, parents reported that they were receiving speech-language services, of which four were due to speech sound errors and one was due to stuttering. All five participants were included in the study. To determine eligibility for the study, participants completed the Matrices subtest of the Kaufman Brief Intelligence Test- Second Edition (KBIT-2; Kaufman & Kaufman, Reference Kaufman and Kaufman2004) and the Recalling Sentences subtest of the Clinical Evaluation of Language Fundamentals- Fourth Edition (CELF-4; Semel et al., Reference Semel, Wiig and Secord2003) as indices of non-verbal cognitive ability and expressive language ability, respectively. Table 1 provides a detailed summary of demographic and linguistic characteristics of the participants.
Note.
a KBIT-2 = Matrices subtest of the Kaufman Brief Intelligence Test- Second Edition (Kaufman & Kaufman, Reference Kaufman and Kaufman2004), mean standard score = 100, SD = 15
b CELF-4 = Recalling Sentences subtest of the Clinical Evaluation of Language Fundamentals- Fourth Edition (Semel et al., Reference Semel, Wiig and Secord2003), mean scaled score = 10, SD = 3
c SRS-2 = Social Responsiveness Scale- Second Edition (Constantino & Gruber, Reference Constantino and Gruber2012), T-scores < 60 are considered within normal limits and scores ≥ 60 are associated with clinical presentations of autism spectrum disorder.
Procedure
Parents or legal guardians of participants completed a Family Background Questionnaire (FBQ; adapted from Bangert et al., Reference Bangert, Halverson and Finestack2019) and the Social Responsiveness Scale- Second Edition (SRS-2; Constantino & Gruber, Reference Constantino and Gruber2012). The FBQ (Bangert et al., Reference Bangert, Halverson and Finestack2019) included a series of questions about family demographic variables, including race, ethnicity, maternal education, employment, and household income. These variables were used to characterize participants’ demographic backgrounds. The FBQ also asked about medical history, diagnosis of neurodevelopmental disorders, and ongoing special services (e.g., speech, occupational, physical therapy). Participants were excluded from the study if language impairments or developmental delays were reported.
The SRS-2 (Constantino & Gruber, Reference Constantino and Gruber2012) is a measure of social behaviors of children and adults across three different age groups: preschool-age, school-age, and adult. The current study used the school-age version appropriate for individuals between 4-18 years of age. Parents or legal guardians rated their children’s reciprocal social behaviors (e.g., “Plays appropriately with children his/her own age,” “Has an unusually narrow range of interests”) on a 4-point Likert scale (i.e., “not true,” “sometimes true,” “often true,” and “almost always true”). Scores lower than 60 are considered within normal limits and not associated with clinical presentations of autism. Thus, participants with an SRS-2 score of 60 and higher were excluded from the study.
Participants completed the Matrices subtest of the KBIT-2 (Kaufman & Kaufman, Reference Kaufman and Kaufman2004) and the Recalling Sentences subtest of the CELF-4 (Semel et al., Reference Semel, Wiig and Secord2003) to measure non-verbal IQ and language ability. The KBIT-2 (Kaufman & Kaufman, Reference Kaufman and Kaufman2004) is a measure of verbal and non-verbal intelligence for individuals between the ages of 4 years; 0 months and 90 years; 11 months. Participants completed the Matrices subtest of the assessment, during which they viewed a pair of pictures that were related (e.g., a rabbit and a carrot) and a third picture (e.g., a dog) paired with a question mark. Then, they were asked to select a picture from five possibilities that would best match with the third picture in a way similar to the first set of pictures (i.e., a bone). As participants progressed the test items became more difficult, transitioning from relationships between people and objects to abstract symbols and designs with more pictures to analyze. Participants established basal by obtaining three consecutive correct answers and reached ceiling by four consecutive scores of 0. Standard scores (M = 100, SD = 15) were calculated and used to characterize participants’ non-verbal IQ.
The CELF-4 (Semel et al., Reference Semel, Wiig and Secord2003) is a measure of expressive and receptive language abilities for individuals aged 5 years; 0 months to 21 years; 11 months. Participants completed the Recalling Sentences subtest of the CELF-4 in which they repeated sentences with varying length and syntactic complexity (e.g., “My mom is the nurse who works in the community clinic.”). Participants did not need to establish basal, and they reached ceiling after five consecutive scores of 0. Raw scores were converted to standardized scaled scores (M = 10, SD = 3), and these scores were used to characterize participants’ language ability. Table 1 provides a detailed summary of demographic and linguistic characteristics of the participants.
Experimental Task
Participants also completed an experimental task designed to measure comprehension and explanation of indirect answers. The experimental stimuli consisted of 30 novel question-answer pairs. Similar to those found in Bernicot et al. (Reference Bernicot, Laval and Chaminaud2007), each item included two images with audio stimuli of two people having a conversation. One person asked the question and the conversational partner responded with an indirect positive answer (e.g., Q: “Are you feeling cold?” A: “I should have worn a sweater.”), an indirect negative answer (e.g., Q: “Are you feeling hungry?” A: “I just came from a pizza party.”), an ambiguous answer (e.g., Q: “Are you feeling hot?” A: “It feels like yesterday.”) or a direct answer (e.g., Q: “Are you feeling tired?” A: “I am feeling tired.”). After viewing the conversation on an iPad, the researcher pointed to the responder and asked the participant whether the speaker meant yes or no (Comprehension Task). Then, the researcher asked the participant “Why?” or “How did you know that?” to explain his or her answer (Explanation Task). If the participant simply repeated the second person’s utterance (e.g., “He said he just came from a pizza party.”), the researcher prompted the child by asking “Tell me more.” or “Why did you think he meant yes/no?” The researcher recorded child responses verbatim and scored them online. All sessions were audio recorded using a digital audio recorder for further coding and reliability purposes. The task required approximately 10-15 minutes to complete.
The 30 question-answer pairs were split across four conditions based on how the conversation partner responded to the question posed: Indirect Yes (10 items), Indirect No (10 items), Ambiguous Response (5 items), and Direct Response (5 items). Indirect Yes answers provided a positive response to the yes-no question without stating “yes.” Indirect No answers provided a negative response to the yes-no question without stating “no.” Ambiguous Responses were designed to provide an unclear answer that could be interpreted as either yes or no to the question. Direct Responses provided a clear “yes” or “no” to the question. The Ambiguous and Direct Response conditions had fewer items because the former served as an exploratory condition and the latter served as a control condition for comparison. Appendix A provides a complete list of the stimuli.
Prior to testing, we created a total of 48 question-answer pairs that were evenly split across the four conditions. We invited 20 adults, who were native English speakers between 19 and 45 years of age, to read these pairs in written form and judge whether the answer meant yes or no. For Indirect Yes and Indirect No conditions, we selected the items with the highest agreement. The consensus indicated a mean agreement of 97% for the Indirect Yes items and 99% for the Indirect No items, with agreements ranging from 90% to 100% for all items. We selected the items with an agreement close to 50% for the Ambiguous Response items. The mean agreement for this condition was between 40% and 45%. Finally, the mean agreement for the Direct Response items was 100%.
We controlled for the syntactic complexity and semantic difficulty of the stimuli. Specifically, the utterance length of all question-answer prompts ranged from 5 to 7 morphemes (M = 5.8, SD = 0.79). Mean length of utterance (MLU; Brown, Reference Brown1973) indexes children’s syntactic development to their age. The MLU of the stimuli was in line with the participants’ language development as Rice et al. (Reference Rice, Smolik, Perpich, Thompson, Rytting and Blossom2010) found that 5- and 8-year-olds with typical development had an MLU of 4.92 and 5.59, respectively, in a large-scale study of more than 300 participants. All words used in the stimuli were acquired by age four, according to the age-of-acquisition norms created by Kuperman et al. (Reference Kuperman, Stadthagen-Gonzalez and Brysbaert2012). We selected age-appropriate vocabulary to evaluate comprehension and reasoning because we did not expect young children to explain items with advanced terms (e.g., Q: “Do you have any siblings?” A: “My father had a vasectomy after me.”). There were no suggestive words (e.g., “good”, “bad”, “favorite”, “hate”, “like”) that might reveal preferences, and, thus, bias judgement. We avoided contractions (e.g., “she’s”, “isn’t”, “can’t”) to maximize the clarity of utterances and avoided other forms of non-literal language (e.g., idiom, sarcasm) to ensure the validity of the experimental stimuli.
Two native English speakers, one male and one female, with mid-western US dialects recorded the auditory stimuli. They were naïve to the study aims and were instructed to read through a list of statements followed by a list of questions (i.e., the experimental stimuli). Two individual recording sessions took place in a quiet therapy room using a microphone and digital recorder. The speakers were instructed to keep the same volume and speech rate and use a neutral tone when reading the sentences. We edited the sound files so that the male and female alternated asking and answering questions. The number of items with the male vs. female asking the question in each category was counterbalanced. We created two randomized sequences of the 30 question-answer pairs. For each sequence, no more than two of the Indirect Yes or Indirect No items appeared consecutively to reduce response bias, where participants responded yes or no with all questions (Winkler et al., Reference Winkler, Kanouse and Ware1982).
Scoring and Coding
Prior to scoring and coding, the first author researcher and a research assistant listened to each participant’s audio recordings and transcribed the child explanations in an spreadsheet. For the Comprehension Task, the researcher scored Indirect Yes, Indirect No, and Direct Response items as correct (1) or incorrect (0). The Ambiguous Response items were not scored because the answers could be interpreted either way (e.g., Q: “Did you have fun playing baseball?” A: “I tossed the ball.”). Thus, the maximum score for the Comprehension Task was 25 (i.e., 10 Indirect Yes, 10 Indirect No, and 5 Direct Response items). For the Explanation Task, two trained research assistants, undergraduate majors in Speech-Language-Hearing Sciences, who were not involved in the transcription and naïve to the purpose of the study and participant characteristics (e.g., age, sex) independently judged whether the explanation for an indirect answer was adequate (1) or inadequate (0). A detailed description of the training and coding procedures can be found in Appendix B.
After scoring was completed, the research assistants further assigned an error code to the inadequate responses as I Don’t Know/No Response (1), Repetition of Response (2), Irrelevant to Context (3), Made-up Interpretation (4) that is appropriate to context but different from the speaker’s intention, or Insufficient Explanation (5) that fails to capture speaker intention. We created the error codes based on an initial evaluation of 150 child explanations and the coding scheme by Nippold and Martin (Reference Nippold and Martin1989) for idiom interpretation. We developed a worksheet that outlined a binary approach for the research assistants to categorize participant explanations (Figure 1). The research assistants coded all inadequate explanations independently. Interrater reliability across Indirect Yes, Indirect No, and Ambiguous Response items ranged from 88%-98% with an overall average of 95%. Interrater reliability for Direct Response items was not calculated, as all explanations for this category were judged adequate and, thus, did not receive an error code. Appendix B provides descriptions and example responses assigned to each error code.
Statistical Analyses
To address Research Question 1, the research assistants scored participants’ answers as correct (1) or incorrect (0) for the Comprehension Task. We calculated percent correct for the Indirect Yes, Indirect No, and Direct Response conditions. Percent correct for the Ambiguous Response condition was not calculated because the answers could be interpreted either way (e.g., Q: “Are you going to the party?” A: “Bob will be at the party.”). Given that all participants achieved 100% accuracy for the Direct Response items, we created an Overall percent correct variable by averaging the accuracy of the two experimental conditions only (i.e., Indirect Yes and Indirect No). For the Explanation Task, the research assistants scored participants’ responses as adequate (1) or inadequate (0). We calculated percent “adequate” for all four conditions and created an Overall percent “adequate” variable by averaging the accuracy of the three experimental conditions (i.e., Indirect Yes, Indirect No, and Ambiguous Response). Direct Response items were not included because all participants scored with 100% adequacy. Finally, we conducted a series of Wilcoxon signed-rank tests to examine mean differences in the Indirect Yes, Indirect No, and Overall categories between the Comprehension Task and the Explanation Task. The Wilcoxon signed-rank test is a non-parametric test for paired data based on independent units of analysis (Woolson, Reference Woolson2007). We also evaluated effect sizes using Cohen’s d, with 0.2, 0.5, and 0.8 representing small, medium, and large effect sizes, respectively (Howell, Reference Howell2016).
To increase the statistical power for Research Question 2, we combined the individual age groups (n = 7-9) into three larger groups: 5- and 6-year-olds, 7- and 8-year-olds, and 9- and 10-year-olds (n = 15-17). We conducted a series of Wilcoxon Mann-Whitney U-tests to examine mean differences in the Overall percent correct of the Comprehension Task and the Overall percent “adequate” of the Explanation Task between the three groups. The Wilcoxon Mann-Whitney U-tests provided a more conservative non-parametric approach to test whether two independent groups had been sampled from the same population (Siegel & Castellan, Reference Siegel and Castellan1988). Additionally, we adjusted the p-value cutoff to 0.017 using the Bonferroni correction and evaluated effect sizes using Cohen’s d, where 0.2, 0.5, and 0.8 represent small, medium, and large effect sizes, respectively (Howell, Reference Howell2016).
To address Research Question 3, the research assistants independently coded inadequate explanations with one of the five error codes: I Don’t Know/No Response, Repetition of Response, Irrelevant to Context, Made-up Interpretation, or Insufficient Explanation. Next, we tallied the number of instances of each error code and calculated the percentage of each error code for the three larger age groups (i.e., "5 & 6", "7 & 8", and "9 & 10"). Finally, we conducted the Wilcoxon Mann-Whitney U-tests with the Bonferroni correction to examine differences in the distribution of the error codes between groups.
Results
Research Question 1
The first research question compared performance between the Comprehension Task and the Explanation Task. The children performed significantly higher on the Comprehension Task than the Explanation Task across the Indirect Yes (M (SD) = 87 (13) vs. 52 (23), p < 0.001, d = 1.86), Indirect No (M (SD) = 91 (11) vs. 56 (24), p < 0.001, d = 1.89), and Overall conditions (M (SD) = 89 (9) vs. 52 (21), p < 0.001, d = 2.26). Table 2 summarizes the mean percent correct and the mean percent “adequate” of the two tasks.
Note.
a Overall percent correct calculated by averaging Indirect Yes and Indirect No for the Comprehension Task (Ambiguous items were not scored because the answers could be interpreted either way); Overall percent adequate calculated by averaging Indirect Yes, Indirect No, and Ambiguous Response for the Explanation Task.
b Mean comparisons using Wilcoxon signed-rank test to evaluate significance at 0.05 and Cohen’s d to evaluate effect sizes.
Research Question 2
The second research question examined performance on the two tasks by age. When comparing the Overall percent correct of the Comprehension Task, no significant difference was found across the three larger age groups (i.e., 5 & 6 vs. 7 & 8 vs. 9 & 10; M (SD) = 85 (11) vs. 87 (13) vs. 95 (9)). However, a large effect size emerged when comparing the 5- and 6- to the 9- and 10-year-olds (d = 0.98) and a medium effect size emerged when comparing the 7- and 8- to the 9- and 10-year-olds (d = 0.71). Within-group analyses indicated no significant differences between Indirect Yes and Indirect No items and small effect sizes from all comparisons (d-values < 0.47). Ambiguous Response items were not scored because they could be interpreted either way. Table 3 summarizes the mean percent correct of the Comprehension Task by the three larger age groups, and Figure 2 contains a scatter plot that visualizes the relationship between age and comprehension of indirect answers.
Note.
a Overall percent correct calculated by averaging Indirect Yes and Indirect No items for the Comprehension Task. Ambiguous Response items were not scored because they could be interpreted either way. No significant statistical difference was found between groups.
When comparing the Overall percent “adequate” of the Explanation Task, the 5- and 6-year-olds (M (SD) = 32 (19)) performed significantly lower than the 7- and 8-year-olds (M (SD) = 55 (15); U(30) = 51053, p < 0.0001, d = 1.37). Additionally, the 7- and 8-year-olds performed significantly lower than the 9- and 10-year-olds (M (SD) = 66 (14); U(32) = 66553, p < 0.0001, d = 0.73). Within-group analyses of the Indirect Yes, Indirect No, and Ambiguous items revealed no significant differences. However, a medium effect size was found between Indirect No and Ambiguous in the 9- and 10-year-olds (d = 0.68). The remaining effect sizes were small, ranging from 0.12 to 0.47. Table 4 summarizes the mean percent “adequate” of the Explanation Task by the three larger age groups, and Figure 3 contains a scatter plot that visualizes the relationship between age and explanation of indirect answers.
Note.
a Overall percent “adequate” calculated by averaging Indirect Yes, Indirect No, and Ambiguous Response items for the Explanation Task. Significant statistical differences were found between groups (all p-values < 0.0001), indicating “5 & 6” < “7 & 8” < “9 & 10”.
Research Question 3
The third research question examined inadequate responses by categorizing them into five error types: I Don’t Know/No Response, Repetition of Response, Irrelevant to Context, Made-up Interpretation, and Insufficient Explanation. When comparing the mean percentages of I Don’t Know/No Response, Repetition of Response, and Irrelevant to Context, no significant difference was found across the three larger age groups. However, medium effect sizes emerged from the Repetition of Response when comparing the 5- and 6-year-olds to the 7- and 8-year-olds (d = 0.71) and to the 9- and 10-year-olds (d = 0.74). When comparing the distribution of Made-up Interpretations, the 9- and 10-year-olds had a significantly lower mean percentage (M (SD) = 2.27 (6.32)) than the 5- and 6-year-olds (M (SD) = 5.56 (5.64); U(32) = 56.5, p = 0.007, d = 0.54) and the 7- and 8-year-olds (M (SD) = 9.09 (8.46); U(33) = 61.5, p = 0.007, d = 0.91). Finally, for Insufficient Explanations, 9- and 10-year-olds had a significantly higher mean percentage (M (SD) = 41.66 (26.29)) than 5- and 6-year-olds (M (SD) = 22.62 (18.99); U(32) = 66.5, p = 0.012, d = 0.83), with a medium effect size when comparing 5- and 6-year-olds to 7- and 8-year-olds (d = 0.78). Table 5 shows the distribution of the error types for the three larger age groups, and Table 6 further details the mean percentage of error types across the experimental conditions (i.e., Indirect Yes, Indirect No, and Ambiguous Response).
Note. For I Don’t Know/No Response, Repetition of Response, and Irrelevant to Context, no significant differences were found across the three larger age groups. For Made-up Interpretation, 9- and 10-year-olds had a significantly lower mean percentage than 5- and 6-year-olds and 7- and 8-year-olds (both p-values = 0.007). For Insufficient Explanation, 9- and 10-year-olds had a significantly higher mean percentage than 5- and 6-year-olds (p = 0.012).
Note.
a The percentages indicate the distribution of the inadequate responses to Indirect Yes items across the error codes.
Discussion
The purpose of the current study was to examine the developmental trajectory of comprehension of indirect answers among preschool- and early elementary school-aged children with typical development. To address the first research question, we investigated the impact of methodological measures (i.e., forced-choice vs. open-ended questions) on task performance given that previous studies have used both formats and revealed varying levels of proficiency. The current study incorporated both types of measures (i.e., the Comprehension Task and Explanation Task), and the results indicated a significant difference with the Comprehension Task at 89% correct and the Explanation Task at 52% “adequate” overall. These findings are consistent with Bernicot et al. (Reference Bernicot, Laval and Chaminaud2007) and de Villiers et al. (Reference de Villiers, de Villiers, Coles-White and Carpenter2009) that reported 75% vs. 50% accuracy between forced-choice and open-ended questions. This performance gap is expected because explanation is more difficult than comprehension of non-literal language, and, thus, is developed and mastered over time. Laval (Reference Laval2003) examined both comprehension and metapragmatic knowledge of idioms (i.e., participants’ abilities to justify their chosen answers) and found that 6-year-old children with typical development understand idioms, but the corresponding metapragmatic knowledge is not matured until after age 9.
For the second research question, we examined children’s performance on comprehension and explanation of indirect answers by age. For the Comprehension Task, all children achieved above 84% accuracy, with the 9- and 10-year-olds reaching 95% accuracy. Across the three larger age groups, there was no significant difference between the performance on Indirect Yes and Indirect No items, suggesting that children comprehend indirect yes and no answers correctly at similar levels.
The developmental trajectory of the Explanation Task depicts noticeable gains over the preschool and early elementary years. Specifically, the cross-sectional trajectory shows three distinctive stages. In the first stage, 5- and 6-year-olds can adequately explain speaker intentions behind indirect answers 32% of the time. In the second stage, that percentage increases significantly to 55% by 7- and 8-year-olds. In the third stage, 9- and 10-year-olds make significant gains in their ability to interpret and explain indirect answers 66% of the time. The results also demonstrated that children of the same age performed similarly across the experimental conditions (i.e., Indirect Yes, Indirect No, and Ambiguous Response), suggesting no differential performance between the presumably more difficult Ambiguous Responses and the more common Indirect Yes and No answers. Particularly, older children can adequately explain Ambiguous Responses just as well as Indirect Yes and No answers by linking the speaker’s utterance to their intention (e.g., interpreting an ambiguous response “I tossed the ball” as having fun at a baseball game by reasoning that “he [the speaker] practiced and became better at it” – more examples can be found in Appendix B).
The development of indirect answers is similar to that of metaphors, another common form of non-literal language (Colston & Kuiper, Reference Colston and Kuiper2002; Kerbel & Grunwell, Reference Kerbel and Grunwell1997): comprehension begins at the age of 5 to 6 years and improves steadily throughout childhood and adolescence (Nippold, Reference Nippold1985; Rundblad & Annaz, Reference Rundblad and Annaz2010; Vosniadou & Ortony, Reference Vosniadou and Ortony1983). Winner et al. (Reference Winner, Rosenstiel and Gardner1976) examined comprehension of metaphors by older children between the ages of 6 and 14 years by asking the participants to explain metaphoric sentences, such as “After many years of working at the jail, the prison guard had become a hard rock that could not be moved.” The researchers found that comprehension increased gradually with age, and a higher level of metaphoric understanding emerged in early adolescence. Responses by the youngest participants showed little to no signs of metaphoric understanding – for example, explaining that the prison had hard rock walls, or the guard used to sit on a rock. Eight-year-old children demonstrated initial understanding of metaphors by commenting that the guard had muscles as hard as rocks, linking physical similarities between the guard and rock. By 10 years of age, children began to provide genuine metaphoric responses, interpreting the guard as hard as a rock because he did not care about anybody. Such progressive comprehension continues into early adolescence. In another experimental sentence where participants had to explain “The taste was a sharp knife,” a 10-year-old interpreted it to mean “It was spicy,” while a 14-year-old explained with more enriched descriptions that “The taste was a shocking flavor, hitting all of my senses at once”. In the current study, we also see more elaborated explanations of indirect answers in older participants. For example, to explain the speaker’s intention in Q: “Are you going to the circus?” A: “I have my binoculars ready.” a 5-year-old participant simply answered, “She already packed her stuff.” A 7-year-old further explained, “So she can see better at the circus.” Several 9- and 10-year-olds provided even more detailed explanations that the speaker “brought her binoculars to see closer,” and “If there’s something small at the circus, she can use the binoculars to see it.” The oldest children in the current study were able to adequately explain indirect answers 66% of the time, and their performance may continue to improve well into adolescence similar to other forms of non-literal language (Nippold & Rudzinski, Reference Nippold and Rudzinski1993; Vieiro & García-Madruga, Reference Vieiro and García-Madruga1997; Winner et al., Reference Winner, Rosenstiel and Gardner1976).
The acquisition and development of other common forms of non-literal language – such as irony (Dews et al., Reference Dews, Winner, Kaplan, Rosenblatt, Hunt, Lim, McGovern, Qualter and Smarsh1996; Hancock et al., Reference Hancock, Dunham and Purdy2000; Pexman & Glenwright, Reference Pexman and Glenwright2007), humor and sarcasm (Keenan & Quigley, Reference Keenan and Quigley1999; Semrud-Clikeman & Glass, Reference Semrud-Clikeman and Glass2008, Reference Semrud-Clikeman and Glass2010), and scalar implicature (Guasti et al., Reference Guasti, Chierchia, Crain, Foppolo, Gualmini and Meroni2005; Noveck, Reference Norbury2001; Papafragou & Musolino, Reference Papafragou and Musolino2003) – show trends similar to indirect answers, further suggesting that preschool and early elementary years are prime years of gaining non-literal language. However, more studies that compare and cross-examine multiple forms of non-literal language are needed before a comprehensive view of non-literal language development can be obtained.
Finally, for the third research question, we compared the distribution of five types of inadequate explanations (i.e., I Don’t Know/No Response, Repetition of Response, Irrelevant to Context, Made-up Interpretation, and Insufficient Explanation) across three larger age groups (i.e., 5- and 6-year-olds, 7- and 8-year-olds, and 9- and 10-year-olds). Overall, Repetition of Response and Insufficient Explanation were most common, accounting for approximately 80% of inadequate explanations. Irrelevant to Context, I Don’t Know/No Response, and Made-up Interpretation were less common, each accounting for 2%-10% of total inadequate responses. These data highlight that when children fail to explain indirect answers, their response often repeats the speaker’s utterance or lacks a convincing explanation for the speaker’s intention.
The distributions of I Don’t Know/No Response (Error Code 1), Repetition of Response (2), and Irrelevant to Context (3) did not differ significantly across the three larger age groups. For Made-up Interpretation (4), 9- and 10-year-olds had a significantly lower distribution of 2% than 5- and 6-year-olds at 6% and 7- and 8-year-olds at 9%. For Insufficient Explanation (5), 9- and 10-year-olds had a significantly higher distribution of 42% than 5- and 6-year-olds at 23%. The coding scheme for inadequate explanations was developed to reflect a hierarchy of reasoning, where lower codes represented no or poor explanations and higher codes represented more satisfactory (but still inadequate) explanations. While there was no significant difference in the distribution of the lower codes 1-3, a shift was seen in the higher codes 4 and 5. That is, the oldest group had significantly fewer Made-up Interpretations (4) than the other two younger groups and significantly more Insufficient Explanations (5) than the youngest group. Combining the findings, children’s ability to adequately explain indirect answers improves with age; this shift further suggests that the quality of inadequate explanations also improves with age.
Limitations and Future Directions
Although the current study provides a detailed examination of the comprehension and explanation abilities of preschool- and early elementary school-aged children, one limitation to these findings arises from the small sample sizes in each individual age group (n = 7-9). To increase statistical power, we combined these groups into three larger groups: 5- and 6-year-olds, 7- and 8-year-olds, and 9- and 10-year-olds (n = 15-17) to compare performance. While we found significant differences in explaining indirect answers between groups (i.e., “5 & 6” < “7 & 8” < “9 & 10”), we were not able to pinpoint a specific age or ages at which children make significant gains.
Another limitation stems from the racially homogeneous sample of 79% (38 of 48 participants) being identified as White. Therefore, results from the current study are more appropriately generalized to children with the same demographic. Future studies should more closely examine and compare comprehension of indirect answers across different racial and linguistic backgrounds as well as include a larger sample to increase external validity of the results found in this study.
A third limitation arises from the exploratory nature of the error analyses. While we created the coding scheme based on Nippold and Martin’s (Reference Nippold and Martin1989) classification for idiom interpretation, the five error types are not an exhaustive list. Additionally, we attempted to organize the error codes hierarchically from least to most satisfactory, albeit being inadequate explanations (i.e., I Don’t Know/No Response (1), Repetition of Response (2), Irrelevant to Context (3), Made-up Interpretation (4), and Insufficient Explanation (5)). This framework is not evidence-based, and, thus, requires further investigation to validate its methodological soundness.
The current study did not examine suprasegmental or paralinguistic features that may facilitate comprehension of indirect answers. For example, intonation contours are known for their pragmatic function of suggesting or imposing an alternative meaning different from the literal utterance (de Marneffe & Tonhauser, Reference de Marneffe and Tonhauser2019; Dennison & Schafer, Reference Dennison and Schafer2017; Kurumada, Brown et al., Reference Kurumada, Brown, Bibyk, Pontillo and Tanenhaus2014; Pierrehumbert & Hirschberg, Reference Pierrehumbert and Hirschberg1990). Facial expressions can provide visual cues to determine whether there is a discrepancy between the literal message and speaker intention (Attardo et al., Reference Attardo, Eisterhold, Hay and Poggi2003; Caucci & Kreuz, Reference Caucci and Kreuz2012; Deliens et al., Reference Deliens, Antoniou, Clin, Ostashchenko and Kissine2018a). In everyday communication, indirect answers are naturally accompanied by these multimodal cues. Thus, future research may investigate their influences on comprehension of indirect answers.
Future studies may also consider comparing comprehension of indirect answers across different neurodevelopmental conditions, such as autism and developmental language disorder. Understanding non-literal aspects of language, such as metaphors and irony, is often cited as a communication difficulty for individuals on the autism spectrum (e.g., Colich et al., Reference Colich, Wang, Rudie, Hernandez, Bookheimer and Dapretto2012; Deliens et al., Reference Deliens, Papastamou, Ruytenbeek, Geelhand and Kissine2018b; Dennis et al., Reference Dennis, Lazenby and Lockyer2001; Emerich et al., Reference Emerich, Creaghead, Grether, Murray and Grasha2003; Happé, Reference Happé1993, Reference Happé1995; Kalandadze et al., Reference Kalandadze, Norbury, Nærland and Næss2018; Martin & McDonald, Reference Martin and McDonald2004; Mitchell, Reference Mitchell1997; Norbury, Reference Noveck2005; Rundblad & Annaz, Reference Rundblad and Annaz2010). However, this population’s ability to comprehend and explain the speaker’s intention behind indirect answers as another form of non-literal language remains unknown.
Conclusions
The main contribution of the current study is that it provides empirical evidence to the developmental trajectories of comprehension and explanation of indirect answers in preschool- and early elementary school-aged children with typical development. For comprehension, 5- to 8-year-olds perform around 84%-86% accuracy, and 9- and 10-year-olds were near ceiling at 95%. For explanation, the cross-sectional trajectory indicated three stages, with 5- and 6-year-olds explaining indirect answers adequately 32% of the time, 7- and 8-year-olds performing significantly higher at 55%, and 9- and 10-year-olds performing significantly higher than the two younger groups at 66%. By examining the two tasks separately, the findings offer two sets of baseline data for future studies that investigate acquisition of indirect answers by children with different cultural and linguistic backgrounds or those with varying cognitive and language profiles.
The error analysis offers novel insight into what happens when children fail to interpret the speaker’s intentions appropriately. Overall, Repetition of Response and Insufficient Explanation are the most common errors and account for approximately 80% of inadequate explanations. Irrelevant to Context, I Don’t Know/No Response, and Made-up Interpretation each account for 2%-10% of total inadequate responses. Additionally, the quality of inadequate explanations improves with age, as evidenced by older children providing more Insufficient Explanations and fewer Made-up Interpretations than younger children.
Competing interest statement
The authors have no financial or nonfinancial relationships to disclose.
Funding statement
Timothy Huang received funding from West Chester University’s Creative Activity and Research Experience Award and from the University of Minnesota’s Bryng Bryngelson Research Award to support this study.
Appendix A
Appendix B
This appendix details the training procedure for research assistants. The table at the end of the appendix provides descriptions of the error codes and examples of child responses. All research assistants completed the online Basic Course for Social/Behavior or Humanist Research training through the Collaborative Institutional Training Initiative Program. The researcher trained two coding assistants through direct instruction on definitions of an adequate explanation and characteristics of each error code for inadequate explanations, utilizing participant responses as examples. An adequate explanation was defined as a response that links the speaker’s utterance to his/her intention or provides a context-appropriate alternative reasonan. For example, an adequate explanation to the conversation, Q: “Are you hungry?” A: “I just came from a pizza party.” included “He was not hungry because he already ate at the party.” Inadequate explanations, however, failed to capture or interpret the speaker’s intention appropriately. For example, inadequate explanations to the same item included “He said he went to a pizza party.” and “Parties are fun.” Scoring of the Explanation Task was independent from the Comprehension Task. That is, a participant could score 1 for the Comprehension Task but 0 for the Explanation Task and vice versa. For example, a participant could score 0 by answering “Yes (he is hungry)” for the Comprehension Task due to its contradiction to adult judgement but still score 1 for the Explanation Task by reasoning that “Because he didn’t eat anything at the party, so probably hungry.”
Scoring for the Ambiguous and Direct Response items differed from the other two conditions. Because the Ambiguous answer prompts were designed to be unclear, explanations were judged based on whether an appropriate or adequate speaker intention was provided. For example, in the conversation, Q: “Did you have fun playing baseball?” A: “I tossed the ball.”, adequate explanations included “(Yes) because he practiced and became better at it” and “(No) all he did was tossing the ball.” Inadequate explanations included “(Yes) he tossed the baseball” and “(No) he didn’t have fun.” For Direct Response items, repetition of the answer prompt was considered adequate. For example, in the conversation: Q: “Did you go to the garden?” A: “I did not go to the garden.”, “He said he did not go to the garden.” and “He said so.” would be judged as adequate. More examples are provided at the end of the appendix.
The research assistants completed approximately 1 hour of training with the researcher. Next, the research assistants independently scored and coded 90 child responses (i.e., responses by three participants). The researcher coded the same responses and calculated interrater reliability between the researcher and the assistants for each test item. Interrater reliability was computed by dividing the number of instances of agreement by the total number of opportunities and multiplying by 100. Reliability on the Comprehension Task across all test items ranged from 80%-100% between the researcher and one assistant and 60%-100% between the researcher and the other assistant. Reliability on the Explanation Task across all test items ranged from 80%-100% between the researcher and the two assistants. The researcher and the research assistants met a second time to discuss instances of disagreement item by item and resolved all issues using the error code worksheet (Figure 1). After training completed, the researcher and both assistants reached 100% reliability on 30 new child responses.
Both research assistants scored all responses independently. Reliability across Indirect Yes, Indirect No, and Ambiguous Response items ranged from 82%-94% with an overall average of 87%. Instances of disagreement arose from the coders’ subjective judgment of whether a response “adequately” explained speaker intentions. Take the item Q: “Have you finished your homework?” A: “I just got home from school.” For example, the explanation “He doesn’t want to do it.” was coded as adequate by one coder but inadequate by the other. The former argued that the response showed that the child interpreted the indirect answer as an excuse, and the latter argued that the response failed to provide a convincing explanation, such as “He did not have time to do it.” All disagreements were subsequently resolved by the first author as a third coder. Interrater reliability was 100% for Direct Response items as all explanations were judged adequate. The table below provides descriptions of the error codes and corresponding examples across the experimental conditions.
Note. aChild did not respond after further prompting (e.g., “Tell me more.”); bAll participants scored “adequate” for Direct Response items. *indicates adequate explanations based on alternative assumptions.