Introduction
Children with word finding (WF) difficulties frequently experience problems retrieving the target words when they are speaking (German et al., Reference German, Schwanke and Ravid2012). WF difficulties can occur as part of a developmental language disorder (DLD; Bishop et al., Reference Bishop, Snowling, Thompson and Greenhalgh2016) or not (German et al., Reference German, Schwanke and Ravid2012). Identifying WF difficulties is important since they can lead to academic, self-esteem and socializing problems (Best et al., Reference Best, Hughes, Masterson, Thomas, Fedor, Roncoli, Fern-Pollak, Shepherd, Howard, Shobbrook and Kapikian2018; German et al., Reference German, Schwanke and Ravid2012). Despite their impact on child development, research on WF difficulties is sparse, especially in languages other than English, leading to an impoverished understanding of WF difficulties and of their underlying deficit. A few studies have suggested questionnaires, single-word naming, and discourse as key tasks in the identification of WF difficulties, but most do not provide empirical data to support this claim (Bourassa Bédard & Trudeau, Reference Bourassa Bédard and Trudeau2021; Bragard et al., Reference Bragard, Schelstraete, Collette and Grégoire2010; German, Reference German2009; Paul et al., Reference Paul, Norbury and Gosse2018). However, in order to hypothesize at which stage of speech production WF difficulties may arise, additional tasks are needed in the assessment process.
Models of lexical access
Many psycholinguistic models of lexical access have been developed over time. While these models have minor differences, they all claim that speech production unfolds in four general language processing stages and that there exists a clear distinction between semantic and phonological stages (German et al., Reference German, Schwanke and Ravid2012; Levelt, Reference Levelt1999). At stage one, conceptual planning, the speaker determines the information that they want to convey, without using words (German et al., Reference German, Schwanke and Ravid2012; Levelt, Reference Levelt1999). At stage two, lexical selection, the speaker accesses the lemma, which includes a word’s syntactic representation and a link to its semantic representations (German et al., Reference German, Schwanke and Ravid2012; Levelt, Reference Levelt1999). At stage three, the speaker must retrieve a word’s morphological (the morpheme structure of the word) and phonological representations (Levelt, Reference Levelt1999). Stage four is a motor stage leading to the word’s production.
Although these models were initially developed for adults, authors argue that Levelt’s model can also apply to school-aged children (German et al., Reference German, Schwanke and Ravid2012; Levelt, Reference Levelt1999). Most importantly, evidence suggests that, when retrieving words, children also go through the same semantic and phonological stages, two important components of models of lexical access (German et al., Reference German, Schwanke and Ravid2012; Levelt, Reference Levelt1999). Informed by models of lexical access, a key challenge in WF research is to determine the stage of breakdown at which WF difficulties may arise. These models have traditionally led to conflicting views, such as the semantic vs. phonological deficit debate, suggesting that WF difficulties may arise when retrieving semantic representations (stage two) or phonological representations (stage three; Messer & Dockrell, Reference Messer and Dockrell2006). More recently, a consensus seems to have been reached: WF difficulties are related to profiles of difficulties with different stages of breakdown for each profile, rather than a single one (Best et al., Reference Best, Hughes, Masterson, Thomas, Howard, Kapikian and Shobbrook2021; German, Reference German, Bahr and Silliman2015). German (Reference German, Bahr and Silliman2015) also states that these profiles may be linked to other diagnoses, such as DLD or dyslexia. Although there is variability in the number and the description of these profiles, they seem to align on two axes (German, Reference German, Bahr and Silliman2015). The first axis focuses on representations (i.e., semantic vs. phonological representations deficits) and the second focuses on the mechanisms (i.e., storage vs. retrieval deficits).
The first axis has focused on whether some WF difficulties occur at levels of semantic (stage two of lexical access models) or phonological representations (stage three; Bragard et al., Reference Bragard, Schelstraete, Snyers and James2012; Messer & Dockrell, Reference Messer and Dockrell2006). Semantic representation refers to the word’s meaning and includes characteristics such as its category and its function (Best et al., Reference Best, Hughes, Masterson, Thomas, Fedor, Roncoli, Fern-Pollak, Shepherd, Howard, Shobbrook and Kapikian2018; Bragard et al., Reference Bragard, Schelstraete, Snyers and James2012). Phonological representation includes characteristics such as the number of syllables and the sequence of phonemes used to produce the word (Best et al., Reference Best, Hughes, Masterson, Thomas, Fedor, Roncoli, Fern-Pollak, Shepherd, Howard, Shobbrook and Kapikian2018). A few studies support the existence of semantic and phonological profiles. In a case study, Constable et al. (Reference Constable, Stackhouse and Wells1997) argued that a 7-year-old boy’s WF difficulties appeared at stage three of lexical access – that is, when retrieving phonological representations. The child performed well on most semantic tasks, which included a semantic fluency and a semantic judgment task with pictures. The child struggled on a semantic judgment task with words that were read by the examiner, but the authors argue that this could be due to difficulties in phonological memory or processing. Indeed, the child struggled with phonological tasks, which included a phonological judgment task (“auditory discrimination”) and a phonological fluency task (rhyming). Further phonological tasks were administered, such as nonword repetition, and the child’s performance suggested that he had imprecise phonological representations. Another study found evidence for the existence of a phonological profile of WF difficulties, but also the existence of a semantic profile. In an intervention study of four children with WF difficulties, Bragard et al. (Reference Bragard, Schelstraete, Snyers and James2012) identified these two profiles of WF difficulties based on the storage deficit hypothesis. They suggested that two of the children appeared to have semantically based WF difficulties, since these children failed what appears to be a word comprehension taskFootnote 1. The two other children were labeled as having phonologically based WF difficulties since they failed a phonological judgment task (deciding if a word was correctly produced). In Bragard et al. (Reference Bragard, Schelstraete, Snyers and James2012)’s study, the children also responded differently to semantic and phonological interventions based on their profile of WF difficulties, which further provides support for the existence of semantic and phonological WF profiles.
The second axis has focused on whether WF difficulties occur in the development of lexical representations, or when retrieving these representations. The storage deficit hypothesis states that children with WF difficulties have difficulty building precise representations (mainly semantic or phonological) when they are learning new words (German, Reference German, Bahr and Silliman2015; Leonard, Reference Leonard2014). The accurate and rapid retrieval of these words, at either stage two or three, is susceptible to errors since their representations are less precise and thus more fragile (German, Reference German, Bahr and Silliman2015; Leonard, Reference Leonard2014). German (Reference German, Bahr and Silliman2015) suggests that a storage deficit may explain why some children with DLD also have WF difficulties considering that these children have problems building precise semantic and phonological representations. The retrieval deficit hypothesis states that it is the word retrieval mechanism itself that is problematic (German, Reference German, Bahr and Silliman2015; Leonard, Reference Leonard2014). German (Reference German, Bahr and Silliman2015) suggests that a retrieval deficit may explain why some children with learning disabilities, dyslexia or a traumatic brain injury also have WF difficulties. The distinction between storage and retrieval profiles does receive support from a study by Best et al. (Reference Best, Hughes, Masterson, Thomas, Howard, Kapikian and Shobbrook2021). Best et al. (Reference Best, Hughes, Masterson, Thomas, Howard, Kapikian and Shobbrook2021) assigned participants to three groups of WF profiles, i.e., two storage profiles and one retrieval profile, based on their results on a semantic picture judgment task and a nonword repetition task. As in Bragard et al. (Reference Bragard, Schelstraete, Snyers and James2012), they included a semantic and phonological profile, both fitting the storage deficit hypothesis. Children with semantically based WF difficulties failed a semantic picture judgment task (which was different from the one from Bragard et al., Reference Bragard, Schelstraete, Snyers and James2012). Children with phonologically based WF difficulties failed a nonword repetition task. Best et al. (Reference Best, Hughes, Masterson, Thomas, Howard, Kapikian and Shobbrook2021) also added a third WF profile called “classic WF difficulties’ that fit the retrieval deficit profile. Children with classic WF difficulties appeared to have WF difficulties, but they did not have specific semantic or phonological difficulties as measured by the semantic picture judgment and nonword repetition tasks. The authors concluded that, as in Bragard et al. (Reference Bragard, Schelstraete, Snyers and James2012), children responded differently to semantic and phonological interventions depending on their WF difficulties profile, while children with “classic WF difficulties” benefited from both approaches.
As reviewed above, researchers have traditionally organized profiles of WF difficulties around the representations axis (phonological vs. semantic deficit) and the mechanisms axis (storage vs. retrieval deficit). It should be noted that the identification of WF profiles, or group membership, in these studies was researcher or theoretically driven based on these two axes (Best et al., Reference Best, Hughes, Masterson, Thomas, Howard, Kapikian and Shobbrook2021). Messer and Dockrell (Reference Messer and Dockrell2013) provide a different account of profile categorization of WF difficulties that is data driven. Their categorization is rather novel as it is based on both the representation axis and Bishop and Snowling’s (Reference Bishop and Snowling2004) model of the relationship between dyslexia and DLD. Their study involved measures of written language rather than focusing on traditional semantic versus phonological measures. In their longitudinal study of 38 children with WF difficulties from ages 7 to 9, Messer and Dockrell (Reference Messer and Dockrell2013) concluded that, overall, their participants’ WF difficulties appeared to stem from semantic difficulties. They also used a cluster analysis to identify profiles of WF difficulties. They hypothesized that children with WF difficulties would fall into three profiles: 1) poor semantic abilities (as in poor comprehenders), 2) poor phonological abilities, which would manifest as poor decoding abilities (as in dyslexia) and 3) poor semantic and phonological abilities, which would manifest as poor decoding, reading comprehension and language abilities (as in DLD). Almost all children with WF difficulties, however, fell into the first (poor comprehenders; lower scores on reading comprehension than on single-word reading) or the last profile (DLD; low scores on reading comprehension scores, single-word reading and phonological awareness). Two main conclusions can be drawn from this study regarding WF difficulties profiles. First, Messer and Dockrell’s clusters raise some doubt about the traditional categorization of children into semantic, phonological or classic WF difficulties profiles and stress the need for new data to confirm or disprove this categorization. Recall that, in Best et al. (Reference Best, Hughes, Masterson, Thomas, Howard, Kapikian and Shobbrook2021)’s classic WF difficulties profiles, these children had WF difficulties in the context of strong semantic and phonological abilities. Second, Messer and Dockrell’s DLD profile also highlights the hypothesis that semantic and phonological profiles of WF difficulties may not be mutually exclusive.
Based on the studies reviewed above, it appears that there is no clear consensus on which language processing deficits result in WF difficulties and on the existence of profiles of WF difficulties, but also on which tasks should be used to identify WF difficulties or profiles of WF difficulties. Regarding the tasks, a consensus that seems to emerge is that a typical WF difficulties assessment should include tasks 1) to identify WF difficulties, 2) to identify potential semantic deficits and 3) to identify potential phonological deficits (see, for example, Best et al., Reference Best, Hughes, Masterson, Thomas, Howard, Kapikian and Shobbrook2021). A non-exhaustive list of the various tasks used across studies, organized based on one of these three cases, is presented in Table 1. To summarize, assessments usually include a combination of questionnaires, single-word naming and word comprehension tasks using the same items and a discourse task to identify if WF difficulties are present or not (Bragard et al., Reference Bragard, Schelstraete, Collette and Grégoire2010; German, Reference German2009; Messer & Dockrell, Reference Messer and Dockrell2006; Paul et al., Reference Paul, Norbury and Gosse2018). Once WF difficulties have been identified, one may want to assess semantic and phonological skills to identify profiles of WF difficulties. Semantic tasks assess the quality of a child’s semantic representations. For example, a word definition task does so by encouraging the child to provide categorical, physical, locative, evaluative or categorical information about the word’s meaning (McGregor et al., Reference McGregor, Friedman, Reilly and Newman2002). Phonological tasks assess the quality of a child’s phonological representations. For example, although nonword repetition tasks have been used for a variety of reasons, such as perception, short-term memory, language processing and sub-lexical processing of phonological representations (Stoel-Gammon, Reference Stoel-Gammon2011), they can also be used as a proxy to assess the quality of a child’s phonological representations. Indeed, the ability to accurately repeat nonwords has been linked to problems in building precise phonological representations in word learning (Edwards & Lahey, Reference Edwards and Lahey1998; Gathercole, Reference Gathercole2006). Identifying semantic or phonological strengths/difficulties may be relevant for intervention considering that some studies have suggested that a WF intervention should be tailored to these profiles of strengths and difficulties (Best et al., Reference Best, Hughes, Masterson, Thomas, Howard, Kapikian and Shobbrook2021; Bragard et al., Reference Bragard, Schelstraete, Snyers and James2012; German, Reference German, Bahr and Silliman2015).
Note. Some studies that have used these tasks with children with WF difficulties:
1 Dockrell et al. (Reference Dockrell, Messer and George2001)
2 German (Reference German1991)
3 Messer & Dockrell (Reference Messer and Dockrell2013)
4 Bragard et al. (Reference Bragard, Schelstraete, Snyers and James2012)
6 Constable et al. (Reference Constable, Stackhouse and Wells1997).
Another aspect that may be important in the conceptualization and assessment of WF difficulties is the language spoken. Apart from Bragard et al. (Reference Bragard, Schelstraete, Snyers and James2012), all the studies reported had English-speaking participants. Research on WF difficulties in other languages is rare, especially when looking at WF difficulties at the discourse level. We believe that research in other languages will contribute to a much richer understanding of WF difficulties and areas of impairment. We conducted a pilot study on 11 French-speaking children of school age who were typically developing to collect preliminary data on French narration and compare it to the English data (Bourassa Bédard & Trudeau, Reference Bourassa Bédard and Trudeau2021). Based on the frequency of signs of WF difficulties in discourse, called WF behaviours, our analyses suggested that there may be important differences in the phenotype of WF difficulties in French and in English, reinforcing the need for studies in other languages than English. These differences needed to be confirmed with a larger sample and with children with WF difficulties.
Overall, there is no clear consensus on which language processing deficits result in WF difficulties. WF difficulties may arise at either stage two, a semantic stage, or three, a phonological stage, of lexical access. More specifically, WF may be organized based on two axes: the representation (semantic or phonological deficit) or the mechanism (storage or retrieval deficit) axes. A consensus is emerging that not one deficit leads to WF difficulties, but rather that many deficits based on the representation and mechanism axes may lead to WF difficulties. As a result, profiles of WF difficulties may better describe these children’s challenges. Indeed, previous studies have found that it was possible that at least two profiles of WF difficulties exist and that children with WF difficulties struggled on a range of semantic and phonological measures. To find these profiles based on the representation axis, and thus better understand profiles of WF difficulties, a first step would be to identify, among these semantic and phonological measures, which ones are particularly challenging for these children. Based on this finding, profiles of WF difficulties may arise.
The goal of the present study was to contribute to the understanding of which language processing deficit results in WF difficulties. In addition, the study aimed to contribute to improved identification of WF difficulties in French-speaking children. This second goal is particularly relevant as most of the previous studies have focused on children who speak English, with the exception of Bourassa Bédard and Trudeau (Reference Bourassa Bédard and Trudeau2021) and Bourassa Bédard et al. (Reference Bourassa Bédard, MacLeod and Trudeau2022). Hence, the following two research questions were investigated:
-
1) Among a variety of tasks that have been proposed to assess WF difficulties, which are more difficult for children with WF difficulties as a group compared to typically developing peers as a group? Do these differences hold up when looking at individual performances?
-
2) Can we identify profiles of WF difficulties or critical attributes of these profiles?
Method
Participants
This study was approved by the Centre for Interdisciplinary Research in Rehabilitation of Greater Montreal (CRIR)’s Research Ethics Board (project CRIR-1360-0918/Multi). Forty-six monolingual French-speaking children of 7- to 12-years-old participated in this study. All children lived in the province of Québec, Canada and most of the children lived in the Greater Montréal area. The children composed two groups: 22 children with typical language development, and no suspected WF difficulties, and 24 children with WF difficulties. Children with typical language development had no language, learning or hearing difficulties as confirmed by their caregiver. To further ensure the absence of language difficulties in the group of children with typical language development, children who scored below one standard deviation of the previously reported mean (Bourassa Bédard et al., Reference Bourassa Bédard, MacLeod and Trudeau2022) on the Échelle de vocabulaire en images Peabody (EVIP; Dunn et al., Reference Dunn, Thériault-Whalen and Dunn1993) were excluded. The presence of WF difficulties in children of the other group was documented by previous assessment by speech-language pathologists. All children of this group had a diagnosis of WF difficulties (in French: trouble d’accès lexical or difficultés d’accès lexical). Given the lack of standardized assessments tools for this population in (Québec) French, clinicians relied on a variety of measures to document their clinical impression of WF difficulties. Based on conversations with SLP colleagues, WF difficulties in this community are usually identified by triangulating three data sources: 1) the reason for consultation, 2) a quantitative and qualitative analysis of the child’s responses on a single-word naming task and 3) a qualitative analysis of signs of WF difficulties in the child’s discourse.
To follow previous research on WF difficulties (see, for example, Best et al., Reference Best, Hughes, Masterson, Thomas, Howard, Kapikian and Shobbrook2021), some children were chosen to be included even though other conditions co-occurred such as a DLD, ADHD or dyslexia. Children were however excluded if they stuttered or had a biomedical condition such as autism or an intellectual disability. Stutterers were excluded given the similarities in manifestations between stuttering and WF difficulties (Bourassa Bédard et al., Reference Bourassa Bédard, MacLeod and Trudeau2022; German, Reference German1991). Children with a biomedical condition were excluded following Bishop et al.’s (Reference Bishop, Snowling, Thompson and Greenhalgh2016) consensus that the link between language difficulties and autism or intellectual disability is still unclear, and that intervention may differ based on these diagnoses. These criteria also allowed us to facilitate recruitment and to better reflect the heterogeneity of clinician’s caseloads. In fact, according to an international consensus, WF difficulties are conceptualized as a symptom of a DLD (Bishop et al., Reference Bishop, Snowling, Thompson and Greenhalgh2016) although some researchers argue that WF difficulties can occur without a DLD (see, for example, German et al., Reference German, Schwanke and Ravid2012). Furthermore, a diagnosis of WF difficulties alone is often not sufficient to receive speech and language services in the public system in Québec. Of the 24 participants, 15 children also had a DLD.
Typically developing participants were recruited through online flyers, from their school or from previous studies led by the authors. Participants with WF difficulties were recruited through online flyers or were referred to the research team through speech-language pathologists, rehabilitation centres and a university clinic.
The children’s main caregiver answered a sociodemographic questionnaire and a French adaptation of the Word Finding Referral Checklist (German & German, Reference German and German1992; adapted by Bourassa Bédard & Trudeau, Reference Bourassa Bédard and Trudeau2021). An extensive language exposure history was collected through the sociodemographic questionnaire. Based on the participants’ language exposure history at home and at school, they were functionally French monolinguals. According to their caregiver, participants were not proficient enough in another language to have a conversation; nor did they receive sustained and regular exposure to other languages, with the exception of core English classes.
As reported in Table 2, two groups did not statistically differ in age (t(44) = 0.725; p = 0.472) or gender (χ2(1, N = 46) = 1.315, p = 0.251). Regarding socio-economic status, participants with WF difficulties came from a diversity of socio-economic status based on maternal education. The group of participants without WF difficulties, however, was skewed towards higher socio-economic status. Some previous studies on lexical development in Québec with French speakers have found that socio-economic status effects emerged when looking at high school vs. higher levels of maternal educations (Boudreault et al., Reference Boudreault, Trudeau and Bouchard2006). The distribution of high school vs. other levels of maternal education was not different between the two groups (p = 0.670, two-tailed Fisher’s Exact TestFootnote 2).
Procedure
Sixteen participants (9 with WFD and 7 without WFD) were assessed in person. All but one were tested at our lab. One participant in the typically developing group was tested at their school. Due to the global COVID-19 pandemic, testing was transitioned online for the remaining participants. The distribution of modality (in person or online) was similar across both groups: χ2(1, N = 46) = 0.163, p = 0.763. The first author met with the participants for 1.5 to 2 hours. Testing took place over one or two sessions at a maximum interval of a few days.
The task order was the same for most of the participants and each task will be described in detail below: Échelle de vocabulaire en images Peabody (EVIP; Dunn et al., Reference Dunn, Thériault-Whalen and Dunn1993), a story stem (see Burchell et al., Reference Burchell, Bourassa Bédard, Boyce, McLaren, Brandeker, Squires, Kay-Raining Bird, MacLeod, Rezzonico, Chen and Cleave2022), two stories from the Test of Narrative Language (TNL), a nonword repetition task, the verbal fluency task (semantic and letter) from the French adaptation of a standardized neuropsychological assessment tool for children (NEPSY-II Korkman et al., Reference Korkman, Kirk and Kemp2012), a word definition task from the French Canadian version of the Wechsler Intelligence Scale for Children – Fourth Edition (WISC-IV; Wechsler, Reference Wechsler2005), the Test of Word Finding in Discourse (German, Reference German1991) and Bragard et al. (Reference Bragard, Schelstraete, Collette and Grégoire2010)’s single-word naming and picture word comprehension tasks. Two participants from the WF difficulties group told their story from the TNL at the end of the testing because they were shy or did not know what to say. Testing procedures were similar for in-person and online testing except the EVIP and Bragard et al. (Reference Bragard, Schelstraete, Collette and Grégoire2010)’s tasks which had to be adapted for online testing.
A subset of the tasks was used to assess the overall mechanisms of WF (research question 1): a parent questionnaire (Word Finding Referral Checklist), the EVIP, the two discourse tasks (the narratives and the Test of Word Finding in Discourse), and a single-word naming task combined with a word comprehension task. As mentioned above, the EVIP also allowed us to confirm that participants in the group without WF difficulties were typically developing. Based on the representation axis, other tasks were used to identify WF difficulties profiles (research question 2) – that is, whether breakdowns occurred at the semantic, indicative of a semantic profile of WF difficulties, or at the phonological level, indicative of a phonological profile of WF difficulties. Word definitions and semantic fluency were used to identify potential semantic deficits (as in McGregor et al., Reference McGregor, Friedman, Reilly and Newman2002; Messer & Dockrell, Reference Messer and Dockrell2013) while nonword repetition and letter-based word fluency tasks were used to identify potential phonological deficits (as in Best et al., Reference Best, Hughes, Masterson, Thomas, Howard, Kapikian and Shobbrook2021; Messer & Dockrell, Reference Messer and Dockrell2013).
Overall mechanism of WF
Word Finding Referral Checklist (German & German, Reference German and German 1992 )
This checklist includes 15 questions regarding the presence of different signs of WF difficulties in an everyday setting. Caregivers answer ‘YES’ if their child usually exhibits these behaviours, scoring one point, or ‘NO’ if they do not, scoring no point, for a total of 15 maximum points. According to the authors, children scoring more than six points should be referred to a SLP for an assessment although this checklist is not standardized. The Word Finding Referral Checklist was adapted to French as part of our pilot study following the World Health Organization (WHO)’s guidelines for translation, which includes back translation (Bourassa Bédard & Trudeau, Reference Bourassa Bédard and Trudeau2021). As described in Bourassa Bédard and Trudeau (Reference Bourassa Bédard and Trudeau2021), the initial response choices were also adapted because previous research in Québec has shown that caregivers are sometimes uncertain when answering questionnaires about their child (Paul, Reference Paul2016). Providing choices such as “I think so” or “I don’t think so” was found helpful by caregivers (Paul, Reference Paul2016). Nevertheless, these categories were counted as “Yes” and “No” respectively.
Échelle de vocabulaire en images Peabody (EVIP)
The EVIP (Dunn et al., Reference Dunn, Thériault-Whalen and Dunn1993) is a Canadian French adaptation of the Peabody Picture Vocabulary Test. The child is asked to point to the one of four pictures that corresponded to the word said by the examiner. For this test, the mean corresponds to a standard score of 100 with a standard deviation of 15. The manual’s instructions were followed for administration and scoring of the EVIP.
Narratives
The examiners followed the procedure outlined in Bourassa Bédard et al. (Reference Bourassa Bédard, MacLeod and Trudeau2022). The participants took part in two narrative tasks: a story stem (Cleave, 2015-2021) and a story from the Test of Narrative Language (TNL; Gillam & Pearson, Reference Gillam and Pearson2004; or adaptation of Gillam et al., Reference Gillam, Peña, Bedore and Pearsonin development). Two tasks were provided given that our pilot study (Bourassa Bédard & Trudeau, Reference Bourassa Bédard and Trudeau2021) suggested that two tasks were necessary to elicit a representative sample. Children were encouraged to tell long and complete stories. The examiner encouraged the child with neutral prompts such as “uh-huh”, “yes” and “OK”. For the story stem, the examiner read the beginning of a story such as “There was an old man who lived by the water.” And then asked the child to finish the story. For the story from the TNL, the examiner first told a model story with a picture. The child was then asked comprehension questions, which were not analyzed in the current study. The examiner then showed the child another picture and asked them to tell the story that went with the picture. The two versions of the story stem and of the TNL were translated in French and shown to be equivalent as part of a pilot study (Cleave et al., Reference Cleave, Kay-Raining Bird, Cormier, Squires, Roach, Chiasson and Rushbrook-Dickey2013). The analysis for the narratives and the Test of Word Finding in Discourse are described below.
Test of Word Finding in Discourse (TWFD)
Participants were shown three pictures and asked to describe and tell a story about the pictures. Some objects in the pictures are highlighted with color to make them salient. Participants were then asked questions to generate a longer language sample (e.g., “[T]ell me how it would be different if it were winter or if it were snowing in this picture”). The TWFD was adapted from English to French as part of this study following the WHO’s guidelines.
Word comprehension
In the word comprehension task by Bragard et al. (Reference Bragard, Schelstraete, Collette and Grégoire2010), participants had to identify the one picture out of five that matched the word said by the examiner. Each of the five pictures was assigned a color and the participants had to press the button with the corresponding color. The percentage of correctly identified items was computed.
Single-word naming
In the computer-based single-word naming task by Bragard et al. (Reference Bragard, Schelstraete, Collette and Grégoire2010), participants were shown 80 different objects that they needed to name. Children were told to name the pictures as quickly as possible. The order of the items was randomly generated. The single-word naming task used the 80 items from the word comprehension task. Two scores were computed: the percentage of correctly named items and an adjusted score that counted only the words known by the child – that is, the words that were correctly identified in the word comprehension task. This adjusted procedure was also used in Dockrell et al. (Reference Dockrell, Messer and George2001) to allow “a direct assessment of WF difficulties” – that is, to account for naming errors related to unknown words.
Quality of representations
Nonword repetition
Participants listened to audio recordings of 16 nonwords and were asked to repeat them afterwards. Nonwords varied in length ranging from two to five syllables with four nonwords of each length. Ten different sequences of nonwords were created from Chiat’s (Reference Chiat, Armon-Lotem, de Jong and Meir2015) Quasi-Universal Nonword List as part of the French/English Discourse Study – Canada (Cleave, 2015-2021). Children’s responses were scored as correct when the nonword was repeated entirely correctly, and no partial credit was given, for a total of a maximum of 16 points.
Verbal fluency
Using the semantic fluency task of the NEPSY-II (Korkman et al., Reference Korkman, Kirk and Kemp2012), participants had one minute to name the most words based on a category (i.e, animals) followed by a second category: (i.e., food and drinks). For the letter section, participants had one minute to name the most words that started with the letter s followed by the letter f. In the letter section, participants were told not to name people or places. Each correct answer was given 1 point to obtain a semantic total and a letter total. The totals were then converted to standard scores ranging from 1 to 19 (lowest to highest possible score) according to the test’s manual.
Word definitions
In the vocabulary subtest from the WISC-IV (Wechsler, Reference Wechsler2005), participants were asked to listen to words and then tell the examiner what the words meant. The words were presented in written form in front of the participants as the examiner said the words. Responses were scored from 0 to 2 according to the quality of the definition. Scores were added and the total was then converted to a standard score ranging from 1 to 19 (lowest to highest possible score) according to the test’s manual.
Adaptations for online administration
Since all tasks were designed for testing in person, three of them had to be modified to fit their online administration: the EVIP, the single-word naming and the word comprehension tasks. For the EVIP, the child was asked to say the number corresponding to the chosen picture instead of pointing in the online version. Similarly, the child was asked to name the corresponding color rather than pressing a button of that color in the word comprehension task. For both the single-word naming and word comprehension tasks, a slideshow presentation with the 80 images and five different orders were created. Instead of a random order, the item order of the five versions matched the order of the first five participants in the group of children with WF difficulties.
Analyses
Narratives and the TWFD
The two narratives were combined to generate a longer language sample. Two types of measures were computed given that they have previously been proposed to identify potential WF difficulties: lexical diversity (number of different words; Degani et al., Reference Degani, Kreiser and Novogrodsky2019) and WF behaviours based on the TWFD (yielding eight percentages as scores; German, Reference German1991). For the analysis from the TWFD, after transcribing the language sample, examiners had to divide the language sample into T-Units – that is, main clauses with their dependent clauses. Clauses with subordinates were considered one T-Units while coordinated clauses with and, or or but were separated into two T-Units. Examiners then inspected the language sample for German (Reference German1991)’s seven word-finding behaviours: substitutions, insertions (comments such as “I can’t remember the word”), repetitions, word reformulations, time fillers (three or more “uh” or “ums” of any type in a T-Unit) and delays (pauses of six seconds or more). A full description of the seven WF behaviours can be found in Bourassa Bédard et al. (Reference Bourassa Bédard, MacLeod and Trudeau2022) or German (Reference German1991). Percentages of T-Units containing each of these behaviours can be computed (seven percentages), but the most used measure is the percentage of T-Units with at least one WF behaviour.
Interrater agreement for the narratives and the TWFD
To ensure reliability in analysis from the TWFD, we calculated interrater agreements for the narratives and for the TWFD. The second rater was an experienced transcriber who had previously received training from the first author. The second rater, who was blind to group membership, transcribed 5 children’s narratives and TWFD (10% of the sample) and coded the language samples for the presence of WF behaviours. Agreement was calculated for the division of language samples into T-Units and the eight percentages of T-Units. Most measures for the narratives and the TWFD showed substantial agreement or better (kappas ranging from 0.61-0.80) according to Landis & Koch’s (Reference Landis and Koch1977) interpretation of kappa values, which seemed consistent with percentages of agreement. A few measures on the TWFD had a moderate (kappas ranging from 0.41-0.60) to substantial agreement (0.61-0.80), which seemed consistent with percentages of agreement. Notably, Cohen’s kappa for delays was very low at 0 (no agreement) and was not consistent with the percentage of agreement (99.8%), which is likely due to an imbalance in the contingency table (only one rater identified a delay).
Statistical analysis
To identify potential areas of relative strength and weaknesses (research question 1), we performed group comparisons between children with and without WFD. Given the possibility of WF profiles, we were also interested in individual results. We thus calculated the percentage of children with WF difficulties scoring above -1 standard deviation of children without WF difficulties’ mean on all measures. As in Messer and Dockrell (Reference Messer and Dockrell2013), a task where more than 50% of children scored above -1 standard deviation was considered a relative strength. A task where less than 50% scored above -1 standard deviation was considered a relative weakness. To further investigate possible WF profiles (research question 2) – namely, semantic vs. phonological profiles of difficulties – cluster analyses were performed on the measures that were included for this purpose. Semantic measures included semantic fluency and word definitions. Phonological measures included letter fluency and nonword repetition.
Results
Group comparisons between children with and without WFD
Children with and without WF difficulties were compared on all tasks to determine areas of relative strengths and weaknesses for children with WF difficulties using a MANCOVA. Modality of testing (online or in person) and the three sociodemographic factors (age, gender and SES) were entered as covariates. No gender or SES effects were found for the variables. There was an effect of testing modality for letter fluency (p = 0.035) and an age effect for the percentage of T-Units containing at least one WF behaviour for the narratives (p = 0.047), the word definition task (p = 0.024), the number of different words in the TWFD (p = 0.004), the single-word naming task (p < 0.001 for both adjusted and non-adjusted scores) and the word comprehension task (p = 0.002). As reported in Table 3, the MANCOVA revealed differences between children with and without WF difficulties on the Word Finding Referral Checklist (p < 0.001), the EVIP (p = 0.003), narration measures: number of different words (p = 0.009) and the percentage of T-Units containing at least one WF behaviour (p = 0.025), semantic fluency (p = 0.007), letter fluency (p = 0.020), word definitions (p < 0.001), and accuracy in single-word naming (both non-adjusted, p < 0.001) and adjusted scores, p = < 0.001). The strongest effect size was found for the Word Finding Referral Checklist (Eta2 = 0.669), followed by word definitions (Eta2 = 0.463) and accuracy in single-word naming (accuracy Eta2 = 0.378 and adjusted score Eta2 = 0.384). No differences were found on nonword repetition (p = 0.144), measures on the TWFD (number of different words (p = 0.055); percentage of T-Units containing at least one WF behaviour (p = 0.961)), and the word comprehension task (p = 0.093).
Note.
a Percentage of children with WF difficulties who scored below one standard deviation of typically developing children’s mean. Expected value is 15%.
b n = 22
c n = 21
d = Accuracy score counting only known words according to the word comprehension task.
WF = word finding; EVIP = Échelle de vocabulaire en images Peabody; NDW = number of different words; %WFB = percentage of T-Units containing at least one word-finding behaviour; TWFD = Test of Word Finding in Discourse.
Individual results for children with WF difficulties
Individually, children with WF difficulties had relative strengths in narration (both NDW and %WFB), nonword repetition, semantic fluency, on the TWFD (both NDW and %WFB), and on the word comprehension task. Only some children with WF difficulties struggled on these tasks; 20.83% to 41.67% of the children failed these tasks – that is, scored below -1 standard deviation of the typical children’s mean. Children with WF difficulties had relative weaknesses on the WF Referral Checklist, letter fluency, word definitions and accuracy in single-word naming (both non-adjusted and adjusted scores). Specifically, all children with WF difficulties fell below -1 standard deviation of the mean on the WF Referral Checklist. For the other measures, they struggled most with word definitions. Scores were below -1 standard deviation of the mean for almost 80% of the children with WF difficulties. Letter fluency and accuracy in single-word naming were other areas of weaknesses. The EVIP was neither an area of relative strength nor weakness.
Cluster analysis
To identify potential subgroups of children with WF difficulties, we used a series of K-means cluster analyses. The cluster analyses were performed with all participants – that is, with and without WF difficulties – and only included the four tasks that were used to detect specific semantic or phonological deficits. As mentioned earlier, subgroup difficulties on semantic fluency and word definitions would imply semantic difficulties, while subgroup difficulties on letter fluency and nonword repetition would suggest phonological difficulties. Figure 1 illustrates the expected clusters based on theoretical semantic vs phonological profiles of WF difficulties. For Figures 1 to 3, we transformed each cluster’s mean to a percentage (mean divided by the highest observed score *100%) to account for the fact that the four tasks were not on the same scale. Given the heterogeneity in the number of WF profiles according to each author, the number of clusters was set to range from two to five. However, when the number of clusters was set to five, one participant formed a cluster by themself. Therefore, only results of the analyses for two to four clusters are reported below. Summarized data for the cluster membership can be found in Table 4. Differences between the clusters for each of the four measures were tested through ANOVAs. These ANOVAs confirmed significant differences on all four measures for the three cluster analyses.
Note.
* The numbers in parenthesis correspond to the number of children with WF difficulties who had also been diagnosed with a DLD.
When the number of clusters was set to two, participants were divided into two clusters that somewhat aligned with the predetermined groups. The first cluster, participants who performed relatively well on all four tasks, contained 30 participants. The second cluster, participants who performed poorly on all four tasks, contained 16 participants. Typically developing children were all in the first cluster except one, while children with WF difficulties were mostly (15/24) in the second cluster. Out of the nine children with WF difficulties in the “higher performance” cluster, five had a DLD.
When the number of clusters was set to three, the first cluster was very similar (n = 30; two participants changed cluster) as can be seen in Figure 2. All typically developing children except two and 10 children with WF difficulties were in this cluster. Six of these children also had a DLD. The second cluster (n = 6) performed poorly on most tasks, especially nonword repetition and word definitions. This cluster only contained children with WF difficulties and with a DLD. The third cluster (n = 10) seemed to be intermediary between the two. Participants in the third cluster performed similarly to the first cluster (typically developing) on nonword repetition, but similarly to the second cluster on letter fluency. They scored between the two clusters on word definitions but performed worst on semantic fluency. This cluster included 8 children with WF difficulties, 3 of whom had a DLD.
When the number of clusters was set to four, the first cluster comprised participants who performed well on all tasks as can be seen in Figure 3. The number of participants in this cluster (n = 19) was, however, different from the first clusters in the two other analyses. It included two participants with WF difficulties. The second cluster comprised participants (n = 4) who performed poorly on all tasks, especially nonword repetition. These children had both WF difficulties and a DLD. Participants in the third cluster (n = 7) performed poorly on word definitions and both fluency tasks, but not on nonword repetition. This cluster contained mostly children with WF difficulties some of which also had a DLD. Participants in the fourth cluster (n = 16) seemed to be an intermediary: they performed well on nonword repetition and semantic fluency, but poorly on word definitions and letter fluency. This cluster contained many typically developing participants. Interestingly, cluster 3 and 4 both had children with WF difficulties and with a DLD.
Discussion
The current study aimed to better understand which tasks and measures were difficult for French-speaking children aged 7 to 12 years old with WF difficulties compared with their peers with typical development. We argued that these results would help us make assumptions about what language processing deficits result in WF difficulties and help us describe profiles of difficulties. These results would also help us better identify WF difficulties in French-speaking children since previous research suggests that there may exist some differences in WF in French and in English (Bourassa Bédard & Trudeau, Reference Bourassa Bédard and Trudeau2021; Bourassa Bédard et al., Reference Bourassa Bédard, MacLeod and Trudeau2022).
Our analysis suggested that modality of testing and sociodemographic variables, gender and SES did not have an impact on WF measures. Regarding modality of testing, this result must be interpreted with caution, especially for the nonword repetition task. Although previous research in speech-language pathology has suggested that online assessments are valid assessments of a child’s communicative abilities, measures at the phoneme level, like those used for a nonword repetition task, are susceptible to reduced reliability (Taylor et al., Reference Taylor, Armfield, Dodrill and Smith2014). Regarding gender, this result is unexpected for measures of WF in narration. Our previous study of typically developing children (Bourassa Bédard et al., Reference Bourassa Bédard, MacLeod and Trudeau2022) found that boys of the same age-range produced more word-finding behaviours than girls.
Most importantly, group differences between children with and without WF difficulties were observed on a range of measures: the WF Referral Checklist, semantic and letter fluencies, narration measures: number of different words and the percentages of T-Units containing at least one WF behaviour, one of the two word-comprehension tasks (EVIP only), and word definitions. This result is consistent with previous literature suggesting that parent questionnaires, single word naming, fluencies and narration measures are useful measures in the assessment of WF difficulties (Paul et al., Reference Paul, Norbury and Gosse2018; Martins et al., Reference Martins, Vieira, Loureiro and Santos2007; Messer & Dockrell, Reference Messer and Dockrell2013). To our knowledge, this is the first empirical study to suggest that parent questionnaires and number of different words could be useful in the assessment of WF difficulties. We found statistical differences between children with and without WF difficulties for the narration measures (both number of different words and percentage of T-Units containing at least one WF behaviour), but not for the TWFD. This finding suggests that not all discourse tasks are equal; narration may be a more useful clinical tool to identify WF difficulties. The absence of statistical differences for the TWFD is also unexpected but supports the hypothesis that there may be key differences in WF abilities in French and in English at the discourse level (Bourassa Bédard & Trudeau, Reference Bourassa Bédard and Trudeau2021; Bourassa Bédard et al., Reference Bourassa Bédard, MacLeod and Trudeau2022). The finding that children with WF difficulties struggled on one word comprehension task is, however, not consistent with previous literature. Children with WF difficulties usually perform at typical levels on this task (Messer & Dockrell, Reference Messer and Dockrell2006). This finding could be explained by the fact that many participants with WF difficulties also had a DLD. Indeed, the EVIP has been identified as a useful clinical tool for identifying a DLD in French-speaking children (Thordardottir et al., Reference Thordardottir, Kehayia, Mazer, Lessard, Majnemer, Sutton, Trudeau and Chilingaryan2011). The difference between our two groups may be better explained by the presence or absence of DLD rather than WF difficulties alone. This would support the importance of using word comprehension tasks using the same items as in single word naming (German, Reference German, Bahr and Silliman2015), especially since an adjusted score of the single-word naming based on known words only remained statistically different between children with and without WF difficulties. Future studies may also want to include a control group with a similar lexical age, as measured by the EVIP, to control for these differences. Regarding word definitions, this task had never been used, to our knowledge, with children presenting WF difficulties. Our results suggest that this task is promising to accurately differentiate children with WF difficulties from typically developing children. Moreover, low performance on two tasks that tap into the semantic system, word definitions and semantic fluency, supports the semantic deficit hypothesis. Children with WF difficulties may have difficulties building or retrieving precise semantic representations. Apart from the TWFD, no differences were found for the nonword repetition and the word comprehension task. These results must also be interpreted with caution considering that the absence of a statistical difference for some of these tasks may be due to the relatively small sample and the high number of comparisons despite the adjustment for covariables.
Individual performances led to similar conclusions. All children with WF difficulties scored below one standard deviation of the mean of children with typical development on the WF Referral Checklist, suggesting that this questionnaire may be a useful screening tools for speech-language pathologists. We must however remain cautious with this interpretation since it was not the goal of this study. Among the remaining tasks, word definition was the hardest task for children with WF difficulties: 79.17% of them scored below one standard deviation of typically developing children’s mean. The letter fluency and accuracy on single word naming were other areas of relative weaknesses. In contrast, the number of different words and the percentages of T-Units containing at least one WF behaviour, for both the narratives and the TWFD, nonword repetition, semantic fluency, and word comprehension (Bragard et al., Reference Bragard, Schelstraete, Collette and Grégoire2010: task only) were areas of relative strength because less than half the participants failed these tasks.
Taken together, these results hinted at the possibility of WF profiles. Cluster analysis revealed that our participants could be separated in two to four groups. Regardless of the target number of clusters, one of the clusters performed well on all tasks, while another one tended to struggle on all four tasks, especially on nonword repetition. Children in this second cluster had both WF difficulties and a DLD when the number of target clusters was set to three or four. The other clusters appeared to have intermediary scores on certain tasks, but not all. These results are consistent with previous literature arguing that children with WF difficulties can be classified into profiles (e.g., Best et al., Reference Best, Hughes, Masterson, Thomas, Howard, Kapikian and Shobbrook2021). However, in the cluster analyses, participants with WF difficulties were not divided into groups as expected based on models of lexical access. Recall that, as mentioned in the introduction, a key feature of these models is a distinction between a semantic and a phonological stage. This distinction has led researchers to expect that children with WF difficulties can be divided in at least two groups based on the representations axis (see Best et al., Reference Best, Hughes, Masterson, Thomas, Howard, Kapikian and Shobbrook2021 or Bragard et al., Reference Bragard, Schelstraete, Snyers and James2012, for example): one with phonological strengths, with high performance on phonological fluency and nonword repetition; the other with semantic strengths, with high performance on semantic tasks (see Figure 1 illustrating expected clusters). However, our clusters did not align with these profiles. One could argue that a bigger sample size would have allowed better characterization of multiple profiles. However, although our sample size of children with WF difficulties was small, it was similar to previous studies. Two explanations may better explain the lack of alignment with theoretical profiles. First, we cannot rule out the possibility that WF profiles exist, but that semantic and phonological profiles may not classify these children’s difficulties well. As noted in the introduction, Messer and Dockrell’s (Reference Messer and Dockrell2013) study of children with WF difficulties also sought to classify these children into difficulty profiles. Their results may not be entirely comparable to ours since theirs included measures of written language. Nevertheless, similarly to the current study, their participants were not clearly categorized into a group of children with phonological strengths and a group of children with semantic strengths They found a cluster of children with semantic deficits, as in poor comprehenders, and another cluster of children with both semantic and phonological deficits, as in DLD. Our study also found a group of children with both semantic and phonological deficits, but our other clusters were intermediary groups. Taken together, Messer and Dockrell (Reference Messer and Dockrell2013) and our study’s results raise the possibility that WF difficulties may not present themselves in clear semantic or phonological profiles and that semantic and phonological difficulties can co-occur as part of WF difficulties. Our data however did not support the classification in intervention studies where researchers purposely categorized children with WF difficulties into semantic and phonological profiles. Indeed, our cluster of children with both semantic and phonological deficits may benefit better from both semantic and phonological interventions, as in German et al.’s (Reference German, Schwanke and Ravid2012) study where participants benefited more from a semantic and phonological intervention than a semantic intervention alone. Secondly, while our tasks were used to assess semantic and phonological skills, other abilities may have contributed to the children’s results. For example, some studies with children have used fluency tasks as a measure of executive functions (e.g., Brocki & Bohlin, Reference Brocki and Bohlin2004). In fact, this claim may explain why each cluster seemed to perform similarly on both fluency tasks. Furthermore, it is possible that the letter fluency task may not reflect phonological performances very well. Although letter and phonological fluencies are similar, letter fluency task may tap into orthographic knowledge rather than phonological knowledge. However, it is important to mention that at least some children did approach this task in a phonological way by naming words that started with the corresponding phoneme, /s/ or /f/, rather than the letters. Similarly, it is also possible that word definitions and nonword repetition involve other language or cognitive processes. For word definitions, although children are asked to elaborate on their semantic knowledge about words, the task involves a verbal response and some words that may not be known. Tasks involving verbal responses open the possibility to failure due to poor retrieval (German, Reference German, Bahr and Silliman2015). Children with WF difficulties at the phonological level could fail this task because of WF alone, rather than poor semantic representations. Constable et al. (Reference Constable, Stackhouse and Wells1997) presented a similar argument in their case study of a child with poor phonological skills that failed a semantic task. To perform well on a word definition task, children also need to have a large vocabulary. Unlike single-word naming and word comprehension tasks, the word definition task that we used did not control for words that were not known. In this case, children with WF difficulties may fail a word definition task due to poor vocabulary rather than imprecise semantic representations. Thus, poor performance on the word definition task by children with WF difficulties compared to typically developing peers does not guarantee that all children with WF difficulties have imprecise semantic representations. Future studies of WF profiles should consider including tasks that control for verbal responses and for words that are not known by the child, such as a semantic association task or by combining a word comprehension task with the word definition task. For nonword repetition, recall that research in WF suggests that imprecise phonological representations could be linked to WF difficulties (Best et al., Reference Best, Hughes, Masterson, Thomas, Howard, Kapikian and Shobbrook2021; German, Reference German, Bahr and Silliman2015). The nonword repetition task was included to identify potential phonological deficits – that is, to assess children with WF difficulties’ ability to learn precise phonological representations (Gathercole, Reference Gathercole2006). However, nonword repetition involves several phonological abilities including ones of perception and memory (Constable et al., Reference Constable, Stackhouse and Wells1997). A deficit in nonword repetition may better reflect general phonological processing difficulties, including but not limited to specific difficulties in the ability to learn precise phonological representations.
To summarize, the tasks in the current study were not “purely” semantic or phonological, which may have caused problems in identifying semantic or phonological clusters. Recall, however, that other authors (Best et al., Reference Best, Hughes, Masterson, Thomas, Howard, Kapikian and Shobbrook2021; Messer & Dockrell, Reference Messer and Dockrell2013) have used fluency and nonword repetition tasks to classify WF difficulties into profiles. More studies are needed to see if these results hold up with a larger sample and with a variety of semantic or phonological measures that control for verbal responses and for previous knowledge of target words. Finally, an important limit of this study was that given the lack of standardized assessments in Québec French, it was impossible to confirm the participants’ WF difficulties diagnosis. It is thus possible that some participants may have been assigned to the wrong group, which would have influenced our results. We believe that this study is a first step towards developing standardized assessments of WF difficulties in Québec French.
In conclusion, we found that, compared with typically developing children, children with WF difficulties struggled most on a caregiver questionnaire of WF and on a word definition task. These results highlight that the use of multiple tasks may be important in the clinical assessment of WF difficulties. Our results also stress the importance of including caregivers’s perspectives in the decision-making. As a group, children with WF difficulties exhibit problems with the storage or retrieval of semantic representations of the words they know. Although this provides further evidence that WF difficulties may be linked to semantic deficits, we cannot rule out, at this stage, that phonological deficits could contribute to WF difficulties. Cluster analyses with semantic and phonological tasks did not yield one cluster of children with semantic difficulties and another one of children with phonological difficulties. Children were rather grouped in a high performance group, a low performance group and one or more intermediary groups depending on the cluster model. This result challenges the traditional semantic versus phonological profiles of WF difficulties, but also suggests that semantic and phonological deficits are not mutually exclusive in children with WF. While researchers and speech-language pathologists (SLPs) have traditionally offered semantic or phonological intervention to children based on these profiles, researchers and clinicians may want to be more cautious moving forward – that is, before assuming that children with WF difficulties should be divided into clear semantic and phonological profiles to determine which intervention a child may benefit most from. More studies are needed to better understand profiles of WF difficulties and to reconcile our results with intervention studies that categorize children into semantic and phonological profiles.
Acknowledgements
We would like to thank Alexia Rondeau, MSc student, for her work on this study, and all the families who participated in this study. We have no conflicts of interest to disclose. Our work was supported by a doctoral research scholarship from the Fonds de Recherche du Québec – Société et Culture awarded to Vincent Bourassa Bédard.
Supplementary material
The supplementary material for this article can be found at http://doi.org/10.1017/S0305000923000363.