INTRODUCTION
Many researchers express their surprise at the rapid pace and apparent ease with which most infants learn their native language (e.g. Kuhl, Reference Kuhl2004). Indeed, it has often been taken as evidence that infants are designed for the task – a top-down process whereby ‘evolution’ endows the human neonate with structures dedicated to the acquisition of language (Chomsky, Reference Chomsky1959, Reference Chomsky1975, Reference Chomsky and Kasher1991; Dehaene-Lambertz & Spelke, Reference Dehaene-Lambertz and Spelke2015). According to this predetermined epigenesis perspective, some aspects of language development are independent of other cognitive abilities, and when children struggle to develop language, it is because the ‘language part’ of the brain is impaired (Pinker, Reference Pinker1994). Over the past couple of decades, however, a new account of language development has been championed which treats all linguistic knowledge as an emerging property of complex, experience-driven, self-organizing processes (Hockema & Smith, Reference Hockema and Smith2009; Mareschal, Johnson, Sirois, Spratling, Thomas & Westermann, Reference Mareschal, Johnson, Sirois, Spratling, Thomas and Westermann2007; Smith, Reference Smith2005). According to this probabilistic ‘neuroconstructivist’ perspective, language learning is yoked with cognitive development, language emerges from progressive, context-dependent processes that are also associated with non-linguistic learning, and children who struggle to develop language are likely to have domain-general impairments that affect their ability to construct non-linguistic as well as linguistic knowledge. We will discuss evidence in support of these claims; i.e. evidence of early (domain-general) processes through which language abilities emerge in typically and atypically developing infants and toddlers. Our paper is divided into three sections: the first outlines the neuroconstructivist approach we have adopted; the second discusses evidence of early processes (e.g. functional specialization) and mechanisms (e.g. neural plasticity, statistical learning) through which later-developing, higher-level language abilities emerge; and the third considers the theoretical and practical implications of the empirical findings we describe.
THE NEUROCONSTRUCTIVIST APPROACH TO LANGUAGE DEVELOPMENT
Crucial to neuroconstructivism (and other developmental systems approachesFootnote 1 such as dynamic systems theory) is the concept of control without a controller. On this view, no specific entity – genetic or otherwise – plans or controls development. Instead, developmental processes (including those related to language development) gradually emerge from complex cascades of biological and physical interactions. One way of conceptualizing this is to view development as experience-driven processes that occur within complex biological and ecological systems – and thus as constrained by internal and external factors at multiple levels (e.g. genetic, neural, behavioural, societal) and timescales (Mareschal et al., Reference Mareschal, Johnson, Sirois, Spratling, Thomas and Westermann2007).
Also crucial to neuroconstructivism (and other developmental systems approaches) is the concept of ‘interdependence’ between different, diverse factors. For example, language development varies as a function of parental input (Hoff, Welsh, Place & Ribot, Reference Hoff, Welsh, Place, Ribot, Grüter and Paradis2014), but parental input varies as a function of the child's language performance (Zampini, Fasolo & D'Odorico, Reference Zampini, Fasolo and D'Odorico2012). The interplay between diverse but interdependent units often generates complex behaviours in living systems. On this view, developmental processes should not be studied in isolation.
Finally, neuroconstructivists (and other developmental systems theorists) argue that development is an experience-driven process, and that developmental systems (like the human infant) are adaptive systems (they change in response to their environments, as they act on and learn about them).Footnote 2 On this view, language skills are developed through learning experiences. Furthermore, the timing of developmental events is likely to be an important constraining factor, because the formation of one function may shape (constrain) how later-developing functions emerge (see D. D'Souza & Karmiloff-Smith, Reference D'Souza and Karmiloff-Smith2016, for discussion).
Why would language skills be developed through learning experiences? Would it not be more advantageous to be born with a language module? Some argue that it is more adaptive to have neural circuits gradually develop over time to ensure that circuitry is appropriately shaped by the specifics of the relevant input (e.g. Bates & Elman, Reference Bates, Elman and Johnson1993; D. D'Souza & Karmiloff-Smith, Reference D'Souza and Karmiloff-Smith2011; Elman, Reference Elman1993; Elman, Bates, Johnson, Karmiloff-Smith, Parisi & Plunkett, Reference Elman, Bates, Johnson, Karmiloff-Smith, Parisi and Plunkett1996; Greenough, Black & Wallace, Reference Greenough, Black and Wallace1987; Johnson, Reference Johnson2011; Karmiloff-Smith, Reference Karmiloff-Smith1992; Stiles, Reference Stiles2008). In other words, not only may the protracted period of brain development observed in human infants allow late-generated brain structures to emerge (Clancy, Darlington & Finlay, Reference Clancy, Darlington and Finlay2000), it may also provide time for the infant brain to calibrate or adjust its internal operations (connectivity and computations) to the spatial and temporal metrics of the external world (Buzsaki, Reference Buzsaki2006). A system that adaptively develops in interaction with its environment may also be more robust to perturbation than a system with fixed, predetermined structures and functions (Miller & Page, Reference Miller and Page2007). This may explain why an individual's genes can be expressed differently depending on his or her developmental environment (Lickliter, Reference Lickliter2016).
The upshot is that neuroconstructivists expect higher-level abilities (such as language) to emerge through a gradual process of adaptation that incorporates mechanisms of change (e.g. neural plasticity) and developmental processes (e.g. progressive functional specialization) across different (but interconnected) levels and domains. ‘Plasticity’ refers to the process by which neural connectivity and circuitry change as a function of experience. It comprises different mechanisms, occurs on different levels and timescales (from the nearly immediate local effects of Hebbian plasticity to gradual large-scale reorganization of the brain; Power & Schlaggar, Reference Power and Schlaggar2016), and is associated with functional changes that include learning and memory. ‘Functional specialization’ is the concept that activity-dependent interactions between different interdependent factors (e.g. neural plasticity, sensory input) hone the functions and response properties of neural networks such that their activity becomes restricted to a narrower set of circumstances (Johnson, Reference Johnson2011). For example, neural networks that activate in response to, and can discriminate between, human infant, human adult, and monkey faces may – after extensive experience with only human adult faces – become increasingly tuned to processing human adult face stimuli and lose the ability to discriminate well between human infant or monkey face stimuli (Macchi Cassia, Bulf, Quadrelli & Proietti, Reference Macchi Cassia, Bulf, Quadrelli and Proietti2014; Pascalis, de Haan & Nelson, Reference Pascalis, de Haan and Nelson2002). Thus, at the behavioural level, developmental systems approaches expect highly constrained, highly organized, structured activity to emerge gradually from widespread, uncoordinated, spontaneous activity; at the brain level, they expect increased neural tuning (specialization) in response to a given stimulus or set of task demands (Johnson, Reference Johnson2011). Indeed, this developmental process – i.e. functional specialization – has been identified in several domains, including face perception (Pascalis et al., Reference Pascalis, Scott, Kelly, Shannon, Nicholson, Coleman and Nelson2005), motor ability (H. D'Souza, Cowie, Karmiloff-Smith & Bremner, Reference D'Souza, Cowie, Karmiloff-Smith and Bremner2016), and – as we discuss below – language ability.
SPECIALIZATION AS AN ACTIVE PROCESS
However, progressive specialization is not a passive process; it requires the infant to calibrate its internal operations to the external world by actively exploring (selecting, acting on) and sampling it (Buzsaki, Reference Buzsaki2006). If the infant lacks the cognitive tools to intelligently explore or sample the environment, or is exposed to an atypical or restricted environment that hampers exploration, then this will constrain the process of specialization and the infant is more likely to develop atypically (Johnson, Jones & Gliga, Reference Johnson, Jones and Gliga2015). In other words, infants adapt to their environment, including the language environment, by exploring it. Before they can acquire language, they must be able to sample and extract structure from the external world, as well as learn what aspects of the environment are relevant, i.e. worth directing their attention towards. For example, infants when lying on their back must redirect their attention from, say, the surface of a featureless ceiling to sample a novel talking face, and learn to extract patterns (units of speech) from the continuous streams of sound that they hear. They must also learn what aspects of their environment to focus on. For instance, if a preverbal infant were to focus their attention only on changes in pitch, then they would miss important distributional (and other acoustic) cues. Even if the preverbal infant were to appropriately select and act on the environment, such as orienting away from a ceiling towards a speaking face to sample important audiovisual speech information, he or she would also need to process that information appropriately (i.e. learn from it) or risk specializing atypically. These overlapping basic-level neurocognitive processes and mechanisms – sampling (gathering information), learning (extracting structure and integrating information), and attention (selecting information) – are important precursors to language acquisition, and the topics of the following three subsections.
Sampling
Because progressive specialization reflects changes in neural connectivity and the response properties of different neurons over developmental time (Johnson, Reference Johnson2011), and because it is an experience-dependent process (i.e. adaptive systems require input), the process of sampling the environment affects functional plasticity and the timing of developmental processes (Benasich, Choudhury, Realpe-Bonilla & Roesler, Reference Benasich, Choudhury, Realpe-Bonilla and Roesler2014; Frankenhuis & Panchanathan, Reference Frankenhuis and Panchanathan2011a, 2011b). For example, an absence of visual input can prolong the critical period during which visual information may shape certain structures (e.g. ocular dominance columns) in primary visual cortex (Mower, Caplan, Christen & Duffy, Reference Mower, Caplan, Christen and Duffy1985). In human infants, inter-individual variation in sampling can occur as a result of (i) variation in the external environment, (ii) differences between infants’ ‘internal’ environments (interactions between genes and their molecular and cellular environments), and (iii) interactions between the infant and external environment. We discuss all three constraints in detail below, using examples from the literature on bilingualism (variation in the external world), neurodevelopmental disorders (internal differences), and how having a neurodevelopmental disorder interacts with parenting (internal–external interaction).
Differences in sampling due to the external environment
Because sampling is a stochastic process (events can be statistically analyzed but not precisely predicted), Frankenhuis and Panchanathan (Reference Frankenhuis and Panchanathan2011b) hypothesize that functional specialization is likely to vary as a function of sampling behaviour. Specifically, infants who develop in highly variable environments are likely to specialize later in development than infants who develop in less variable ones. This is because sampling a less variable environment would allow the infant to quickly generate confident estimates, build accurate models of the external world, and successfully anticipate future events. By contrast, infants who develop in more variable environments may require more samples to generate confident estimates and thus more time to specialize to their environments.
In the language domain, functional specialization starts early in life. For example, human neonates are sensitive to a wide range of phonological contrasts, both native and non-native (e.g. Bertoncini, Bijeljac-Babic, Blumstein & Mehler, Reference Bertoncini, Bijeljac-Babic, Blumstein and Mehler1987; Eimas, Siqueland, Jusczyk & Vigorito, Reference Eimas, Siqueland, Jusczyk and Vigorito1971; Streeter, Reference Streeter1976). However, between 6 and 12 months of age, their sensitivity to non-native contrasts declines, while their ability to discriminate between contrasts in their native language improves (Kuhl, Stevens, Hayashi, Deguchi, Kiritani & Iverson, Reference Kuhl, Stevens, Hayashi, Deguchi, Kiritani and Iverson2006; Werker & Tees, Reference Werker and Tees1984; Werker, Yeung & Yoshida, Reference Werker, Yeung and Yoshida2012). In other words, the infant brain becomes progressively selective to its native language. This process of specialization, which involves neural commitment and thus co-occurs with a corresponding reduction in plasticity, increases the fit between infants and their specific language environment (Kuhl, Conboy, Coffey-Corina, Padden, Rivera-Gaxiola & Nelson, Reference Kuhl, Conboy, Coffey-Corina, Padden, Rivera-Gaxiola and Nelson2008). But what would happen if a child were exposed to a multilingual environment? It is likely that, within the same time frame, the child would be provided with fewer samples from each language than a child raised in a monolingual environment would receive from its one language (Byers-Heinlein & Fennell, Reference Byers-Heinlein and Fennell2014). Moreover, the presence of two or more languages would likely make the multilingual environment more variable (e.g. increased phonemic variability) than the monolingual environment. Although the effects of exposure to a more variable language environment can be offset by use of predictive cues (e.g. interlocutor identity; Martin, Molnar & Carreiras, Reference Martin, Molnar and Carreiras2016), for many children, language input from two different languages may originate in the same person, the same environment, and even the same sentence (Byers-Heinlein, Reference Byers-Heinlein2013). Furthermore, bilingual input may be less accurate (and thus less consistent) than monolingual input. Bosch and Ramon-Casas (Reference Bosch and Ramon-Casas2011) analyzed speech recordings from a group of Catalan–Spanish bilingual mothers who reported speaking predominantly Catalan at home. They found that the mothers who had been raised in Spanish–Catalan homes or Spanish-speaking homes with early exposure to Catalan made significantly more errors when producing words that contain the Catalan /e/–/ε/ contrast (which is not present in Spanish) than the mothers who had been raised in Catalan-speaking homes. These additional constraints (e.g. more variability, less consistency) could make sampling the language environment more challenging for bilingual infants, which may prolong the process of functional specialization in the language domain. Indeed, emerging evidence is consistent with this proposal. Pi Casaus (Reference Pi Casaus2015) has found that bilingual infants maintain the capacity to discriminate non-native consonants for six months longer than monolingual controls. Moreover, Sebastián-Gallés, Albareda-Castellot, Weikum, and Werker (Reference Sebastián-Gallés, Albareda-Castellot, Weikum and Werker2012) found that 8-month-old Spanish–Catalan bilinguals, but not 8-month-old Catalan or Spanish monolinguals, can visually discriminate English from French when watching silent video clips of French–English bilingual speakers reciting sentences in French or English. Although children younger than 8 months were not tested, this suggests that bilinguals maintain their sensitivity to non-native languages for longer. However, neither monolinguals nor bilinguals were able to discriminate between unfamiliar languages in adulthood (Werker, Reference Werker1986). This indicates that specialization in bilinguals is initially protracted, but leads to a similar adult state. Yet even this state may be different in these two populations. Tremblay and Sabourin (Reference Tremblay and Sabourin2012) showed that although bilingual adults do not differ in their ability to discriminate non-native contrasts from monolingual controls, they show an enhanced ability to learn them. This suggests that the adult bilingual brain retains some of its early plasticity (i.e. it is extra-sensitive to changes in the environment), perhaps as a consequence of a slower rate of specialization.Footnote 3 Thus, we argue that language context may affect brain and cognitive development via differences in the frequency, variability, and consistency of linguistic input. It is important to note, however, that we are not claiming that differences in neural plasticity or functional specialization are necessarily negative; differences in plasticity/specialization between bilinguals and monolinguals are likely to be adaptive, increasing the fit between the infant and its environment (D. D'Souza & H. D'Souza, Reference D'Souza and D'Souza2016). They may even be the source of a purported cognitive advantage in bilinguals (D. D'Souza & H. D'Souza, Reference D'Souza and D'Souza2016).Footnote 4
Differences in sampling due to the internal environment
As described above, the ability to discriminate contrasts in the native language improves over developmental time (Kuhl et al., Reference Kuhl, Stevens, Hayashi, Deguchi, Kiritani and Iverson2006), while sensitivity to non-native contrasts declines (Werker & Tees, Reference Werker and Tees1984; Werker et al., Reference Werker, Yeung and Yoshida2012). This specialization for native language increases the fit between infants and the environment. However, infants demonstrate considerable variability in their ability to adapt to the environment. Moreover, these early individual differences are associated with later developmental outcomes. Specifically, infants who show more specialization for their native language (e.g. better native, and/or worse non-native, phonetic perception skills) have on average better language outcomes as toddlers (Kuhl, Conboy, Padden, Nelson & Pruitt, Reference Kuhl, Conboy, Padden, Nelson and Pruitt2005; Kuhl et al., Reference Kuhl, Conboy, Coffey-Corina, Padden, Rivera-Gaxiola and Nelson2008; Tsao, Liu & Kuhl, Reference Tsao, Liu and Kuhl2004). Assuming that there were no major differences in the amount and quality of the linguistic input that the participants received, we could draw the conclusion that later emerging language abilities are at least partly constrained by individual differences in the ability to select or process environmental input.
If individual differences in sampling constrain development even in typically developing children, then might alterations in sampling in atypically developing populations affect developmental processes such as specialization to an even greater extent? For example, there is some evidence that neurophysiological responses to speech input are initially diffuse (widespread and bilateral) in typically developing (TD) children and only gradually become more focal (e.g. left hemisphere dominant) and differentiated (with regions fractionating and becoming more selectively activated) over developmental time (Brauer & Friederici, Reference Brauer and Friederici2007; Minagawa-Kawai, Mori, Naoi & Kojima, Reference Minagawa-Kawai, Mori, Naoi and Kojima2007; but see Brown, Petersen & Schlaggar, Reference Brown, Petersen and Schlaggar2006, for critical discussion). By contrast, twenty-four toddlers with autism spectrum disorder (ASD) showed more diffuse patterns of activation in responses to speech input than twenty TD controls matched on chronological age (Coffey-Corina, Padden & Kuhl, Reference Coffey-Corina, Padden and Kuhl2008). Moreover, when the toddlers with ASD were divided into ‘high’ and ‘low’ functioning subgroups (by median split of Autism Diagnostic Observation Schedule social scores; Lord et al., Reference Lord, Rutter, Goode, Heemsbergen, Jordan, Mawhood and Schopler1989), the twelve low functioning toddlers with ASD showed more diffuse patterns of activation than the twelve high functioning toddlers with ASD (Coffey-Corina et al., Reference Coffey-Corina, Padden and Kuhl2008). Johnson et al. (Reference Johnson, Jones and Gliga2015) propose that synaptic dysfunction in ASD (e.g. perturbed synaptogenesis; Bourgeron, Reference Bourgeron2009; Gilman, Iossifov, Levy, Ronemus, Wigler & Vitkup, Reference Gilman, Iossifov, Levy, Ronemus, Wigler and Vitkup2011; Zoghbi, Reference Zoghbi2003) results in increased neural noise and poor sampling of the environment (reduced fidelity), which leads to prolonged specialization plus excessive plasticity and hence developmental delay. Indeed, poor evoked (neural) response reliability (i.e. less consistency across trials), yielding weaker signal-to-noise ratios in visual, auditory, and somatosensory systems, has been identified in fourteen adults with ASD relative to fourteen TD controls matched on chronological age and IQ (Dinstein, Heeger, Lorenzi, Minshew, Malach & Behrmann, Reference Dinstein, Heeger, Lorenzi, Minshew, Malach and Behrmann2012). A similar theory of autism suggests that excessive neuronal information processing in local circuits (leading to hyper-perception, hyper-attention, hyper-memory, etc.) renders the world uncomfortably intense for individuals with the disorder, leading to social and environmental withdrawal (Markram, Rinaldi & Markram, Reference Markram, Rinaldi and Markram2007; see also Rubenstein & Merzenich, Reference Rubenstein and Merzenich2003, and Simmons, Robertson, McKay, Toal, McAleer & Pollick, Reference Simmons, Robertson, McKay, Toal, McAleer and Pollick2009, for similar theories). This would also result in poor sampling of the environment. Although this has not yet been directly tested in toddlers with ASD, atypical patterns of brain activity (e.g. reduced gamma phase-locking) in children with ASD look similar to patterns of brain activity in rats prenatally exposed to valproic acid (Gandal, Edgar, Ehrlichman, Mehta, Roberts & Siegel, Reference Gandal, Edgar, Ehrlichman, Mehta, Roberts and Siegel2010), a chemical compound that causes hyper-reactivity, hyper-connectivity, and hyper-plasticity of glutamatergic synapses in local cortical microcircuitry (Markram et al., Reference Markram, Rinaldi and Markram2007), hinting at an association between developmental delay and excessive plasticity. Furthermore, infants who develop ASD attend less to people and their activities from 6 months of age, and while their social-communication skills are indistinguishable from those of TD infants at 6 months, clear differences emerge by 12 months (see Jones, Gliga, Bedford, Charman & Johnson, Reference Jones, Gliga, Bedford, Charman and Johnson2014, for a review).
ASD is not the only disorder for which synaptic dysfunction has been attributed, however. In fact, a number of other neurodevelopmental disorders also show widespread abnormalities in synaptic function and plasticity (Blanpied & Ehlers, Reference Blanpied and Ehlers2004; Zoghbi, Reference Zoghbi2003). For example, dendritic spine abnormalities (which result in synaptic dysfunction; Nimchinsky, Sabatini & Svoboda, Reference Nimchinsky, Sabatini and Svoboda2002) have been identified in Down, fragile X, Patau, Rett, and Williams syndromes (Chailangkarn et al., Reference Chailangkarn, Trujillo, Freitas, Hrvoj-Mihic, Herai, Yu and Muotri2016; Irwin, Galvez & Greenough, Reference Irwin, Galvez and Greenough2000; Kaufmann & Moser, Reference Kaufmann and Moser2000). At the same time, functional specialization in one or more domains may be atypical in these disorders. For example, whereas music (vs. noise or rest) elicits reliable, robust, and focal activations in superior and middle parts of the temporal lobe in TD adults, it elicits highly variable and diffuse activation in adults with Williams syndrome that involves regions in the amygdala, cerebellum, and brainstem (Levitin et al., Reference Levitin, Menon, Schmitt, Eliez, White, Glover and Reiss2003).
Although more evidence is needed to establish a link between synaptic abnormalities and protracted or atypical specialization, it is highly probable that synaptic disturbances affect development. This is because synapses enable neurons to communicate with one another. They are not fixed; rather, they are modifiable, constantly being created, pruned, and altered throughout the lifespan. Although the development of neuronal circuitry is constrained by epigenetic activity, the size, shape, number, and pattern of synaptic connections are governed by experience (Fiala, Spacek & Harris, Reference Fiala, Spacek and Harris2002). Synaptic dysfunction could lead to sparser and less reliable sampling of environment, and hence a worse signal-to-noise ratio, which in turn may result in protracted plasticity and atypical specialization of function (Rubenstein & Merzenich, Reference Rubenstein and Merzenich2003). Therefore, because individuals with neurodevelopmental disorders present with synaptic abnormalities, they may have difficulty sampling their environment efficiently, and hence present with prolonged or atypical specialization early in development.
Differences in sampling due to interactions between the external and internal environments
Abnormal specialization in individuals with a neurodevelopmental disorder may not just be the result of individual differences at the genetic level, however. Atypically developing children also develop within atypical environments. This can result from straightforward interactions; for example, a child with Williams syndrome who has difficulty reading may spend less time reading and thus receive less linguistic input (fewer samples) than a TD child. But it can also result from subtler interactions. For example, we have argued that the “moment that a parent is informed that their child has a genetic disorder, the parent's behaviour subtly changes … [and] as a result, the baby's responses within the dyadic interaction will also be subtly modified” (Karmiloff-Smith, D'Souza, et al., Reference Karmiloff-Smith, D'Souza, Dekker, Van Herwegen, Xu, Rodic and Ansari2012, p. 17263). In other words, a change at the genetic level may induce changes in the environment, which may have compounding effects on brain and cognitive development, which will affect gene expression and other gene–environment interactions. We have, for instance, observed that “parents of infants/toddlers with genetic syndromes often find it difficult (compared with parents of TD infants) to allow their atypically developing offspring to freely mouth objects to explore their properties with the sensitive nerve endings in the mouth or crawl/walk uninhibited around the laboratory to fully discover their environment” (Karmiloff-Smith, D'Souza, et al., Reference Karmiloff-Smith, D'Souza, Dekker, Van Herwegen, Xu, Rodic and Ansari2012, pp. 17263–17264). Although the reticence of parents with vulnerable children is understandable, it will likely result in a less richly explored, less sampled environment.
Another example of how interaction between external and internal environments constrains sampling comes from two studies on Down syndrome (DS). Whereas parents of TD children often use basic-level category terms to label objects in naming (e.g. ‘bird’ for magpie), Cardoso-Martins and Mervis (Reference Cardoso-Martins and Mervis1985) found that the parents of five children with DS were more likely to use the precise names of objects (e.g. ‘magpie’). This difference between parental linguistic inputs may stem from the fear that a child with a neurodevelopmental disorder may never learn the correct names for objects. But different input may have deleterious effects. For instance, Karmiloff-Smith, D'Souza, et al. (Reference Karmiloff-Smith, D'Souza, Dekker, Van Herwegen, Xu, Rodic and Ansari2012) suggest that initial overgeneralization encourages category formation (e.g. by calling different animals ‘bird’, the child starts to create an implicit animal category) and children with neurodevelopmental disorders have difficulty with category formation. It is important to note, however, that Cardoso-Martins and Mervis (Reference Cardoso-Martins and Mervis1985) studied only five child–mother dyads with DS, and also they did not establish a causal relationship between maternal labelling behaviours and children's language development (see Rondal & Docquier, Reference Rondal and Docquier2006, for a critical review). Nevertheless, unconscious assumptions about what an atypical child can and cannot learn may lead parents to provide less variation in linguistic input (fewer samples) and thus a more impoverished linguistic environment. Indeed, a more recent study provides evidence that maternal input addressed to fourteen 24-month-olds with DS is simpler (contains fewer function words and more onomatopoeic words/routines) than that addressed to twenty-eight TD controls matched on lexical skills (Zampini et al., Reference Zampini, Fasolo and D'Odorico2012). Although it remains to be investigated whether children with DS may in fact benefit more from simpler input, the study demonstrates how interaction between external and internal environments alters the frequency and quality of linguistic input, which may constrain development by changing the learning environment. Furthermore, because the cognitive and linguistic profiles of children with neurodevelopmental disorders are often uneven (e.g. Karmiloff-Smith, Broadbent, et al., Reference Karmiloff-Smith, D'Souza, Dekker, Van Herwegen, Xu, Rodic and Ansari2012), genetic vulnerabilities may make it more difficult for caregivers to accurately assess their child's developmental level and adjust their expectations and input accordingly. This may especially be the case because neurodevelopmental disorders often affect expressive language more than receptive language in infants/toddlers (e.g. D. D'Souza, Reference D'Souza2014), and parents often tell us that expressive language is easier to assess than receptive language, which hints at the possibility that they are underestimating their child's receptive language ability.
Learning
Accumulating samples is essential for adaptive behaviour, because an infant can only adapt to the external world by gathering information about it. However, for the information to be useful, the infant must also be able to understand or extract structure from the samples. For example, they must learn to extract patterns (units of speech) from the continuous streams of sound they hear. In the following sections, we discuss two important learning mechanisms in infancy: (i) habituation and (ii) statistical learning.
Habituation
Arguably, the most basic learning mechanism available to the human infant for proactive exploration is habituation. Habituation is an adaptive process by which the infant familiarizes itself with, and gradually builds up an internal representation of, an external stimulus (Rankin et al., Reference Rankin, Abrams, Barry, Bhatnagar, Clayton, Colombo and McSweeney2009; Sokolov, Reference Sokolov1963). Over repeated successive presentations of the stimulus, the internal representation comes to represent the stimulation increasingly well, the stimulus becomes less relevant to the infant, and novelty-seeking behaviour is triggered (see Sirois & Mareschal, Reference Sirois and Mareschal2004, for a biologically plausible computation model of habituation to the familiar and novelty preference). How might this relate to language? Habituation may be a core mechanism by which the perinatal brain discriminates languages and starts specializing to its language environment (Werker & Byers-Heinlein, Reference Werker and Byers-Heinlein2008). Habituation measures are also believed to index quality or speed of cognitive processing in infants (Colombo & Mitchell, Reference Colombo, Mitchell, Colombo and Fagen1990; Colombo, Shaddy, Richman, Maikranz & Blaga, Reference Colombo, Shaddy, Richman, Maikranz and Blaga2004), and cognitive processes are critical for learning – including word learning. Indeed, a large body of evidence suggests that habituation patterns in infants predict developmental outcomes in language comprehension and production (Ruddy & Bornstein, Reference Ruddy and Bornstein1982; Tamis-LeMonda & Bornstein, Reference Tamis-LeMonda and Bornstein1989), as well as general intelligence (Bornstein & Sigman, Reference Bornstein and Sigman1986; Kavšek, Reference Kavšek2004; McCall & Carriger, Reference McCall and Carriger1993; Rose, Slater & Perry, Reference Rose, Slater and Perry1986). For example, in the visual domain, Tamis-LeMonda and Bornstein (Reference Tamis-LeMonda and Bornstein1989) presented thirty-seven typically developing 5-month-olds with a series of face stimuli. The mean duration of the infant's first two looks constituted a baseline, and stimulus presentations continued until the infant had reached a habituation criterion of two consecutive looks each less than 50% of that baseline. The researchers found that shorter baselines (the start level of habituation), steeper slopes (the decline in attention), and lower decrement scores (the asymptote of habituation) predicted better language comprehension (the number of words flexibly understood across contexts) at 13 months. In the auditory domain, Choudhury and Benasich (Reference Choudhury and Benasich2011) presented twenty-eight TD 6-month-olds with consecutive pairs of tones, of which 85% were ‘standard’ 100–100 Hz pairs and 15% were ‘deviant’ 100–300 Hz pairs. In this auditory oddball paradigm, the auditory system starts to habituate to the consecutively presented standards, a process that is punctuated at random intervals by presentation of the infrequent deviant and resultant ‘mismatch’ brain response. Auditory evoked potentials (AEPs) predicted language abilities at 36 and 48 months of age. Specifically, infants with larger and sharper (more transient) ‘mismatch’ AEPs in response to the deviant stimuli had better receptive and expressive language scores (measured using the Preschool Language Scale [Zimmerman, Steiner, & Pond, Reference Zimmerman, Steiner and Pond1992] and Clinical Evaluation of Language Fundamentals [Wiig, Secord, & Semel, Reference Wiig, Secord and Semel1992]).
Individual differences in habituation mechanisms may therefore constrain language development. This may also be the case for children with neurodevelopmental disorders. Although no study has hitherto reported an association between variation in habituation and language development in atypically developing children, Guiraud et al. (Reference Guiraud, Kushnerenko, Tomalski, Davies, Ribeiro and Johnson2011) found that auditory evoked potentials to repeated (non-speech) sounds decrease over time in twenty-one TD infants but not in thirty-five infants at high risk of developing autism, which indicates impaired habituation. Furthermore, using an auditory oddball paradigm similar to the one described above, D. D'Souza (Reference D'Souza2014) found that AEPs to repeated (speech) sounds are also atypical in Down syndrome, fragile X syndrome (FXS), and Williams syndrome (WS): mismatch AEPs in response to deviant speech sounds were identified in TD infants (N = 21) but attenuated in infants at high risk of developing autism (N = 38) or with DS (N = 41), FXS (N = 10), or WS (N = 33). Although we have yet to relate these brain data to the language data we collected from these children, the neurocognitive processes indexed by mismatch AERs trigger novelty-seeking attentional processes, such as the ‘P3’ attentional orienting brain response, which we know (i) is atypical in these particular children, and (ii) is significantly correlated with vocabulary size (especially in fragile X syndrome; see section below on ‘Reallocating attention’ for details on the link between the P3 and language ability). Finding the optimal balance between building useful representations of the external world and exploring it is critical for adaptive behaviour. If children with neurodevelopmental disorders require more time to consolidate their knowledge (due to impaired habituation mechanisms), then they will have less time to proactively explore and sample new information from their environment, which may also affect their language development.Footnote 5
Statistical learning
Another learning mechanism is statistical learning, the process by which infants identify, and track over time, patterns in sensory information. These patterns range from frequency count (‘there are more sweets in the blue box than in the red one’), through frequency of co-occurrence (‘mum often appears when I cry’), to conditional probability (‘bed-time is more likely to follow bath-time than dinner-time’) (Romberg & Saffran, Reference Romberg and Saffran2010). They also differ in complexity (e.g. simple geometric shapes vs. complex faces) and concreteness (e.g. changes in pitch vs. syntactic categories) (Romberg & Saffran, Reference Romberg and Saffran2010). Importantly for the field of language development, Saffran, Aslin, and Newport (Reference Saffran, Aslin and Newport1996) demonstrated that 8-month-olds could discern recurrent syllable sequences from syllable strings spanning word boundaries by attending to statistical regularities called transitional probabilities (the conditional probability of Y given X in the sequence XY). Since publication of that seminal paper, researchers have discovered that even sleeping neonates are sensitive to the statistical structure of fluent speech (Teinonen, Fellman, Näätänen, Alku & Huotilainen, Reference Teinonen, Fellman, Näätänen, Alku and Huotilainen2009) and that many different types of sequences can be learned (e.g. AcB, where c is an element that can vary; Hsu & Bishop, Reference Hsu and Bishop2010).
Although Saffran et al. (Reference Saffran, Aslin and Newport1996) argued that statistical learning is the process through which infants segment words from fluent speech (a critical step in language acquisition), they had in fact only demonstrated that 8-month-olds could distinguish between sound sequences of varying internal coherence (high vs. low transitional probability). Questions remained unanswered, which prompted further research. For example, how do infants go from discriminating sound sequences of varying internal coherence to identifying words? Does their sensitivity to different levels of speech input (phonemes, syllables, onset, rime, lexical stress, prosody, grammatical structure, semantics, etc.) change over developmental time? What process determines which statistics are used (frequency count, frequency of co-occurrence, mutual information, etc.)? It turns out that infants actually use a range of mechanisms to combine and extract structure from multiple sources of information – which is why attentional processes (see ‘The role of attention in sampling’ section below) are so important (Kirkham, Slemmer & Johnson, Reference Kirkham, Slemmer and Johnson2002). For example, by 7–8 months, infants use both prosodic information and statistical information (e.g. transitional probabilities) to segment the speech stream, but give more weight to the former (Echols, Crowhurst & Childers, Reference Echols, Crowhurst and Childers1997; Jusczyk, Houston & Newsome, Reference Jusczyk, Houston and Newsome1999; Morgan & Saffran, Reference Morgan and Saffran1995). In other words, they segment speech at the boundary before strong syllables, and group together syllables that frequently co-occur as long as they do not cross a prosodically defined boundary. By 10·5 months, the weighting changes, with more emphasis on distributional, allophonic, and phonotactic cues than on prosody. Furthermore, Graf Estes, Evans, Alibali, and Saffran (Reference Graf Estes, Evans, Alibali and Saffran2007) found that 17-month-olds use distributional information to create internal representations, which serve as ‘candidate words’ that they can later map onto objects. In other words, children do not merely track patterns in the speech stream; they use the statistical information to learn words and word meanings (Lany & Saffran, Reference Lany and Saffran2010; Saffran & Wilson, Reference Saffran and Wilson2003; Sahni, Seidenberg & Saffran, Reference Sahni, Seidenberg and Saffran2010; Thiessen & Saffran, Reference Thiessen and Saffran2007).
What about statistical learning in atypical populations? This question has rarely been examined to date. One study found that a group of nineteen toddlers with Williams syndrome (mean chronological age was 33 months; mean mental age was 19 months) had difficulty with speech segmentation; they could detect bisyllabic words with a strong–weak stress pattern (e.g. doctor) but not a weak–strong one (e.g. guitar) – even though a typically developing infant can do this by 10·5 months (Nazzi, Paterson & Karmiloff-Smith, Reference Nazzi, Paterson and Karmiloff-Smith2003). A more recent paper found that a group of eighteen 8- to 20-month-old children with WS successfully demonstrated statistical learning of transitional probabilities in continuous speech (Cashon, Ha, Estes, Saffran & Mervis, Reference Cashon, Ha, Estes, Saffran and Mervis2016). However, the mean chronological age of the children was 15·5 months and the study did not include a control group, so it is impossible to know whether statistical learning was typical or atypical (i.e. whether onset of statistical learning was developmentally late and/or whether rate of development was slow). More research is clearly needed in this domain.
The role of attention in sampling
Another problem that individuals with neurodevelopmental disorders may have – apart from extracting structure – is inadequate sampling as a result of not focusing on relevant aspects of the environment. Previously, we highlighted the need for infants to redirect their attention away from, say, featureless ceilings to speaking faces, or for infants to focus on changes in speech sounds rather than pitch (see above). If infants fail to allocate attention appropriately, then they may miss important social cues and linguistic input, which would make it harder for them to develop language. In the following two subsections, we review evidence of two important attentional precursors to language acquisition: the ability to (i) reallocate attention, and (ii) attend to multiple sources of information and integrate them.
Reallocating attention
Typically developing infants adapt to their environment in part by learning how to shift the focus of their processing resources to the relevant aspects of a scene (Kanwisher & Wojciulik, Reference Kanwisher and Wojciulik2000). This is important because it allows detailed analysis of relevant stimuli, while filtering out irrelevant (distracting) stimuli. A child who can flexibly select and attend to things in their environment, and use this ability to guide action, will interact with (and thus experience) the world differently than a child with an attentional deficit. As a consequence, irregularities in attention may have deleterious effects on development.
For example, some evidence suggests that an important attentional precursor of language development is triadic interaction, which is when a parent uses eye-contact and eye-gaze to direct their child's focus of attention to an object or event of interest (Brooks & Meltzoff, Reference Brooks and Meltzoff2005; Morales, Mundy & Rojas, Reference Morales, Mundy and Rojas1998; Mundy & Gomes, Reference Mundy and Gomes1998; Trevarthen & Hubley, Reference Trevarthen, Hubley and Lock1978; but see Yu & Smith, Reference Yu and Smith2013, for evidence that toddlers are more likely to follow their parent's hands than eyes, and that the interaction is between equals rather than the parent taking the lead). Triadic interaction requires the infant to orient to speech sounds and flexibly shift their attentional focus between visual stimuli (the caregiver and an object of interest). TD infants learn to do this from as early as 3 months of age, developing robust triadic interaction skills by 12 months (Striano & Stahl, Reference Striano and Stahl2005). However, the literature hints that triadic interaction is impaired in at least some neurodevelopmental disorders. For example, Laing et al. (Reference Laing, Butterworth, Ansari, Gsödl, Longhi, Panagiotaki, Paterson and Karmiloff-Smith2002) found that a group of eleven toddlers with Williams syndrome (and a mean age of 29 months) followed pointing behaviour less frequently than a group of eleven TD infants matched on mental age (13 months). Also, whereas pointing precedes speech production and is predictive of early vocabulary development in TD (Bates, Benigni, Bretherton, Camaioni & Volterra, Reference Bates, Benigni, Bretherton, Camaioni and Volterra1979; Butterworth, Reference Butterworth, Francesca and Butterworth1998), it appears after speech production in WS (Mervis & Bertrand, Reference Mervis, Bertrand, Adamson and Romski MA1997). In a longitudinal study of ten children with WS, Mervis and Bertrand found that the children did not produce referential pointing gestures until well after the onset of language. Furthermore, they did not respond appropriately to their mothers’ pointing gestures, even though they were beginning to speak referentially. Directing and following attention via pointing may therefore be under-utilized in WS. It is also possible that poor saccadic control in children with WS (Brown, Johnson, Paterson, Gilmore, Longhi & Karmiloff-Smith, Reference Brown, Johnson, Paterson, Gilmore, Longhi and Karmiloff-Smith2003) affects gaze following (and hence may negatively impact language development). Although preliminary eye-tracking data from our lab tentatively suggest that a group of nine 2- to 3-year-olds with WS (and a mean age of 34 months and a mean verbal age of 19 months) may be able to follow gaze (unpublished observations), gaze following and its relationship to word learning has yet to be investigated in younger children with WS.
To further investigate how attention and language are related, we compared attentional processes, and their relation to language ability, in TD 15-month-olds (N = 27) with chronological- and mental-age-matched infants/toddlers who either had a neurodevelopmental disorder (Down syndrome [N = 45], fragile X syndrome [N = 15], Williams syndrome [N = 37]) or were at high risk of developing autism (N = 51) (D. D'Souza, Reference D'Souza2014). The study included an eye-tracking version of the gap-overlap task (adapted from Atkinson, Hood, Braddick & Wattam-Bell, Reference Atkinson, Hood, Braddick and Wattam-Bell1988), which measures how quickly the participant disengages from, shifts to, and engages with visual stimuli. These children also participated in a passive auditory oddball experiment, where continuous electroencephalogram (EEG) was recorded from the infant while speech sounds (70% standards, 15% speech deviants, 15% pitch deviants) were presented in the background (one every 700 ms). We found that the chronological- and mental-age-matched children with DS were slower than the TD controls at disengaging attention from a visual stimulus, while the chronological- (but not older mental-) age-matched infants with WS were slower at engaging (or fully processing) a visual stimulus. Also, both chronological- and mental-age-matched children with WS were slower at shifting visual attention. Moreover, the electrophysiological response to speech sounds was abnormal in all the atypically developing groups (and in the ‘at high risk of developing autism’ group) in one or more stages of processing: encoding auditory stimuli, discriminating between speech sounds, and orienting to changes in speech sounds (vs. pitch).Footnote 6 For example, the electrophysiological response that typically indexes ‘sound encoding’ did not significantly differ between stimulus types (standards, speech deviants, pitch deviants) in any of the groups – except the TD control group.
How did the attentional measures we collected relate to language ability? Across all groups, infants/toddlers who were better at disengaging visual attention and/or showed a larger attentional orienting brain response to changes in speech sounds had relatively large vocabularies (though it is important to note that the [statistically significant, medium-sized] effect of disengagement was largely driven by the TD control group, unlike the [medium-sized] effect of orienting which was highly significant across all groups). Between-group differences were also identified. For example, the toddlers with FXS (mean chronological age = 33 months) who had a relatively small orienting response (measured using the event-related potential technique) had a significantly smaller expressive vocabulary size (equivalent only to TD 5-month-olds) than those who had a relatively large orienting response (expressive language equivalent to TD 13-month-olds). This relationship did not reach significance in the DS group. However, for children with DS (mean chronological age = 28 months), those who were relatively poor at disengaging attention had a significantly smaller expressive vocabulary size (equivalent only to TD 11-month-olds) than those who were relatively good at disengaging attention (expressive language equivalent to TD 17-month-olds). This relationship did not reach significance in the FXS group. Together, these data suggest at least two mechanisms for acquiring good language skills, one more relevant to FXS (social orienting), and one more relevant to DS (visual disengagement). We speculate that disengaging and shifting attention facilitates exploration, which affects language development (see sections on ‘Social interaction’ and ‘Embodiment’ below), and that shifting attention towards changes in speech sounds (measured using ERP) also contributes to language development.
Attending to and integrating multiple sources of information
However, for optimal learning, not only must an infant orient to a speaking face, they must also look at the relevant parts of the face (e.g. the eyes [for gaze-following] or mouth [for disambiguating speech sounds]) and integrate visual and auditory information. For example, watching lip movements influences auditory (speech) perception (Alsius, Navarra, Campbell & Soto-Faraco, Reference Alsius, Navarra, Campbell and Soto-Faraco2005; Burnham & Dodd, Reference Burnham and Dodd2004; Kushnerenko, Teinonen, Volein & Csibra, Reference Kushnerenko, Teinonen, Volein and Csibra2008; McGurk & MacDonald, Reference McGurk and MacDonald1976; Rosenblum, Schmuckler & Johnson, Reference Rosenblum, Schmuckler and Johnson1997) and facilitates word learning (Teinonen, Aslin, Alku & Csibra, Reference Teinonen, Aslin, Alku and Csibra2008). Visual cues may be particularly useful for speech perception under noisy conditions (Sumby & Pollack, Reference Sumby and Pollack1954).
Could language delay in children with neurodevelopmental disorders partly result from a deficit in integrating auditory and visual information? To find out, we used an eye-tracker to measure auditory-visual (AV) speech integration in ninety-five infants/toddlers with Down syndrome, fragile X syndrome, or Williams syndrome, whom we matched on either chronological or mental age to twenty-five TD 15-month-old controls (D. D'Souza, D'Souza, Johnson & Karmiloff-Smith, Reference D'Souza, D'Souza, Johnson and Karmiloff-Smith2016). In this within-subjects design, the participants were presented with two faces. In one condition (Mismatch), the participants heard /ga/ while one face silently mouthed ‘ga’ (congruent) and the other face silently mouthed ‘ba’ (incongruent). The other condition (Fusion) was identical in all respects except the participants were presented with the sound /ba/ rather than /ga/. We predicted that TD infants would discriminate between the congruent and incongruent faces in the Mismatch condition with their looking behaviour, because when participants hear /ga/ and see ‘ba’, they perceive a mismatch – a combinatorial percept such as /bga/. We also predicted that TD infants would not visually discriminate between the congruent and incongruent faces in the Fusion condition, because the sound /ba/ and sight ‘ga’ fuses into the syllabic percept /da/ (the McGurk effect; McGurk & MacDonald, Reference McGurk and MacDonald1976), and participants with strong AV speech integration skills would demonstrate a McGurk effect and therefore not discriminate between the faces because there is nothing to distinguish one syllabic percept (/da/) from the other (/ba/). The TD controls showed the expected outcome. However, we found no evidence of AV speech integration in either the chronological- or mental-age-matched atypically developing groups (D. D'Souza, D'Souza, Johnson, & Karmiloff-Smith, Reference D'Souza, D'Souza, Horvath and Karmiloff-Smith2016). Moreover, whereas AV speech integration predicted receptive language ability in the TD controls, it did not in any of the atypically developing groups (D. D'Souza, D'Souza, Johnson, & Karmiloff-Smith, Reference D'Souza, D'Souza, Horvath and Karmiloff-Smith2016).
Why would AV speech integration not predict language ability in the neurodevelopmental disorders? We explored the possibility that the atypically developing children were fixating on the wrong parts of the face. We found that, when presented with a speaking face that mouthed a different syllable from the one they could hear, the TD infants (N = 25) with a relatively large vocabulary made more fixations to the mouth than eyes, while the toddlers with FXS (N = 14) or WS (N = 25) and a relatively large vocabulary focused more on the eyes than mouth (D. D'Souza, D'Souza, Johnson & Karmiloff-Smith, Reference D'Souza, D'Souza, Johnson and Karmiloff-Smith2015; see D. D'Souza, Cole, et al., Reference D'Souza, Cole, Farran, Brown, Humphreys, Howard, Rodic, Dekker, D'Souza and Karmiloff-Smith2015, for more evidence that toddlers with WS process faces atypically). In DS (N = 21), by contrast, fixations to the speaker's overall face (rather than to the mouth or eyes) predicted vocabulary size (D. D'Souza, D'Souza et al., Reference D'Souza, Cole, Farran, Brown, Humphreys, Howard, Rodic, Dekker, D'Souza and Karmiloff-Smith2015). This surprising result warrants further investigation. However, it suggests that children with these neurodevelopmental disorders are not using visual speech cues to bootstrap their acquisition of language in the same way as TD infants.
BEYOND COGNITION: OTHER DEVELOPMENTAL CONSTRAINTS ON LANGUAGE ACQUISITION
As shown above, under tightly controlled experimental conditions infants demonstrate an ability to assimilate information in ways that facilitate word learning. However, it is possible that they learn words in very different ways outside the laboratory. This is because different affordances may result in different adaptive behaviours, with some behaviours more optimal in the unfamiliar, tightly constrained, simplified environment of the laboratory, and others more optimal in the familiar but complex and dynamic environment of the home. What constrains language development outside the laboratory? In the following sections, we review evidence that language acquisition is constrained (among other things) by (i) social interaction and (ii) the infant's physical body.
Social interaction
Understanding language within complex natural environments may require different skills than the ones that are identified in intramural studies. There is some evidence to suggest that extracting linguistic structure from the environment requires social interaction: Nine-month-old American infants experienced twelve sessions of an unfamiliar language (Mandarin) through either live interaction with a Mandarin speaker or identical linguistic information from the same speaker but delivered via a television or audiotape (Kuhl, Tsao & Liu, Reference Kuhl, Tsao and Liu2003). The American infants exposed to Mandarin in the live interaction condition could discriminate Mandarin phonemes similarly to age-matched Chinese Mandarin monolinguals. This shows that 9-month-olds can learn and/or retain their early ability to better discriminate non-native contrasts. However, the American infants exposed to Mandarin in the television and audiotape conditions failed to better discriminate between Mandarin phonemes than a group of age-matched American controls who had been exposed only to English.
Kuhl et al.’s (2003) data suggest that social interaction is important for language learning (even if it is unnecessary for simpler statistical learning tasks presented in experimental situations; e.g. Saffran et al., Reference Saffran, Aslin and Newport1996). There are parallels of this in the non-human animal literature. For example, the young male zebra finch needs to see or physically interact with a ‘tutor’ bird to learn song (Adret, Reference Adret2004; Eales, Reference Eales1989). Moreover, the song crystalizes during a sensitive period of early life, but a recent study found that this sensitive period could be extended through social interaction (Derégnaucourt, Poirier, Van der Kant, Van der Linden & Gahr, Reference Derégnaucourt, Poirier, Van der Kant, Van der Linden and Gahr2013). In the study, after learning a song, young male zebra finches were moved to a common aviary where they could freely interact with other males. Zebra finches that had poorly imitated their song model during the sensitive period were more likely to change their song than zebra finches that had imitated their song model well. This highlights the importance of live social interaction on song learning in zebra finches. But what mechanisms underpin ‘social learning’ in the human infant? The zebra finch offers a clue. Preliminary data suggest that it imitates song better when its tutor is facing it (Ljubičić, Bruno & Tchernichovski, Reference Ljubičić, Bruno and Tchernichovski2016). Perhaps an attentive tutor increases attention and arousal in the learner, enhancing its ability to learn and memorize information. Indeed, Yu and Smith (Reference Yu and Smith2016) found that human infants extend their sustained attention to an object when a parent attends to the same object. Kuhl (Reference Kuhl2007) suggests that social learning may facilitate language acquisition by increasing attention/arousal and/or providing additional sources of information (e.g. eye-contact, confirmatory glances from the instructor).Footnote 7
Another example of how social interaction constrains language development comes from Karmiloff-Smith, Aschersleben, de Schonen, Elsabbagh, Hohenberger, and Serres (Reference Karmiloff-Smith, Aschersleben, de Schonen, Elsabbagh, Hohenberger and Serres2010). In this study, 6-month-old infants were tested on their ability to process speech, faces, and actions/events. They were also observed and videotaped for 3 minutes as they played with their mothers, and the quality of each mother–infant dyadic interaction was rated. The infants were tested again at 10 months. At the group level, the infants displayed all of the expected effects found in previous research (e.g. the infants could discriminate both native and non-native contrasts at 6 months, but only native contrasts at 10 months). However, when Karmiloff-Smith and colleagues divided their data according to the mother–child interaction ratings, they found that the 6-month-old infants of highly sensitive (vs. controlling or unresponsive) mothers seemed to show earlier specialization for the sounds of their native language (i.e. they failed to significantly discriminate between the non-native contrasts). One might expect that interactions high on sensitivity would foster child development across cognitive domains generally. However, this was not the case. The infants from dyads highly rated for their sensitivity were in advance of their peers in speech processing at 6 months and in processing physical events at 10 months, but were actually worse at processing human goal-directed actions than infants who were judged to be compliant and who also had moderately controlling mothers. Karmiloff-Smith et al. (Reference Karmiloff-Smith, Aschersleben, de Schonen, Elsabbagh, Hohenberger and Serres2010) suggested that by imposing their own choice of toy on their infants and changing the toys without showing sensitivity to the infants’ current focus of attention, the controlling mothers forced their compliant infants to frequently process their mothers’ goals rather than their own goals, leading to an advantage in understanding others’ goal-directed actions. Karmiloff-Smith et al., also suggested that, while sensitive mothers provide their infants with appropriate level linguistic input, controlling mothers speak less and are less likely to take into account the progressively changing nature of their child's vocalizations, leading to the differences in language development between the two groups of infants. And whereas sensitive mothers leave their infants sufficient time to fully explore (and learn about) the properties of objects, controlling mothers are more likely to interrupt their infants’ exploration of objects and offer them a succession of new toys, leading to differences in their infants’ ability to process physical events (Karmiloff-Smith et al., Reference Karmiloff-Smith, Aschersleben, de Schonen, Elsabbagh, Hohenberger and Serres2010). Of course, these hypotheses need to be empirically tested, and we do not actually know whether the children were compliant (passive) because their mothers were moderately controlling, or whether the mothers displayed controlling behaviours because their children were compliant, or whether there is a different explanation. Nevertheless, the data indicate that differences in dyadic interaction are associated with differences in language development.
Motivated by the observations of Karmiloff-Smith et al. (Reference Karmiloff-Smith, Aschersleben, de Schonen, Elsabbagh, Hohenberger and Serres2010), we measured various aspects of dyadic interaction in parent–infant dyads with Down syndrome or Williams syndrome. The first phase of our investigation found that, relative to eight chronological-age-matched TD dyads, sixteen dyads with DS or WS infants are lower in mutuality and engagement, parent responses are more controlling (directive) and less sensitive, and infants are less attentive to their parents and show less joint activity (Soukup-Ascençao, D'Souza, D'Souza & Karmiloff-Smith, Reference Soukup-Ascençao, D'Souza, D'Souza and Karmiloff-Smith2016). However, because parents are not necessarily intrinsically directive or sensitive, but adapt to their child's cognitive and linguistic characteristics (Hodapp, Reference Hodapp2004), the second phase of our study will compare the DS/WS data with data from mental age (MA) matched TD infants and their parents. Because both syndromes present with uneven cognitive and social profiles (Karmiloff-Smith, Broadbent, et al., Reference Karmiloff-Smith, Broadbent, Farran, Longhi, D'Souza, Metcalfe and Sansbury2012; Moore, Oates, Hobson & Goodwin, Reference Moore, Oates, Hobson and Goodwin2002), it is reasonable to expect that these atypicalities will yield different parental responses within the dyad in comparison to the parents of TD MA-matched infants (Hodapp, Reference Hodapp2004). However, atypical parenting styles are not necessarily detrimental to infants with neurodevelopmental disorders; they could reflect adaptive processes and in fact facilitate language development. For example, it is possible that atypically developing infants may be more passive and thus require (and benefit from) more direction from their caregivers than typically developing infants. Therefore, in our future research, we are planning to link our parent–infant interaction measures with measures such as spontaneous exploration, visual attention, AV speech integration, and vocabulary size, to better understand how these complex social interactions constrain language development.Footnote 8
Embodiment
Most studies on early language development (including many of those reviewed above) assume that the infant has the near-impossible task of having to extract structure by flexibly selecting and attending to things in relatively unconstrained, dynamically complex, cluttered, multimodal contexts. Attentively orienting to, disengaging from, shifting between, and engaging with various spinning toys, unpredictable sounds, eyes, mouths, and pointing behaviours, etc., can be challenging for infants (Kannass, Oakes & Shaddy, Reference Kannass, Oakes and Shaddy2006; Lansink, Mintz & Richards, Reference Lansink, Mintz and Richards2000; Ruff, Capozzoli & Weissberg, Reference Ruff, Capozzoli and Weissberg1998; Ruff & Lawson, Reference Ruff and Lawson1990), but the learning space may not be as unconstrained as many researchers assume. Learning is partly constrained by the environment, such as when the fetus hears its mother's voice but low-pass filtered through amniotic fluid, accentuating the rhythmic structure of speech (Smith, Gerhardt, Griffiths, Huang & Abrams, Reference Smith, Gerhardt, Griffiths, Huang and Abrams2003). But also, the learner's body constrains access to moment-to-moment sensory information, and in this way guides learning. For example, the neonate's short eyesight may restrict visual exploration to only the most salient features of an object; the infant's short arms may restrict exploration to only one object at a time; the toddler's short legs may restrict exploration to only part of a scene.
Furthermore, whereas researchers have focused on finding cognitive solutions to word learning problems such as referential ambiguity in early object naming (e.g. Xu & Tenenbaum, Reference Xu and Tenenbaum2007), Pereira, Smith, and Yu (Reference Pereira, Smith and Yu2014) found that the occurrence of such ambiguous events might be limited by the sensorimotor constraints that underpin real-time free-flowing interactions. Specifically, they used a head-mounted camera to record gaze data from toddlers as they played with novel objects, and as the parent spontaneously named them. The toddlers were later tested to ascertain whether they had learned the names of the objects. This enabled the researchers to identify the properties of visually optimal moments for toddlers to learn the labels of objects: the toddlers learned best when the named object was visually larger and more centred than other objects, and when that visual advantage was sustained for several seconds before and after the naming event. This suggests that learning is constrained by the sensory properties of naming moments. That is, from an adult-centric perspective, the number of possible meanings that a novel word in a novel scene may have is so great that it provides the infant with an almost insurmountable problem of referential ambiguity, but research reflecting the child's perspective demonstrates that, during word learning, large numbers of objects are not typically in view, and that factors like the infant's physical abilities or size of visual field constrain the problem space (Samuelson & McMurray, Reference Samuelson and McMurray2016). In other words, cluttered visual scenes may provide the problem of referential ambiguity, but solutions that posit language-specific biases (e.g. Lidz, Waxman & Freedman, Reference Lidz, Waxman and Freedman2003), strategies that deductively or statistically narrow down the set of possible word meanings (e.g. Golinkoff & Hirsh-Pasek, Reference Golinkoff and Hirsh-Pasek2006; Golinkoff, Mervis & Hirsh-Pasek, Reference Golinkoff, Mervis and Hirsh-Pasek1994; Xu & Tenenbaum, Reference Xu and Tenenbaum2007), and/or social skills such as inferring others’ referential intent (e.g. Woodward & Markman, Reference Woodward, Markman, Kuhn and Siegler1998), may be viable but are unnecessary; the active child creates moments of minimal visual ambiguity (by bringing objects close to their face with their short arms; Smith, Yu & Pereira, Reference Smith, Yu and Pereira2011; Yu, Smith, Shen, Pereira & Smith, Reference Yu, Smith, Shen, Pereira and Smith2009), rendering the cognitive problem obsolete.Footnote 9
The flipside of Pereira et al.’s (2014) findings is that they also point to visual limits on object name learning, which highlights the need for infants to develop abilities that will enable them to learn in more challenging environments. Changes in the ability to use one's body (e.g. changes in posture, locomotion, object manipulation) help infants with this. That is, they enable the infant to explore and experience the world in new ways (Iverson, Reference Iverson2010). For example, children who can more easily pick up toys (because they are wearing ‘sticky mittens’) are more likely to explore different toys (Needham, Barrett & Peterman, Reference Needham, Barrett and Peterman2002; cf. Williams, Corbetta & Guan, Reference Williams, Corbetta and Guan2015), and the provision of toys for exploration is a significant predictor of language development (Elardo, Bradley & Caldwell, Reference Elardo, Bradley and Caldwell1977). And by learning to walk, an infant can bring a distant object of interest to an adult who can subsequently label it (Karasik, Tamis-LeMonda & Adolph, Reference Karasik, Tamis-LeMonda and Adolph2011). Indeed, in a longitudinal study of forty-four TD infants, Walle and Campos (Reference Walle and Campos2014) found that acquisition of walking predicts vocabulary size (both receptive and expressive), and that vocabulary size (receptive, expressive) significantly increases as a function of walking experience, irrespective of chronological age. But not only do motor changes help the child to sample more of its environment (and differently), they also require and develop cognitive problem solving skills (Adolph, Reference Adolph, Rieser, Lockman and Nelson2005). Moreover, they ‘prepare’ the child for talking and listening. For example, by sitting up without assistance, an infant can breathe more efficiently, maintain subglottal pressure, and generate longer utterances in one breath (Iverson, Reference Iverson2010). By bringing objects to an adult, the infant may increase the number, and extend periods, of social interactions, which provides opportunities to learn important social behaviours such as turn-taking (Karasik et al., Reference Karasik, Tamis-LeMonda and Adolph2011; Walle & Campos, Reference Walle and Campos2014). In a second observational study of forty-four parent–infant dyads during a 10-minute free-play session (when parents later reported that they did not remember they were being filmed), Walle and Campos (Reference Walle and Campos2014) found that the more active (walking or crawling) infants had larger receptive vocabulary sizes. They also found that, irrespective of chronological age, various factors in the social environment predicted vocabulary size – in walking, but not crawling, infants. Specifically, walking infants who received more language input from the parent had larger receptive and expressive vocabularies. Walking infants who spent more time at a ‘medium’ distance from the parent (nearby but out of the parent's physical reach) had larger expressive vocabularies, perhaps because a medium distance encourages distal communication. And walking infants whose parents moved around less frequently had larger receptive and expressive vocabularies, perhaps reflecting better distal dyadic communication and less need for physical intervention from the parent.
If individual differences in motor development constrain language acquisition in TD children, they have arguably an even greater effect on language development in atypically developing infants with genetic vulnerabilities. Motor difficulties are present early in development across various neurodevelopmental disorders. For example, infants with DS experience hypotonia, joint laxity, and hypermobility (Ulrich & Ulrich, Reference Ulrich and Ulrich1993). And various motor skills, including walking, are delayed in DS. While the average age of walking in TD children is 13 months and the age range is from 9 to 17 months, most children with DS only learn to walk between 2 and 3 years of age – with some children unable to walk at 4 years of age (Palisano et al., Reference Palisano, Walter, Russell, Rosenbaum, Gémus, Galuppi and Cunningham2001). Even when children with DS learn a new motor skill, they tend to practise it less. For example, de Campos, da Costa, Savelsbergh, and Rocha (Reference de Campos, da Costa, Savelsbergh and Rocha2013) found that, even after onset of reaching, infants with DS (N = 9) produce significantly fewer spontaneous reaches than TD controls (N = 16). This may constrain their exploration of objects (and require more input from the caregiver), which could further contribute to language delay. However, interventions can hasten the late onset of motor skills. For example, infants with DS who underwent a home-based stepping training intervention achieved independent walking 101 days earlier (on average) than infants with DS who had not taken part in the intervention (Ulrich, Ulrich, Angulo-Kinzler & Yun, Reference Ulrich, Ulrich, Angulo-Kinzler and Yun2001). We hypothesize that, because the motor domain does not exist in isolation (see above), early intervention in the motor domain may also have cascading effects on connected, interdependent domains, such as language.
IMPLICATIONS, CHALLENGES, AND NEW DIRECTIONS: EMBRACING COMPLEXITY
Several decades of infant research demonstrate that word learning is constrained by multiple factors (some of which were discussed aboveFootnote 10 ) that range from the amount and quality of linguistic input they receive to the length of their arms. It is important to note, however, that arguably all of the developmental processes, domains, and levels of description discussed in this paper are interconnected. For example, habituation processes are likely to constrain language development through learning and memory, but also they are yoked to attentional processes that trigger novelty-seeking behaviour, and constrain exploratory behaviour (more time spent consolidating knowledge is likely to result in less time exploring). Many of the developmental processes, domains, and levels of description are also interdependent. For example, Down syndrome affects how an infant develops partly through the complex interactions the infant has with its parent; parenting constrains the infant's development, but the infant's development affects parenting. And because interactions between diverse, interdependent units give rise to non-linear dynamics, in order to understand how the system develops we cannot study the components of the system in isolation.Footnote 11 Understanding how infants acquire language therefore necessitates multidisciplinary research across multiple interdependent domains (e.g. sampling, learning, attention), modalities (e.g. haptic, auditory, visual), and levels of description (from genes to social context) over developmental time.Footnote 12
Furthermore, to study how complex systems adapt to their environments, it is important to study them in their environment. Moreover, because language development is a probabilistic process, there are likely to be multiple ways to an adaptive fit. Arguably, researchers should therefore abandon attempts to find one developmental pathway to language acquisition and instead identify co-varying factors that significantly constrain development. This is crucial if we want to understand how language development goes awry in infants with neurodevelopmental disorders.
How can we hope to understand language development if it requires an understanding of the multiple ways in which different but interdependent domains, modalities, and levels of description interact with each other over developmental time? Technological and methodological innovations will aid the process. For example, new or improving technologies such as head-mounted eye-trackers for infants will enable more naturalistic studies. And the increasing size and complexity of datasets collected during studies that, e.g. involve moment-by-moment gaze data from eye-trackers mounted on the heads of infants as they explore their home environment, are being paralleled by increases in computational power and the development of tools that can store, catalogue, and analyze massive datasets (see Arzi et al., Reference Arzi, Banerjee, Cox, D'Souza, De Brigard, Doll, Woo, Gazzaniga and Mangun2014, for discussion). These tools may also help researchers from around the world to share and collate data such as the speech corpus contained within the Child Language Data Exchange System (CHILDES), a central repository for language acquisition data, or the digital data library (Databrary) which facilitates the sharing and repurposing of video data. Creating a similar platform for data gathered from infants with a neurodevelopmental disorder would enable us to repurpose data and gain more insight into how learning happens moment-by-moment in these often difficult-to-study populations.
Theoretical and conceptual advancements are also important because they guide what scientists decide to study and how they study it (H. D'Souza & Karmiloff-Smith, Reference D'Souza and Karmiloff-Smith2016). Furthermore, they lead to better explanatory and predictive models of language development, and can shape social practice, education, and intervention. Take intervention, for example. To tackle language delay in children with neurodevelopmental disorders, paediatricians often target the purported malfunctioning ‘language module’ with specific outcomes in mind, such as teaching the child to produce a set of words. However, the developmental studies discussed in this paper suggest that, because language emerges gradually from the interaction of many diverse interdependent units over time, any delay may be the result of cascading effects of early impairments in non-linguistic domains (e.g. in selective attention). In this case, successful intervention would require targeting mechanisms and domains that are not necessarily directly related to the language domain and which could be on different levels of description (e.g. emotional dysregulation). Moreover, because the child is a complex adaptive system, timely intervention is likely to be critical (D. D'Souza & Karmiloff-Smith, Reference D'Souza and Karmiloff-Smith2016).Footnote 13 That is, teaching a child to learn words may be particularly challenging if the child has not, for example, developed adequate oro-motor abilities. Furthermore, because the child is a complex adaptive system, rather than attempt to restructure one particular domain such as language (e.g. by providing the child with more language lessons), it would be better to focus on creating an environment for optimal learning. This may involve various strategies such as keeping the child on the edge of the boundary between consolidating knowledge (e.g. allowing the child time to process a stimulus) and searching for new knowledge (e.g. providing the child with novel stimuli). In other words, new theoretical perspectives view the child as a complex adaptive system and suggest that it would be best to gain an understanding of, and build interventions around, the interdependency of various factors.
CONCLUSION
We have reviewed evidence for the existence of several diverse, interacting, interdependent (linguistic and non-linguistic) constraints on language development. This evidence suggests that many aspects of language are constructed piecemeal via the complex and dynamic interactions of parts within the infant as well as between the infant and its social and physical environments. Language delay is at least in part due to variations in these constraints. Future research on language development should therefore not only focus on single cues (e.g. prosodic patterns, stress patterns, the social context of the interaction) or on single domains (e.g. attention, learning, social interaction); it should integrate data from different domains and levels of description. These cues/domains are neither individually sufficient nor independent of each other, but rather they are used in combination to constrain the learning space and allow probabilistic knowledge to emerge. Future studies should therefore investigate how infants integrate multiple sources of information in naturalistic contexts and across developmental time. This may lead to a better understanding of language development and, critically, lead to more effective interventions for cases when language develops atypically.