Spelke's picture of early infant cognition in her monumental (Reference Spelke2022) book, What Babies Know, involves two basic cognitive mechanisms: core cognitive systems (none of which is unique to humans) and the human language faculty (which uniquely enables human infants to speak their native tongue). The agents core system generates infants' expectations about the efficiency of an agent's instrumental action directed toward a physical target. The core social cognition system enables infants to represent people as social beings who interactively engage and share phenomenal experiences with others. Initially the core systems are independent, compete for attention, and cannot interact with each other. Therefore, young infants cannot simultaneously attend to people as agents who efficiently pursue instrumental goals and as social beings who engage and share phenomenal experiences with others through coordinated social actions.
On Spelke's account, the human language faculty does not merely enable preverbal infants to learn the words and grammar of their native language; it also enables them to overcome the initial representational and attentional limitations of their core cognitive systems. On Spelke's view, infants' pragmatic expectations about communication are derived from their social interactions with speakers whose utterances are used to “express their thoughts to others, in accord with pragmatic principles of economy, informativeness, and relevance” (p. 435). Spelke suggests that these pragmatic expectations may arise automatically from the agents core system. As a result of language acquisition that allows the integration of the agents core system with the core social cognition system, infants' expectations about the efficiency of an agent's instrumental action are transferred from the agents core system to the core social cognition system. Nonverbal communicative interactions between social agents, however, are not supposed to induce pragmatic expectations in young infants.
Therefore, without the benefits of language acquisition, infants should be unable to construe an agent's gaze shift toward one of a pair of physical targets as an object-directed action whose social goal is to share attention to the referent object with a social partner. Nor can they understand such a nonverbal ostensive referential gesture in terms of the social agent's communicative and informative intentions to convey to his recipient relevant information about the indicated referent. In contrast to this account, we shall argue for an alternative view according to which significantly earlier than 12 months young infants exhibit special sensitivity to nonverbal behavioral cues of ostensive communication that generate their pragmatic expectations about the referentiality, informativity, and relevance of the communicative actions of social agents. As evidence for this view, we shall focus on four relevant studies that demonstrate early sensitivity to nonverbal ostensive cues of communicative intention, and indicate referential expectations as well as pragmatic expectations about the relevance of the ostensively communicated information.
In a recent study by Tauzin and Gergely (Reference Tauzin and Gergely2019), 10-month-olds watched videos depicting the interactions between two unfamiliar agents whose sole activity consisted in exchanging unfamiliar nonlinguistic sound triplets in a turn-taking manner. In the identical signals condition, the sequence of sounds produced by the second agent was fully predictable from the first agent's sound signals because the second agent strictly replicated the sound triplet emitted by the first agent. In the variable signals condition, the second agent's sound triplets were not fully predictable because they replicated only the initial sound of the first agent's sound triplets while the second and third sounds of the second agent's sound triplets were freely varied. In the subsequent test phase, only one of the two agents was present with two laterally positioned objects on its two sides, and the agent turned toward one of the two objects. Tauzin and Gergely found that the agent's target-oriented movement induced a gaze-following response in the 10-month-olds only in the variable signals condition, not in the identical signals condition. This finding does not seem compatible with the assumptions of the core social cognition system which, according to Spelke, would be expected to generate imitative attunement and gaze-following of the entity's attentional orienting response in both conditions. Arguably, however, new information can only be conveyed by (partially) unpredictable but not fully predictable signal sequences (cf. Shannon, Reference Shannon1948). If so, then gaze-following responses subsequent to the social agent's target-oriented movement only in the variable signal condition indicate that infants are sensitive to the pragmatic principle of informativity and they rely on it to identify communicative information transfer between agents.
Recently, Okumura, Kanakogi, Kobayashi, and Itakura (Reference Okumura, Kanakogi, Kobayashi and Itakura2020) reported a gaze-following study with 9-month-olds to investigate the predictions of the theory of natural pedagogy (Csibra & Gergely, Reference Csibra and Gergely2011). They tested whether ostensive cues (e.g., eye-contact or infant-directed speech) induce pragmatic expectations of referentiality, informativity, and relevance in infants in contrast to noncommunicative attention-grabbing behaviors (e.g., shivering or uttering beeping sounds). The results showed that 9-month-olds followed the model's gaze shift and spent an equal amount of time looking at the target object in both conditions. However, in a further test, infants were presented with pairs of pictures one of which depicted the previously fixated target and the other depicted a novel object. Okumura and colleagues hypothesized that ostensive cues were likely to boost the infants' processing, encoding, and retention of the properties of the fixated object. As a result, they expected a novelty effect and predicted that infants would look longer at the picture of the previously nonfixated object. The results showed that the infants looked reliably longer at the picture of the novel than of the fixated object in the ostensive, but not in the nonostensive attention-grabbing condition in accordance with their hypothesis. Moreover, in a further object choice test the infants were given the opportunity to choose between a three-dimensional (3D) replica of the previously fixated object and a 3D replica of an unfamiliar novel object. Okumura et al. (Reference Okumura, Kanakogi, Kobayashi and Itakura2020) argued that the selective preference toward the target object indicated by the model's gaze shift can be regarded as “evidence that the actor's gaze impacts the affective appraisal of objects.” In line with this assumption, infants showed selective preference for choosing the previously fixated object in the ostensive condition whereas no selective object choice was found in the noncommunicative attention enhancing behaviors condition. These findings suggest that the model's ostensive communicative signals induced not only referential expectations in the 9-month-olds but also pragmatic expectations of informativity and relevance about the referent object that was communicatively manifested for them by the model's referential gesture.
In an object individuation study with 10-month-olds, Futó, Téglás, Csibra, and Gergely (Reference Futó, Téglás, Csibra and Gergely2010) demonstrated that in an ostensive cuing context, manual demonstrations of the functions of two novel artifacts can induce kind-based object individuation even in the absence of naming the objects with words. In the ostensive condition, the infants were first addressed by infant-directed speech before a hand brought out the novel artifacts on either sides of an occluder and performed different means actions on them that generated either a sound or a visual effect. After the occluder was removed the 10-month-olds looked reliably longer when one rather than two objects were present. In a nonostensive attention induction condition (using a mechanical sound), however, the very same goal-directed action manipulations did not induce longer looking at one versus two objects. This study indicates that similarly to the use of word labels in an ostensive context, nonverbal action demonstrations are interpreted by young infants as communicative manifestations of kind-relevant informational properties of novel objects, such as their functions.
A groundbreaking study by Vouloumanos, Martin, and Onishi (Reference Vouloumanos, Martin and Onishi2014) provides further evidence that in many relevant respects even 6-month-old infants process verbal conversations between speakers of their linguistic community the way monolingual adults process conversations between speakers of a foreign language. They can recognize the presence of a speaker's communicative intention before they can speak their native tongue. Infants were first familiarized with a single agent (the speaker) who repeatedly showed her preference to play with one of a pair of toys in front of her. In the next scene, the speaker appeared behind a tiny window and could not reach the toys anymore. Opposite to her, however, appeared someone else (the recipient) who could both see and act on the toys located between them. The speaker turned toward the recipient and either uttered the (novel) word “koba” or produced a coughing sound. When the speaker uttered “koba,” but not when she coughed, 6-month-olds looked reliably longer if the recipient picked up the toy that was not the speaker's preferred toy rather than when she picked up the speaker's favorite toy. Although coughing unquestionably drew infants' attention toward the speaker, only the speaker's utterance of “koba,” not coughing, triggered their referential and communicative expectations. Arguably, infants ascribed a communicative intention to the speaker and used the context to fill in the content of her informative intention. They knew about the speaker's preference for one of the toys and could see that she was patently unable to satisfy her preference by her own bodily action. In this context, they expected the speaker to make a request and not an assertion that could only be fulfilled by the recipient if the latter took the speaker's tokening of “koba” to refer to her favorite toy.
In an important recent study, Neff and Martin (Reference Neff and Martin2023) replicated these findings and provided further evidence showing that 6-month-olds do not assume that a verbal utterance is a sufficient condition to ensure the success of a speaker's communicative action. Neff and Martin found that only in an ostensive context, in which the speaker and the recipient are in face-to-face contact during speech (not if either is looking elsewhere) do infants expect the recipient to pick up the speaker's favorite toy and thereby fulfill her informative intention. These results show that even preverbal infants possess a sensitivity to nonverbal cues of ostensive communication which induces their pragmatic expectations about the relevance of communicated information.
The evidence reviewed above suggests that in response to ostensive stimuli, infants form pragmatic expectations about an agent's verbal or nonverbal communicative action. These include expectations about an agent's referential action as well as the expectation that the communicative agent is seeking to convey information relevant to her recipient.
There is, however, no room for pragmatic expectations in either the agents core system or the core social cognition system. On Spelke's account, pragmatic expectations arise from the human language faculty. There are, however, two possible interpretations of the role of the human language faculty. One possibility is that infants' expectations about the efficiency of an agent's instrumental action generated by the core agents system are converted by the combinatorial power of the human language faculty into pragmatic expectations about communicative actions. It is unclear, however, how the combinatorial power of the language faculty could fill the gap between expectations about the efficiency of an agent's instrumental action and pragmatic expectations of relevance and informativity in communicative interactions. A second possibility is that speakers of a natural language express their thoughts in accordance “with pragmatic principles of economy, informativeness, and relevance” because the pragmatic principles of communication are built into the human language faculty itself. If so, then communication would likely be a major function of the human language faculty – a view adamantly rejected by Chomsky (Bolhuis, Tattersall, Chomsky, & Berwick, Reference Bolhuis, Tattersall, Chomsky and Berwick2014). In this case, however, pragmatic expectations about nonverbal communicative actions in human infancy would be puzzling.
The alternative not explored by Spelke's (Reference Spelke2022) monumental book is that preverbal human infants are innately prepared to form pragmatic expectations about an agent's verbal or nonverbal communicative acts (see Gergely & Jacob, Reference Gergely and Jacob2012). So far as we know, unlike nonhuman great apes, humans are uniquely disposed to provide information relevant to others and conversely to extract information relevant to themselves from others' ostensive-communicative displays (cf. Tomasello, Reference Tomasello2014). This mutual adjustment suggests a biological adaptation rather than an ontogenetic explanation in terms of learning processes and poses difficulties for Spelke's developmental account.
Spelke's picture of early infant cognition in her monumental (Reference Spelke2022) book, What Babies Know, involves two basic cognitive mechanisms: core cognitive systems (none of which is unique to humans) and the human language faculty (which uniquely enables human infants to speak their native tongue). The agents core system generates infants' expectations about the efficiency of an agent's instrumental action directed toward a physical target. The core social cognition system enables infants to represent people as social beings who interactively engage and share phenomenal experiences with others. Initially the core systems are independent, compete for attention, and cannot interact with each other. Therefore, young infants cannot simultaneously attend to people as agents who efficiently pursue instrumental goals and as social beings who engage and share phenomenal experiences with others through coordinated social actions.
On Spelke's account, the human language faculty does not merely enable preverbal infants to learn the words and grammar of their native language; it also enables them to overcome the initial representational and attentional limitations of their core cognitive systems. On Spelke's view, infants' pragmatic expectations about communication are derived from their social interactions with speakers whose utterances are used to “express their thoughts to others, in accord with pragmatic principles of economy, informativeness, and relevance” (p. 435). Spelke suggests that these pragmatic expectations may arise automatically from the agents core system. As a result of language acquisition that allows the integration of the agents core system with the core social cognition system, infants' expectations about the efficiency of an agent's instrumental action are transferred from the agents core system to the core social cognition system. Nonverbal communicative interactions between social agents, however, are not supposed to induce pragmatic expectations in young infants.
Therefore, without the benefits of language acquisition, infants should be unable to construe an agent's gaze shift toward one of a pair of physical targets as an object-directed action whose social goal is to share attention to the referent object with a social partner. Nor can they understand such a nonverbal ostensive referential gesture in terms of the social agent's communicative and informative intentions to convey to his recipient relevant information about the indicated referent. In contrast to this account, we shall argue for an alternative view according to which significantly earlier than 12 months young infants exhibit special sensitivity to nonverbal behavioral cues of ostensive communication that generate their pragmatic expectations about the referentiality, informativity, and relevance of the communicative actions of social agents. As evidence for this view, we shall focus on four relevant studies that demonstrate early sensitivity to nonverbal ostensive cues of communicative intention, and indicate referential expectations as well as pragmatic expectations about the relevance of the ostensively communicated information.
In a recent study by Tauzin and Gergely (Reference Tauzin and Gergely2019), 10-month-olds watched videos depicting the interactions between two unfamiliar agents whose sole activity consisted in exchanging unfamiliar nonlinguistic sound triplets in a turn-taking manner. In the identical signals condition, the sequence of sounds produced by the second agent was fully predictable from the first agent's sound signals because the second agent strictly replicated the sound triplet emitted by the first agent. In the variable signals condition, the second agent's sound triplets were not fully predictable because they replicated only the initial sound of the first agent's sound triplets while the second and third sounds of the second agent's sound triplets were freely varied. In the subsequent test phase, only one of the two agents was present with two laterally positioned objects on its two sides, and the agent turned toward one of the two objects. Tauzin and Gergely found that the agent's target-oriented movement induced a gaze-following response in the 10-month-olds only in the variable signals condition, not in the identical signals condition. This finding does not seem compatible with the assumptions of the core social cognition system which, according to Spelke, would be expected to generate imitative attunement and gaze-following of the entity's attentional orienting response in both conditions. Arguably, however, new information can only be conveyed by (partially) unpredictable but not fully predictable signal sequences (cf. Shannon, Reference Shannon1948). If so, then gaze-following responses subsequent to the social agent's target-oriented movement only in the variable signal condition indicate that infants are sensitive to the pragmatic principle of informativity and they rely on it to identify communicative information transfer between agents.
Recently, Okumura, Kanakogi, Kobayashi, and Itakura (Reference Okumura, Kanakogi, Kobayashi and Itakura2020) reported a gaze-following study with 9-month-olds to investigate the predictions of the theory of natural pedagogy (Csibra & Gergely, Reference Csibra and Gergely2011). They tested whether ostensive cues (e.g., eye-contact or infant-directed speech) induce pragmatic expectations of referentiality, informativity, and relevance in infants in contrast to noncommunicative attention-grabbing behaviors (e.g., shivering or uttering beeping sounds). The results showed that 9-month-olds followed the model's gaze shift and spent an equal amount of time looking at the target object in both conditions. However, in a further test, infants were presented with pairs of pictures one of which depicted the previously fixated target and the other depicted a novel object. Okumura and colleagues hypothesized that ostensive cues were likely to boost the infants' processing, encoding, and retention of the properties of the fixated object. As a result, they expected a novelty effect and predicted that infants would look longer at the picture of the previously nonfixated object. The results showed that the infants looked reliably longer at the picture of the novel than of the fixated object in the ostensive, but not in the nonostensive attention-grabbing condition in accordance with their hypothesis. Moreover, in a further object choice test the infants were given the opportunity to choose between a three-dimensional (3D) replica of the previously fixated object and a 3D replica of an unfamiliar novel object. Okumura et al. (Reference Okumura, Kanakogi, Kobayashi and Itakura2020) argued that the selective preference toward the target object indicated by the model's gaze shift can be regarded as “evidence that the actor's gaze impacts the affective appraisal of objects.” In line with this assumption, infants showed selective preference for choosing the previously fixated object in the ostensive condition whereas no selective object choice was found in the noncommunicative attention enhancing behaviors condition. These findings suggest that the model's ostensive communicative signals induced not only referential expectations in the 9-month-olds but also pragmatic expectations of informativity and relevance about the referent object that was communicatively manifested for them by the model's referential gesture.
In an object individuation study with 10-month-olds, Futó, Téglás, Csibra, and Gergely (Reference Futó, Téglás, Csibra and Gergely2010) demonstrated that in an ostensive cuing context, manual demonstrations of the functions of two novel artifacts can induce kind-based object individuation even in the absence of naming the objects with words. In the ostensive condition, the infants were first addressed by infant-directed speech before a hand brought out the novel artifacts on either sides of an occluder and performed different means actions on them that generated either a sound or a visual effect. After the occluder was removed the 10-month-olds looked reliably longer when one rather than two objects were present. In a nonostensive attention induction condition (using a mechanical sound), however, the very same goal-directed action manipulations did not induce longer looking at one versus two objects. This study indicates that similarly to the use of word labels in an ostensive context, nonverbal action demonstrations are interpreted by young infants as communicative manifestations of kind-relevant informational properties of novel objects, such as their functions.
A groundbreaking study by Vouloumanos, Martin, and Onishi (Reference Vouloumanos, Martin and Onishi2014) provides further evidence that in many relevant respects even 6-month-old infants process verbal conversations between speakers of their linguistic community the way monolingual adults process conversations between speakers of a foreign language. They can recognize the presence of a speaker's communicative intention before they can speak their native tongue. Infants were first familiarized with a single agent (the speaker) who repeatedly showed her preference to play with one of a pair of toys in front of her. In the next scene, the speaker appeared behind a tiny window and could not reach the toys anymore. Opposite to her, however, appeared someone else (the recipient) who could both see and act on the toys located between them. The speaker turned toward the recipient and either uttered the (novel) word “koba” or produced a coughing sound. When the speaker uttered “koba,” but not when she coughed, 6-month-olds looked reliably longer if the recipient picked up the toy that was not the speaker's preferred toy rather than when she picked up the speaker's favorite toy. Although coughing unquestionably drew infants' attention toward the speaker, only the speaker's utterance of “koba,” not coughing, triggered their referential and communicative expectations. Arguably, infants ascribed a communicative intention to the speaker and used the context to fill in the content of her informative intention. They knew about the speaker's preference for one of the toys and could see that she was patently unable to satisfy her preference by her own bodily action. In this context, they expected the speaker to make a request and not an assertion that could only be fulfilled by the recipient if the latter took the speaker's tokening of “koba” to refer to her favorite toy.
In an important recent study, Neff and Martin (Reference Neff and Martin2023) replicated these findings and provided further evidence showing that 6-month-olds do not assume that a verbal utterance is a sufficient condition to ensure the success of a speaker's communicative action. Neff and Martin found that only in an ostensive context, in which the speaker and the recipient are in face-to-face contact during speech (not if either is looking elsewhere) do infants expect the recipient to pick up the speaker's favorite toy and thereby fulfill her informative intention. These results show that even preverbal infants possess a sensitivity to nonverbal cues of ostensive communication which induces their pragmatic expectations about the relevance of communicated information.
The evidence reviewed above suggests that in response to ostensive stimuli, infants form pragmatic expectations about an agent's verbal or nonverbal communicative action. These include expectations about an agent's referential action as well as the expectation that the communicative agent is seeking to convey information relevant to her recipient.
There is, however, no room for pragmatic expectations in either the agents core system or the core social cognition system. On Spelke's account, pragmatic expectations arise from the human language faculty. There are, however, two possible interpretations of the role of the human language faculty. One possibility is that infants' expectations about the efficiency of an agent's instrumental action generated by the core agents system are converted by the combinatorial power of the human language faculty into pragmatic expectations about communicative actions. It is unclear, however, how the combinatorial power of the language faculty could fill the gap between expectations about the efficiency of an agent's instrumental action and pragmatic expectations of relevance and informativity in communicative interactions. A second possibility is that speakers of a natural language express their thoughts in accordance “with pragmatic principles of economy, informativeness, and relevance” because the pragmatic principles of communication are built into the human language faculty itself. If so, then communication would likely be a major function of the human language faculty – a view adamantly rejected by Chomsky (Bolhuis, Tattersall, Chomsky, & Berwick, Reference Bolhuis, Tattersall, Chomsky and Berwick2014). In this case, however, pragmatic expectations about nonverbal communicative actions in human infancy would be puzzling.
The alternative not explored by Spelke's (Reference Spelke2022) monumental book is that preverbal human infants are innately prepared to form pragmatic expectations about an agent's verbal or nonverbal communicative acts (see Gergely & Jacob, Reference Gergely and Jacob2012). So far as we know, unlike nonhuman great apes, humans are uniquely disposed to provide information relevant to others and conversely to extract information relevant to themselves from others' ostensive-communicative displays (cf. Tomasello, Reference Tomasello2014). This mutual adjustment suggests a biological adaptation rather than an ontogenetic explanation in terms of learning processes and poses difficulties for Spelke's developmental account.
Financial support
This research received no specific grant from any funding agency, commercial, or not-for-profit sectors.
Competing interest
None.