Our target article presented human expression as a diverse and rather muddy phenomenon, replete with graded differences between specific cases. This muddiness is, we believe, characteristic of great ape expressive behavior in general, and human expressive behavior especially: Humans have a large and wide range of social goals, and a large and wide range of means of expression with which to satisfy these goals. Language use is but one important special case: Not apart from other means of human expression, but continuous with them.
In aiming to identify the common foundations of this diversity, three goals were especially paramount in our minds:
(1) Specify the cognitive capacities that underpin human communication.
(2) Demonstrate that these capacities are evolvable under plausible assumptions.
(3) Describe how these capacities can generate massive behavioral diversity.
Box R1 summarizes our main contributions to meeting these goals. The concept of ostension plays an especially important role. Indeed, one of our main hopes was to contribute to “pragmatics-first” accounts of language origins, which emphasize how ostensive communication must be prior – in ontogeny and phylogeny – to the emergence of language and languages. More broadly, the diversity of expression enabled by ostension raises an important set of empirical questions for cognitive science. In particular, when, how, and why do people choose the particular means of expression they do? (See Box R2.)
Here we summarize our contributions to meeting the three goals stated in the Introduction: (1) to specify the cognitive capacities that underpin ostensive communication; (2) to demonstrate that these capacities are evolvable under plausible assumptions; and (3) to describe how these capacities can generate massive behavioral diversity.
Regarding the specification of the key cognitive capacities, we drew heavily on Deirdre Wilson and Dan Sperber's relevance theory and its subsequent enrichments, while also paying more attention to the production side than is typically the case in that literature. We highlighted in particular: (a) On the communicator's side, the capacity to satisfy their informative intentions by means of making them – the intentions themselves – manifest. An act of ostensive communication consists in intentionally attracting the audience's attention to evidence about one's own informative intentions. By doing this, communicators trigger audiences' interpretative processes. (b) On the audience's side, the capacity to identify the communicators' informative intentions, i.e., what is meant. This is achieved by spontaneous presumptions of relevance. In other words, audiences recognize and interpret ostensive stimuli with cognitive capacities that, in effect, embody an assumption that ostensive stimuli are the most effective means available to the communicator for revealing her particular informative intentions. (c) Also on the audience's side, the capacity to modulate trust in what is meant. Thus, audiences update their beliefs in view not only of their interpretation of others' ostensive behavior, but also in view of the broader, validating context, which includes the trustworthiness of the communicator and the plausibility of what is interpreted. We described how these cognitive capacities collectively constitute a communication system that can be both evolutionarily stable and truly open-ended at the same time, and we discussed similarities and differences with the communication of other species.
Regarding evolutionary plausibility, we specified one way by which the above capacities can co-evolve with one another. The essential idea is that in a partner choice ecology, it is often adaptive to draw others' attention to whatever will allow them to “acquire” information; and that once audiences effectively presume that this expressive behavior is done cooperatively, then drawing attention to evidence of one's own informative intentions will itself be sufficient to generate the intended inferences in the audience. In fact, this unleashes expression. We think this specific evolutionary path is very plausible, but it is presented in the first instance as an existence proof of how and why the necessary cognitive capacities could have evolved in a gradual manner.
Regarding massive behavioral diversity, we highlighted five specific examples in the target article: language use, coordination, teaching, punishment, and art. We described some of the ways in which these cases are each very different from one another, and the underlying cognitive unity. Some of the commentaries provide further good examples, such as the non-verbal cues used by poker players to influence one another's thought processes (Stehberger), movements used by car drivers to reveal their intentions to other drivers (Tolstaya et al.), and screaming (Gouzoules et al.). Recognizing that this diversity has common evolutionary and cognitive foundations generates new questions for future empirical research (Box R2).
Generally, the many and different means of human expression are described and studied in largely independent literatures (target article, sect. 8). The diversity is daunting and empirical research tends to focus on specific cases in isolation. Yet viewing diversity as different solutions to the same problem – satisfying informative intentions – generates new and important questions for empirical research.
One especially important question is, when and why do people decide that the best means to satisfy their informative intention is to make it overt? That is, when do people provide evidence of their own informative intention, rather than evidence of whatever it is they want to inform others about? Part of the answer must be that in some cases it is hard or impossible to provide direct evidence of whatever it is that the communicator wants to inform the audience of, e.g., absent entities or past events (see also target article, endnote 6). Still, there are cases when ostension – in the narrow sense (target article, endnote 7) – is used even when direct evidence could be provided: for an example see target article, section 8.2.
One approach to the question might be to treat ostension as something “special” and perhaps cognitively “costly” or “complex,” and hence as something to be used only when other, more “simple” or “basic” cognitive processes do not suffice. In contrast, we argued that the ordinarily developing human cognitive phenotype includes competence with ostensive communication, i.e., with each of the various cognitive capacities entailed by the expression and recognition of informative intentions. Thus, we suggest that humans tend to make their informative intentions manifest to the extent that is optimal; and that others interpret this behavior assuming optimality, given the communicator's goals, and the affordances and constraints acting on them. The empirical literatures we surveyed in section 8 of the target article provide many findings consistent with this view, but the issue has not yet been investigated in a wholly unified way.
The commentaries enriched and challenged many aspects of this thesis, in original ways. Our reply is organized around six core issues:
(1) Languages and their cultural evolution. What role, if any, does ostensive communication play in the cultural evolution of languages?
(2) The pervasiveness of expression in human behavior. What new insights and understanding are gained by an expressive perspective on human behavior?
(3) Artificial intelligence and ostensive communication. What prospects are there for the development of intention-based models of communication?
(4) Communication in other animals. What forms or precursors of ostensive communication might be found in other species?
(5) The ecology and evolution of ostensive communication. What ecological factors, distinctive of humans, are necessary to trigger the gradual evolution of ostensive communication?
(6) Biolinguistics and pragmatics. What is the root source of open-endedness in natural languages?
R1. Languages and their cultural evolution
Language use is the most salient specific mode of ostensive communication. In previous work, we have argued that from a pragmatics-first perspective there are thus two basic questions for language evolution (Scott-Phillips, Reference Scott-Phillips2017). (1) How and why did humans evolve ostensive communication? (2) How do collections of communicative conventions develop, and how and why do they evolve, culturally, to take the forms that they do? Our target article was aimed mainly at question (1), with only passing remarks on question (2). Four commentaries (Chater & Christiansen; de Vos; Rorot, Skowrońska, Nagórska, Zieliński, Zubek, & Rączaszek-Leonardi [Rorot et al.]; Veit & Browning) justly picked up where our article left off, elaborating and debating answers to question (2).
Chater & Christiansen phrase the question this way: “what is the route from non-linguistic communication, driven by a powerful ‘pragmatic engine,’ to the creation of the astonishing complexity of full-blown combinatorial language?” Sketching their answer, they describe how communicative conventions are continually recreated and reshaped in the course of ordinary interaction (see also Christiansen & Chater, Reference Christiansen and Chater2022). We very much agree; but we also think this picture can and should be enriched further, in two particular ways.
First, what cognitive capacities drive these processes? As Chater & Christiansen point out, there is by now a large amount of literature on how grammar and symbols emerge from behavior in interaction – but where do those behaviors themselves come from? We believe that the cognitive capacities described by relevance theory (target article, sects. 3.3 and 4.3) provide a good answer to this question: “the very same cognitive capacities that make ostensive communication possible in the first place, also play a pivotal role here” (target article, sect. 8.5). Second, what is the most useful framework with which to describe and analyze the evolving system of communicative conventions? This is an important question, because without any answer it is hard, if not impossible, to describe how individual behavior (language use) generates population-level phenomena (languages): this is sometimes called the “problem of linkage” (Kirby, Reference Kirby1999, p. 19). Somewhat in contrast to other frameworks for cultural evolution, we mentioned in the target article that we believe an epidemiological framework is likely to be a fruitful approach (see Claidière, Scott-Phillips, & Sperber, Reference Claidière, Scott-Phillips and Sperber2014; Scott-Phillips, Blancke, & Heintz, Reference Scott-Phillips, Blancke and Heintz2018). We would be interested to know how Chater & Christiansen view these important issues.
In section 6 of the target article we asserted that cognitive capacities for ostensive communication emerge early and reliably in ontogeny, and we cited some books and articles that, we believe, collectively summarize a diverse and compelling range of data in support of this conclusion. de Vos enriches this point substantially. She identifies that a natural testbed for this claim is homesign: Visual–gestural forms of communication that emerge in the absence of any language. de Vos hence describes how the universal aspects of communicative competence we identified in our article can and do facilitate the creation and conventionalization of commonly known mappings between form and function, and hence in due course the emergence of new languages. This process can occur at many different levels of analysis, from individual households and small communities to whole nation states. de Vos provides description of the specific example of Balinese homesign, expanding the range of natural case studies beyond more commonly studied cases, in particular Nicaraguan Sign Language and Al-Bedouin Sign Language.
We think, furthermore, that findings derived from the study of homesign and new sign languages are of high importance not only for language evolution, but also for language learning. If infants are competent ostensive communicators “from the beginning,” so to speak – which is to say, very soon after they actively engage in the social world, at around 9 months of age – then what changes over time in the process of language learning is not the development of any core competence with ostensive communication as such, but rather (a) greater knowledge of the means by which those around them express and recognize informative intentions, especially conventional means; and (b) greater sensitivity to the possible motives, goals, and objectives of communication partners. Accordingly, any apparent weaknesses in infant pragmatics are explained not by any weaknesses in communication qua communication, but rather by changes in (a) and (b) that occur as infants grow. This picture aligns squarely with de Vos’s arguments that studies that supposedly show pragmatic and/or socio-cognitive “deficits” in homesigners are likely better explained as task effects. Further empirical findings consistent with this idea can be found in several other literatures such as, for instance, discourse pragmatics (e.g., Ateş & Küntay, Reference Ateş and Küntay2018; Hughes & Allen, Reference Hughes and Allen2013; Salazar Orvig et al., Reference Salazar Orvig, Marcos, Morgenstern, Hassan, Leber-Marin and Parès2010; Skarabela, Reference Skarabela2007; Skarabela, Allen, & Scott-Phillips, Reference Skarabela, Allen and Scott-Phillips2013).
Both Veit & Browning and Rorot et al. make similar empirical points to those above, but they argue for different theoretical and terminological frames, which they believe provide greater or enhanced understanding of the empirical issues. Specifically, Veit & Browning emphasize “scaffolding,” in which traits facilitate the evolution or development of other traits and are later lost or repurposed, while Rorot et al. emphasize the conceptual tools afforded by an “extended” evolutionary synthesis. Naturally, we see merit in the framing we adopted, which is rather classical in its evolutionary and cognitive perspectives, but we certainly don't object if other scholars find benefit in translating our claims into other terms.
For instance, Veit & Browning restate one of our main claims as a claim that core cognitive capacities of ostensive communication “scaffold” many important means of human interaction, such as coordination smoothing and punishment. We agree this is an insightful presentation. Indeed one of us has previously used the language of scaffolding to describe how processes of cultural evolution build upon panhuman cognitive capacities (Heintz, Reference Heintz, Caporael, Wimsatt and Griesemer2014). Rorot et al.'s preferred frame is the extended evolutionary synthesis. The relative merits of this theoretical frame have been debated within evolutionary biology at some length and this reply is not the place to regurgitate those arguments (see, e.g., Laland et al., Reference Laland, Uller, Feldman, Sterelny, Müller, Moczek and Strassmann2014; Lewens, Reference Lewens2019); but we think we are on safe ground not adopting it explicitly. After all, even the most enthusiastic advocates of the extended evolutionary synthesis agree that more classical approaches are always able, in principle, to provide explanatory accounts of biological phenomena (Scott-Phillips, Laland, Shuker, Dickins, & West, Reference Scott-Phillips, Laland, Shuker, Dickins and West2014).
Empirically, Rorot et al. assert that our approach “cannot account for the emergence of the structure of unleashed communication visible in language” (italics added). We agree that our paper does not do this, because it does not aim to; but that does not mean that our framework cannot account for the emergence of language structure. On the contrary, other commentaries – in particular by de Vos and by Chater & Christiansen – show how this issue can be approached from the perspective we presented. Rorot et al. describe how development plays a key role in this process, helping to scaffold the emergence of languages: and in doing so they support and reinforce, rather than challenge, the picture developed in the target article.
R2. Pervasiveness of expression in human behavior
The modulation and flow of attention is ubiquitous in human interaction. Four commentaries enrich this point. Three elaborate some further, diverse cases of human communication and expression (Tolstaya, Gupta, & Hughes [Tolstaya et al.]; Stehberger; Gouzoules, Engelberg, & Schwartz [Gouzoules et al.]), and one adds new arguments for the deep, pervasive, and often unidentified role of pragmatics in ordinary interaction (Osiurak & Federico).
Tolstaya et al. introduce the example of driving. It is plausible to us that drivers use movements of their vehicles not just to achieve travel, but also to actively reveal their intentions to other road users. For instance, drivers wishing to change lanes in stationary traffic sometimes turn their wheels, not simply in advance of movement but also with the goal of indicating to drivers their wish, or intention, to enter the other lane. Accordingly, it is also plausible that road users are sensitive to these behaviors as expressive behaviors. If so, then self-driving cars, absent artificial intelligence able to duplicate human intention-reading and attention manipulation, might deviate from the behavior of human-driven cars in ways that are subtle but of high importance for predictability and hence safety. Whether this speculation is correct is a matter for future empirical research, with important implications for technological development.
Stehberger highlights the world of poker as a domain in which many of the aspects of human expression we described in our target article play important roles. As we described, ostensive communication is an important special case of expression, in which one individual (the “communicator”) directs the attention of another (the “audience”) to their (the communicator's) own informative intentions in a specifically overt way. However, often in human interaction it can be beneficial to not communicate ostensively as such, but rather to direct others' attention and simultaneously hide, or at least not make overt, this goal. In the target article we called this “hidden authorship.” We also discussed how both generosity and punishment sometimes entail keeping informative intentions at least somewhat hidden (sects. 3.2 and 8.3). Some of the behaviors employed in poker are excellent further examples. There are obvious misaligned interests between poker players, and many betting decisions are made on the basis of what individuals believe others know, or what they believe that others believe about what others know. In consequence it is sometimes useful for poker players to attempt to inform, or direct the attention of other players, without overtly drawing attention to this goal. Doing this well is a difficult and advanced skill. So too is noticing others' attempts to direct attention, and making betting decisions that take into account what you believe others have revealed in their actions at the table, or are attempting not to reveal. Stehberger also describes how poker players exploit the fact that interpretation of ostensive stimuli is spontaneous and cannot be prevented even if interpretation is against the audiences' own interests. This is akin to the case of film spoilers, which we mentioned in section 4.3: Our desire to not recover the meaning of what is said does not and cannot suspend the interpretive process. All in all, Stehberger's commentary highlights how a deeper understanding of a common mode of human interaction – in this case, a competitive game – can be gained from the broad and pragmatic perspective we developed in the target article.
Gouzoules et al. discuss the enlightening example of screams. We are especially glad for this because of emotional expression is an obviously major means of human and nonhuman expression which we did not discuss in the target article. Gouzoules et al. describe how capacities of scream production tend to first evolve with the expressive function to startle potential predators, and how in some highly social species the forms and functions of screams have diversified over the course of evolution, creating selection for “complementary” capacities of comprehension, hence forming a communication system. Gouzoules et al. further describe how screams can hence be sometimes expressive but not necessarily communicative (such as to startle predators), and at other times they are properly communicative (such as when used to recruit aid). In humans this diversity is extended even further, because in addition to using screams as emotional communication, humans can use screams in an unleashed way to ostensively communicate that, for instance, something is inducing emotion. This is different from the spontaneous, non-ostensive communication of emotion itself. Gouzoules et al. wonder if we would not agree with the extrapolation of the term “unleashed” to include human nonverbal expression; but far from disagreeing, we very much welcome this, as it helps to further demonstrate the real pervasiveness of expression and communication in human life.
Osiurak & Federico provide an interesting example of how human interaction is so much governed by the expression and recognition of informative intentions that it can easily lead scientific investigation astray, if we are not alert to its effects. They point out how tasks used to assess dementia overlook the ostensive nature of experimental instructions to participants, failing to recognize how standard tasks used to assess dementia, which entail consideration of the experimenter's expectations, can be challenging for patients with a loss of semantic knowledge. Patients with semantic dementia of tools usually do not lack the mental representations needed for using tools; rather, they lack knowledge about the means used in ostensive communication to communicate about tools. Thus, behaviors that have been assumed to derive from a general tool-use disorder might in fact result from disorders that principally affect pragmatic communicative capacities. We do not have expert knowledge in this area but this analysis seems very plausible to us. Indeed, we think that many experimental protocols in psychology have pragmatic aspects that have consequences that are not always taken into account in the interpretation of data. Further examples include the Wason selection task, used to assess reasoning skill (e.g., Sperber, Caro, & Girotto, Reference Sperber, Caro and Girotto1995); verbal false-belief tasks commonly used to assess infant mindreading (e.g., Helming, Strickland, & Jacob, Reference Helming, Strickland and Jacob2014; Siegal & Beattie, Reference Siegal and Beattie1991); and cross-cultural experimental games used to assess prosocial preferences (e.g., Baumard & Sperber, Reference Baumard and Sperber2010; Heintz, Reference Heintz2013). In all these cases, and apparently also in the case described by Osiurak & Federico, the experiments entail ostensive communication between investigator and participant. This communicative interaction is often not the simple and innocent process it sometimes appears to be. It is a social interaction with its own dynamics, and if scientists are not alert to these dynamics and their consequences, then the data they acquire may not be as revealing of the target phenomenon as intended.
R3. Artificial intelligence and ostensive communication
Tolstaya et al. survey the literature on generality in artificial intelligence, pointing out that specialization remains the norm: There is still very little artificial intelligence that displays a breadth of competence across otherwise diverse tasks. We argued in the target article that ostensive communication is both a very general skill and a very specialized one. It is very general in the sense that the effective domain of the relevant cognitive capacities is unlimited. At the same time, ostensive communication is a very specialized skill, in the sense that the relevant cognitive capacities all have their own specific and narrow domains. Accordingly, we described how the metarepresentational structure of ostensive communication generates virtual domain generality from narrow specialization (target article, sect. 5). Tolstaya et al. intuit that this approach provides a new way to address the problem of generality in artificially intelligent communication. It also, we believe, reframes one of the basic challenges for artificial intelligence, namely how to replicate human language use. There are countlessly many artificial intelligence language models that replicate human language use with varying degrees of success, but none (to our knowledge) is based on a pragmatics-first foundation, with communicative conventions employed as an enrichment of the expression and recognition of informative intentions. We believe this challenge is deeper and far harder than presently appreciated; but were it to be addressed, it would fundamentally change the prospects for artificially intelligent communication.
The key engineering challenge is that ostension is not any specific behavior, it is any behavior motivated by a particular cognitive phenomenon, namely informative intentions (target article, sect. 3.3). Thus, in order to artificially replicate ostensive communication, what will be necessary are pairs of social agents who have (1) goals with respect to each other's internal (“mental”) states, and (2) models of each other's goals, and the means by which those goals might be satisfied. Some limited progress has been made in this direction using Bayesian approaches (e.g., Ho, Cushman, Littman, & Austerweil, Reference Ho, Cushman, Littman and Austerweil2021), in which a communicator's behavior is modeled as efficient planning with respect to an audience's beliefs, and comprehension as inverse planning of the same, that is, for what goals could this behavior be the most efficient means? In this way, communicative behavior is modeled as a type of action whose costs and benefits (for the communicator) turn on its impact on the belief states of other agents; and comprehension is modeled as a reactive process whose costs and benefits (for the audience) turn on how informative this process is about the communicator's goals. The modeling of language use, as one especially important form of ostensive communication, will, in turn, be based on the use of words and other linguistic items in the service of these deeper goals. As Tolstaya et al. suggest, such approaches would be radically different from the present cutting-edge; but if successful they could lead to major advances in the development of open-ended communication in artificial agents. Or to put the point in negative terms: We do not believe that artificial intelligence will achieve human-like competence in language, and human communication more broadly, unless and until it meets the difficult engineering challenges presented by ostensive communication.
R4. Communication in other animals
Plainly, there are cognitive differences of some sort between human and nonhuman communication. Veit & Browning, Amici & Liebal, and Warren, Call, & Gergely (Warren et al.) all emphasize that exactly what these differences are and where they come from are important empirical questions. We agree, and we argued that the most informative comparisons from an evolutionary perspective will be those focused on social cognition, and more precisely on means of attention manipulation (see also Scott-Phillips & Heintz, Reference Scott-Phillips and Heintz2023). We further suggested that (1) ostensive communication, in its full richness, is part of the ordinarily developing cognitive phenotype of humans and not part of the ordinarily developing cognitive phenotype of nonhuman great apes, and (2) differences between the social cognition that underpins ostensive communication, and the social cognition of other great apes, are graded and relatively few. Three commentaries raise questions on clarification or skepticism about these claims (Warren et al.; Berio, Newen, & Moore [Berio et al.]; Amici & Liebal). One further commentary broadens the range of species considered to raise some important issues about the mappings between species' social ecologies and the nature of their communication (Ross).
Warren et al. attribute to us some views that we do not hold. In particular, they assert that we do believe the gestures of nonhuman primates are not communicative: “the authors dismiss nonhuman primates’ intentional use of gestures as not communication”; “The authors argue that nonhuman primates’ intentional gestures are expressive but not communicative….” Yet we do not claim this and in several places we say the opposite. Here are two things we did write: “Living things communicate in a great variety of ways, from the quorum sensing of bacteria, to songbirds, to the gestural and vocal communication of primates…”; “The gestural communication of nonhuman great apes is more diverse and flexible than most other cases….” One reason Warren et al. seem to attribute to us the contrary view is the following sentence, which they quote and take issue with: “a gorilla thumping his chest might associate this with the behavioral effect of conspecifics backing away, but not with the effect of them being frightened.” Yet, when the passage is quoted in full, it is clear that this is a hypothetical example used to motivate a conceptual distinction, with no empirical claim either way (target article, endnote 4). Warren et al. attribute to us the certain view that gorilla chest thumping is not communicative: But we did not express any such view, we do not have any such view, and we do not believe the relevant passage suggests such a view. Whether chest thumping or any other behavior achieves a communicative function, and whether it does so in a specifically ostensive way, are empirical questions to be resolved for each given case.
More substantively, Warren et al. question the distinction we drew between communication and expression. In some respects this distinction is unusual. After all, if an informative intention is satisfied, then there is successful “information transfer,” and so perhaps “expression” could be equated with “communication.” However, it is useful to maintain a terminological distinction between (1) behavior or traits the function of which is to inform, and (2) behavior or traits the function of which is to inform by the specific means of triggering inferences whose function is, complementarily, to identify and process the behavior or trait. Without this distinction then we have no way to distinguish communication proper from, say, mimicry and other behaviors or traits that could be called “psychological coercion.” Put simply, communication is the product of complementary traits: one on the production side and one on the interpretation side (Maynard Smith & Harper, Reference Maynard Smith and Harper2003; Scott-Phillips, Blythe, Gardner, & West, Reference Scott-Phillips, Blythe, Gardner and West2012). In the target article we specified it this way, “By ‘complementary,’ we mean that each mechanism can perform its function only in conditions when the other mechanism is in place…. This characterisation is solely functional in nature, and not mechanistic.” The important general point here is that a functional approach allows empirical questions to be asked and addressed in an open way, without prior assumptions or commitments. Warren et al. seem to question this, stating that “the authors do not present a precise explanation of the factors which make a mechanism ‘complementary’ that can be mapped onto behavior occurring at various levels of cognitive engagement” – but the distinction being drawn is not a mechanistic one, it is functional. We strongly support the empirical research agenda of investigating cognitive similarity and cognitive difference between humans and other species, especially other great apes; and we believe that this research agenda will both enrich and be enriched by functional clarity.
Berio et al. ask some specific questions of clarification, which we are happy to answer. First, they press us on “the relationship between ostension and contextually variant interpretation.” They point out, rightly, that contextual variation is widely documented in animal communication. However, our suggestion was not simply contextual variation: It was contextual variation (1) in response to specifically ostensive stimuli (where ostensive is used in the narrow sense employed in the target article; see in particular endnote 7) and (2) dependent on what is in the common ground. The contextually variable response should “make sense of” the ostensive stimulus in light of the common ground, and in particular in light of the audience's prior knowledge about the communicator's knowledge. To be even more precise, one suitable test would be experiments in which the independent variable is the audience's knowledge of the communicator's knowledge, and the dependent variable is the audience's reaction to ostensive stimuli produced by the communicator. We mentioned in the target article that human infants have been shown to pass a version of such tasks where the infant observes interactions between two other agents (Tauzin & Gergely, Reference Tauzin and Gergely2018). Further experimental protocols, suitable for comparisons across species, would be hugely informative, and would complement existing studies on the production side showing that chimpanzee pointing behavior can be dependent on what is or is not in the common ground (e.g., Bohn, Call, & Tomasello, Reference Bohn, Call and Tomasello2015; Tauzin, Bohn, Gergely, & Call, Reference Tauzin, Bohn, Gergely and Call2020).
Second, Berio et al. summarize one of our claims as follows, and take issue with it: “If we understand H&S-P's argument correctly, enculturated great apes acquire expectations of mutual benefit, and so trust the information provided by pointers.” This is not exactly our argument. What we claim is that enculturated great apes learn to assume that behavior that overtly demands attention is likely to indeed be worth paying attention to (and to some extent, this claim is reinforced by the points that Berio et al. make about attention in the object-choice task). It is important to distinguish “assuming that something is worth paying attention to” and “trust”: the former is about expecting relevance, while the latter is about accepting what is meant. Relatedly, Berio et al. ask if we claim that “unenculturated chimpanzees distinguish between ‘informative’ and ‘communicative’ intentions, but remain poor at pointing comprehension only because they lack the trust to interpret humans’ messages pro-socially.” Again, not exactly. What we claim is that unenculturated chimpanzees tend not to recognize communicative intentions because doing so requires the specific cognitive disposition to presume that behavior that demands attention is indeed likely to be worth paying attention to. In humans this disposition is built into (or “embodied”) in the way our attentional systems work (sect. 4.3). We raised the hypothesis that, while this interpretative mechanism is not part of the ordinarily developing cognitive phenotype of other species, a corresponding disposition could be acquired ontogenetically, in the right ecology.
Amici & Liebal make four main points. We agree with some and not with others. First, they emphasize the methodological challenges of investigating cognitive capacities of ostensive communication in nonhumans, and hence caution against hasty conclusions. We fully agree. Second, they assert that epistemic vigilance is not necessary for the evolution of open-ended communication. This is not true. Arguing otherwise, Amici & Liebal point out that many communication systems are evolutionarily stable without cognitive mechanisms of epistemic vigilance. Yes indeed, but this is not a counter-argument to the facts that (1) epistemic vigilance is necessary for any distinction between comprehension and acceptance, and (2) this distinction between comprehension and acceptance is critical to the stability of truly open-ended communication (target article, sect. 5). Third, Amici & Liebal ask us to “better clarify whether humans… differ from other species in terms of cognitive skills or motivational aspects of communication.” It is both (target article, sect. 7). It is cognitive skill, because the cognitive capacities that underpin ostensive communication are not part of the ordinarily developing cognitive phenotype in nonhuman great apes. It is also motivational, because it is not the case in nonhuman great ape social ecologies that attending to others when they attempt to attract attention will necessarily prove beneficial. Here dogs provide the most revealing contrast: They spontaneously presume that when humans attempt to gain their attention, it is indeed worthwhile to actually pay attention and expect relevance, even if that relevance is not initially clear.
Fourth and perhaps most fundamentally, Amici & Liebal maintain that “the ability to combine meaningful elements into new combinations with novel meanings still better explains how open-ended communication emerges.” We highlighted two major challenges for this focus on structural and combinatorial features of different species' communication systems (target article, sect. 1), and Amici & Liebal's reassertion of this perspective does not directly address either challenge. First, this focus says very little about quasi- and non-linguistic means of communication and expression. Second, it does not address the fundamental problem of how (how just possibly?) a communication system can be both stable and open-ended. Amici & Liebal summarize some recent findings of compositions in primates and we fully agree that these findings are valuable and important (Scott-Phillips & Heintz, Reference Scott-Phillips and Heintz2023), but it does not follow that a focus on combinatorics provides a “still better” explanation of the evolution of truly open-ended communication.
Ross's observations and speculations about the ecologies and communication systems of other species are interesting and relevant: A true diversity of examples and case studies is very welcome. In particular, Ross raises the intriguing hypothesis that elephant communication may be “leashed” in part because there is insufficient divergence of interests within elephant communities, and hence there has not been selection on cognitive capacities necessary to deal with the challenges and complexities of a social ecology where there are not only very high potential gains to cooperation, but also high risks of exploitation. This contrast highlights how the evolution of ostensive communication requires not only cooperation and relatively sophisticated social cognition, but also the potential for divergent interests and conflict (target article, sect. 5).
Ross also comments on the possible contrast between “mindreading” and “mindshaping” (see also Zawidzki, Reference Zawidzki2013). Approaches to the evolution of human communication that emphasize metapsychology, including ours, are sometimes criticized on the grounds that they are too cognitively “rich” or “intellectualized.” Two endnotes in the target article address this worry (9 and 11). In particular, we use the notion of mindreading in a broad, minimal, and deflationary way, to refer just to the spontaneous recognition of mental states, which we believe may be present in many species. If others use “mindreading” in richer ways, such as to describe the conscious analysis of others' mental states, then we don't object to a different term for the more deflationary notion, and “mindshaping” may indeed be suitable. In fact it may have the advantage of highlighting “action” on mental states. Whatever the terminology, we certainly agree that cognitive capacities to recognize and shape others' mental states must ultimately serve behavior and action.
R5. Ecology and evolution of ostensive communication
Five commentaries raise questions about the ecology and evolution of human cognition, and communication in particular (Badets; Burkart, Sehner, Brügger, Adriaense, & van Schaik [Burkart et al.]; Gärdenfors; Mussavifard & Csibra; and Wacewicz & Żywiczyński). In the target article we described how the cognitive capacities that underpin ostensive communication can evolve in a gradual way, and become stable cognitive adaptations in a partner choice ecology. What we did not do is describe in detail why humans in particular occupy the relevant social ecology; and hence the deep evolutionary reasons why it is humans, and not any other species, that have traversed the evolutionary path toward language.
So as Wacewicz & Żywiczyński put it, our article pushes the issue “one level deeper.” Why, they ask, does the expression and recognition of information intentions afford fitness benefits “to humans (including prehistoric hominins), if it does not for anyone else (including our ape cousins)?.” We do not believe anybody yet knows the answer to this question in detail, but our target article did contain a sketch. Prompted by Wacewicz & Żywiczyński, here we elaborate a little more.
A partner choice social ecology has two key prerequisites. (1) An environment in which there are opportunities for win-win and win-lose ventures, such that it is adaptive to cooperate with others, but not always and not necessarily so. These opportunities may be present in the social ecologies of several species, but they are present to a greater degree, and with more diversity, in the human case (target article, sect. 6). (2) Social cognitive capacities that allow individuals to assess whether entering a cooperative venture with someone will be beneficial at all; and whether it might be more beneficial to enter it with someone else if possible. We called this “social vigilance.” Social vigilance is commonly achieved by reading and representing others' mental states, which we assume is common to great apes and perhaps several other species and taxa (target article, sect. 4).
These two elements together generate selection pressure for the social cognitive capacity to provide credible evidence to others that a cooperative venture is indeed a win-win opportunity; and also for capacities of reputation management. One of the main contributions of our target article is a description of how these selection pressures are on their own sufficient to trigger a gradual co-evolution of the cognitive capacities necessary for ostensive communication. None of this is to deny, as Burkart et al. point out, that forms of partner choice take place in many other species, including several primate species. The difference is just degree and span. In humans, partner choice – characterized as above – is more ubiquitous and involved in many more diverse tasks than in other species. There is more mutual dependency (or “interdependence” as Wacewicz & Żywiczyński put it), and we have evolved more specific cognitive dispositions to handle both its opportunities and its dangers. Ross notes how this emphasis on the breadth of human partner choice aligns with a common view about the origins of human ecological dominance, namely that it lies in the expansion of community scales, market exchange, and specialization traced to the Upper Pleistocene.
So the relevant domain of partner choice in humans is broad, covering a wide range of possible interactions. It probably includes hunting, parenting, technological development, and many others. Three commentaries elaborate supposed alternatives, but these are, we think, both better understood as special cases of partner choice. Burkart et al. stress cooperative breeding, Gärdenfors suggests that teaching may be an especially important domain, and Mussavifard & Csibra argue that extensive reliance on technology, where the causal relations between object and goal are hard to perceive, creates a need for pedagogical demonstration. These activities all provide opportunities for cooperative ventures, which can be beneficial for the self, or not, depending on the context and available partners. (Shall I leave my offspring with this person? Is it to my advantage that this person acquires skills I can demonstrate?) In the target article we mentioned also animal hunting and building shelters. We have no strong views about which types of cooperative venture exercised the most significant role during evolution. Our claim is that an open social ecology, such that there is a high degree and wide span of partner choice, occupied by a socially vigilant species, makes possible the gradual evolution of the cognitive capacities that underpin ostensive communication (see Fig. R1).
The ecological breadth of partner choice maps onto the functional breadth of human communication. Debate over the evolutionary origins of human communication, and language in particular, is too often focused on which of many different types of human communication – gossip, sexual advances, teaching, and so on – had the greatest relative importance during evolution. Some of the commentaries, such as by Gärdenfors, seem to reinvigorate this debate. However, a focus on relative importance misses the point that the great boon of ostensive communication is its functional diversity. It is as if the evolution of bipedal locomotion was discussed just in terms of the relative importance of running, jogging, and walking, when in fact what has been selected for is functional diversity itself, and the large range of behavioral possibilities thus enabled (Origgi & Sperber, Reference Origgi, Sperber, Carruthers and Chamberlain2000). So we agree with Gärdenfors that the evolution of human communication must precede through different modes of attention manipulation, enabling some behaviors (e.g., a hammering action) to “stand for” others (e.g., hammering itself) – but we disagree that this is specific to teaching.
Finally, three commentaries make observations either enriching or challenging the specific evolutionary path we sketched in the target article. Badets proposes that the evolution of human communication may have proceeded alongside tool cognition. This is an intriguing suggestion, to find one unique origin for both human-specific communicative capacities and tool use. Moreover, there is a sense in which expression does indeed involve using tools – words and other linguistic “constructions” – to act on the world: It is just that, with communication, the part of the world acted upon is psychological states and the tools are, correspondingly, epistemic. But other than this general observation we do not have any strong or specific views on how communication and tool cognition relate to one another. Mussavifard & Csibra suggest that rather than partner choice and cooperation causing the evolution of ostensive communication, ostensive communication may facilitate cooperation. However, there is an important asymmetry between ostensive communication and cooperation. Ostensive communication is necessarily a type of cooperation, but cooperation is not necessarily a type of communication. We therefore agree with Mussavifard & Csibra that ostensive communication can promote cooperation, indeed we wrote as much in endnote 12. The point we insisted on is just that there is no ostensive communication at all without a specific type of prior cooperation. Burkart et al. attribute to us the view that partner choice requires Gricean cognitive pragmatics for reputation management, hence causing a tendency toward showing and expecting prosociality in communication; but this is not exactly our proposal and we do not much recognize our account in their figure. What we argued is that an ecology of partner choice will select for specific forms of cooperative behavior in expression, which in turn enables the gradual evolution of ostensive communication (see Fig. R1).
R6. Biolinguistics and pragmatics
We share with Carston the view that cognitive pragmatics is foundational to language use, and that relevance theory provides the most cognitively plausible description of the relevant capacities (e.g., Carston, Reference Carston2002a, Reference Carston2002b; Wilson & Carston, Reference Wilson and Carston2006). Looking beyond this point of agreement, her commentary highlights a point of difference that echoes major divisions in linguistics itself (see, e.g., Harris, Reference Harris2021; Scholz, Pelletier, Pullum, & Nefdt, Reference Scholz, Pelletier, Pullum, Nefdt and Zalta2022). Channeling what is sometimes called the biolinguistic perspective – according to which “language” is most properly conceived of as a cognitively internal device of recursive symbol manipulation – Carston argues that language must logically precede ostensive communication (see also e.g., Berwick & Chomsky, Reference Berwick and Chomsky2016; Carston, Reference Carston2000; Murphy, Reference Murphy2020).
In making this argument Carston raises two particular issues for our “pragmatics-first” approach. One is that the metarepresentational structure of ostensive communication itself entails a cognitive capacity for recursive embedding (in this case, recursive embedding of mental states), and where is that to come from if not language? Our answer is that recursive embedding is not distinctive of language. It is present in other cognitive domains also, including some that, unlike ostensive communication, are shared with other species. Visual processing is the clearest example (see Fig. R2). So we can agree with Carston that a species is not “ostension-communication-ready” before it has some cognitive capacity of recursion that might be co-opted from one domain to another; but we need not and do not agree that this capacity must be specifically linguistic.
The other, related issue raised by Carston is the open-endedness of human communication: From where does it come? In answering this question, it is important to distinguish two things: (i) what can be mentally represented; and (ii) the expression of what is mentally represented. Our target article was focused on how (ii) is achieved. Carston suggests, in effect, that without “language” then (i) is a very small set indeed, and hence that (ii) is redundant unless and until there is language. We demur. As we see it, many species have mental representations they do not express: All mammals, for instance, must represent food and sex in some way, yet not all make expressions about these things. So we see no reason to make a priori assumptions about the limitations of (i). Furthermore, we should not necessarily expect communication even when (i) is large, for evolutionary and game-theoretic reasons we elaborated in section 2 of the target article. Thus in our view, the problem is not so much what can be mentally represented in principle; the problem is the stability of any expression of what is mentally represented. We described our solution to this problem in section 5 of the target article.
Let us conclude by making more vivid this contrast between the biolinguistic perspective and the pragmatics-first approach we have advocated. Consider this recent passage by Noam Chomsky, Ángel Gallego, and Dennis Ott, from the introduction of an overview of the biolinguistic program:
“Only humans appear to possess a mental grammar… that permits the composition of infinitely many meaningful expressions… Universal Grammar (UG) is a label for this striking difference in cognitive capacity between ‘us and them’… What is it, and how did it evolve in our species? While we may never find a satisfying answer to the latter question, any theory of UG must meet a criterion of evolvability: the mechanisms and primitives ascribed to UG… must be sufficiently sparse to plausibly have emerged as a result of what appears to have been a unique, recent, and relatively sudden event on the evolutionary timescale” (Chomsky, Gallego, & Ott, Reference Chomsky, Gallego and Ott2019, p. 230).
Our target article would seem to have addressed exactly these issues. We described in some detail how truly open-ended expression is made possible by a relatively sparse set of human cognitive capacities for ostensive communication; we described how these capacities meet the important criterion of evolvability; we addressed the equally important criteria of gradualism and stability; and we highlighted some of the most important similarities and differences between “us and them.” We are thus tempted to say: If Universal Grammar is but a label for the set of cognitive capacities that allow open-ended expression in humans, then our target article contained many arguments that in fact, Universal Grammar is ostensive communication.
We are well aware, of course, that key words such as “grammar” and “expression” are used in different ways depending on prior assumptions about the nature of “language,” and these prior assumptions strongly affect how empirical issues are framed (Scholz et al., Reference Scholz, Pelletier, Pullum, Nefdt and Zalta2022). Nevertheless, the contrast seems to us revealing; and, furthermore, suggestive of an original perspective on linguistic generativity. Specifically, the generativity observed in syntax and semantics – the focus of many existing research agendas – may in fact be derivative on the generativity of unleashed expression. Social cognition as the root of grammatical open-endedness. Developing this idea in detail is a major future challenge for the pragmatics-first approach: With, we believe, the potential to re-frame many fundamental issues in original and innovative ways.
Financial support
C. H. and T. S.-P. were financially supported by the European Research Council, under the European Union's Seventh Framework Programme (FP7/2007-2013)/ERC grant agreement no. 609819 (Somics project).
Conflict of interest
None.
Target article
Expression unleashed: The evolutionary and cognitive foundations of human communication
Related commentaries (18)
Cognitive pragmatics: Insights from homesign conversations
Expression unleashed in artificial intelligence
From the pragmatics of charades to the creation of language
Illustrating continuity between linguistic and non-linguistic human communication and expression
Loosening the leash: The unique emotional canvas of human screams
Metarepresentation, trust, and “unleashed expression”
No unleashed expression without language
On the murky dissociation between expression and communication
Ostensive communication, market exchange, mindshaping, and elephants
Primates unleashed
Putting the cart before the horse? The origin of information donation
Structuring unleashed expression: Developmental foundations of human communication
Teaching unleashes expression
The central problem is still evolutionary stability
The co-evolution of cooperation and communication: Alternative accounts
The evolutionary roots of goal-directed mechanisms: A communication account
The scaffolded evolution of human communication
What semantic dementia tells us about the ability to infer others' communicative intentions
Author response
Being ostensive (reply to commentaries on “Expression unleashed”)