1. Introduction
Why is human expression so rich and multifaceted? Living things communicate in a great variety of ways, from the quorum sensing of bacteria, to songbirds, to the gestural and vocal communication of primates, but humans are expressive in ways and to an extent that is clearly distinctive. There is language use of course, but also points, nods, winks, and other behaviours that although not linguistic are still conventionalised; and also many ad hoc, improvised behaviours, such as a small hand gesture used to visually park a topic of ongoing conversation, subtle body movements that connect dance partners, and the open-ended expressiveness of music, painting, and other art forms (Green, Reference Green2007). It is likely that humans are not naturally sensitive to many forms of interaction in nonhumans, and as such there is much still to discover from a comparative perspective; but at the same time, nonhuman expression is, on all available evidence, limited to finite domains. Bees, for instance, communicate about the location of flowers and the quantity of their nectar but, apparently, little or nothing else. The gestural communication of nonhuman great apes is more diverse and flexible than most other cases (Cartmill & Byrne, Reference Cartmill and Byrne2010; Fröhlich & Hobaiter, Reference Fröhlich and Hobaiter2018; Graham, Hobaiter, Ounsley, Furuichi, & Byrne, Reference Graham, Hobaiter, Ounsley, Furuichi and Byrne2018), but its scope is still clearly limited relative to humans.
In explaining the open-endedness of human communication, many researchers emphasise the combinatorial and generative quality of natural language: The fact that individual constituent parts can be recombined in many ways, making infinite use of finite means (e.g., Chomsky, Reference Chomsky1965; Jackendoff, Reference Jackendoff1997). An emphasis on combinations and codes is, moreover, sometimes linked with particular assumptions about evolution: That as complex systems of words, rules, and combinations, natural languages are enrichments of the communication systems of other species (e.g., Leroux & Townsend, Reference Leroux and Townsend2020; Nowak, Plotkin, & Jansen, Reference Nowak, Plotkin and Jansen2000; Planer & Sterelny, Reference Planer and Sterelny2021; Progovac, Reference Progovac2015). This picture is attractive because it describes human and nonhuman communication systems in terms that appear continuous. However, as a broad explanation of human expression, this focus on the evolution of combinatorics faces a number of fundamental problems, of which we here highlight two. First, it says very little about quasi- and non-linguistic means of communication and expression. Second, the evolutionary story of complex combinations evolving from more simple ones does not address the “central problem” (Maynard Smith & Harper, Reference Maynard Smith and Harper2003, p. v) for the evolution of communication, namely stability in the face of incentives to deceive. This is not an incidental critique. On the contrary, the problem of evolutionary stability is, we shall shortly argue, fundamental, because it holds expression on a leash, keeping it constrained to narrow domains of statistical mutual benefit (sect. 2). In consequence, explaining evolutionary stability and explaining expressive versatility are deeply interlinked problems. One cannot be resolved without the other.
We present and develop an alternative explanation of the zoological distinctiveness and open-ended richness of human expression, based not on combinatorics but on cognitive pragmatics. That is, we describe the evolution of distinctly human means of communication – sometimes called an “interactional engine” (Levinson, Reference Levinson2006, p. 39) – as the evolution of mechanisms of social cognition targeted at navigating distinctive features of the human social ecology; and we specify how these cognitive mechanisms in turn unleash expression. More specifically, we argue that expression can be unleashed in partner choice social ecologies where it is simultaneously adaptive (1) for interpretative inferences to be predicated on spontaneous prior assumptions that communicators are cooperative, and (2) for expressive behaviour to exploit this assumption. Natural languages, in all their combinatorial richness, are a means by which we exploit unleashed expression, rather than being the source of unleashed expression. If we are right about this, then our account provides an overtly adaptationist and cognitive answer to the “Why humans?” question about language origins, that is clearly different to prominent biolinguistic approaches (e.g., Berwick & Chomsky, Reference Berwick and Chomsky2016, Reference Berwick and Chomsky2017; Hauser et al., Reference Hauser, Yang, Berwick, Tattersall, Ryan, Watumull and Lewontin2014).
These contributions substantially enrich previous insights that human communication is evolutionarily grounded in cooperative social ecologies, and that key cognitive processes involved in communication derive, evolutionarily, from non-communicative aspects of social cognition shared with other primates (e.g., Arbib, Reference Arbib2012; Dor, Knight, & Lewis, Reference Dor, Knight, Lewis, Dor, Knight and Lewis2014; Frith & Frith, Reference Frith and Frith2010; Fitch, Huber, & Bugnyar, Reference Fitch, Huber and Bugnyar2010; Hrdy & Burkart, Reference Hrdy and Burkart2020; Hurford, Reference Hurford2007; Johansson, Reference Johansson2021; Levinson & Holler, Reference Levinson and Holler2014; Moore, Reference Moore2017; Scott-Phillips, Reference Scott-Phillips2015; Seyfarth & Cheney, Reference Seyfarth and Cheney2018; Sperber, Reference Sperber and Sperber2000; Sterelny, Reference Sterelny2012; Tomasello, Reference Tomasello2008; Wheeler & Fischer, Reference Wheeler and Fischer2012; Zlatev, Żywiczyński, & Wacewicz, Reference Zlatev, Żywiczyński and Wacewicz2020). Going beyond this point of agreement, we provide an especially focused description of the relevant cognitive capacities, grounded in a precise theory of cognitive pragmatics. In other words, we “scratch beneath the surface” (Graham, Wilke, Lahiff, & Slocombe, Reference Graham, Wilke, Lahiff and Slocombe2020), to describe the computational tasks the interactional engine must perform. We also describe how these cognitive capacities are employed for use in a wide range of domains, extending far beyond how communication is ordinarily construed. In doing this we do not make recourse to notions such as “we-intentionality,” which is, in our view, not a cognitive process but a behavioural phenomenon that is itself in need of explanation.
The structure of the paper is illustrated in Figure 1. As it suggests, we elaborate on the problem of leashed expression in the next section, and we explain how it is resolved (sect. 5) after we have provided a taxonomy of ways in which individuals can affect the minds of others (sect. 3), and make adaptive inferences based on others' behaviour (sect. 4).
2. Expression leashed
We use “expression” to describe any trait or behaviour whose function is to inform others. This characterisation is solely functional in nature, and not mechanistic.Footnote 1 In this way, it is sufficiently broad to be inclusive of whatever means this function might be achieved, including, for instance, emotional expression; but also sufficiently narrow so as not to include any and all cases of information flow, regardless of function. In this section we describe how expression in this functional sense is leashed rather than freely open-ended. In later sections we focus on one specific manifestation of expression, namely behaviours that result from informative intentions.Footnote 2
Crucially, an organism's expressive range is limited by the fact that only a specific and finite range of stimuli will actually generate a psychological reaction in other organisms. After all, organisms attend only to a limited subset of other organisms' behaviour, and will take from this only a limited range of information. This makes expression about, for instance, future events, or the location of far away food sources, effectively impossible without mechanisms of interpretation that complement mechanisms of production. Consider bee dance: It would not – and could not – express the location of pollen if other bees had no mechanisms specifically dedicated exactly to interpreting dances as indicators of the location of pollen.
Communication is thus, in our terms, a subset of expression. Although expression is the production of stimuli the function of which is to generate a psychological reaction, communication involves the production of stimuli the function of which is to generate a reaction by the particular means of stimulating complementary mechanisms of interpretation (Scott-Phillips, Blythe, Gardner, & West, Reference Scott-Phillips, Blythe, Gardner and West2012). By “complementary,” we mean that each mechanism can perform its function only under conditions when the other mechanism is in place. Bee dance, for instance, is communication because it generates a reaction by means of stimulating complementary mechanisms of interpretation; and those mechanisms of interpretation can only perform their function (to learn about pollen) if bee dance actually takes place. In contrast, frightening behaviour can be expressive but not communicative: It can generate a reaction, but not necessarily by stimulating complementary mechanisms of interpretation. The great ape behaviours commonly known as “attention-getters” are another possible example: They appear to trigger a mixed set of mechanisms that may not be complementary in the relevant way (see, e.g., Tomasello & Call, Reference Tomasello and Call2019).
Correspondingly, we characterise communicative stimuli as those that generate a reaction by triggering complementary mechanisms of interpretation; and we characterise non-communicative stimuli as stimuli that generate a reaction by triggering other mechanisms, with different functions. Expression thus involves producing either communicative or non-communicative stimuli to change others' psychological states, while communication involves producing specifically communicative stimuli for the same function. As such, the evolutionary emergence of mechanisms of interpretation in audiences enriches expression because they complement mechanisms of production, producing communication systems as a result. (This is not the same as unleashing expression: see below.) This distinction between expression and communication is useful and important because it frames the important questions in the right way. First, it raises the question of whether or not an expressive function is met communicatively, that is, by means of triggering the audience's dedicated interpretative capacities. Second, it raises the questions of how and why expressive and interpretative capacities might co-evolve.
Crucially, however, the emergence of communication systems does not usually unleash expression, because communication systems are tied to domains of statistical mutual benefit (e.g., Maynard Smith & Harper, Reference Maynard Smith and Harper2003; Searcy & Nowicki, Reference Searcy and Nowicki2005; inter alia). This is because the interdependence of mechanisms of production and mechanisms of interpretation means that for communication to be stable, it must be beneficial, on average, to both communicator and audience. This does not imply that communication is always of mutual benefit, or that deception never occurs. However, it does imply that communication must be sufficiently beneficial, sufficiently often, for both parties, otherwise it would collapse. Explaining why this does not happen is the central theoretical issue in animal signalling theory (Maynard Smith & Harper, Reference Maynard Smith and Harper2003; Searcy & Nowicki, Reference Searcy and Nowicki2005). Answers to this question are usually developed in the context of evolution by natural selection, but pairs of mutually stable mechanisms of production and comprehension can also emerge across the lifespan, such as in the case of ontogenetic ritualisation (see, e.g., Halina, Rossano, & Tomasello, Reference Halina, Rossano and Tomasello2013, for description).
In some cases this mutual benefit derives from genetic relatedness, such as with ant pheromones or bee dance. In other cases it derives from direct fitness effects on communicator and audience. For instance, the pattern on the wings of many poisonous butterfly species communicates their non-palatability to potential predators, because this is beneficial to both the butterfly and the potential predator. The function of the pattern is to inform the potential predator, and the function of the predator's reaction is to avoid feeding on such butterflies. Clearly deception can occur: Other species of butterfly can be and have been selected to mimic the focal species, even if they (the other species) are not themselves poisonous. Missing out on the possibility of preying on the mimic species is an opportunity cost for the predator, but this is outweighed by the benefits of avoiding poisonous butterflies. If this were not so – if, in other words, the same pattern is used by so many actually palatable organisms that the predator's opportunity costs outweigh the risks of eating unpalatable prey – then the communication system would collapse. The predator will, under these circumstances, not evolve any mechanism for attending to the signal in the first place; or, if they already have such a mechanism, it will be selected against (Scott-Phillips et al., Reference Scott-Phillips, Blythe, Gardner and West2012). These and other dynamics, such as those associated with the differential costs of signalling (Lachmann, Szamado, & Bergstrom, Reference Lachmann, Szamado and Bergstrom2001; Scott-Phillips, Reference Scott-Phillips, Smith, Smith and Ferrer i Cancho2008), leash communication to relatively narrow domains of statistical mutual benefit.
Human communication appears to be in flagrant violation of this limitation. Its range is certainly not restricted to any particular topic: Humans can communicate about potentially anything. Moreover, we commonly and ordinarily take people at their word, even for statements that can have immediate and serious fitness consequences, such as, for instance, a doctor's medical diagnosis and prescriptions. Moreover, humans frequently communicate about phenomena for which no directly observable evidence could ever be provided, such as statements about past or future events. Given this vast expressive range, audiences should be massively vulnerable to misinformation and deception; and what should follow, on ordinary evolutionary logic, is the collapse of the communication system itself (see above). Yet this does not happen. Explaining why this is so is, we believe, not only necessary for explaining the evolution and the expressive richness of human communication, but also being fundamental.
Let us summarise. Expression can be enriched when supplemented with complementary mechanisms for interpretation, generating a communication system. This does not however unleash expression: This does not make expression open-ended. On the contrary, standard evolutionary considerations tell us that communication systems are still expressively limited because they are only evolutionarily stable when there is little gain (in the aggregate) to deception. Yet as a truly open-ended means of expression, human communication does not seem to be restricted in the same way. How is this paradox resolved? Why would it be adaptive for humans to have the sort of interpretative mechanisms they do, given the central evolutionary problem of stability? The assertion that human expressivity is enabled by combinatoriality offers no answer to this problem. We will provide an answer (sects. 3–7) by relating the evolution of human communication to the evolution of cognitive mechanisms that specifically function to allow humans to make the most informative use of social interaction. These mechanisms are, we shall argue, both a consequence and a cause of a partner choice social ecology.
The study of cognitive mechanisms for human expression is traditionally the domain of cognitive pragmatics: The study of the capacity of mind that facilitates human communicative competence. The relevant literature has its most important origins in the work of philosopher Paul Grice (Reference Grice1957, Reference Grice, Cole and Morgan1975, Reference Grice1989). Grice was particularly concerned with meaning, and his key originality was to approach it as a primarily psychological phenomenon, and a linguistic phenomenon only derivatively. In particular he developed the idea that intentions might play a key role in determining meaning itself. This work provides the foundations on which a cognitive theory of communication and expression can be built, and since Grice an extensive literature has developed this approach in various ways, including from evolutionary and developmental perspectives (e.g., Bach & Harnish, Reference Bach and Harnish1979; Clark, Reference Clark1996; Csibra & Gergely, Reference Csibra and Gergely2009; Levinson, Reference Levinson2000; Moore, Reference Moore2017; Scott-Phillips, Reference Scott-Phillips2015; Sperber & Wilson, Reference Sperber and Wilson1986/1995; Tomasello, Reference Tomasello2008; Wilson & Sperber, Reference Wilson and Sperber2012; inter alia). Specific approaches differ from one another in some of the detail, but all agree that the expression, recognition, and epistemic evaluation of intentions together play a foundational role. In what follows we adopt and enrich the post-Gricean approach commonly known as relevance theory, which specifies key notions for communication (“communicative intention,” “informative intention,” “ostension” in computational terms (Carston, Reference Carston2002; Clark, Reference Clark2013; Padilla Cruz, Reference Padilla Cruz2016; Sperber & Wilson, Reference Sperber and Wilson1986/1995; Wilson & Sperber, Reference Wilson and Sperber2012; see also the Relevance Theory Online Bibliographic Service). Our analysis could potentially be adapted to fit with more classically Gricean or neo-Gricean approaches (for focused comparisons see, e.g., Carston, Reference Carston2004; Sperber & Wilson, Reference Sperber, Wilson, Jackson and Smith2007).
3. Graded forms of manipulative intention
In a pair of seminal papers, Krebs and Dawkins characterised animal communication as an arms race between means of affecting the minds and behaviours of other organisms – labelled “manipulation” – and means of reacting in an adaptive way to the behaviour of others (Dawkins & Krebs, Reference Dawkins and Krebs1978; Krebs & Dawkins, Reference Krebs, Dawkins, Krebs and Davies1984; see also Guilford & Dawkins, Reference Guilford and Dawkins1991). Manipulation is a broad term, to include, for instance, the handling of objects in a goal-directed way. Krebs and Dawkins's insight was to think of informing others as a means of “manipulating” their behaviour.Footnote 3 In this section we relate this teleological characterisation to the specific, cognitive mechanisms by which it is achieved in humans. In the next section we do the same for the audience side.
The specific mechanisms by which communication is achieved in the natural world are many and varied. They might be, for instance, physiological, as in the case of, say, butterfly wing patterns; or chemical, as in the case of, say, quorum sensing (Diggle, Gardner, West, & Griffin, Reference Diggle, Gardner, West and Griffin2007). Our focus is on cognitive means, and more specifically the expression and recognition of intentions. Specifically, we shall focus on informative and communicative intentions, which are proximate, cognitive processes for the functional tasks of expressing and communicating, respectively. We distinguish three embedded categories of manipulative intention, elaborating on each with examples (Fig. 2). Each of the following subsections begins with a concise statement of each category, followed by examples and elaboration.
3.1. Intentional action on others
The broadest set are behaviours that are intentional and manipulative. For instance, experimental studies show how orangutan mothers will, if necessary, use their offspring as physical tools (Völter, Rossano, & Call, Reference Völter, Rossano and Call2015). Because of their small size, infants can reach food in locations that mothers cannot reach, so mothers can (and do) use them to reach the food, with the mother then consuming the food herself.
3.2. Action based on informative intention
In the second set are behaviours that intentionally change mental states, and which can do so without overtly bringing attention to the intention itself. We call the underlying intentions “informative intentions.”Footnote 4 For instance, an individual might dress in a smart and conservative way, as a means to suggest to others competence and professionalism, yet without bringing excessive attention to oneself. Conspicuous consumption is intended to provide evidence of wealth and other markers of status, but without necessarily advertising this intent in a formally overt way. In the presence of others we might adopt a bodily posture that suggests, say, social ease and competence, and while this can be done in an overt or otherwise exaggerated way, it need not be. More generally, impression management – individuals presenting themselves in ways intended, subconsciously or otherwise, to generate and maintain a positive image in the eyes of others, but without overtly bringing attention to this informative intent – is a common feature of human social life.Footnote 5
Such behaviour can generate a degree of shared knowledge about the actor's informative intent. In other words, it may be salient that the actor has and is acting on an informative intention. That said, this need not necessarily be the case. In fact in some cases the actor might have informative intentions but also have strategic motives to actively keep those intentions hidden or at least deniable (called “hidden authorship”; see, e.g., Grosse, Scott-Phillips, & Tomasello, Reference Grosse, Scott-Phillips and Tomasello2013). A criminal who plants misleading cues in a crime scene has an informative intention and is acting on it, but simultaneously hiding that intention. A dinner guest who wishes to have more wine but, recognising it would be impolite to ask, might wait until her hosts' attention is elsewhere and then move her empty glass to a conspicuous location where it will, in due course, be noticed. Many public acts of generosity fall within this category also: Generous individuals want to be seen as generous, so they gain a positive social reputation, but often they do not want their acts of generosity to be seen as simply attempts to gain a positive social reputation, because that would immediately undermine their purpose (Berman, Levine, Barasch, & Small, Reference Berman, Levine, Barasch and Small2015; Frank, Reference Frank1988; Hoffman, Yoeli, & Nowak, Reference Hoffman, Yoeli and Nowak2015; Karabegovic & Heintz, Reference Karabegovic and Heintzunder review; see also sect. 8.3 on punishment).
In all these cases, agents satisfy their informative intentions simply by means of providing non-communicative, or “direct,” evidence for what they want to convey.Footnote 6 If, for instance, Amy puts three apples on the table with the intention of informing Barry that there are three apples, she is providing non-communicative evidence for the presence of the three apples. She can, moreover, do this without any indication that she actually has an informative intention: She can just place the apples on the table, without drawing any particular attention to the fact that she is doing this. Looking comparatively, we take it as highly plausible that nonhuman primates, and possibly some other species, have informative intentions, and satisfy them by providing non-communicative evidence (see, e.g., Genty & Zuberbühler, Reference Genty and Zuberbühler2014, for a plausible example; Zuberbühler, Reference Zuberbühler2018, for a review; and Warren & Call, Reference Warren and Callin revision, for discussion). The key comparative questions are, in our view, whether any nonhuman species act in the ways described in the next subsection, where we consider cases where the communicator provides evidence for the informative intention itself (see Moore, Reference Moore2016, Reference Moore2017, Reference Moore2018, for a different approach to the same issues). Such cases are sometimes called overtly intentional, because they involve making intentions overt; or more simply, ostensive.Footnote 7
3.3. Action based on communicative intention
In this third set are behaviours performed not only with an intention to inform an audience, as above, but, more than this, to make the actor's informative intention mutually known. This is achieved, as we said above, by communicators providing non-communicative evidence for their informative intention itself.
To see the difference between this set and the one above, consider two possible ways in which Mary might satisfy her intent that Peter be informed that some berries are edible (see also Scott-Phillips, Reference Scott-Phillips2015; Sperber, Reference Sperber and Sperber2000; Wharton, Reference Wharton and Brown2006). One way Mary might do this is to simply eat the berries in Peter's company (without Mary bringing any particular attention to the fact that she is doing this). In this case Mary has an informative intention which she acts on by providing some evidence that the berries are edible, but without giving any overt evidence that she is acting on an informative intention. As such she relies on Peter simply attending to her behaviour and drawing the inference that the berries are edible. In this case, Mary's behaviour belongs to the second embedded subset (sect. 3.2). There is however an alternative. Mary might not eat the berries at all, but instead mime eating them, perhaps with exaggerated movements and while tapping her tummy. Here she has the same informative intention, but provides evidence only about the intention itself. She does not eat the berries, after all. She provides only communicative evidence of their edibility.
The most salient and important special case of overtly intentional behaviour is, of course, the use of conventional symbols, especially, but not only, in the context of language use (sect. 8.5). Grice's work on meaning was focused on these cases, and his crucial insight was that the sort of actions we are describing in this section – providing evidence about informative intentions – is what generates meaning. As he put it, “‘A meant something by x’ is (roughly) equivalent to ‘A intended the utterance of x to produce some effect in an audience by means of the recognition of this intention’” (Reference Grice, Cole and Morgan1957, p. 385, italics added; the intention referred to here is best understood, in our analysis, as an informative intention).
That said, overtly intentional behaviour is also viable in cases where no conventions are used to communicate. After all, almost any behaviour that humans can perform, they can perform in an overtly intentional way. Sometimes we eat food, and sometimes we eat food in an overt, exaggerated, or otherwise ostensive way, to express to others that the food is tasty, revolting, generous, or fancy. Sometimes we blink, and sometimes we blink with microscopic exaggeration, such as with a slight delay in re-opening the eyes, to express, say, ironic surprise. Such deviations from otherwise non-communicative behaviour have been experimentally isolated in a number of studies, in both production and comprehension (e.g., McEllin, Sebanz, & Knoblich, Reference McEllin, Sebanz and Knoblich2018a; Newman-Norlund et al., Reference Newman-Norlund, Noordzij, Newman-Norlund, Volman, De Ruiter, Hagoort and Toni2009; Royka, Aboody, & Jara-Ettinger, Reference Royka, Aboody and Jara-Ettinger2018; Scott-Phillips, Kirby, & Ritchie, Reference Scott-Phillips, Kirby and Ritchie2009; Vesper, Morisseau, Knoblich, & Sperber, Reference Vesper, Morisseau, Knoblich and Sperber2021). Again, like language use, such behaviour is Gricean. It provides evidence about informative intentions.
We have so far presented the distinction between the three subsets in this section as categorical, but they are in fact graded and continuous (Sperber, Reference Sperber and Scott2019). That is, different means of manipulation can, we suggest, vary in the extent to which the actor makes her informative intention manifest. Grice's characterisation of meaning, quoted above, describes one end of this continuum; the cases described in section 3.1 represent the other end; and in between are many cases where communicators make informative intentions partially manifest. We shall return to this graded quality in section 8, where we shall suggest that it helps to generate the massive diversity of human expression. (For related but different continua see Duranti, Reference Duranti2015, p. 290; Sperber & Wilson, Reference Sperber and Wilson2015; Wharton, Reference Wharton2009.)
In any case, informing others via the expression of informative intentions – commonly called “communicating” – has a distinctive property that is not present in the other subsets described above. Crucially, the communicator is now freed from the constraint of providing non-communicative evidence (see also sect. 5). Returning to the example above, Mary makes eye contact with Peter and mimes eating berries in an exaggerated way, with the informative intent that Peter believes the berries are edible, and in doing so she provides only communicative – and hence indirect – evidence that the berries are edible. So communication provides Mary with a second inferential route to the same conclusion. Mary intends Peter to believe that the berries are edible, and she can do this either directly, just by eating the berries, or indirectly, by making manifest her intention.
However, for this potential freedom of expression to be realised two related issues must be addressed. First, freedom from the constraint of having to provide non-communicative (“direct”) evidence depends on audiences' complementary abilities to infer informative intentions, more-or-less accurately, on the basis just of whatever other, communicative evidence communicators are able to provide (see sect. 2 on the importance of complementary mechanisms). Second, expressive freedom also provides communicators with the opportunity to deceive, leaving the system clearly prone to evolutionary instability (sect. 2). To discuss how these issues are resolved, we turn now to the audience side.
4. Graded forms of social vigilance
All animal species have evolved adaptive reactions to the presence and behaviour of others. As with manipulation (sect. 3), we define these adaptive reactions functionally, recognising that the specific mechanisms can be many and varied.Footnote 8 For example, a chameleon's adaptive reaction to the presence of potential predators is (we assume) largely physiological. We focus here on cognitive means of adaptive reaction targeted at the behaviour of conspecifics. We call these cognitive means “social vigilance” (Heintz, Karabegovic, & Molnar, Reference Heintz, Karabegovic and Molnar2016). Again we distinguish three graded and embedded subsets (Fig. 3), and we describe their relationship to the various categories of intentional manipulation described in the previous section.
4.1. Inferences about others' intentions
The first embedded subset includes inferences based on the capacity to anticipate and respond adaptively to the intentional action of others. Humans do this routinely, of course. Others' intentions are relevant to us, such as when we decide whether to avoid or engage with them as friends, rivals, or indeed any social relationship. The capacity to behave in ways that take account of other individuals' intentions has also been experimentally documented in many studies with nonhuman great apes (Andrews, Reference Andrews2017; Bettle & Rosati, Reference Bettle and Rosati2021; Call & Tomasello, Reference Call and Tomasello2008; Emery & Clayton, Reference Emery and Clayton2009). In one such experiment, chimpanzees are given the opportunity to take a piece of food from a bucket, the location of which is either known or not known by a dominant conspecific. The key finding is that the subordinate chimpanzee is more likely to take the food if the location is unknown to the dominant (Hare, Call, Agnetta, & Tomasello, Reference Hare, Call, Agnetta and Tomasello2000). The details of exactly which intentions and other mental states great apes attribute to others remain a topic of active study; but in any case many experiments show that chimpanzees are able, in some contexts at least, to adaptively modulate their behaviour in view of what the conspecific is most likely to do, and what effects this might have on the focal individual. Similar modulations have been documented in many non-primate species. Grey squirrels, for instance, have been shown to modulate their caching behaviour as a function of the presence of onlookers, for example, moving a cache when the onlooker leaves (Leaver, Hopewell, Caldwell, & Mallarky, Reference Leaver, Hopewell, Caldwell and Mallarky2007); and ravens have been shown to guard their caches against discovery, taking into account other ravens' possible knowledge of the cache (Bugnyar, Reber, & Buckner, Reference Bugnyar, Reber and Buckner2016).Footnote 9
4.2. Inferences about others' informative intentions
In the second embedded subset are inferences about others' informative intentions. The possibility of such inferences, and their nature, depends to a significant extent on whether the informative intentions in question have been made overt (sect. 3.3) or not (sect. 3.2) by the actor, or communicator.
If an informative intention is not overt and audiences do not recognise it, then what is perceived is simply instrumental action, which like any behaviour might or might not be relevant to the observer. If, alternatively, an informative intention is not overt but is recognised nevertheless, audiences might take account of this absence of overtness, and the possible reasons for it, in their interpretation. Suppose, for instance, Claire leaves Dwight's keys on the table, with the informative intention that by virtue of seeing them Dwight does not forget the keys when he goes to work. Dwight might simply see the keys and thus remember to take them, not ever recognising Claire's informative intention, which was after all not overt. Alternatively, Dwight might recognise that Claire had an informative intention even though she did not make this overt, and he might hence infer that she thinks he is absentminded but she does not want to embarrass him by saying so explicitly. In any case, humans commonly act with informative intentions that they do not make overt, but which are sometimes recognised nevertheless. Section 8.1 on coordination smoothers and section 8.3 on expressive punishment discuss some specific examples.
Actors attempting to make an informative intention overt have a communicative intention (by definition: see sect. 3.3). If the informative intention is nevertheless not recognised, such as when, for instance, a raised hand, intended as a request to ask a question, is interpreted as a mere stretch of the arm, then that is a simple failure of communication. If, in contrast, an overt informative intention is recognised as being overt, such as when a raised hand intended as a request to ask a question is indeed recognised as a request to ask a question – if, in other words, the audience recognises that the actor has not only an informative intention, but also a communicative intention – then the audience is warranted in making an inferentially powerful presumption about the behaviour. Specifically, the audience is warranted in presuming that the behaviour is the most effective one the communicator could produce, given the communicators' goals, abilities, and the constraints acting on them. This insight is central to relevance theory (Sperber & Wilson, Reference Sperber and Wilson1986/1995, Reference Sperber and Wilson2002, Reference Sperber, Wilson, Jackson and Smith2007), where it is called the communicative principle of relevance. In the next section we describe in more detail the nature of the inferential process that is triggered, in audiences, by this recognition of communicative intent.
4.3. Inferences about others' communicative intentions
In section 3.3 we described how communicators sometime express their informative intentions overtly, with the goal to make their informative intention mutually known. Here we discuss how such stimuli are interpreted.
Language use is a paradigmatic example, and one of Grice's pioneering insights was that the interpretation of utterances is guided by prior expectations about the cooperative intent of communicators (mirroring his characterisation of linguistic meaning: see sect. 3.3).Footnote 10 Further developments in cognitive pragmatics have specified and debated the nature of these expectations in more detail. Relevance theory, for instance, describes these expectations in terms of a single assumption, that ostensively presented stimuli are optimally relevant for the intended audience, given the speakers' goals, abilities, and the constraints acting on them. Or, in other words, audiences have a strong positive prior expectation that overtly intentional behaviour is cooperative; and this prior expectation of cooperativeness in turn licenses a presumption that informative intentions are worth paying attention to, that is, are optimally relevant. Here and elsewhere, “optimally relevant” means, more precisely, that communicators strive to optimise the trade-off between cognitive effects and processing effort, subject to goals, abilities, and constraints.
Here is an example. Amy and Barry are drinking in a bar. Amy's glass, which is visible to both her and Barry, is empty. This fact is on its own unremarkable. Suppose now that Amy picks up the glass and gently waves it in front of Barry. Why would she do this? What could it possibly “mean,” and how could Barry know? The relevance theory answer is that Amy's behaviour triggers in Barry a spontaneous process of interpretation, governed by a cooperative assumption that Amy's behaviour is the most optimally relevant behaviour she could perform given her goals and the circumstances. The key point here, which features in some form in all Gricean approaches, is that only with this cooperative assumption can Barry converge on the conclusion that Amy's behaviour is a suggestion that they stay for another drink. Otherwise, without this assumption, Amy's behaviour is simply mysterious. Many experimental studies have shown how prospective audiences interpret communicative behaviours under presumptions of optimal relevance (e.g., Gibbs & Bryant, Reference Gibbs and Bryant2008; van der Henst, Carles, & Sperber, Reference van der Henst, Carles and Sperber2002; inter alia).
This is very similar to how other specialised cognitive mechanisms “embody” knowledge about the nature of objects, magnitudes, species, and other basic, fundamental features of the human evolutionary ecology (Carey, Reference Carey2009; Spelke & Kinzler, Reference Spelke and Kinzler2007). In all these cases, items perceived as being of a particular type (an object, a magnitude, etc.) trigger specific assumptions about the nature of that item. For example, items perceived as physical objects trigger assumptions that the item is physically cohesive, bonded, rigid, and cannot be acted on at a distance (Spelke, Reference Spelke1990). In the present case, behaviour perceived as ostensive triggers an assumption that the behaviour is optimally relevant for the audience, given the communicator's goals, abilities, affordances, and constraints. This allows a specialised, “satisficing” process of interpretation to then derive the communicator's intended meaning (Sperber & Wilson, Reference Sperber and Wilson2002; see also, e.g., Ferreira & Patson, Reference Ferreira and Patson2007, on “good enough” approaches to linguistic comprehension). This process is, moreover, spontaneous and largely unconscious, meaning that we cannot “choose” not to perform it even if we wish to. As a revealing example, consider film spoilers: Our desire to not recover the meaning of what is said does not and cannot suspend the interpretive process. Again, this is akin to the recognition of objects, magnitudes, and so on, all of which we recognise and process in spontaneous and unconscious ways. We cannot “un-see” objects, and we cannot “un-understand” what others say. All in all, spontaneous interpretation of ostensive stimuli is a functionally specialised form of social vigilance, targeted at the specific phenomenon of others' ostensive behaviour. It is, moreover, a foundational aspect of human interaction: Without it, we simply would not be able to understand each other in communication.
As with the other side of the equation (sect. 3), we have so far presented the distinction between the subsets in this section as categorical, but they are in fact graded and continuous. Specifically, there is on the audience side variation in the extent to which recognition of the actor's informative intention contributes to correct interpretation, that is, to satisfying the informative intention. Again, we shall return to these graded aspects in section 8.
5. Unleashing expression, together
We can now summarise how a system of communication predicated on the expression and recognition of informative intentions can unleash expression.
Crucially, the metarepresentational structure of ostensive communication generates a “virtual” domain generality (see also Mercier & Sperber, Reference Mercier, Sperber, Evans and Frankish2009, on virtual domain generality in cognition more broadly).Footnote 11 Communicators provide evidence of their informative intentions, which can in turn be about anything at all. Consider again the example of Mary, who makes eye contact with Peter and mimes eating berries in an exaggerated way. By doing so, she provides evidence of her informative intention that Peter understands that the berries are edible; and this intention is informative about the actual edibility of the berries only in turn. This metarepresentational structure makes the expressive domain of ostensive communication effectively open-ended (unleashed), even though the actual domain of the relevant cognitive capacities is narrow and specific: It is just the communicator's informative intentions.
This in turn has two important consequences, which together generate a further important corollary.
First, a metarepresentational structure is how ostensive communication can be expressively open-ended while still conforming to the central evolutionary constraint that communication systems are tied to narrow domains of statistical mutual benefit. In section 2 we summarised why, from an evolutionary perspective, all evolved communication systems should be tied to narrow domains of statistical mutual benefit, and we observed that human communication appears to be in flagrant violation of this constraint. Now we can state how the paradox can be resolved. The actual domain of the cognitive capacities that underpin ostensive communication is indeed still restricted to a narrow domain of statistical mutual benefit, namely the communicator's own informative intentions. At the same time, the metarepresentational structure generates an expressive domain that is truly open-ended.
Second, the metarepresentational structure of ostensive communication generates a distinction between comprehension and acceptance. Comprehension is targeted at the informative intention itself: To comprehend is to recognise the informative intention that an individual has towards another (“She wants me to believe that the berries are edible”). Acceptance, in contrast, is targeted at what the informative intention is about, that is, what is “virtual.” To accept is to actually update one's own beliefs in light of what has been communicated (“The berries are indeed edible”).
Together, these two consequences imply that audiences cannot actually gain from communication unless they extend a degree of trust towards the communicator. The distinction between comprehension and acceptance, and the massive open-endedness of human communication, together mean that audiences who do not extend a degree of trust towards ostensive communicators would comprehend what others want to do to their minds, but would never then update their beliefs in light of that knowledge. They would never allow themselves to gain information in communication! Peter would understand that Mary wants him to believe that the berries are edible, but Peter unless he extends some trust towards her, he will never believe that the berries are actually edible. Of course, this trust must be tentative and provisional, lest audiences be misinformed, but it must be extended in some way, just for audiences to gain from communication in the first place.
In consequence, cognitive capacities for expressing and recognising informative intentions must be complemented by further capacities that allow audiences to trust what is communicated but in a vigilant way, possibly questioning the competence or the benevolence of the communicator, and evaluating the plausibility of what is communicated. Commonly known as epistemic vigilance, these cognitive capacities are a specialised form of social vigilance, targeted at the assessment of informative intentions (Mercier, Reference Mercier2017; Sperber et al., Reference Sperber, Clément, Heintz, Mascaro, Mercier, Origgi and Wilson2010). They also allow audiences to identify misleading communicators, and hence adjust the attention and trust they are willing to grant in the future.
In sum, then, the open-ended richness of human communication is achieved virtually, by a combination of cognitive capacities for the expression of informative intentions, cognitive capacities for the recognition of informative intentions, and cognitive capacities of epistemic vigilance, all of which are functionally tied to one another. Thus, to properly explain the expressive openness of human communication, what must be described is: (1) how these cognitive capacities could all gradually co-evolve and be mutually supportive of one another, such that they form a communication system; and (2) the ecological reasons why they have actually done so in humans. The next section addresses these questions.
6. Co-evolutionary ecology of human communication
Many authors have observed how human communication must have co-evolved in a highly social ecology, one way or another (sect. 1). Here we identify which specific and distinctive aspects of human social ecologies can generate the co-evolution of the cognitive capacities described in sections 3 and 4, such that fully enriched ostensive communication can become uniform and stable in the population. We also provide precise description of how this co-evolution could occur in a gradual manner.
Arguably the most distinctive feature of the human cognitive niche is that it is highly social. Humans tend to live in social groups that are loosely defined but long-lasting, and comprised of both kin and non-kin. To a degree that surpasses that of other great apes, this social ecology generates many opportunities for win-win cooperation, and risks of exploitation. More broadly, human social ecologies involve an especially delicate balance of cooperation and competition, with substantial evolutionary pressure for behaviours that make the most of this mix (Ferriere, Bronstein, Rinaldi, Law, & Gauduchon, Reference Ferriere, Bronstein, Rinaldi, Law and Gauduchon2002; Noë & Hammerstein, Reference Noë and Hammerstein1995; Tomasello, Carpenter, & Liszkowski, Reference Tomasello, Carpenter and Liszkowski2007). Individuals acting in their own adaptive self-interest seek out others (“friends,” “colleagues”) with whom to engage in mutually beneficial enterprises, and they behave in ways that increase their chances of being chosen as a partner for joint enterprise (e.g., Barclay, Reference Barclay2013; Krems, Williams, Aktipis, & Kenrick, Reference Krems, Williams, Aktipis and Kenrick2021). A partner choice ecology will include many limitations about who might or might not be available for a joint enterprise. It need not be a perfect market of potential partners. Still, its main feature is that individuals might gain or lose win-win opportunities depending on what others think of them. Humans have hence evolved a number of cognitive capacities adapted for this ecology, including moral dispositions, mechanisms of social vigilance that identify potential partners and opportunities for mutually beneficial interaction, an awareness of potential opportunities to exploit others, a strong sensitivity to changes in one's reputation, and so on (see, e.g., Barrett, Cosmides, & Tooby, Reference Barrett, Cosmides and Tooby2010; Baumard, André, & Sperber, Reference Baumard, André and Sperber2013; Curry, Mullins, & Whitehouse, Reference Curry, Mullins and Whitehouse2019; Delton & Robertson, Reference Delton and Robertson2012; Engelmann & Tomasello, Reference Engelmann and Tomasello2019; Heintz et al., Reference Heintz, Karabegovic and Molnar2016; McCullough, Reference McCullough2020; Origgi, Reference Origgi2004, Reference Origgi2005; Sperber & Baumard, Reference Sperber and Baumard2012).
These factors collectively constitute a “partner choice” social ecology; or, possibly, an ecology of “self-domestication.” This means, minimally, that it is advantageous to be selected as a partner for some joint enterprise (other than mating), and that the selection of partners for joint enterprise is based on information on past actions. Reputations are thus especially critical.Footnote 12
In this partner choice ecology, the cognitive means of manipulation described in section 3.3, and the cognitive means of social vigilance described in section 4.3, are each adaptive. Particularly important is the role of social commitments. By providing evidence of their informative intention, informers make themselves accountable to their audience, putting their reputation at stake; and audiences can hence effectively assume the relevance of overtly expressed informative intentions. This evolutionary dynamic is, incidentally, similar to that described by partner choice approaches to fairness (André & Baumard, Reference André and Baumard2011; Barclay, Reference Barclay2013, Reference Barclay2016; Debove, André, & Baumard, Reference Debove, André and Baumard2015). In both cases, the adaptive value of maintaining one's reputation in a partner choice ecology constitutes a crucial selection pressure for psychological traits, which in turn generates prosocial behaviour. In the case of ostensive communicators, prosocial behaviour means being relevant.
The following five paragraphs elaborate this argument in more detail. They hence provide an existence proof of how and why the cognitive capacities described in previous sections could have evolved in a gradual manner (for similar but different approaches see, e.g., Cornell & Wharton, Reference Cornell, Wharton, Ifantidou, de Saussure and Wharton2021; Wharton, Reference Wharton and Brown2006). Indeed, “lineage explanations,” in which changes in the phenotype result from incremental changes, are an important constraint on theories of cognitive evolution, especially co-evolutionary theories (Calcott, Reference Calcott2009).
Consider a social ecology with many, varied opportunities for win-win cooperation. Here, informing others can be adaptive, because it can facilitate win-win cooperation or even create new win-win opportunities. In particular, informing others can generate common ground and hence facilitate many joint enterprises (such as, say, animal hunting, building shelter, maintaining a fire, alloparenting). Informing others can also have fitness advantages by allowing the social transmission of opaque skills to cooperative partners and kin.
These potential adaptive benefits in turn mean that it is adaptive to recognise and attend to others' attempts to gain attention (sect. 4.2). That said, attention is limited and thus should be modulated depending on whether others' attempts to inform are likely to be worth the attention indeed, that is, are revealing of relevant information. Individuals should be socially vigilant towards others' informative intentions, evaluating whether, or to what extent, the intentions are indeed cooperative. It will hence be in informers' own interests to actually be relevant for their audience, because those who intentionally attract attention but fail to do so in ways that are useful (relevant) will, in time, incur costs to their reputation and lose their capacity to manipulate their conspecifics' attention. In other words, there will be selection for behaviours that intentionally attract others' attention only when it is likely to be worthwhile for the audience to indeed pay attention (see also Dessalles, Reference Dessalles, Hurford, Studdert-Kennedy and Knight1998).
At this point, expression has not yet “gone Gricean” (we adopt this useful expression from Bar-On, Reference Bar-On2013), and as such humans are not yet “language ready.” This is because informative intentions are not intentionally made overt, and so expression is not yet predicated on the systematic exploitation of the audience's recognition of informative intentions (Grice's “by means of” clause: see sect. 3.3). Expression is based just on behaviours that informers expect will be relevant to others; and “comprehension” is based just on tentative assumptions that others' expressive behaviour is likely to be relevant for the same reason. These tendencies and dispositions do, however, constitute a new social ecology, and it is here that “going Gricean” is adaptive.
Crucially, in this new social ecology – in which audiences might expect others' informative behaviour to be relevant, and in which a reputation for being a good cooperator can be gained and lost – it is adaptive to make manifest informative intentions themselves, that is, make overt and common knowledge the intentions you have towards your audiences' mind. This is adaptive because, by making informative intentions manifest, informers effectively offer a credible commitment that the overtly presented behaviour will indeed be relevant for the audience; which in turn increases the probability that the informative intention will indeed be satisfied. In other words, because the overt expression of an informative intention makes that informative intention common knowledge; and because in a partner choice social ecology there is the risk of developing a reputation for irrelevance and hence of losing the possibility of influencing others' minds; then communicators are effectively committed to their behaviour being useful (relevant) for the audience. This in turn makes it adaptive for audiences to simply presume – even if just tentatively at first – that the behaviour is indeed relevant, and to interpret the behaviour in light of this presumption of cooperativeness.
The social ecology is now one in which the overt expression of informative intentions effectively commits informers to being relevant to their audience, and in which informers indeed abide by that commitment (see also Scott-Phillips, Reference Scott-Phillips2010; Sperber, Reference Sperber2013). In this new ecology, it is adaptive for audiences to evolve two kinds of specialised cognitive disposition. (1) High prior expectations that others' communicative behaviour will be relevant. These will eventually become the spontaneous presumptions of relevance described in section 4.3. (2) Forms of social vigilance that assess how, and to what extent, beliefs should be updated in view of others' informative intentions. These will eventually become the epistemic vigilance described in section 5. Now audiences have these two kinds of specialised cognitive disposition, individuals can inform simply “by means of” (as Grice put it) making their informative intentions manifest. This is Gricean communication proper.
Let us summarise. What we are arguing is that outside a partner choice social ecology, communication and expression are highly prone to irrelevance, deception, and instability; but within a partner choice social ecology there is selective advantage for behaviour that is cooperative (statistically speaking at least), which in the context of communication means relevant. Within this social ecology a gradual, cognitive co-evolution of specialised capacities for ostensive communication is hence possible. As with other aspects of core cognition, these capacities, which provide the foundations of human communication (sect. 5), should become part of the ordinarily developing cognitive phenotype, emerging at reliable and predictable stages of ontogeny.
There is, correspondingly, abundant empirical evidence of this reliable and predictable cognitive development in humans, both in language use and communication more broadly (e.g., Bates, Reference Bates1979; Bloom, Reference Bloom2002; Bohn & Frank, Reference Bohn and Frank2019; Clark, Reference Clark2003; Goldin-Meadow, Reference Goldin-Meadow2005; Tomasello et al., Reference Tomasello, Carpenter and Liszkowski2007; inter alia). In the next section we consider whether, or in what ways and to what extent, the same cognitive capacities might be present in nonhuman primates.
7. Cross-species comparisons
Do any other species communicate ostensively, or in proto- or quasi-ostensive ways? How would we know? These questions are worth asking because nonhuman species, great apes and dogs in particular, sometimes appear to understand some human ostensive behaviour, at least in some specific contexts. Here we outline how this question can be addressed experimentally, we present an ecology-based explanation of key differences between human and nonhuman great ape communication, and we reinterpret some key findings in the comparative cognition of communication.
The key methodological challenge in studying ostensive communication from a comparative perspective is that ostension is ultimately a psychological construct, that is, ostension is not any particular behaviour, but rather any behaviour motivated by a particular cognitive phenomenon, namely informative intentions (sect. 3.3). This makes it impossible to fully isolate behavioural characteristics. As such it will always be theoretically possible to reinterpret any behavioural differences between experimental conditions in a non-mentalistic (‘killjoy’) way, so the relevant cognitive capacities are not ascribed to the individual animal participants, or species. One response to this methodological challenge has been to, effectively, abandon use of the Gricean framework in the study of animal communication (e.g., Townsend et al., Reference Townsend, Koski, Byrne, Slocombe, Bickel, Boeckle and Manser2017). In contrast, we suggest that experiments revealing the relevant intentions and interpretative processes are still possible.
In particular, the hypothesis that nonhuman primates can be sensitive to ostension qua ostension (i.e., sensitive to the expression of an informative intention) can be tested by contrasting two scenarios in which the exact same ostensive behaviour prompts spontaneous identification of different informative intentions, and hence different behavioural responses, depending only on what is in the common ground. Real-world human communication is replete with examples. Ordinary utterances such as, say, “It's raining” can, even if produced in exactly the same way in each case, be interpreted in wildly different ways depending only on the present common ground (“Take an umbrella,” “Even the whether can't lighten my mood,” “Your parents won't be visiting after all”). The same can be true of non-verbal means of communication, such as points and nods, not to mention spontaneous and ad hoc gestures, and indeed all human ostensive stimuli.Footnote 13
We predict, tentatively, that if any nonhuman primates do reliably pass tasks of the type described above, it will be individuals with extensive experience of altruistic human caregivers, and not those living the natural social ecologies of nonhuman primates. Nonhuman primate social ecologies involve fewer and less frequent opportunities for interactions of mutual benefit. That is not to say such opportunities are absent, but they are much less prevalent relative to the human case, and in consequence the relevant selection pressures are not (as) present. This is, we believe, why nonhuman great apes have not evolved all of the same communicative dispositions as humans. At the same time, nonhuman primates living under captive conditions, with human caregivers who are more-or-less uniformly cooperative, could – perhaps – develop the relevant dispositions and expectations ontogenetically (see, e.g., Call, Reference Call2011, for discussion of how rearing conditions can affect chimpanzee communication). In other words, a “proto” presumption of relevance could result from non-standard life history in nonhuman apes. If so, then we should expect some recognition and interpretation of human ostensive behaviours qua ostensive in at least some individuals, albeit in imperfect and happenstance ways. This prediction is of course ultimately a matter for future empirical research, but it aligns with existing findings that humans differ from other great apes in dispositions of trust and cooperation (see, e.g., Jaeggi, Burkart, & van Schaik, Reference Jaeggi, Burkart and van Schaik2010; Moll & Tomasello, Reference Moll and Tomasello2007; Tomasello & Call, Reference Tomasello and Call2019).
Here is an analogy that helps to articulate this difference between specialised competences that are part of the ordinarily developing phenotype (as ostensive communication is in humans), and latent competencies that might be refined in the right ecology (as might be the case for ostensive communication in chimpanzees and some other nonhuman primates). Consider humans swinging from trees. Human bodies are not especially well-suited to this task. We lack the specialised biological apparatus of other primates and we do not develop the relevant dispositions as an ordinary part of ontogeny. At the same time, there is no absolute barrier. Some humans can swing from trees in some limited ways and to some extent, and this basic ability can be refined and enhanced with training: in other words, in the right ecology. What we are suggesting, tentatively, is that ostensive communication in other primates, living in the right sort of social ecologies, might be similar: Not impossible and not wholly absent, but still unspecialised, disfluent, not a regular part of the environment, and not part of the ordinarily developing phenotype.
This perspective on nonhuman primate cognition can help make sense of otherwise puzzling findings in the comparative psychology of communication. We highlight two examples in particular: performance in the “object-choice task,” and the phenomenon of overimitation. (This is obviously not an exhaustive list of relevant comparisons. As one of many further phenomena to explain, great ape interaction tends to have a dyadic rather than triadic character; see Pika, Liebal, Call, & Tomasello, Reference Pika, Liebal, Call and Tomasello2005.)
First, in the object-choice task a desirable object is shown to the participant, and then placed in one of the two boxes, or bins. The participant does not know which of the two boxes contain the desirable object. The two boxes are placed either side of the experimenter, who then points to the box with the desirable object. The participant is then free to open the boxes. Many nonhuman primates “fail” this task: In many studies nonhuman primates do not choose the indicated box at levels greater than chance (see Clark, Elsherif, & Leavens, Reference Clark, Elsherif and Leavens2019, for a recent review). We suggest that this occurs simply because the relevant cognitive processes employed by the audience are, in nonhuman primates, not ordinarily predicated on a presumption of cooperation, which in the context of pointing means communicative relevance. Dogs, in contrast, perform far better at the object-choice task (see below), as do human infants.
Second, we consider “overimitation,” in which individuals copy actions demonstrated to them, including in particular those that are perceivably causally irrelevant (e.g., tapping a box before opening it, even when the tapping makes no difference to whether or how the box is opened). Intriguingly, overimitation is only consistently observed in humans, including children, and not in chimpanzees or bonobos (Clay & Tennie, Reference Clay and Tennie2018; Hoehl et al., Reference Hoehl, Keupp, Schleihauf, McGuigan, Buttelmann and Whiten2019; Horner & Whiten, Reference Horner and Whiten2005; Johnston, Holden, & Santos, Reference Johnston, Holden and Santos2017; Lyons, Young, & Keil, Reference Lyons, Young and Keil2007). This finding has prompted speculation that overimitation derives from a cognitive adaptation for acquiring generic, cultural knowledge (e.g., Chudek & Henrich, Reference Chudek and Henrich2011; Gergely, Reference Gergely, Metcalfe and Terrace2013; Legare & Nielsen, Reference Legare and Nielsen2015; Nielsen & Tomaselli, Reference Nielsen and Tomaselli2010). We suggest, in contrast, that overimitation is best explained as a by-product of audience presumptions of relevance. Overimitation reliably occurs only when the copied behaviour has been performed in an overtly intentional (i.e., ostensive) way (see, e.g., Király, Csibra, & Gergely, Reference Király, Csibra and Gergely2013, for experimental demonstration). This triggers in the audience a spontaneous process of interpretation, which is predicated on a presumption of relevance (sect. 4.3), hence delivering the (incorrect) conclusion that the demonstrated actions are useful, even if that utility is currently opaque to the audience. We are suggesting, in short, that the reason only humans reliably demonstrate overimitation is that only humans reliably interpret ostensive behaviour in terms of optimal relevance (see also Morin, Reference Morin2016, pp. 244–245). We note, consistent with this interpretation, that overimitation emerges in development very soon after the emergence of ostensive communication.
Finally, dogs are also an informative point of comparison. Dogs have been subject to a long period of domestication in which humans are often prosocial towards them. In this ecology it can be adaptive for dogs to simply presume that when humans attempt to gain their attention, it is indeed worthwhile to actually pay attention. Correspondingly, dogs seem to be sensitive to some of the most salient human ostensive behaviours (see, e.g., Topál, Kis, & Oláh, Reference Topál, Kis, Oláh, Kaminski and Marshall-Pescini2014; Wynne, Reference Wynne2016, for reviews). This sensitivity, moreover, emerges early in development and is highly heritable (Bray et al., Reference Bray, Gnanadesikan, Horschler, Levy, Kennedy, Famula and MacLean2021). Presumably, dogs do not make the same interpretative inferences as humans, but they do show how, in the right evolutionary ecologies, it can be adaptive to spontaneously presume that others – in this case, human owners – are being cooperative when they attempt to gain attention.
8. Diversity in human expression
In this section, we suggest how the cognitive mechanisms described in previous sections underpin many otherwise diverse means of human expression. Our main goal is demonstrative: We aim to highlight how otherwise distinct means of expression and communication appear in new light when considered from the perspective of cognitive unity. We focus in particular on the examples of coordination smoothers (sect. 8.1), teaching (sect. 8.2), punishment (sect. 8.3), art (sect. 8.4), and languages (sect. 8.5). In each case, we summarise how these different means of expression each employ the common, unified set of cognitive capacities described in previous sections, but in different ways in each case. If we are right about this, then the evolutionary emergence of specialised, cognitive means of manipulation (sect. 3) and social vigilance (sect. 4), which together unleash expression (sect. 5), is a root cause of many of the most distinctive aspects of human behaviour and societies. Other domains that have been studied from a perspective broadly similar to ours include divination (Boyer, Reference Boyer2020), humour (Yus, Reference Yus2016), emotional expression (Dezecache, Mercier, & Scott-Phillips, Reference Dezecache, Mercier and Scott-Phillips2013; Wharton, Bonard, Dukes, Sander, & Oswald, Reference Wharton, Bonard, Dukes, Sander and Oswald2021), literary interpretation (Cave & Wilson, Reference Cave and Wilson2018; Chapman & Clark, Reference Chapman and Clark2019), mathematical diagrams (McCallum, Reference McCallum2019), onomatopoeia (Sasamoto, Reference Sasamoto2019), many borderline or quasi-linguistic phenomena (Ifantidou, de Saussure, & Wharton, Reference Ifantidou, de Saussure and Wharton2021), and others.
We will highlight in particular the importance of graded aspects of human expression. In section 3.3 we mentioned how there is a graded quality on the production side: Different means of expression can vary in the extent to which the actor makes her informative intention manifest. In section 4.3 we mentioned how there is, in turn, variation on the audience side: Recognition of the actor's informative intention can contribute to satisfying the informative intention to different degrees in each case. In the subsections below, we describe how different forms of human expression make use of these graded aspects in a range of different ways; and we suggest that in general people aim to make their informative intentions manifest just to the extent that the informative intention is likely to be satisfied, but no more so.
8.1. Coordination smoothers
To coordinate with one another in joint actions, such as dancing with a partner, maintaining a tempo, moving large objects together, and many others, individuals must be informed about each other's behaviour and likely future behaviour, often on a moment-by-moment basis (Sebanz, Bekkering, & Knoblich, Reference Sebanz, Bekkering and Knoblich2006). This can occur passively, but individuals also behave in ways that actively facilitate the flow of information for joint action. For instance, two people may have a goal to lift and move a large table. In lifting their end of the table, each person might move in slightly exaggerated ways, in order to be predictable. Such behaviours are called “coordination smoothers”: they enable predictability for coordination (Vesper, Butterfill, Knoblich, & Sebanz, Reference Vesper, Butterfill, Knoblich and Sebanz2010, Reference Vesper, Abramova, Bütepage, Ciardo, Crossey, Effenberg and Wahn2017). Some cases of coordination smoothing are clearly communicative, such as road signs and forms of language use targeted at easing the flow of conversation (“discourse markers,” “procedural meaning”: see, e.g., Blakemore, Reference Blakemore2002; Gibbs & Bryant, Reference Gibbs and Bryant2008). However, cases in which informative intentions are less overt have only recently been explicitly analysed in terms of communication and expression (e.g., Pezzulo et al., Reference Pezzulo, Donnarumma, Dindo, D'Ausilio, Konvalinka and Castelfranchi2019).
Consider two people, Jane and Paul, walking towards one another on a relatively narrow street. Jane makes a clear movement towards one side in order to make her action predictable to Paul. This informs Paul that Jane intends to proceed on the right, and it can do so even if Paul does not recognise that Jane has this informative intention. Alternatively, Jane might exaggerate her movement to one side. This makes her informative intention more manifest, and Paul can hence infer that she has an informative intention that he believes she will proceed on the right, and trust in it. This raises the question: When, why, and to what extent do individuals make their informative intentions manifest? When should Jane not just clearly move, but exaggerate her movement to one side? And to what extent? Correspondingly, on the audience side: To what extent does recognition of the actor's informative intention contribute to successful coordination smoothing? These are all empirical questions whose answers depend on how individuals take into account the constraints and the affordances of the situation, and how they navigate graded dimensions of human expression.
Many results in the experimental study of joint action suggest that people indeed exaggerate or otherwise adjust their actions to the extent that it is useful to do so for the purposes of informing others; or, in other words, they make the least costly difference that is large enough to make a difference (e.g., Curioni, Vesper, Knoblich, & Sebanz, Reference Curioni, Vesper, Knoblich and Sebanz2019; McEllin et al., Reference McEllin, Sebanz and Knoblich2018a, Reference McEllin, Knoblich and Sebanz2018b; Schmitz, Vesper, Sebanz, & Knoblich, Reference Schmitz, Vesper, Sebanz and Knoblich2018). Observers, in turn, attribute to others commitments to behaving in the predicted way just to the extent that those others are perceived as acting on a communicative intention (see, e.g., Bonalumi, Tacha, Scott-Phillips, & Heintz, Reference Bonalumi, Tacha, Scott-Phillips and Heintz2020, Reference Bonalumi, Michael and Heintz2021; Gibbs & Bryant, Reference Gibbs and Bryant2008). In sum, we are suggesting that coordination smoothing is a form of human expression, in which people navigate graded aspects of human expression in a competent way.
8.2. Teaching
Human teaching is richly diverse. Ethnographies of teaching reveal practices that span the full range of human expression, ranging from tolerated observation, in which a skilled individual just allows others to observe her in practice, to, at the other extreme, direct verbal statements from the teacher, which the learner is expected to internalise (e.g., Lave & Wenger, Reference Lave and Wenger1991; Marchand, Reference Marchand2010; Sugiyama, Reference Sugiyama2021). Most actual instances of teaching lie between these extremes and also include, for instance, repeated demonstration, performance, exaggeration, role-play, and countless other forms of expression (see, e.g., Kline, Reference Kline2015 for a recent review). We suggest that, as with coordination smoothers, one way to organise the diversity of teaching is in terms of graded distinctions in human expression. Different means of teaching vary in the extent to which the teacher makes her informative intention manifest; and also the extent to which recognition of the teacher's informative intention contributes to successful interpretation, and hence learning.
Consider, for instance, a dance teacher (see, e.g., Downey, Reference Downey2008 for cognitively informed ethnography of dance teaching). At one extreme, she might simply repeat a dance step multiple times, possibly from a range of different angles, allowing learners to observe, without any further guidance about which aspects of the movement to attend to. Here the teacher has an informative intention, and this intention is not hidden at all, but the teacher does not make the intention manifest, and the learners employ means-end relations to learn. Alternatively, at the other extreme, the teacher might openly exaggerate some of her movements and hence highlight especially relevant aspects, which would otherwise remain unnoticed. By doing this the teacher makes her informative intention manifest. This triggers in learners spontaneous presumptions of relevance (sect. 4.3), which allows them to differentiate exaggeration from the actual target behaviour, and hence identify what is especially relevant about the teacher's movements. In between the extremes are cases where the teacher makes her informative intention somewhat manifest, such as when she slows her movement but in a slight way only. In these ways and others, teaching is a phenomenon that, in its diversity, spans the two graded dimensions of human expression we highlighted above.
When should different modes of teaching be employed? Sometimes it is suitable to simply perform the target behaviour as usual and just allow observation, sometimes it is suitable to make an informative intention somewhat manifest – and sometimes it is necessary that the teacher's informative intention be made wholly manifest. In particular, by triggering learners' presumption of relevance, teachers can teach things even though what makes those things relevant is opaque to the learners (Gergely & Csibra, Reference Gergely, Csibra, Enfield and Levinson2006). A real-world example is teaching the counting routine to children, who learn by presuming that the routine is relevant even though they do not yet understand its real utility. This mode of teaching is arguably crucial for conceptual change (Heintz, Reference Heintz2011).
This perspective on teaching, as a subclass of human expressive behaviours, contrasts with approaches that treat teaching as a distinct behaviour in need of its own gradual evolution (e.g., Csibra & Gergely, Reference Csibra and Gergely2011; Gärdenfors & Högberg, Reference Gärdenfors and Högberg2017). It can also help to explain why teaching is prevalent in human societies but relatively rare in other species (see, e.g., Hoppitt et al., Reference Hoppitt, Brown, Kendal, Rendell, Thornton, Webster and Laland2008; Thornton & Raihani, Reference Thornton and Raihani2008, for comparative perspectives). Human teaching involves the dynamic use of unleashed expression. Analogous behaviours are observable in other species, but not with the same dynamic, open-ended, and flexible range of behaviours that are afforded by truly unleashed expression, and readily exploited by human teachers and learners.
8.3. Punishment
Punishment is not always or intuitively thought of as a means of expression. Particularly within cognitive science and cognate fields such as behavioural economics, punishment has traditionally been modelled simply as retribution or incentive, such that it discourages or deters specific behaviours. The actual delivery of punishment is then necessary only to maintain the integrity of the incentive structure. Yet the rewards and punishment that humans tend to produce are actually inefficient for this goal. That is, they do not incentivise the target behaviour well, contrary to the intuitive model (Cushman, Sarin, & Ho, Reference Cushman, Sarin, Ho, Doris and Vargas2022; Ho, Cushman, Littman, & Austerweil, Reference Ho, Cushman, Littman and Austerweil2019). The way people punish also includes an important expressive dimension (ibid.). More precisely, punishment is, we suggest, used to inform others that future exploitative behaviour will result in future costs (see also e.g., Sripada, Reference Sripada2005).
But why, then, is this expressive function not patently apparent? In other domains (linguistic communication, teaching, art) expression and communication are utterly plain to see. We suggest that punishment is usually most effective when its communicative aspects (sect. 3.3) are somewhat hidden, even though its informative aspects (sect. 3.2) are present.
The crucial point is that in ordinary social relations, punishment is credible only if the incentives behind it are perceived to be stable; but in fact, in ordinary social relations the incentives to inform are unstable. Specifically, they are dependent on the possibilities of future collaboration: If we are unlikely to interact in the future, I have no substantive incentive to inform you that your behaviour was unacceptable. So if the real incentives that motivate punishment were actually made manifest, they would be revealed as unstable and would hence undermine the credibility of punishment as a means of informing. In this respect punishment is akin to generosity. We mentioned in section 3.2 that there is a slight paradox to generosity, in the sense that while it can be motivated by an intention to advertise oneself as prosocial, this intention should not itself be too manifest, lest the act of generosity be seen as insincere. We are suggesting that a similar dynamic plays out on the punishment side: Punishment does not credibly demonstrate a willingness to retaliate against anti-social behaviour if it is perceived as being motivated only by an intention to demonstrate a willingness to retaliate against anti-social behaviour. So for both generosity and punishment, the communicative aspect should not be salient, because this undermines the very purpose of the behaviour, namely to inform others of likely future costs and benefits.
Institutional forms of punishment, in particular the legal punishments of nation states, provide a revealing contrast. In jurisprudence the communicative aspects of legal punishment have been long recognised: Punishment is described as, for instance, “a conventional device for the expression of attitudes of resentment and indignation, and of judgments of disapproval and reprobation” (Feinberg, Reference Feinberg1965, p. 400; more recently see, e.g., Duff, Reference Duff2001; Hampton, Reference Hampton and Cragg1992; Primoratz, Reference Primoratz1989). The expressive dimension of punishment is straightforwardly recognised in the legal domain because, we suggest, there is in this domain no real doubt that the punisher (the nation state) has a long-term, stable incentive to inform the audience (citizens) about what is unacceptable. Nation states hence have no real need to hide the communicative dimensions of their punishments.
The expressive dimensions of punishment can be investigated in much more detail. In particular, we know of no studies that focused on whether, or to what extent, people interpret or understand punishment as communicative; nor on the effectiveness of punishment as a means of expression. Such studies would form an important bridge between cognitive pragmatics and other fields, such as the newly emerging area of experimental jurisprudence (Sommers, Reference Sommers2021). We also know of no existing research examining the expressive dimensions of legal punishment from a historical perspective.
8.4. Art
Art is clearly expressive in some way, and audiences interpret artistic outputs in open-ended ways. Modern audiences in particular are granted a great deal of autonomy in how they engage with art, and are encouraged to develop and seek out their own, often highly personal interpretations. We suggest that, like the other examples above, the open-endedness of artistic expression derives from the natural character of human expression more broadly; and hence that the interpretations that audiences derive from artwork, which are often highly personal, are nevertheless often prompted by and derive from the same set of cognitive processes that govern more ordinary forms of communication.
Crucially, what differentiates artistic expression from ordinary behaviour is not any fundamental aesthetic quality as such, but rather the overt presentation of an object as an aesthetic experience worthy of attention. This is the conclusion reached in many lines of research and argument from at least four different fields: traditional art theory (e.g., Danto, Reference Danto1964; Dickie, Reference Dickie1987), cognitive pragmatics (e.g., McCallum, Mitchell, & Scott-Phillips, Reference McCallum, Mitchell and Scott-Phillips2020; Pignocchi, Reference Pignocchi, Cova and Rehault2019), philosophy of mind (e.g., Fodor, Reference Fodor and Rollins2012), and social anthropology (e.g., Dissanayake, Reference Dissanayake1988, Reference Dissanayake2003). The overt presentation of objects as worthy of attention is respectively labelled “Artworld,” “ostensive,” “Gricean,” or “making special,” in each of these four literatures. This consensus in turn suggests that while proclivities towards aesthetic experience might be observable in other species, the evolutionary emergence of cognitive capacities for unleashed expression, as a core part of the human cognitive phenotype (sect. 6), allows those proclivities to be expressed in overtly intentional ways, thus differentiating art from other forms of aesthetic experience.
The emergence of art institutions can further reinforce these effects. Contemporary art galleries in particular use white walls, open spaces, and other features of curation to present artworks as items to be considered and appreciated, hence triggering, we claim, the audience presumptions of relevance that are an indelible part of ordinary communication. These and other institutional effects, which can generate highly personal, even idiosyncratic interpretations, are well established in art theory. What cognitive pragmatic perspectives help to provide is description of how these effects exploit and otherwise build on the way audiences spontaneously interpret ostensive behaviour in more ordinary forms of social interaction (McCallum et al., Reference McCallum, Mitchell and Scott-Phillips2020; Pignocchi, Reference Pignocchi, Cova and Rehault2019).
8.5. Languages
Language use is quintessentially ostensive. Unlike some of the other means of expression described above, language use involves making informative intentions wholly manifest. We are arguing, in other words, that cognitive capacities for ostensive communication are foundational to language use: There could not be any languages or linguistic communication without the prior existence of cognitive capacities for ostensive communication (see also e.g., Levinson, Reference Levinson2006; Scott-Phillips, Reference Scott-Phillips2015; Tomasello, Reference Tomasello2008). As we put it in section 1, natural languages exploit unleashed expression: not the other way around.
Languages have, of course, their own particular features that collectively distinguish them from other means of communication. In particular, linguistic communication makes ubiquitous and multilayered use of communicative conventions: phonemes, morphemes, words, grammars, and so on, which function to associate particular behaviours (speech, gesture) with particular inferences that the communicator intends to trigger in the audience. This “pragmatics-first” perspective on the nature of languages aligns with usage- and construction-based approaches to grammar, which emphasise how linguistic constructions are used as a means to provide evidence of speaker meaning (e.g., Bybee & Beckner, Reference Bybee, Beckner, Heine and Narrog2010; Goldberg, Reference Goldberg2003; Goldberg & Suttle, Reference Goldberg and Suttle2010; Hartmann & Pleyer, Reference Hartmann and Pleyer2021; Schmid, Reference Schmid2020; Tomasello, Reference Tomasello2003).
Where do communicative conventions come from? In recent decades the emergence and stabilisation of communicative conventions has been documented in several real-world case studies (e.g., Brentari & Goldin-Meadow, Reference Brentari and Goldin-Meadow2017; Kegl, Senghas, & Coppola, Reference Kegl, Senghas, Coppola and DeGraff1999; Meir et al., Reference Meir, Sandler, Padden, Aronoff, Marschark and Spencer2010), and studied experimentally in the laboratory (e.g., Fay, Arbib, & Garrod, Reference Fay, Arbib and Garrod2013; Granito, Tehrani, Kendal, & Scott-Phillips, Reference Granito, Tehrani, Kendal and Scott-Phillips2019; Motamedi, Schouwstra, Smith, Culbertson, & Kirby, Reference Motamedi, Schouwstra, Smith, Culbertson and Kirby2019; Raviv, Meyer, & Lev-Ari, Reference Raviv, Meyer and Lev-Ari2019; Schouwstra & de Swart, Reference Schouwstra and de Swart2014). What is commonly observed in these literatures is how behaviour that is sufficiently similar to previously successfully informative behaviour tends to be interpreted by audiences as a further token of the same type as previously used, even after just one interaction; and also how further repetition causes the focal behaviours to become increasingly conventional. How and why does this happen?
In answering this question, what is not often recognised is that the very same cognitive capacities that make ostensive communication possible in the first place, also play a pivotal role here. In particular, without audience presumptions of optimal relevance (sect. 4.3), behaviour that resembles past communicative behaviour is mysterious and strange. Why repeat a past behaviour, and bring attention to it?! Such behaviour is worth doing only if the attention-grabbing repetition of a past behaviour triggers in audiences the interpretation that the behaviour is being used for the same or similar purposes as previously. So past events have a role in communication not simply because of statistical learning of associations, but because alluding to past events is typically the most effective way to trigger audience presumptions of relevance, and hence interpretative inferences that are the same or similar to those triggered previously.
Furthermore, this allusion to past events will often mean that a given behaviour can afford to be slightly less complex or less elaborate than the previous version, so long as the allusion is still made. Communicative success becomes increasingly governed by the simple fact that echoing how behaviours have been successfully used in the past is the most efficient means of prompting the intended inferences in the audience. Repeated many times over, this allows gradual simplification of the stimuli; and in time helps to shape many of the features that are characteristic of natural languages, such as displaced reference, compositionality, predicate-argument structure, low levels of resemblance between form and use (“symbolism”), statistical relationships between word length and frequency of use (e.g., “Zipf's law”), and so on. These processes can, we believe, be fruitfully analysed within an epidemiological framework (e.g., Enfield, Reference Enfield2003, Reference Enfield2014; see also Scott-Phillips, Blancke, & Heintz, Reference Scott-Phillips, Blancke and Heintz2018).
Finally, we note that linguistic stimuli are processed by some dedicated cognitive capacities (see, e.g., Hagoort, Reference Hagoort2017 for a recent summary). Crucially, these capacities appear to work in parallel with cognitive processes of ostensive communication more broadly. On the comprehension side in particular, inference of what is said and inference of what is meant are not serial, with one following the other, but instead seem to impact on each other in a dynamic process of parallel “mutual adjustment” (see, e.g., Carston, Reference Carston2002; Sperber & Wilson, Reference Sperber, Wilson, Jackson and Smith2007; Wilson, Reference Wilson2004 for post-Gricean description of this process; and e.g., Paunov, Blank, & Fedorenko, Reference Paunov, Blank and Fedorenko2019; Spotorno, Koun, Prado, van der Henst, & Noveck, Reference Spotorno, Koun, Prado, van der Henst and Noveck2012; Vanlangendonck, Willems, & Hagoort, Reference Vanlangendonck, Willems and Hagoort2018 for neuroscientific evidence). Further and deeper integration of findings in psycholinguistics and cognitive pragmatics are important future research goals (Gibbs & Colston, Reference Gibbs and Colston2020; Noveck, Reference Noveck2018); but in any case, the evolutionary emergence of these dedicated capacities must have followed, rather than preceded, the evolutionary emergence of ostensive communication described in section 6.
9. Conclusion: Rethinking pragmatics
Historically speaking, pragmatics has been situated on the periphery of the language sciences, appearing in textbooks usually only as a fringe topic. We have argued, in contrast, that cognitive capacities for ostensive communication are foundational, because they unleash expression on a grand scale. This in turn redefines the domain of pragmatics itself. Rather than being narrowly conceived as the study of how context influences the interpretation of utterances – which is how it is conceived in many approaches – pragmatics should be characterised as the study of how, and the many means by which, informative intentions are satisfied. The core questions for pragmatics are how informative intentions are made manifest, and to what effect (e.g., Allott & Wilson, Reference Allott, Wilson, Allott, Lohndal and Rey2021). Language use and other conventionalised means of expression are the most salient specific instances, and are clearly central to human sociality, but there are many others too.
Here we have grounded this rethinking of pragmatics in a broad evolutionary and cognitive context. Human communication has long posed a challenge to evolutionary biology and signalling theory, because of its versatility, diversity, and clear proneness to deception (sect. 2). We have described how this problem is resolved in humans, allowing truly unleashed expression (sect. 5); described key graded differences between the specialised capacity of mind that drives human communication and other means of social cognition (sects. 3 and 4), some of which are shared with other great apes (sect. 7); and provided a crucial proof of evolvability, by relating these graded differences to specific and distinctive aspects of the human social ecology (sect. 6). We also described how several different means of human expression each employ the relevant cognitive capacities in interestingly different ways (sect. 8). If we are right about this, then the evolutionary emergence of specialised, cognitive means of influencing other minds (sect. 3.3), and of recognising what others want to do to your mind (sect. 4.3), is a root cause of many of the most distinctive aspects of human behaviour and societies.
An essential future goal is to establish a formal foundation for this reconceived pragmatics. An important first step in the development of scientific understanding is informed description of the phenomenon of interest, recognising its full complexity while simultaneously providing some provisional order and generality. We hope to have contributed to that goal here, which is too often neglected in contemporary cognitive psychology and cognate disciplines (e.g., Doliński, Reference Doliński2018; Oude Maatman, Reference Oude Maatman2021; Rai & Fiske, Reference Rai and Fiske2010; Rozin, Reference Rozin2001; inter alia). Formal models build on these descriptive foundations. In this respect Bayesian models of interaction, in which production and comprehension are modelled as interconnected planning problems over others' mental states, are a particularly promising direction (e.g., Goodman & Frank, Reference Goodman and Frank2016; Ho et al., Reference Ho, Cushman, Littman and Austerweil2019; Shafto, Goodman, & Griffiths, Reference Shafto, Goodman and Griffiths2014). Different specific models differ in the detail and we do not subscribe to all of the assumptions made.Footnote 14 Moreover, we do not believe that any existing model is sufficiently general to cover the full range of prototypical cases of communication (e.g., pointing, gesture, language use), let alone the broader diversity we highlighted in section 8. Nonetheless, we do believe that modelling human interaction in these terms is an important and promising research direction. We hope that our description of the evolutionary and cognitive foundations of human expression, in all its diversity, will aid this agenda.
More broadly, we agree with recent arguments that the human evolutionary sciences are presently too “cognition blind” (e.g., Heyes, Reference Heyes2019; Morin, Reference Morin2016; Singh et al., Reference Singh, Acerbi, Caldwell, Danchin, Isabel, Molleman and Derex2021), and that greater theoretical unity will be achieved by deeper consideration of the cognitive processes that underpin otherwise diverse human behaviours. The case of communication and expression is, we believe, a clear case in point. Quite commonly, particular types of expressive behaviour (e.g., language use, teaching, punishment, art) are considered each in isolation, without much consideration of the possibility that they might in fact derive from the same underlying biological trait: As if the evolution of running was considered in isolation from the evolution of walking, when in fact both are derived subfunctions of a unified capacity for bipedal locomotion. We have argued that the same applies here: Different types of human “expressive act” are each derived sub-functions of a unified capacity for ostensive communication.
Acknowledgments
For comments on previous drafts, we thank Josep Call, György Gergely, Günther Knoblich, Hugo Mercier, Dan Sperber, Cordula Vesper, Elizabeth Warren, Eva Wittenberg, and members of the Smash, Oscar and Aces research groups at Central European University. For extensive discussions on the cognitive and evolutionary basis of human expression, we thank the “Relevance Researchers Network,” and all members of the Somics project, “Constructing social minds: Coordination, communication, & cultural transmission.”
Financial support
CH and TS-P were financially supported by the European Research Council, under the European Union's Seventh Framework Programme (FP7/2007-2013)/ERC grant agreement no. 609819 (Somics project).
Conflict of interest
None.
Target article
Expression unleashed: The evolutionary and cognitive foundations of human communication
Related commentaries (18)
Cognitive pragmatics: Insights from homesign conversations
Expression unleashed in artificial intelligence
From the pragmatics of charades to the creation of language
Illustrating continuity between linguistic and non-linguistic human communication and expression
Loosening the leash: The unique emotional canvas of human screams
Metarepresentation, trust, and “unleashed expression”
No unleashed expression without language
On the murky dissociation between expression and communication
Ostensive communication, market exchange, mindshaping, and elephants
Primates unleashed
Putting the cart before the horse? The origin of information donation
Structuring unleashed expression: Developmental foundations of human communication
Teaching unleashes expression
The central problem is still evolutionary stability
The co-evolution of cooperation and communication: Alternative accounts
The evolutionary roots of goal-directed mechanisms: A communication account
The scaffolded evolution of human communication
What semantic dementia tells us about the ability to infer others' communicative intentions
Author response
Being ostensive (reply to commentaries on “Expression unleashed”)