Puzzling out graphic codes

Olivier Morin

doi:10.1017/S0140525X23002418

Puzzling out graphic codes

Published online by Cambridge University Press: 02 October 2023

Olivier Morin

Show author details

Olivier Morin*: Affiliation:
Max Planck Institute for Geoanthropology, Minds & Traditions Research Group, Jena, Germany [email protected]; https://www.shh.mpg.de/94549/themintgroup Institut Jean Nicod, CNRS, ENS, PSL University 29, Paris, France

Article contents

Abstract
Introduction
Graphic codes are more than mere images
The language–writing nexus
For and against the learning account
Debating and clarifying the standardization account
Can we solve the puzzle of ideography with production costs alone?
The future of ideography
Footnotes
References

Rights & Permissions

Abstract

This response takes advantage of the diverse and wide-ranging series of commentaries to clarify some aspects of the target article, and flesh out other aspects. My central point is a plea to take graphic codes seriously as codes, rather than as a kind of visual art or as a byproduct of spoken language; only in this way can the puzzle of ideography be identified and solved. In this perspective, I argue that graphic codes do not derive their expressive power from iconicity alone (unlike visual arts), and I clarify the peculiar relationship that ties writing to spoken language. I then discuss three possible solutions to the puzzle of ideography. I argue that a learning account still cannot explain why ideographies fail to evolve, even if we emancipate the learning account from the version that Liberman put forward; I develop my preferred solution, the “standardization account,” and contrast it with a third solution suggested by some commentaries, which says that ideographies do not evolve because they would make communication too costly. I consider, by way of conclusion, the consequences of these views for the future evolution of ideography.

Type: Author's Response
Information: Behavioral and Brain Sciences , Volume 46 , 2023 , e260

DOI: https://doi.org/10.1017/S0140525X23002418 [Opens in a new window]
Copyright: Copyright © The Author(s), 2023. Published by Cambridge University Press

R1. Introduction

One of my goals in presenting the target article to commentators was to show that the study of graphic codes can and should be a field of study in its own right; that the conventions linking inscribed symbols to specific meanings deserve to be studied on their own terms. Disciplinary divides have tended to relegate the study of writing, emblems, pictographs, specialized notations, and so on to two approaches that do not quite give them justice. One approach lumps them together with all possible means of expressions – from visual arts to gesture and language –, to be studied by a general theory of signs. Another approach treats them as borderline cases of linguistic communication – interesting insofar as they can validate theories developed within linguistics, but treated overall as linguistics' poor relation. Solving the puzzle of ideography requires (or so I argued) an approach of graphic codes that takes them and their unique properties seriously. Most of the contributors to this rich and diverse collection of commentaries appear to share this ambition. I am grateful to all of them for showing that the study of graphic codes may interest a broad range of disciplines, from semiotics to anthropology, from generative linguistics to typology, and from archaeology to neuroscience.

This response is organized into six sections. Section R.2 addresses the commentaries that see writing and other graphic codes as continuous with visual arts, comic books, and other means of expression that rely on iconicity; it defends the view that graphic codes do not rely on iconicity alone, and possess special properties by virtue of being codes. Section R.3 clarifies the claim that writing is a specialized notation of spoken language. Section R.4 revisits the “learning account,” a solution to the puzzle of ideography based on the view that a generalist ideography would be unlearnable. I address new arguments that the commentaries put forward in favor of the learning account, but also against it. Section R.5 clarifies my preferred solution to the puzzle of ideography, the “standardization account”; it holds that generalist ideographies fail to evolve not because they are unlearnable, but because it is difficult for their users to align on a shared code. Section R.6 addresses a different solution to the puzzle of ideography – a simpler one, which says the reason ideographies do not develop is because graphic symbols are too costly to produce. Section R.7 concludes with an attempt at synthesizing the many commentaries that responded to my speculative views on the future of ideography in the digital world.

R.2. Graphic codes are more than mere images

R.2.1. On distinguishing codes from noncodes

The target article was focused on the evolution of graphic codes, rather than graphic communication in general. The key property that marks out graphic codes from other forms of visual communication is the importance of conventional (or standardized) mappings between symbols and meanings. Writing systems, heraldic emblems, musical notations, and so on are highly codified in this sense. Visual art forms like paintings, graffiti, comic books, and so on, are not. This does not mean visual art does not carry information: As Sueur & Pelé's commentary notices, even simple abstract doodles or abstract paintings produced by apes possess informational content, in the sense that they form shapes that are both visually complex and predictable to a degree. But graphic codes carry information in a way that is quite different, and much more powerful. Graphic codes possess two important properties: They allow their users to compress a great deal of information into a few simple signs (Garrod, Fay, Lee, Oberlander, & MacLeod, Reference Garrod, Fay, Lee, Oberlander and MacLeod2007; Tamariz & Kirby, Reference Tamariz and Kirby2015; Winters & Morin, Reference Winters and Morin2019), but they require users to learn the code. This capacity to compress information is both a defining property and a key advantage of graphic codes, as Moldoveanu notices when he aptly defines graphic codes as a form of “source coding.” Being compressed, codified signals tend to be simpler, cheaper to produce, and less cumbersome to store. In contrast, noncodified representations convey a much smaller quantity of information. They tend to be more complex, thus more expensive to produce; but they can be interpreted immediately without the need to master a code. How do we recognize that a mode of expression relies on codification? Two simple cues are the amount of time or effort needed to learn the code, and the need for translation. Using this yardstick, we can easily see, for instance, that comic books are clearly less codified than natural language, and that Chinese characters are as codified as the vocabulary of a spoken language.

These points did not persuade all commentators. Zhang, Hu, Li, & Chen (Zhang et al.) argue that Chinese writing is iconic, not symbolic: The characters' meaning is immediately accessible without knowing the underlying code. Cohn & Schilperoord argue that comic books are a visual language on a par with spoken languages. Other commentaries (e.g., Straffon, Papa, Øhrn, & Bender [Straffon et al.]; Wisher & Tylén) broadly agree with the target article but underscore the importance of visual arts and iconic resources in graphic communication.

R.2.2. On iconicity

Signs are iconic, according to the target article's definition, if their meaning can be transparently derived from their form alone, without being acquainted with a specific conventional code. Following a broad scholarly consensus, I claimed that writing, like language, is not iconic in this way. Not all commentators agree. Zhang et al. (and perhaps Yan & Kliegl, more cautiously) argue that Chinese writing is iconic to a certain extent, whereas Wisher & Tylén cast doubt on the general idea that language is an arbitrary code. Zhang et al. argue that most Chinese characters directly describe the shape of things, citing characters such as 日 (rì, sun), 月 (yuè, moon), and 山 (shān, mountain). Yan & Kliegl echo this view in a more watered-down version, arguing that “…some [Chinese] characters are…largely recognizable even by untrained eyes (e.g., 山 for mountain and 田 for farmland).” Is this true?

The studies that investigated the iconicity of Chinese characters (Koriat & Levy, Reference Koriat and Levy1979; Luk & Bialystok, Reference Luk and Bialystok2005; Xiao & Treiman, Reference Xiao and Treiman2012) generally use a two-alternative forced choice (2AFC) paradigm. Participants are shown a character (e.g., 山) and asked to choose between two associated meanings, the correct one (mountain) and an incorrect distractor (e.g., lake). A 2AFC experiment is considered conclusive when a statistically significant proportion of guesses reaches above the level of chance (50%). This sets a rather low bar for a symbol to count as iconic. True iconicity would be obtained if participants spontaneously guessed a symbol's accurate meaning on seeing the symbol, without being primed with two meanings that include the right one. (This happens less than 2% of the time in Luk & Bialystok, Reference Luk and Bialystok2005.) Even using the 2AFC task, for cherry-picked sets of iconic characters, performances are mediocre. In the most recent study on the topic (Xiao & Treiman, Reference Xiao and Treiman2012), the characters cited by the commentaries, 山, 田, 月, and 日 are recognized at rates of 65, 85, 55, and 35%, respectively. And are such characters the majority? No, as Yan & Kliegl recognize. Out of the 213 (highly common) characters studied by Xiao and Treiman (Reference Xiao and Treiman2012), only 15 are guessed above chance in the 2AFC.

The example above underlines the perils of relying on 2AFC tasks to show that symbols or wordforms are iconic as opposed to arbitrary. Such studies show that symbols provide some information about their referents by iconicity alone, and they usually show that this amount is small. They do not demonstrate that the symbols derive most of their informative power from iconicity alone. More generally, studies showing nonrandom associations between (some) wordforms and their meaning are intriguing (Dingemanse, Blasi, Lupyan, Christiansen, & Monaghan, Reference Dingemanse, Blasi, Lupyan, Christiansen and Monaghan2015; Monaghan, Shillcock, Christiansen, & Kirby, Reference Monaghan, Shillcock, Christiansen and Kirby2014); but they do not claim to challenge the view that language mostly rests on conventional mappings between symbols and referents (Lewis, Reference Lewis1969; Skyrms, Reference Skyrms2010). They actually show the opposite: For instance, Monaghan et al. (Reference Monaghan, Shillcock, Christiansen and Kirby2014) estimate at 0.02% the share of variance in one-syllable English wordform properties that can be predicted by iconicity. Such differences in degree are massive enough to justify drawing a sharp boundary between spoken or signed languages, on the one hand, and on the other, means of expression that rely overwhelmingly on iconicity, such as visual arts.

R.2.3. Comics are not a visual language

In this spirit, I presented two simple arguments against the view that comic books rely on a visual language (Cohn, Reference Cohn2013). First, when exported to a different country, comic book drawings require no translation, whereas their written dialogues do. Second, the conventions of a particular genre (manga, for instance) can be assimilated in a few hours, compared to the years required to attain fluency in spoken language. More fine-grained arguments could be given. Wisher & Tylén, for instance, note that “the comics principle … has not evolved into conventionalized ideographic codes,” in particular when it comes to encoding anaphoric relations. In their commentary, Cohn & Schilperoord address none of these objections.

Most of the disagreements between me and Cohn & Schilperoord spring from the way we understand conventionality, or standardization (two words that I use interchangeably). The topic of the target article is the evolution of codes, understood as standardized (or conventional) mappings between symbols and their meanings (de Saussure, Reference de Saussure2011; Lewis, Reference Lewis1969; Millikan, Reference Millikan1998; Scott-Phillips, Reference Scott-Phillips2014). I took the view that the kind of standardization that matters is the standardization of the code that pairs meanings and symbols – in other words, codification. Linguistic codes, I argue, are far more standardized than many visual codes (including comics). Thanks to this, they can carry and store much more information, compared to merely iconic means of expression. Cohn & Schilperoord's reply stems from a different view of conventionality, or standardization. In their definition, contrary to mine, standardization has nothing to do with the way images map to meanings. Thus, purely iconic signs can be highly standardized. They provide a number of examples for iconic signs that are conventional, in the sense that artists in different traditions depict eyes or fists in the same way, quite different from the way that is taught in other traditions.

So far, we simply have different ways of using the same words; but the disagreement becomes substantial when Cohn & Schilperoord go on to claim that standardization has nothing to do with the distinction between iconic or noniconic signs. This, in my view, is tantamount to dismissing the contribution that standardized codes make to communication. Indeed, they go on to show that comic books, being standardized, are a full-blown language on a par with spoken languages. The fact that comic books overwhelmingly rely on iconicity is irrelevant in their view, because spoken languages (they claim) also rely on iconicity to some extent. (A point I addressed in the previous section.)

Suppose we agree, and the formal standardization of styles is all that matters for comic book styles to count as full-blown languages. That would compel us to put any form of graphic expression showing variations in style on the same footing as spoken language. This includes cave art (Guthrie, Reference Guthrie2006), culturally transmitted pottery decorations (Crema, Kandler, & Shennan, Reference Crema, Kandler and Shennan2016); beyond this, cultural patterns have been found, or claimed, for nest building in some bird species (Madden, Reference Madden2008), for primate tool making, and so on. All these things may be called “visual languages” – and why not? They are, after all, visual forms of expression, forming cultural patterns. Yet this simplification comes at a cost. It renders us incapable of putting a name on something that makes spoken and signed languages uniquely powerful and informative. That something is a shared code. A shared code allows language users to convey information cheaply and efficiently, saving them the effort of depicting or explaining meanings that are already encoded. Having a shared code requires standardizing mappings between symbols and meanings – mere standardization of forms does not suffice.

R.2.4. Visual arts cannot fulfill all the functions of graphic codes

I welcome Straffon et al.'s eloquent plea for taking seriously the variety of visual communication systems, and I fully agree with them that writing is very far from being the only efficient way of transmitting information with images. I am less persuaded by the claim that visual arts alone can enable rich forms of communication comparable to those that graphic codes make possible. I worry that we may be falling into an old trap: The temptation to underestimate the degree to which visual communication is codified. This temptation has been a recurrent problem in the study of American graphic codes – the region on which Straffon et al. focus their commentary. Cuna shamans' pictographs were dismissed as mere drawings before anthropologists like Severi (Reference Severi2019) showed how they worked; the very idea of a writing system native to America was dismissed until quite late (e.g., Gelb, Reference Gelb1963); and it took a long time before the sophisticated encoding system of khipus – not a writing system but a complex graphic–haptic code – was taken seriously (Urton, Reference Urton2017). Precedents like this make me wary of embracing the highly controversial view that Teotihuacan had no writing (Helmke & Nielsen, Reference Helmke and Nielsen2021, make a good case for a native glottographic writing system at Teotihuacan). Likewise, I doubt the Inka could have held together a vast empire using visual arts alone: They could hardly have done it without their khipus. These minor disagreements aside, I agree on Straffon et al.'s most important point: The evolution of writing was highly contingent and unpredictable, due in part to the availability of graphic codes that could fulfill some of its functions.

To summarize, the puzzle of ideography is a puzzle about the evolution of graphic codes. Graphic codes are mostly conventional: They cannot be read or produced fluently by someone who does not know the underlying standard for pairing symbols with meanings. As a corollary, many ideographic codes are not iconic at all (algebraic signs, logical symbols, monetary symbols, etc.). Some are only residually iconic, like musical notations. The relative height of notes on the staff score is iconically linked to their pitch, but this does not get us very far if we want to know a note's exact pitch, because the clef symbol is not encoded iconically, and neither are the alterations, and many aspects of the code have no iconic meaning – for example, the fact that white notes are longer than black ones. This is why codes like this one interest me: Because they can only evolve if their users are taught a series of conventions. Forms of expression that are not strongly codified, like visual arts, were not my primary concern, because they lack the crucial power of graphic codes: The power to compress information, thus allowing us to store and transport it.

R.3. The language–writing nexus

Relatively few commentaries took issue with the specialization hypothesis. This hypothesis claims that graphic codes necessarily encode a small range of meanings, which confines them to restricted topics, such as music, mathematics, or (to use Sterelny's excellent example) chess games. What makes the specialization hypothesis important is its application to writing. Writing, being a graphic code, is a specialized notation, not a general-purpose one. What is it specialized for? I defend the (rather banal) view that writing mostly encodes elements of language – be they phonemes, syllables, or morphemes. This does not mean that I consider writing a record of speech, or a phonography: Linguistic elements like phonemes are abstract, contrastive categories, not sounds. But my view does entail that the basic components of writing systems encode linguistic units and cannot be processed without an understanding of the encoded language.

R.3.1. Why literacy requires linguistic competence

Three commentaries that do not object to the specialization hypothesis in general nonetheless wish to nuance the claim that writing encodes language. Two of them (Yan & Kliegl; Zhang et al.) are specific to Chinese writing. Sterelny's remarks are much more general. He makes threeFootnote ¹ objections against the view that writing encodes language.

First, holding that view forces me to deny the possibility that someone achieves literacy in a language they cannot speak. This happens (in Sterelny's view) to academics who learnt a language through books alone. This case is very close to situations of literate diglossia (which the target article does mention). Diglossia typically occurs when a language spawns a literate variant that is overwhelmingly used by clerks for literate use. Classical Chinese, literate Arabic, or Renaissance Latin are cases in point. Such literate languages can develop on their own and become quite distinct from their spoken counterpart. The people who can read and write these languages can also read them aloud, and are trained to do so in specific settings (e.g., classical poetry recitations). This, to me, counts as a kind of linguistic competence, albeit one that lacks fluency. Today's classicists are able to speak Latin – their own literate kind of Latin. Sterelny seems to disagree, but holding this stance would force him to say that linguistic competence in Sumerian, ancient Egyptian, Hittite, and so on has entirely disappeared, turning most paleographers into strange impostors.

Sterelny turns to semantics for his last objection. If it were true that writing represents spoken language, he notes, then written sentences would be representations of spoken sentences, not of states of the world. Thus, the written sentence “Berlin is the capital of Germany” would have the corresponding spoken sentence as its truth condition: It would not be a statement of fact. This would make standard semantics inapplicable to written sentences. Sterelny reads a lot into my use of the verb “represent”; I more often wrote that writing systems encode spoken languages – probably a more adequate verb. Inscriptions that encode sentences should not be confounded with the truth-bearing proposition that the encoded sentence may express, as Frege (for instance) made clear:

A sentence which an author writes down is primarily a direction for forming a spoken sentence in a language whose sequences of sounds serve as signs for expressing a sense. So at first there is only a mediated connection set up between written signs and a sense that is expressed. But once this connection is established, we may also regard the written or printed sentence as an immediate expression of a thought, and so as a sentence in the strict sense of the word. (Frege, Reference Frege1920/1981, p. 260)

Written sentences, in this Fregean view, can be analyzed on two levels. On the first level, the printed inscription “Berlin is the capital of Germany” is an encoding of a possible spoken sentence, that is to say, a set of instructions or a recipe for forming a sentence (a set of possible spoken sentences, to be precise, because accentuation, prosody, etc. are not usually encoded). As an encoding, the inscribed sentence may be more or less accurate (for instance, it may contain typos); but it lacks full-blown truth-conditional meaning. On a second level, the sentence that the inscription encodes may express a proposition with truth conditions, if some additional conditions are fulfilled: The proposition must be expressed with the right assertoric force, and at the right moment (the proposition was neither true nor false in 1960 or 1750, but it is true today). Sterelny's challenge dissolves when we distinguish propositions, which are truth-bearers, from inscriptions, which are not.

R.3.2. Chinese writing

There was on the whole, relatively little push-back against the glottographic view of Chinese writing advocated in the target article. Zhang et al. dispute it, but only concerning its earliest manifestation, the writing on Shang oracle bones. Even though the use of phono-semantic compounds seems attested on oracle bones (Boltz, Reference Boltz1993), and even though Old Chinese phonology has been successfully reconstructed, I agree that the writing system was highly ambiguous (Demattè, Reference Demattè2022). In this it is similar to other pristine inventions of writing, where the mapping between syllables and symbols took a long time to become systematic and standardized (see, e.g., Hermalin & Regier, Reference Hermalin and Regier2019). Yan & Kliegl broadly agree that “Chinese encodes a natural spoken language just as alphabetic scripts do,” but also point out the rather unique aspects of the relation between Chinese writing and language. Their insightful commentary contains claims that I doubt – how easy it is for Japanese readers to read Chinese, or how iconic some Chinese characters may be. There are also claims that I agree with: It is true that “phonetic radical[s] merely provides an unreliable clue to [their] pronunciation.” However the same claim can be made (to varying degrees) about most other writing systems except the most regular ones (e.g., Finnish or Hungarian writing), because many writing systems are highly irregular (including English). Overall, I take Yan & Kliegl's welcome qualifications as matters of nuance.

R.3.3. Strengthening the specialization hypothesis: The centrality of language

Thus, only a minority of commentaries challenge the specialization hypothesis, or the glottographic view of language defended in the target article, and they do not provide strong arguments to reject it. Other commentaries seem generally to endorse it, and three commentaries go much further than the target article. Overmann sees a more watertight boundary between writing and other graphic codes than I do; Harbour argues that graphic codes rely on the language faculty even when they do not encode the language that their users speak; Winters argues for a strong role of language not just in the emergence of writing, but in the evolution of all graphic codes.

Overmann's commentary emphasizes the uniqueness of writing (which she calls “visible language”) vis-à-vis other graphic codes. She argues that writing is entirely distinct from other graphic codes (e.g., musical or numerical notations), functionally, psychologically, and historically. I agree with her that writing was a rather singular invention, which needs to be distinguished starkly from other graphic codes. That being said, I am not convinced that writing has “no direct material precursors” (because all pristine inventions of writing had clear precursors, I assume the truth of Overmann's claim hinges on what we mean by “direct”). Nor do I think that the use of symbol recombination is unique to writing (see sect. R.7.3). The specialization hypothesis does not need to overplay the uniqueness of writing.

Harbour's commentary highlights the many surprising ways in which graphic codes, including writing, may rely on the language faculty, or at least on cognitive mechanisms that are shared with the language faculty broadly construed (such as recursion). Harbour deftly shows that some of the linguistic structure in writing systems is not actually derived from the language that they encode. Writing systems have a degree of autonomy from their target language, and may form structures that have no counterpart in it, like the determinatives found in Egyptian hieroglyphs. These structures can be analyzed using the tools of linguistics, because they behave according to the same rules as similar structures found in other languages – but not in the encoded language. Thus, Egyptian hieroglyphs reinvented determinatives, even though the language they encode lacks them. With examples like this one, Harbour makes a convincing case for the research program that studies writing systems as linguistic objects without reducing them to mere reflections of the structure of their target language.

Harbour goes on to suggest that, because linguistic structure pervades graphic codes, ideography was always dead on arrival (so to speak). If ideography is defined as a language-independent graphic code, and if language-like structures always find ways to sneak into graphic codes, even the codes not made to encode them, how could a language-free ideography ever evolve? This argument rests on an ambiguity in how we define “language.” The way the target article defines them, ideographies are graphic codes that do not encode elements of a specific spoken (or signed) language. This leaves open the possibility that ideographies may themselves possess language-like features such as recursion, duality of patterning, or compositionality (see sect. R.7.3). Put differently, what differentiates a writing system from an ideography is the fact that writing systems encode a specific natural language that preexists in them – not the fact that it possesses general language-like features like recursion, syntax, and so on. Taking Harbour's argument to its extreme limit would lead us to a familiar position in the debates over ideography: Ideography cannot logically exist, because it is conceptually impossible to assign meanings to symbols without in some way using language (Boltz, Reference Boltz1993; du Ponceau Reference du Ponceau1838). If this position were true, there would be no such thing as an ideograph, and we would be completely unable to describe the difference between symbols like 2, 3, , and the written words “one,” “two,” “love” (Edgerton, Reference Edgerton1941 Footnote ²). Ideography is simply the fact that some symbols encode ideas directly, bypassing words. These symbols may still possess rich language-like properties.

Winters's commentary further stresses the centrality of language to all graphic codes, not simply writing. He makes many important points, all of which I endorse. Most important perhaps is his remark that self-sufficient specialized graphic codes, like musical or mathematical notations, seem to evolve much more readily in literate societies. This does not mean societies without writing do not produce rich and sophisticated graphic codes – they do – but it does mean these codes are unlikely to be self-sufficient (i.e., they should rely on oral glosses). I agree with Winters that this is a natural consequence of the standardization account, because writing, itself a product of successful coordination, in turn becomes a powerful coordination device.

Winters raises a question also broached by other commentaries (Adiego & Valério; Gainotti): To what extent can a graphic code ever be independent of spoken language? Adiego & Valério argue that the creation and first acquisition of codes (if not their use) must involve linguistic communication. Gainotti shows, on the basis of clinical evidence with aphasic patients, that learning to communicate ideographically is not a domain-general process, but rather relies on language-specific cognition. The difficulties that aphasics experience in learning Bliss or other pictorial codes do suggest this, because their language faculty is damaged but their capacities for perception or memorization are preserved. For Winters, creating and learning a graphic code from scratch without using language is possible in theory, but unlikely in practice. One intriguing argument that he gives is the apparent lack of codified permanent visual marks in nonhuman animals, in contrast to the importance of acoustic codes in several species of birds and primates. Here again, I am tempted to agree; bower-bird nest decorations are the closest thing to a counterexample that comes to my mind, but they do not have codified meanings in the way that, for instance, vervet monkey calls do.

R.4. For and against the learning account

The learning account is a way to solve the puzzle of ideography by showing that acquiring a generalist ideography raises serious cognitive difficulties, contrary to spoken language. Alvin Liberman's theory of writing is a good example of a learning account. Besides, the puzzle of ideography is a relatively neglected problem, and Liberman is one of the few scholars who explicitly articulated it and proposed a cogent solution. Being keen not to attack straw men, I concentrated my critique on his specific argument. No commentator explicitly attempted to defend Liberman's own views, but several commentaries proposed alternative ways of showing that generalist ideographies raise a learning problem (Arsiwalla; Harris, Perfetti, & Hirshorn [Harris et al.]). Other commentaries sought, on the contrary, to strengthen the case against the learning account (Howard; Nephew, Polcari, & Korkin [Nephew et al.]).

R.4.1. Alternatives to Liberman's learning account

Harris et al. object to my critique of the learning account, because it focuses on Liberman and his motor theory (which they quickly dismiss). In their view, I should have criticized the learning account in a much more general way, going beyond Liberman's specific theory to include other possible versions of the learning account. This discussion gives me an occasion to do exactly that.

Harris et al. present an intriguing argument to defend the general idea of a learning account. Dyslexic as well as congenitally deaf individuals cannot easily link letters to sounds, but this would not be an obstacle to literacy if they simply could treat writing like an ideography. In other words, if humans had no problem learning arbitrary pairings between signs and meanings, dyslexics and deaf people could simply treat written words as ideographs – that is, take them as pointing directly to ideas, bypassing language. This, in their view, does not happen. Only a tiny minority of profoundly deaf people learn to read in this way (and then again, not very successfully). As for dyslexia, attempts to cure it by training learners to process words as ideographic symbols aggravate the condition instead of improving it. Harris et al. take this as a strong indication that ideography raises an insurmountable learning problem.

The learning difficulties that Harris et al. highlight are real and significant. The fact that they are also found for the Chinese script is an important reason to resist nonglottographic accounts of Chinese characters. But do they show that ideographies in general are unlearnable? No. The fact that deaf people or dyslexics usually fail to recognize written words by their visual shape alone does not make this point. Writing it is not an ideography, so attempts to learn writing as if it were ideographic are bound to fail. The people Harris et al. refer to all have some mastery of the spoken language that writing encodes; they cannot simply ignore their linguistic knowledge – as Harris et al. acknowledge. Thus, I fail to see how Harris et al.'s argument could make a case against the learnability of ideographies. Writing is the very opposite of an ideography; dyslexics and deaf people struggle with writing precisely because writing encodes language.

A counterpoint to Harris et al.'s position is found in Sterelny's commentary. Sterelny considers that a glottographic view of writing cannot account for the success of profoundly deaf people in learning to read. Harris et al. blame the target article for taking deaf literacy too seriously; Sterelny criticizes it for failing to take deaf literacy seriously enough. I attempted to explain literacy in profoundly deaf people by noting that phonemes, syllables, or morphemes are not sounds but contrastive categories, making it theoretically possible to become literate in a language when one's main contact with that language is visual, not aural. Sterelny says this account is not “natural,” suggesting that he subscribes to the view that profoundly deaf people may learn a spoken language from print only, by mapping printed words directly onto sign language words (Hoffmeister & Caldwell-Harris, Reference Hoffmeister and Caldwell-Harris2014). This view can be disputed, however, because most deaf people who learn to read know the encoded spoken language from partial hearing, lip-reading, or signed versions of the spoken language. Whether this phonemical awareness acts as a help, a hindrance, or a mere byproduct, is not entirely clear, a point that my article should have paid more attention to. In any case, Harris et al.'s commentary makes evident the drastic limitations that profoundly deaf people face in learning to read (Hirshorn & Harris, Reference Hirshorn and Harris2022).

Arsiwalla's thoughtful version of the learning account puts forward two mechanisms that enhance the learnability of spoken language as opposed to visual ones. Arsiwalla's first argument is that ideographic messages are difficult to decompose into discrete chunks, making ideographic codes harder to memorize. To make this point, Arsiwalla relies on the assumption that ideographic codes lack compositionality – an assumption that I refute in section R.7.3. His second argument is based on the view that multimodal learning, combining visual and aural information, is more efficient compared to learning through one modality alone. It rests on a literature showing, in classroom contexts, that students are better at retaining things taught using both visual material and an oral gloss, compared to things taught with exclusively visual material (usually a diagram and a written gloss). I do not quite understand how these intriguing studies would make a case against the learnability of ideography. To make such a case, Arsiwalla would have needed to show that spoken and signed languages are learnt multimodally, and that domain-specific ideographies are not. The acquisition of codes generally relies on several modalities, and this includes ideographic codes like musical notations. Arsiwalla does not explain what would make speech acquisition special in this respect. More importantly, he does not mention sign languages, which are perfectly learnable for deaf people who have at best limited access to the aural modality.

Overall, the two commentaries written in defense of the learning account do not succeed in producing a cognitive mechanism that makes it difficult to learn ideographic codes, without raising the same problem for spoken languages. If one wants to claim that a learnability problem is what prevents us from acquiring a generalist ideography, one has to explain where exactly the problem resides. And the answer cannot be modality. The reason generalist ideographies do not take off is not because they are visual – sign language linguistics has taught us that. One could perhaps imagine other versions of the learning account that do not face this problem, and that would be an exciting research program; but the commentators have not done this yet.

R.4.2. Strengthening the case against learning account

Nephew et al. provide an intriguing argument against some versions of the learning account by highlighting human proficiency with face recognition. The fact that we routinely process and correctly recognize thousands of faces shows how far human visual memory can go. Although this argument would not affect Liberman's version of the learning account (because in Liberman's view we learn codes through a motor route not a visual one), it would clearly be a problem for anyone who claims that ideographies are unlearnable because of visual memory constraints (as Harris et al. seem to do). I take Nephew et al.'s point, with a handful of qualifications. The typical inventory of faces a person can identify ranges in the thousands – 5,000 on average and up to 10,000 for “super-recognizers” (Jenkins, Dowsett, & Burton, Reference Jenkins, Dowsett and Burton2018). This is as high as the size of the most complex-documented graphic codes.Footnote ³ It is not, however, extremely high if compared to the vocabulary size of English (to take a well-studied example), which was estimated at 42,000 lemmas on average, with other estimates ranging from 10,000 to hundreds of thousands (Brysbaert, Stevens, Mandera, & Keuleers, Reference Brysbaert, Stevens, Mandera and Keuleers2016). Another important caveat, pointed out by Nephew et al. themselves, is the domain-specific nature of face recognition, which is cognitively distinct from neighboring skills such as voice recognition (Young, Frühholz, & Schweinberger, Reference Young, Frühholz and Schweinberger2020). With these limitations in mind, I agree with Nephew et al.: Our face recognition abilities suggest that visual memory per se does not stand in the way of ideography.

Another commentary that redoubles my critique of the learning account is Howard's. He concentrates his commentary on Bliss symbols that, he claims, are easily taught and learnt. He blames me for not citing the abundant literature that (he thinks) makes this point. I disagree. The authors of these studies did not actually try to teach Bliss symbolics to their participants. Their goal was different: To compare Bliss with other ideographic systems like Carrier symbols, or pictographic systems like Rebus, Picsyms, or the Picture Communication System, based on transparent pictures. To perform these comparisons, they familiarized their participants with a small number of symbols, in the short term.Footnote ⁴ Only one of the studies cited comes close to a genuine attempt at teaching Bliss: Funnell and Allport (Reference Funnell and Allport1989). Finding the results disappointing, they gave up on using Bliss as an alternative communication tool with their two aphasic patients. Gainotti points to several other studies that reached the same conclusion. There is, thus, no clear-cut evidence that Bliss symbolics is readily learnable. By itself, this lack of evidence proves little – only that seriously trying to teach an ideographic language to normal adults or children (let alone patients) would be a costly and complex enterprise that researchers are yet to tackle.

R.5. Debating and clarifying the standardization account

Several commentaries contained insightful arguments against my preferred solution to the puzzle of ideography – the standardization account. The standardization account claims that graphic symbols are difficult to standardize because they lack the properties that allow users of spoken or gestured signs to align on shared meanings and repair misunderstandings in the course of conversation. If used synchronously, graphic symbols do not facilitate repair, alignment, or rephrasings, because they are too effortful to produce and too cumbersome (they do not fade rapidly), compared to words or gestures. If used asynchronously, graphic symbols cannot be interpreted in the light of a rich common ground, and messages cannot be repaired easily. Because of this, getting users of a graphic code to align on the exact same code is a challenge that can only be overcome for small numbers of symbol–meaning pairings, characteristic of specialized codes.

The general view that face-to-face communication offers unique opportunities for repair and alignment (Clark, Reference Clark1996; Enfield, Reference Enfield2017) was not challenged by the commentaries. Moldoveanu found an excellent way to phrase this point when he wrote that spoken or signed language rely much more on “channel encoding,” which consists in adding enough redundancy to a signal to offset noise. Repetition, repair, but also pointing and other means of obtaining alignment between interlocutors are all forms of channel coding in his view. Graphic codes, on the contrary, cannot count so much on channel coding so they must rely on precise source coding, that is to say, unambiguous mappings between symbols and their meanings.

The kind of standardization that spoken or signed conversations allow is a fine-grained and decentralized alignment, the outcome of a process that many users of the code contribute to. Getting a committee to agree on a published dictionary does not achieve standardization in this sense. Standardization, in other words, is a social fact: As Riggsby aptly notes, this means standardization is both a matter of degree and a matter of scale. A code may be highly standardized for some people, not for others; may be loose overall but tight in some places; different standards often compete. A code is standardized for a community of users, at a given point in time. This point is the source of a misunderstanding about Bliss symbols, between myself and Howard. There is, Howard remarks, an official standard for Bliss, accompanied by textbooks, dictionaries, and so on. The problem is that this standard exists on paper; few people master it well enough to communicate fluently with others. The fact that the use of Bliss is restricted to a clinical context, to help people with severe language impairments makes it difficult to give a fair estimate of the system's potential for future success: As we saw, the literature cited by Howard does not prove much one way or the other.

Berio, Can, Helming, Palazzolo, & Moore (Berio et al.) object to the standardization account, arguing that graphic codes could simply be learned and used in face-to-face interactions, where repair, alignment, and common ground are readily available. But a graphic code that would be used exclusively in face-to-face settings would lose what makes it useful compared to other codes: Its capacity to support asynchronous communication. A graphic code that is only ever used face-to-face would have no advantage compared to spoken or signed language, with the added costs of consisting of cumbersome signs, effortful to produce. The only feature that gives graphic signals a clear comparative advantage is their permanence; that feature is lost when graphic symbols are only used for face-to-face communication.

Riggsby provides an intriguing rejoinder to the standardization account when he notes how poorly standardized writing systems could be, in their early stages. In his view, this need not be an obstacle to using a code, as long as the code is tight enough for a particular group of users, in the same way that dialect continua enable communication inside overlapping pockets of linguistic unity, even though linguistic standardization is fairly low on the whole. Riggsby's argument is persuasive and backed by compelling examples. It points out a lacuna in my target article: Standardization is described as a population-level challenge, yet the mechanisms that produce it occur at the level of the dyad (alignment, repair, etc.). More work is clearly needed to bridge these two scales.

R.6. Can we solve the puzzle of ideography with production costs alone?

Three commentaries (Adiego & Valério; Berio et al.; Tylén & Wisher) suggest a solution to the puzzle of ideography that seems much simpler than the one I advocate. All commentaries start from the fact that graphic messages are costly to produce, compared to speech and gesture. The costs of producing graphic messages, they argue, are what really stand in the way of a generalist ideography. If true, this cost-based account could replace my standardization account. The standardization account acknowledges the production costs of graphic message, but does not see these costs as sufficient, in themselves, to explain why generalist ideographies did not arise, while specialist ideographies did.

Berio et al. offer the most challenging version of this argument. They argue that the format of graphic messages requires them to match the complexity of their content. When that content reaches a certain level of complexity, the message's inscribers must rely on ever more intricate drawing skills, until the message simply becomes too unwieldy to produce or process. The only way to escape this trade-off is for messages to be ambiguous, which means that they will require a verbal gloss for their meaning to be fully communicated. Thus, the inherent complexity and cost of graphic messages suffice to explain the puzzle of ideography. This argument can be decomposed into three premises and a conclusion; I agree with the premises but reject the conclusion.

Premise 1: Graphic messages are costly to produce (compared to spoken ones). Other commentaries insist on this point too. Adiego & Valério's commentary underlines the importance of production costs in blocking the evolution of ideography. They highlight, in particular, the limitation raised by the fact that most graphic symbols require some external support to inscribe them on,Footnote ⁵ and a tool to inscribe them with. Wisher & Tylén agree. I agree too: Graphic messages are hard to produce. In the target article, cheap production is one important reason why spoken or signed languages are easy to standardize, while graphic codes are not. That, however, is only one reason among several others; it does not, by itself, solve the puzzle, as I'll explain below.

Premise 2: Informative graphic messages tend to be complex, and thus costly. The expressive power of graphic codes is limited, Berio et al. argue, by the degree of complexity that graphic symbols can achieve. Wisher & Tylén make a similar point: Efficient graphic communication, in their view, requires material resources that, prior to the industrial revolution, were either highly expensive (like tapestries), or just nonexistent (like cartoon motion pictures). I agree that there is a correlation between the complexity (and consequent cost) of graphic messages, and the amount of information they can encode. For instance, in writing systems, characters encoding high-level linguistic units (think Chinese characters as opposed to Latin alphabet letters) are more graphically complex (Chang, Chen, & Perfetti, Reference Chang, Chen and Perfetti2018; Miton & Morin, Reference Miton and Morin2021). On a more fine-grained level, frequent letters carry less information than rare ones (by definition, because a rare letter is more unexpected than a frequent one). Our work (Koshevoy, Miton, & Morin, Reference Koshevoy, Miton and Morin2023) shows that frequent letters are graphically simpler than infrequent ones; considering a diverse sample of 27 scripts, we found that this relation obtained inside each one of them. In short, I emphatically agree with premise 2, but I would add two caveats. First, the correlation between informativeness and cost/complexity is real and robust, but not necessarily strong. Frequent letters can be complex; and it is entirely possible to encode a great deal of information using just a few symbols (consider “E = MC²”). Second, and most importantly, the mechanism that Berio et al. highlight also applies to spoken language. There is evidence that the complexity of utterances is related to the amount of information they contain. At the level of words, the length of words reflects their conceptual complexity (Lewis & Frank, Reference Lewis and Frank2016). Words that are long or phonotactically complex are infrequent and less likely to be ambiguous (Piantadosi, Tily, & Gibson, Reference Piantadosi, Tily and Gibson2012). At the level of utterances, it seems fairly straightforward that, ceteris paribus, encoding a lot of information is easier to do with a ten-words English sentence compared to a three-words one. Thus, the constraint that Berio et al. highlight is very general; it applies far beyond graphic codes; and it need not be very strong.

Premise 3: There is a trade-off between code and context. Graphic codes allow us to encode some information graphically but, as with any other mode of communication, not everything can be encoded (Winters, Kirby, & Smith, Reference Winters, Kirby and Smith2018; Winters & Morin, Reference Winters and Morin2019). What the code leaves out must be inferred pragmatically, or supplied using another code – in the case of graphic messages, an oral gloss. Whenever we use a code, we face a trade-off between making our messages too ambiguous – running the risk of being misunderstood – or overly explicit – increasing complexity, burdening ourselves and our audience with unwieldy messages. On this point Berio et al. fully agree with the target article.

Conclusion: Self-sufficient ideographies do not evolve because informative graphic messages are too costly to produce. This is where me and Berio et al. disagree. Production costs play a role in my account of the puzzle of ideography – but they do not explain everything. Costs play a role in the nonevolution of generalist ideographies, because they stand in the way of the conversational interactions that allow communication to self-standardize in spoken or signed languages. But costs do not explain the puzzle of ideography on their own. Costs, after all, can be paid. They had to be, or visual arts would not exist, and neither would specialized graphic codes. The chief benefit of using graphic signals, in spite of their cost, is durability; and durability is worth paying for. Many societies incurred huge costs in maintaining the skills and materials required to produce enduring messages. Writing is indeed cumbersome, costly, intricate – but it is worth the trouble. The reason why elaborate visual arts or specialized notations evolve, but generalist graphic codes do not, is not costs alone. It is because communication with visual art or specialized codes does not require much standardization. Production costs matter, but they matter only insofar as they prevent the kind of quick and easy exchanges required for alignment and repair, two key ingredients of standardization. (This point is well captured by Sterelny's commentary.)

Thus, I share many assumptions with Berio et al. and Adiego & Valério's commentaries, but I do not think that the costs of producing graphic symbols solve the puzzle of ideography on their own. Wisher & Tylén make a slightly different point. Their commentary only considers graphic messages that are fully iconic: Pictures that resemble their referents and need no graphic code to convey their content. They argue that the technical means needed for telling complex messages in an exclusively iconic fashion only became available quite recently – and I agree. This, however, does not solve the puzzle of ideography, which is a puzzle about the origins of graphic codes. As I argued above (sect. R.2), graphic codes are usually not highly iconic, and thanks to this, they can be much simpler than their iconic equivalents – and thus, less cumbersome and costly.

R.7. The future of ideography

The one section in the target article that elicited the most commentaries is the one that speculates about the possible evolution of a generalist ideography, made possible by the resources of the digital age (sect. 6.4). A consensus emerges that graphic symbols are indeed becoming more informative and better standardized online. Critical commentaries say I underestimate this trend, or predict it for the wrong reason; but no one appears to disagree on its possibility.

R.7.1. Standardization of communication in the digital world

The target article argued that codes based on cheap, fast, and transient signals are easier to standardize, because of two independent mechanisms. The first is cheap and fast production leading to repair and alignment: Cheap and fast signals can be modified or repaired multiple times, allowing interlocutors to converge on shared meanings. The second is face-to-face communication with transient signals. Transient signals constrain interlocutors to face-to-face interactions, where the advantages of common ground are maximized. The target article argued that digital communication shared some (not all) of the characteristics of the cheap, fast, and transient signals of spoken or signed languages. This should ease the standardization problem somewhat. Most of the commentaries that broach this issue agree with the target article on these points: Digital communication is closer to spoken or signed communication than to legacy graphic communication, and this should ease the standardization problem to a degree (Clark; Gandolfi & Pickering; Veit & Browning).

The target article, however, was unclear on the exact nature of the mechanism that could further standardization in digital communication. I mentioned two mechanisms: Cheap and fast repair and alignment and face-to-face communication with transient signals. The two are independent; they can be dissociated, and in the case of digital communication they clearly are. The commentaries by Feldman and Gandolfi & Pickering allow me to clarify my views on this. What, in my view, makes digital communication different is cheap and fast production leading to repair and alignment. It is not face-to-face communication with transient signals. Gandolfi & Pickering's commentary make this point much better than I did. Digital communication can be synchronic, and symbols can be exchanged at a fast rate, close to the pace of spoken conversation, because they require little effort to produce. Thanks to this, ideographic symbols can play the same role as the signals used for repair and feedback in face-to-face conversation.

R.7.2. How standardized are the meanings of emojis?

Commentators (Clark; Feldman; Gandolfi & Pickering; Veit & Browning) all appear to agree that digital communication is a favorable environment for solving the standardization problem. The question then becomes: Has this standardization happened yet, and what kind of symbols would it apply to? The target article took a cautious stance on the matter – too cautious for several commentators. I argued that emojis, gifs, and other digital pictographs may acquire increasingly precise and standardized meanings; but this stage may not have been reached yet, possibly because digital communication is not yet fast enough compared to the pace of speech, or because we are only seeing the beginning of its evolution. Applying this view to emojis, I made two points. First, emojis are not yet standardized enough to function as a self-sufficient ideography; second, some emojis may reach a sufficient level of standardization in the future, starting with emojis that encode paraverbal cues.

Regarding the first point, two commentaries (Feldman; Veit & Browning) argue that I underestimate the level of standardization that emojis have already achieved (a point also suggested by Gandolfi & Pickering's commentary). Veit & Browning cite the successful standardization of available emojis across platforms in support of their view; I would reply that the standardization of emoji keyboards should not be confounded with the degree to which their meanings are standardized between users. Feldman criticizes my focus on face-expression emojis, which she sees as much less standardized than other emojis. I chose to focus on face-expression emojis because, as Feldman acknowledges, they are by far the most commonly used emojis (see, e.g., Daniel, Reference Daniel2021). Their poor standardization tells us something important about emoji users' capacities to converge on a shared meaning when that meaning is not immediately given through iconicity. In the studies cited by Feldman (Barach, Feldman, & Sheridan, Reference Barach, Feldman and Sheridan2021; Częstochowska et al., Reference Częstochowska, Gligorić, Peyrard, Mentha, Bień, Grütter and West2022), the nonface emojis that are highly standardized stand for objects, activities, food, in a straightforwardly iconic fashion, so that associating with “beer” or with “key” requires little in the way of a shared code. There are, of course, nonface emojis that have acquired a standardized meaning, often quite remote from their figurative referent: Feldman mentions the or emojis. But how standardized are these figurative meanings? Quantitative data would be needed to estimate this – an intriguing topic for future studies.

Gandolfi & Pickering broadly agree that emojis are poorly standardized, but they make an exception for the ones used in backchannel interactions such as repair or other kinds of feedback (e.g., , ), especially in SMS interactions, which allow for quick exchanges. The point is well taken. I see their commentary as a more specific and more accurate version of the point I attempted to make in section 6.4 in the target article – namely, that digital technologies should foster the standardization of emojis if (1) they are used as paraverbal cues and (2) the interactions are rapid and synchronous, leaving a lot of room for repair and alignment. Gandolfi & Pickering's views also provide an interesting contrast with Feldman, who explicitly ignores single emojis that stand on their own, and focuses her commentary on emojis as embedded in sentences or messages. Feldman makes a point that the target article acknowledges, and which does not contradict my theory: Emojis do not require much standardization as long as they are surrounded by linguistic information that helps readers narrow down an emoji's meaning. Gandolfi & Pickering make the same point: Emojis are often used as complements to writing rather than replacements for it, playing the same part that paraverbal cues (cospeech gestures, nods, and facial expressions) play for speech; this limits their potential to evolve into a complete ideography.

In summary, there is no unanimity on what level of standardization emojis may have reached, but whatever we take it to be, it is not yet sufficient to allow them to function as a self-sufficient ideography. Still, the evolution of emojis highlights the new opportunities for standardization that digital communication opens up.

R.7.3. What would a generalist ideography look like?

If the target article is right and ideographies are both learnable, and capable of becoming standardized in the future, thanks to digital communication, what would such a future ideography look like? The commentaries lay down prerequisites and highlight important challenges that this ideography would need to face. Historically, inventors of ideography (like inventors of universal languages in general) tend to fall into one of two camps. There are those who think their system should improve upon natural language, by being closer to the true structure of concepts. Frege's ideography is a successful example of ideography in this sense (even though it is highly specialized) (Frege, Reference Frege1883). Then there are those who think they would be lucky enough if they could produce a tool for communication that simply works. Cheng and Moldoveanu keep the first tradition alive: A good ideography should be a language of ideas that carves concepts at their joints; it should avoid ambiguity; it should express only intrinsic properties of things, not contingent ones. Most other commentaries have more modest ambitions: When they consider the prospects for a functioning graphic code, they simply ask whether it will be good enough for communication. Mazur & Plontke go further, denouncing the quest for a language of ideas devoid of ambiguity as a positivistic myth. Chrisomalis agrees: It is doubtful whether ideography in that sense was ever possible or even needed. These points are well taken.Footnote ⁶ Another thing the target article did not consider was the use of ideography as a universal language.Footnote ⁷ As Adiego & Valério rightly note, we cannot expect a generalist ideography to escape the laws of language evolution: Languages change, diverge, fragment, to form mutually incomprehensible dialects. I wrote that a generalist ideography would cut across language barriers, allowing its users to communicate even when they have no spoken language in common; but I never said that it could stand as a universal language or, as Charles Bliss put it, “overcome Babel.”

Even with these relatively modest ambitions, a generalist ideography would still have many challenges to overcome. Compositionality and the capacity to express abstract ideas are two things that natural languages excel in, but graphic codes may struggle with – according to the commentaries. Cheng sees abstract and intangible concepts as the greatest hurdle, whereas Chrisomalis points at the difficulty of using graphic codes componentially – fusing several symbols to communicate one idea. In line with this, Overmann sees the capacity to combine and recombine elements (be they phonetic units or morphemes) as something that spoken or written langue are uniquely good at – a point echoed by Cohn & Schilperoord and Arsiwalla. The specialization hypothesis implies that ideographies can and do express abstract concepts as well as spoken languages do, but only for a restricted range of meanings. It makes the same prediction about combinatoriality: Ideographies are capable of it, as long as the number of basic symbols to be recombined is not too high and their meanings are narrow enough. Both predictions, I think, are confirmed by existing ideographies. Abstraction is no issue for formal logic or mathematical notations, whereas combinatoriality at multiple levels is an obvious feature of many graphic codes, from musical notations to cattle branding (Youngblood, Miton, & Morin, Reference Youngblood, Miton and Morin2023). There are rules for the combination of meaningless elements (duality of patterning) in heraldry (Morin & Miton, Reference Morin and Miton2018), and rules to combine meaningful elements into meaningful compounds in musical or mathematical notations, road signs, and so on. The degree of compositionality at play in many graphic codes is arguably equal or superior to that of natural language: The meaning of compound expressions depends unambiguously on the meaning of the parts and can be derived from systematic rules. This is always true, by construction, for logical notations. But we may well ask, following Chrisomalis, whether this property can scale up for a generalist ideography. If the standardization hypothesis is on the right track, there should be no special difficulty in designing a generalist graphic code capable of high levels of abstraction and obeying compositional combination rules. The problem would lie in getting most users to align on a shared standard, one that cannot be artificially decreed but has to emerge from the back-and-forth of communicative interactions.

Footnotes

1. Here I overlook Sterelny's second objection, concerning deaf people learning a language through print – it is tackled in section R.4.1.

2. I thank Chrisomalis for pointing me to this paper through his comment.

3. Some dictionaries of Chinese contain tens of thousands of characters, but most of them are archaic or rare forms, and it is highly unlikely that a single person has memorized such a dictionary in its entirety.

4. Bliss symbolics consists of around 100 basic signs, which can be combined to form a number of compound signs which varies between 886 TBC (the Unicode Corporation's conservative estimate) and 2,384 (ISO-IR norm). With one exception, the studies that Howard cites only taught between 11 (Poupart, Trudeau, & Sutton, Reference Poupart, Trudeau and Sutton2013) and 45 (Mizuko, Reference Mizuko1987) compound symbols, most studies teaching 15 compound symbols (Burroughs, Albritton, Eaton, & Montague, Reference Burroughs, Albritton, Eaton and Montague1990; Clark, Reference Clark1981; Ecklund & Reichle, Reference Ecklund and Reichle1987; Hurlbut, Iwata, & Green, Reference Hurlbut, Iwata and Green1982). The maximum number of symbols that were actually retained does not reach above 20 (in Mizuko, Reference Mizuko1987). Three studies found that these iconic symbols were easier to learn compared to Bliss symbols (Ecklund & Reichle, Reference Ecklund and Reichle1987; Mizuko, Reference Mizuko1987, both of them working with preschoolers, and Hurlbut et al., Reference Hurlbut, Iwata and Green1982, working with severely handicapped adolescents). This pattern is consistent with other studies (e.g., Alant, Life, & Harty, Reference Alant, Life and Harty2005; Kozleski, Reference Kozleski1991; Mizuko & Reichle, Reference Mizuko and Reichle1989; Sevcik, Barton-Hulsey, Romski, & Hyatt Fonseca, Reference Sevcik, Barton-Hulsey, Romski and Hyatt Fonseca2018). Lastly, two studies cited by Howard did not find a clear advantage for either type of system (Burroughs et al., Reference Burroughs, Albritton, Eaton and Montague1990; Poupart et al., Reference Poupart, Trudeau and Sutton2013, both working with preschoolers). Interestingly, Clark (Reference Clark1981), working with preschoolers, found that all the ideographic or pictographic symbols that they use (Bliss, Carrier, and Rebus) were easier to learn compared to written English.

5. I agree, but exceptions should be made for tattoos, scarifications, body paintings, etc.

6. Chrisomalis is right to argue that the word “semasiography” (also used by Overmann) would have been a more fitting label for what I call “ideography”: It would be in line with the literature (e.g., Gelb, Reference Gelb1963), and it would put the focus where it ought to be – on the signs and their meanings, rather than the ideas they designate. My only excuse (if it is one) is the need to catch the eyes of a broad interdisciplinary audience to whom the word “semasiography” means little.

7. Adiego & Valério remark that were a generalist, self-sufficient ideography to be found, it would then fulfill all the conditions I set to be called a language. I agree with this, but the point is rather vacuous if, as the target article claims, generalist ideographies are rare or nonexistent. Adiego & Valério also note my tendency to use the phrase “visual language” rather too loosely and inconsistently in some places; they are right about this. I invite readers to mentally correct the target article, replacing “visual language” with “visual code.”

References

Alant, E., Life, H., & Harty, M. (2005). Comparison of the learnability and retention between Blissymbols and CyberGlyphs. International Journal of Language & Communication Disorders, 40(2), 151–169. https://doi.org/10.1080/13682820400009980 CrossRef Google Scholar PubMed

Barach, E., Feldman, L. B., & Sheridan, H. (2021). Are emojis processed like words?: Eye movements reveal the time course of semantic processing for emojified text. Psychonomic Bulletin & Review, 28(3), 978–991. https://doi.org/10.3758/s13423-020-01864-y CrossRef Google Scholar PubMed

Boltz, W. G. (1993). The origin and early development of the Chinese writing system. American Oriental Society.Google Scholar

Brysbaert, M., Stevens, M., Mandera, P., & Keuleers, E. (2016). How many words do we know? Practical estimates of vocabulary size dependent on word definition, the degree of language input and the participant's age. Frontiers in Psychology, 7, 1116. https://doi.org/10.3389/fpsyg.2016.01116 CrossRef Google Scholar PubMed

Burroughs, J. A., Albritton, E., Eaton, B., & Montague, J. (1990). A comparative study of language delayed preschool children's ability to recall symbols from two symbol systems. Augmentative and Alternative Communication, 6(3), 202–206. https://doi.org/10.1080/07434619012331275464 CrossRef Google Scholar

Chang, L.-Y., Chen, Y.-C., & Perfetti, C. A. (2018). GraphCom: A multidimensional measure of graphic complexity applied to 131 written languages. Behavior Research Methods, 50(1), 427–449. https://doi.org/10.3758/s13428-017-0881-y CrossRef Google Scholar PubMed

Clark, C. R. (1981). Learning words using traditional orthography and the symbols of Rebus, Bliss, and Carrier. Journal of Speech & Hearing Disorders, 46, 191–196. https://doi.org/10.1044/jshd.4602.191 CrossRef Google Scholar PubMed

Clark, H. H. (1996). Using language. Cambridge University Press.CrossRef Google Scholar

Cohn, N. (2013). The visual language of comics: Introduction to the structure and cognition of sequential images. A&C Black.Google Scholar

Crema, E. R., Kandler, A., & Shennan, S. (2016). Revealing patterns of cultural transmission from frequency data: Equilibrium and non-equilibrium assumptions. Scientific Reports, 6, 39122. https://doi.org/10.1038/srep39122 CrossRef Google Scholar PubMed

Częstochowska, J., Gligorić, K., Peyrard, M., Mentha, Y., Bień, M., Grütter, A., … West, R. (2022). On the context-free ambiguity of emoji. Proceedings of the International AAAI Conference on Web and Social Media, 16, 1388–1392.CrossRef Google Scholar

Daniel, J. (2021). Emoji frequency. Unicode. https://home.unicode.org/emoji/emoji-frequency/Google Scholar

Demattè, P. (2022). The origins of Chinese writing. Oxford University Press.CrossRef Google Scholar

de Saussure, F. (2011). Course in general linguistics (P. Meisel, Ed.; W. B. E. by P. Meisel & H. Saussy, Trans.). Columbia University Press.Google Scholar

Dingemanse, M., Blasi, D. E., Lupyan, G., Christiansen, M. H., & Monaghan, P. (2015). Arbitrariness, iconicity, and systematicity in language. Trends in Cognitive Sciences, 19(10), 603–615. https://doi.org/10.1016/j.tics.2015.07.013 CrossRef Google Scholar PubMed

du Ponceau, P. S. (1838). A dissertation on the nature and character of the Chinese system of writing. To which are subjoined a vocabulary of the Cochin Chinese language by J. Morrone [&c.]. Published for the American Philosophical Society.Google Scholar

Ecklund, S., & Reichle, J. (1987). A comparison of normal children's ability to recall symbols from two logographic systems. Language, Speech, and Hearing Services in Schools, 18, 34–40. https://doi.org/10.1044/0161-1461.1801.34 CrossRef Google Scholar

Edgerton, W. F. (1941). Ideograms in English writing. Language, 17(2), 148–150. https://doi.org/10.2307/409622 CrossRef Google Scholar

Enfield, N. J. (2017). How we talk: The inner workings of conversation. Basic Books.Google Scholar

Frege, G. (1883). On the scientific justification of a concept-script (J. M. Bartlett, Trans.). Mind; A Quarterly Review of Psychology and Philosophy, 73(290), 155–160.Google Scholar

Frege, G. (1920/1981). Posthumous writing (Illustrated ed.). John Wiley.Google Scholar

Funnell, E., & Allport, A. (1989). Symbolically speaking: Communicating with Blissymbols in aphasia. Aphasiology, 3(3), 279–300. https://doi.org/10.1080/02687038908248995 CrossRef Google Scholar

Garrod, S., Fay, N., Lee, J., Oberlander, J., & MacLeod, T. (2007). Foundations of representation: Where might graphical symbol systems come from? Cognitive Science, 31(6), 961–987. https://doi.org/10.1080/03640210701703659 CrossRef Google Scholar PubMed

Gelb, I. J. (1963). A study of writing. University of Chicago Press.Google Scholar

Guthrie, R. D. (2006). The nature of Paleolithic art. University of Chicago Press.Google Scholar

Helmke, C., & Nielsen, J. (2021). Teotihuacan writing: Where are we now? Visible Language, 55(2), Article 2. https://doi.org/10.34314/vl.v55i2.4607 CrossRef Google Scholar

Hermalin, N., & Regier, T. (2019). Efficient use of ambiguity in an early writing system: Evidence from Sumerian cuneiform. Annual Meeting of the Cognitive Science Society. https://www.semanticscholar.org/paper/Efficient-use-of-ambiguity-in-an-early-writing-from-Hermalin-Regier/9a6898192ea9a9ad8f9d67fe95521a8532fdd3be Google Scholar

Hirshorn, E. A., & Harris, L. N. (2022). Culture is not destiny, for reading: Highlighting variable routes to literacy within writing systems. Annals of the New York Academy of Sciences, 1513(1), 31–47. https://doi.org/10.1111/nyas.14768 CrossRef Google Scholar

Hoffmeister, R. J., & Caldwell-Harris, C. L. (2014). Acquiring English as a second language via print: The task for deaf children. Cognition, 132(2), 229–242. https://doi.org/10.1016/j.cognition.2014.03.014 CrossRef Google Scholar

Hurlbut, B. I., Iwata, B. A., & Green, J. D. (1982). Nonvocal language acquisition in adolescents with severe physical disabilities: Bliss symbol versus iconic stimulus formats. Journal of Applied Behavior Analysis, 15(2), 241–258. https://doi.org/10.1901/jaba.1982.15-241 CrossRef Google Scholar PubMed

Jenkins, R., Dowsett, A. J., & Burton, A. M. (2018). How many faces do people know? Proceedings of the Royal Society B: Biological Sciences, 285(1888), 20181319. https://doi.org/10.1098/rspb.2018.1319 CrossRef Google Scholar PubMed

Koriat, A., & Levy, I. (1979). Figural symbolism in Chinese ideographs. Journal of Psycholinguistic Research, 8(4), 353–365. https://doi.org/10.1007/BF01067139 CrossRef Google Scholar PubMed

Koshevoy, A., Miton, H., & Morin, O. (2023). Zipf's law of abbreviation holds for individual characters across a broad range of writing systems. Cognition, accepted.CrossRef Google Scholar

Kozleski, E. B. (1991). Visual symbol acquisition by students with autism. Exceptionality, 2(4), 173–194. https://doi.org/10.1080/09362839109524782 CrossRef Google Scholar

Lewis, D. (1969). Convention: A philosophical study. Wiley Blackwell.Google Scholar

Lewis, M. L., & Frank, M. C. (2016). The length of words reflects their conceptual complexity. Cognition, 153, 182–195. https://doi.org/10.1016/j.cognition.2016.04.003 CrossRef Google Scholar PubMed

Luk, G., & Bialystok, E. (2005). How iconic are Chinese characters? Bilingualism: Language and Cognition, 8(1), 79–83. https://doi.org/10.1017/S1366728904002081 CrossRef Google Scholar

Madden, J. R. (2008). Do bowerbirds exhibit cultures? Animal Cognition, 11(1), 1–12. https://doi.org/10.1007/s10071-007-0092-5 CrossRef Google Scholar PubMed

Millikan, R. G. (1998). Language conventions made simple. Journal of Philosophy, 185(4), 161–180.CrossRef Google Scholar

Miton, H., & Morin, O. (2021). Graphic complexity in writing systems. Cognition, 214, 104771. https://doi.org/10.1016/j.cognition.2021.104771 CrossRef Google Scholar PubMed

Mizuko, M. (1987). Transparency and ease of learning of symbols represented by Blissymbols, PCS, and Picsyms. AAC: Augmentative and Alternative Communication, 3, 129–136. https://doi.org/10.1080/07434618712331274409 Google Scholar

Mizuko, M., & Reichle, J. (1989). Transparency and recall of symbols among intellectually handicapped adults. Journal of Speech and Hearing Disorders, 54(4), 627–633. https://doi.org/10.1044/jshd.5404.627 CrossRef Google Scholar PubMed

Monaghan, P., Shillcock, R. C., Christiansen, M. H., & Kirby, S. (2014). How arbitrary is language? Philosophical Transactions of the Royal Society B: Biological Sciences, 369(1651), 20130299. https://doi.org/10.1098/rstb.2013.0299 CrossRef Google Scholar PubMed

Morin, O., & Miton, H. (2018). Detecting wholesale copying in cultural evolution. Evolution & Human Behavior, 39(4), 392–401. https://doi.org/10.1016/j.evolhumbehav.2018.03.004 CrossRef Google Scholar

Piantadosi, S. T., Tily, H., & Gibson, E. (2012). The communicative function of ambiguity in language. Cognition, 122(3), 280–291. https://doi.org/10.1016/j.cognition.2011.10.004 CrossRef Google Scholar PubMed

Poupart, A., Trudeau, N., & Sutton, A. (2013). Construction of graphic symbol sequences by preschool-aged children: Learning, training, and maintenance. Applied Psycholinguistics, 34(1), 91–109. https://doi.org/10.1017/S0142716411000622 CrossRef Google Scholar

Scott-Phillips, T. (2014). Speaking our minds: Why human communication is different, and how language evolved to make it special. Palgrave Macmillan.Google Scholar

Sevcik, R. A., Barton-Hulsey, A., Romski, M., & Hyatt Fonseca, A. (2018). Visual-graphic symbol acquisition in school age children with developmental and language delays. Augmentative and Alternative Communication, 34(4), 265–275. https://doi.org/10.1080/07434618.2018.1522547 CrossRef Google Scholar PubMed

Severi, C. (2019). Their way of memorizing: Mesoamerican writings and native American picture-writings. Res: Anthropology and Aesthetics, 71–72, 312–324. https://doi.org/10.1086/706117 Google Scholar

Skyrms, B. (2010). Signals: Evolution, learning, and information. Oxford University Press.CrossRef Google Scholar

Tamariz, M., & Kirby, S. (2015). Culture: Copying, compression, and conventionality. Cognitive Science, 39(1), 171–183. https://doi.org/10.1111/cogs.12144 CrossRef Google Scholar PubMed

Urton, G. (2017). Inka history in knots: Reading khipus as primary sources. University of Texas Press.Google Scholar

Winters, J., Kirby, S., & Smith, K. (2018). Contextual predictability shapes signal autonomy. Cognition, 176, 15–30. https://doi.org/10.1016/j.cognition.2018.03.002 CrossRef Google Scholar PubMed

Winters, J., & Morin, O. (2019). From context to code: Information transfer constrains the emergence of graphic codes. Cognitive Science, 43(3), e12722. https://doi.org/10.1111/cogs.12722 CrossRef Google Scholar PubMed

Xiao, W., & Treiman, R. (2012). Iconicity of simple Chinese characters. Behavior Research Methods, 44(4), 954–960. https://doi.org/10.3758/s13428-012-0191-3 CrossRef Google Scholar PubMed

Young, A. W., Frühholz, S., & Schweinberger, S. R. (2020). Face and voice perception: Understanding commonalities and differences. Trends in Cognitive Sciences, 24, 398–410. https://doi.org/10.1016/j.tics.2020.02.001 CrossRef Google Scholar PubMed

Youngblood, M., Miton, H., & Morin, O. (2023). Signals of random copying are robust to time- and space-averaging. Evolutionary Human Sciences, 5, E10.CrossRef Google Scholar