Morin's Puzzle of ideography advances a persuasive argument for why general-purpose, entrenched ideographies are not widespread, in spite of the advantage that they can be deployed in communication among speakers competent in different languages. But it does not allow us to predict whether or not a new ideography is more or less likely to fail than were its predecessors. Can we conceptualize the difference between ideographies and natural languages that allows us to do that, and perhaps to design an ideography that is both generalized and entrenched? I outline such an approach, adapting Shannon's (Reference Shannon1948; Cover and Thomas, Reference Cover and Thomas2006) model of a communication system as the concatenation of a source and channel encoder/decoder (Fig. 1) to represent the evolution and deployment of both a natural language and an ideography:
(1) The source encoder maps a plaintext message/picture into a set of codewords designed to allow the decoder to reconstruct the message from the code. The source decoder inverts the encoder's operation to recover the original message from the output of the channel decoder.
(2) The channel encoder maps the output of the source encoder onto a set of codewords that protect against distortions introduced by the channel by intelligently reintroducing redundancies in the signal. The channel decoder inverts the operations in ways informed by insight into the channel conditions – to recover the input to the source decoder.
Figure 1. Source- and channel-coding decomposition of the functions of ideographies and natural languages.
The separation of source- and channel-coding functions arises from their different purposes:
(1) the source encoder squeezes redundancy from the original message by exploiting hidden regularities to produce the shortest code from which the message can be reconstructed by the decoder.
(2) The channel encoder adds redundancy to the output of the source encoder to generate a transmitted signal maximally intelligible to a receiver over a channel.
For a natural language: Meaning-bearers (written words, propositions) source encode underlying messages (referents: “states of affairs,” “events,” “objects,” “relationships”). Redundancy – repetition, rephrasing, explication – and interpersonal feedback signals like queries and gestures are added to the message in ways adaptive to different channels, which could be synchronous (in person speech acts), or asynchronous (emails) and corrupted by noise that may be “white” (bad phone connection), or selectively affect different parts of the message (accented speech, ungrammatical text).
For an ideography: Meaning-bearers (“ideograms”) encode underlying referents using stylized pictures that do not require knowledge of a natural language to decode. Channel encoding is more restrictive. As Morin points out, it may comprise acts of pointing, magnification, or repetition. One can argue some written languages – like English – already incorporate some redundancy into the design of the source encode: English vowels are largely redundant, and added to increase the intelligibility of text, which is often comprehensible from consonants alone (“th qck brn fx …”). They are also useful as prompts for correct pronunciation to help listeners distinguish among different words when they are spoken. Such languages lump together some source- and channel-coding operations, but they do not alter the parsing of communicative tropes into source- and channel-coding operations.
This approach to representing communication tools illuminates several aspects of the differences between languages and ideographies, such as:
(1) Morin's explanation of the difficulty of establishing a generalist ideography that is entrenched: Even though natural languages entail ambiguities mapping sign to referent that can make it difficult to uniquely recover a pointer to a specific referent (e.g., an object) from the source coder output (the word that spells the name of the object), they enable ample opportunities for error correction over both synchronous and asynchronous channels that ideographic communication lacks.
(2) The selective appeal of ideographies in specialized domains such as mathematical physics, which Morin relevantly references: Ideograms (mathematical symbols and formulas) efficiently encode their referents (operations and entities such as sets and functions defined on sets) in ways that make the source message more efficiently and less ambiguously reconstructible from received messages than alternative representations in natural language, suggesting source-coding advantages can compensate for more restricted channel-coding opportunities.
If, as Morin argues, there is no “hard-wired” advantage to the development of natural languages, in the form of neurological constraints or adaptive or exaptive developments, then one explanation for the absence of widely entrenched general-purpose ideographies is that attempts to build them have not been guided by an intelligent information-theoretic conception of a communicative device, which has led to the design of ideographies that mimic natural language sentences instead of exploiting the idiomatic source-coding advantages of ideograms and optimizing their communicability over plausible channels.
What could an ideography that did so look like? Shannon's analysis of source and channel coding offers some guidance: His source-coding theorem (Cover & Thomas, Reference Cover and Thomas2006) links the average size of source codewords to the entropy of the source signal – which measures the number of possible objects or states of affairs a codeword can refer to given the frequency with which they ecologically occur. His channel-coding theorem links the amount of redundancy we need to add to a signal sent over a channel with a given capacity (i.e., the mutual information of sent and received signals) – to the probability of recovering the signal with at most a certain error. Thus:
(1) An “intelligently designed” ideographic source encoder can minimize the number of possible states of affairs that needs to be encoded by the source encoder by featuring a set of objects (ideograms) that have maximal translational and scale invariance in space–time, and maximize contextual invariance by encoding objects in terms of their causal powers as opposed to nonessential properties (“being within Y feet of X”).
(2) To maximize its channel-coding gain, it can enable multiple levels of resolution and spatial rotation/translation (zooming and panning), decomposability (whole → parts) and compositionality (parts → whole) to make purely graphical explanations possible; and – a set of ideographically idiomatic communicative acts (zoom, decompose, pan) that are different from the queries of natural language and designed to make a purely graphical communicative act subject to pointy requests for clarification.
Morin's Puzzle of ideography advances a persuasive argument for why general-purpose, entrenched ideographies are not widespread, in spite of the advantage that they can be deployed in communication among speakers competent in different languages. But it does not allow us to predict whether or not a new ideography is more or less likely to fail than were its predecessors. Can we conceptualize the difference between ideographies and natural languages that allows us to do that, and perhaps to design an ideography that is both generalized and entrenched? I outline such an approach, adapting Shannon's (Reference Shannon1948; Cover and Thomas, Reference Cover and Thomas2006) model of a communication system as the concatenation of a source and channel encoder/decoder (Fig. 1) to represent the evolution and deployment of both a natural language and an ideography:
(1) The source encoder maps a plaintext message/picture into a set of codewords designed to allow the decoder to reconstruct the message from the code. The source decoder inverts the encoder's operation to recover the original message from the output of the channel decoder.
(2) The channel encoder maps the output of the source encoder onto a set of codewords that protect against distortions introduced by the channel by intelligently reintroducing redundancies in the signal. The channel decoder inverts the operations in ways informed by insight into the channel conditions – to recover the input to the source decoder.
Figure 1. Source- and channel-coding decomposition of the functions of ideographies and natural languages.
The separation of source- and channel-coding functions arises from their different purposes:
(1) the source encoder squeezes redundancy from the original message by exploiting hidden regularities to produce the shortest code from which the message can be reconstructed by the decoder.
(2) The channel encoder adds redundancy to the output of the source encoder to generate a transmitted signal maximally intelligible to a receiver over a channel.
For a natural language: Meaning-bearers (written words, propositions) source encode underlying messages (referents: “states of affairs,” “events,” “objects,” “relationships”). Redundancy – repetition, rephrasing, explication – and interpersonal feedback signals like queries and gestures are added to the message in ways adaptive to different channels, which could be synchronous (in person speech acts), or asynchronous (emails) and corrupted by noise that may be “white” (bad phone connection), or selectively affect different parts of the message (accented speech, ungrammatical text).
For an ideography: Meaning-bearers (“ideograms”) encode underlying referents using stylized pictures that do not require knowledge of a natural language to decode. Channel encoding is more restrictive. As Morin points out, it may comprise acts of pointing, magnification, or repetition. One can argue some written languages – like English – already incorporate some redundancy into the design of the source encode: English vowels are largely redundant, and added to increase the intelligibility of text, which is often comprehensible from consonants alone (“th qck brn fx …”). They are also useful as prompts for correct pronunciation to help listeners distinguish among different words when they are spoken. Such languages lump together some source- and channel-coding operations, but they do not alter the parsing of communicative tropes into source- and channel-coding operations.
This approach to representing communication tools illuminates several aspects of the differences between languages and ideographies, such as:
(1) Morin's explanation of the difficulty of establishing a generalist ideography that is entrenched: Even though natural languages entail ambiguities mapping sign to referent that can make it difficult to uniquely recover a pointer to a specific referent (e.g., an object) from the source coder output (the word that spells the name of the object), they enable ample opportunities for error correction over both synchronous and asynchronous channels that ideographic communication lacks.
(2) The selective appeal of ideographies in specialized domains such as mathematical physics, which Morin relevantly references: Ideograms (mathematical symbols and formulas) efficiently encode their referents (operations and entities such as sets and functions defined on sets) in ways that make the source message more efficiently and less ambiguously reconstructible from received messages than alternative representations in natural language, suggesting source-coding advantages can compensate for more restricted channel-coding opportunities.
If, as Morin argues, there is no “hard-wired” advantage to the development of natural languages, in the form of neurological constraints or adaptive or exaptive developments, then one explanation for the absence of widely entrenched general-purpose ideographies is that attempts to build them have not been guided by an intelligent information-theoretic conception of a communicative device, which has led to the design of ideographies that mimic natural language sentences instead of exploiting the idiomatic source-coding advantages of ideograms and optimizing their communicability over plausible channels.
What could an ideography that did so look like? Shannon's analysis of source and channel coding offers some guidance: His source-coding theorem (Cover & Thomas, Reference Cover and Thomas2006) links the average size of source codewords to the entropy of the source signal – which measures the number of possible objects or states of affairs a codeword can refer to given the frequency with which they ecologically occur. His channel-coding theorem links the amount of redundancy we need to add to a signal sent over a channel with a given capacity (i.e., the mutual information of sent and received signals) – to the probability of recovering the signal with at most a certain error. Thus:
(1) An “intelligently designed” ideographic source encoder can minimize the number of possible states of affairs that needs to be encoded by the source encoder by featuring a set of objects (ideograms) that have maximal translational and scale invariance in space–time, and maximize contextual invariance by encoding objects in terms of their causal powers as opposed to nonessential properties (“being within Y feet of X”).
(2) To maximize its channel-coding gain, it can enable multiple levels of resolution and spatial rotation/translation (zooming and panning), decomposability (whole → parts) and compositionality (parts → whole) to make purely graphical explanations possible; and – a set of ideographically idiomatic communicative acts (zoom, decompose, pan) that are different from the queries of natural language and designed to make a purely graphical communicative act subject to pointy requests for clarification.
Financial support
This work was supported by the Desautels Centre for Integrative Thinking at the Rotman School of Management, University of Toronto.
Competing interest
None.