Morin's standardization hypothesis is important for understanding the puzzle of ideography, but we suggest that understanding the ideographic puzzle, from psychological perspectives, has great potential value. Because people can be advantageous on learning, memory, and survival if spoken languages can be combined with graphic codes, people may prefer to combine spoken language with graphic codes and the target article should strengthen this part of the discussion. In this commentary, the effects that spoken languages have on graphic codes would be further discussed from three perspectives.
Perspective of error-driven learning
First, in terms of psychological mechanisms, the combination of spoken language and graphic codes contributes to standardization. Spoken language is the cheap and transient signal, and when people combine spoken language with graphic codes, they can verbally correct misunderstandings about the meaning of graphic codes, which is related to error-driven learning (Li et al., Reference Li, Hu, Li, Long, Gu, Tang and Chen2021). We have built an error-driven standardization model. Therefore, we can understand the mechanisms by which spoken language corrects misunderstandings of graphic codes at a psychological level (Fig. 1). As the figure shows, when the ideography appears, each interlocutor will have an initial expectation for it (expectation 1), followed by a first interaction with other interlocutors to exchange meanings (response 1). During the information communication, each interlocutor will receive information from other interlocutors (feedback 1), which may result in support, opposition, or supplement to the interlocutor's existing views and then give rise to prediction error (prediction error 1). When the prediction error happens (i.e., the expected view does not match the received feedback), the interlocutor will receive the error signal, first performing error monitoring to detect the mismatched information, then making targeted posterror attentional adjustments to integrate their information (Li, Wang, Li, & Chen, Reference Li, Wang, Li and Chen2022), and eventually becoming identical to the existing standard. The interlocutor generates new expectations (expectation 2) by correcting misinformation and updating common ground. This is followed by a second interaction to process individual standardization. In the interaction, the interlocutors support, debate, or correct each other, cycling through the above processes in order to revise and update the information several times. Until the end of the nth interaction, each interlocutor's prediction error would become zero (i.e., the expected viewpoint is the same as the received feedback). It means that the group has reached a consensus through the interaction, forming a uniform standard that can be shared and completing group standardization. In short, interlocutors standardize ideography by engaging in a cheap and transient interaction process of individual standardization combined with group standardization, which is compatible with the natural tendency to combine spoken language with graphic codes.
Figure 1. Error-driven standardization model of ideography supported by spoken language.
Perspective of memory
Second, combining spoken language with graphic codes is more beneficial to human memory. For example, both working memory and long-term memory are better with pronunciation added than without pronunciation (Hopkins & Edwards, Reference Hopkins and Edwards1972; MacLeod, Gopie, Hourihan, Neary, & Ozubko, Reference MacLeod, Gopie, Hourihan, Neary and Ozubko2010; Tan, Li, & Bai, Reference Tan, Li and Bai2022). Memory for pictures with pronunciation is also more effective than without pronunciation, when content-rich pictures are used as stimuli (Zormpa, Brehm, Hoedemaker, & Meyer, Reference Zormpa, Brehm, Hoedemaker and Meyer2019). From the neuroscience perspective, vocabulary memorization with pronunciation also has an activation advantage, compared with vocabulary memorization without pronunciation (Bailey et al., Reference Bailey, Bodner, Matheson, Stewart, Roddick, O'Neil and Newman2021). The learning hypothesis in the target article claims the human mind cannot memorize large numbers of pairings between meanings and visual symbols. However, we argue that humans are capable of remembering large numbers of pairings between meanings and visual symbols; it's just that memory efficiency and performance could be largely improved by combining pronunciation with visual symbols, therefore, humans choose the discarded less-efficient way of memorizing things.
Perspective of psychological evolution
As mentioned above, it can be demonstrated that adding spoken language enables humans to achieve better performance on learning and memory. Human's learning and memory are closely related to long-term psychological evolution, who may have evolved a tendency to learn and remember out of survival interests (Nairne, Thompson, & Pandeirada, Reference Nairne, Thompson and Pandeirada2007). For our ancestors, in order to solve the crisis of survival and repopulation, it was necessary to use a system of information that could be passed on from generation to generation. Thereby, the crisis could be better resolved and corresponding experiences could be preserved for a long time. For survival, our ancestors would have preferred a more efficient way of learning and memory. After some practice, this operation of combining spoken language does not need to consume cognitive resources, and have become an automatic operation after psychological evolution (Hu et al., Reference Hu, Wang, Gu, Luo, Yin, Wang and Chen2017; Yin, Sui, Chiu, Chen, & Egner, et al., Reference Yin, Sui, Chiu, Chen and Egner2019; Zhang, Ding, Li, Zhang, & Chen, Reference Zhang, Ding, Li, Zhang and Chen2013). Automatically combining spoken language with graphic codes may be an important reason for the historical absence of ideography.
Chinese characters
To sum up, from a psychological perspective, people will combine spoken language with graphic codes. Although Morin supposes Chinese is a code for spoken language, this opinion is absolute and controversial (Li, Reference Li1996; Zhang, Reference Zhang2011; Zhang et al., Reference Zhang, Fang, Du, Kong, Zhang and Xing2012; Zhu, Reference Zhu1995). If all Chinese characters are the record or code for spoken language, they cannot explain the existence of words that describe the shape of things but have no pronunciation clues in Chinese. For example, the Chinese oracle bones use such forms as , , and to represent sun, moon, and mountains, respectively. In modern Chinese characters, sun is “日,” moon is “月,” and mountain is “山.” Only 26% of the characters in the earliest oracle bones are associated with sound clues. Therefore, Chinese is a graphic code combined with spoken language, and cannot be interpreted absolutely as a record or encoding of spoken language (Zhang, Reference Zhang2011; Zhang et al., Reference Zhang, Fang, Du, Kong, Zhang and Xing2012; Zhu, Reference Zhu1995).
Conclusion
In summary, from a psychological perspective, this article suggests that because of the advantages of learning and memory, people will be inclined to choose a graphic code that incorporates spoken language as opposed to one that does not, and this operation of combining may have become automatic over time. People always automatically want to combine spoken language with graphic codes, which may be the reason why ideography has been missing since ancient times, and Chinese is a script that combines spoken language with graphic codes.
Morin's standardization hypothesis is important for understanding the puzzle of ideography, but we suggest that understanding the ideographic puzzle, from psychological perspectives, has great potential value. Because people can be advantageous on learning, memory, and survival if spoken languages can be combined with graphic codes, people may prefer to combine spoken language with graphic codes and the target article should strengthen this part of the discussion. In this commentary, the effects that spoken languages have on graphic codes would be further discussed from three perspectives.
Perspective of error-driven learning
First, in terms of psychological mechanisms, the combination of spoken language and graphic codes contributes to standardization. Spoken language is the cheap and transient signal, and when people combine spoken language with graphic codes, they can verbally correct misunderstandings about the meaning of graphic codes, which is related to error-driven learning (Li et al., Reference Li, Hu, Li, Long, Gu, Tang and Chen2021). We have built an error-driven standardization model. Therefore, we can understand the mechanisms by which spoken language corrects misunderstandings of graphic codes at a psychological level (Fig. 1). As the figure shows, when the ideography appears, each interlocutor will have an initial expectation for it (expectation 1), followed by a first interaction with other interlocutors to exchange meanings (response 1). During the information communication, each interlocutor will receive information from other interlocutors (feedback 1), which may result in support, opposition, or supplement to the interlocutor's existing views and then give rise to prediction error (prediction error 1). When the prediction error happens (i.e., the expected view does not match the received feedback), the interlocutor will receive the error signal, first performing error monitoring to detect the mismatched information, then making targeted posterror attentional adjustments to integrate their information (Li, Wang, Li, & Chen, Reference Li, Wang, Li and Chen2022), and eventually becoming identical to the existing standard. The interlocutor generates new expectations (expectation 2) by correcting misinformation and updating common ground. This is followed by a second interaction to process individual standardization. In the interaction, the interlocutors support, debate, or correct each other, cycling through the above processes in order to revise and update the information several times. Until the end of the nth interaction, each interlocutor's prediction error would become zero (i.e., the expected viewpoint is the same as the received feedback). It means that the group has reached a consensus through the interaction, forming a uniform standard that can be shared and completing group standardization. In short, interlocutors standardize ideography by engaging in a cheap and transient interaction process of individual standardization combined with group standardization, which is compatible with the natural tendency to combine spoken language with graphic codes.
Figure 1. Error-driven standardization model of ideography supported by spoken language.
Perspective of memory
Second, combining spoken language with graphic codes is more beneficial to human memory. For example, both working memory and long-term memory are better with pronunciation added than without pronunciation (Hopkins & Edwards, Reference Hopkins and Edwards1972; MacLeod, Gopie, Hourihan, Neary, & Ozubko, Reference MacLeod, Gopie, Hourihan, Neary and Ozubko2010; Tan, Li, & Bai, Reference Tan, Li and Bai2022). Memory for pictures with pronunciation is also more effective than without pronunciation, when content-rich pictures are used as stimuli (Zormpa, Brehm, Hoedemaker, & Meyer, Reference Zormpa, Brehm, Hoedemaker and Meyer2019). From the neuroscience perspective, vocabulary memorization with pronunciation also has an activation advantage, compared with vocabulary memorization without pronunciation (Bailey et al., Reference Bailey, Bodner, Matheson, Stewart, Roddick, O'Neil and Newman2021). The learning hypothesis in the target article claims the human mind cannot memorize large numbers of pairings between meanings and visual symbols. However, we argue that humans are capable of remembering large numbers of pairings between meanings and visual symbols; it's just that memory efficiency and performance could be largely improved by combining pronunciation with visual symbols, therefore, humans choose the discarded less-efficient way of memorizing things.
Perspective of psychological evolution
As mentioned above, it can be demonstrated that adding spoken language enables humans to achieve better performance on learning and memory. Human's learning and memory are closely related to long-term psychological evolution, who may have evolved a tendency to learn and remember out of survival interests (Nairne, Thompson, & Pandeirada, Reference Nairne, Thompson and Pandeirada2007). For our ancestors, in order to solve the crisis of survival and repopulation, it was necessary to use a system of information that could be passed on from generation to generation. Thereby, the crisis could be better resolved and corresponding experiences could be preserved for a long time. For survival, our ancestors would have preferred a more efficient way of learning and memory. After some practice, this operation of combining spoken language does not need to consume cognitive resources, and have become an automatic operation after psychological evolution (Hu et al., Reference Hu, Wang, Gu, Luo, Yin, Wang and Chen2017; Yin, Sui, Chiu, Chen, & Egner, et al., Reference Yin, Sui, Chiu, Chen and Egner2019; Zhang, Ding, Li, Zhang, & Chen, Reference Zhang, Ding, Li, Zhang and Chen2013). Automatically combining spoken language with graphic codes may be an important reason for the historical absence of ideography.
Chinese characters
To sum up, from a psychological perspective, people will combine spoken language with graphic codes. Although Morin supposes Chinese is a code for spoken language, this opinion is absolute and controversial (Li, Reference Li1996; Zhang, Reference Zhang2011; Zhang et al., Reference Zhang, Fang, Du, Kong, Zhang and Xing2012; Zhu, Reference Zhu1995). If all Chinese characters are the record or code for spoken language, they cannot explain the existence of words that describe the shape of things but have no pronunciation clues in Chinese. For example, the Chinese oracle bones use such forms as , , and to represent sun, moon, and mountains, respectively. In modern Chinese characters, sun is “日,” moon is “月,” and mountain is “山.” Only 26% of the characters in the earliest oracle bones are associated with sound clues. Therefore, Chinese is a graphic code combined with spoken language, and cannot be interpreted absolutely as a record or encoding of spoken language (Zhang, Reference Zhang2011; Zhang et al., Reference Zhang, Fang, Du, Kong, Zhang and Xing2012; Zhu, Reference Zhu1995).
Conclusion
In summary, from a psychological perspective, this article suggests that because of the advantages of learning and memory, people will be inclined to choose a graphic code that incorporates spoken language as opposed to one that does not, and this operation of combining may have become automatic over time. People always automatically want to combine spoken language with graphic codes, which may be the reason why ideography has been missing since ancient times, and Chinese is a script that combines spoken language with graphic codes.
Financial support
This work was supported by the National Natural Science Foundation of China (Grant No. 32171040).
Competing interest
None.