The article is an excellent illustration for the progress we can make in unraveling the “social artifact puzzle” once human–robot interaction (HRI) integrates humanities expertise pertaining to the analysis of the symbolic space of human social interaction. The depiction model presents a vast advance over coarse-grained taxonomies for human experiences in HRI (see e.g., Barak, Alves-Oliveira, & Ribeiro, Reference Barak, Alves-Oliveira, Ribeiro, Jost, Pedevic and Grandgeorge2020; Onnasch & Roesler, Reference Onnasch and Roesler2021), and unlike “relational” or “postphenomenological” pointers to “social construction” it provides concrete guiding concepts for analysis and design of HRI.
The authors' core assumption, however, that social robots are always and only experienced as depictions of social agents, rather than as social agents proper, seems problematic. Cognitive science and neuroscience research on “implicit” (pre-conscious) phases of social cognition provide ample counterevidence: Robot motions trigger many of the same perceptual “implicit mechanisms of social cognition” as human motions. There is thus no reason to assume that, at the level of implicit social perception, human motions are processed as socially coordinated movements but robotic motions only as depictions thereof, unless one wishes to keep the traditional assumption that sociality requires human consciousness.
The OASIS framework, another descriptive framework for human experiences in HRI developed by HRI researchers with humanities background (Seibt, Reference Seibt, Coeckelbergh, Loh, Funk, Seibt and Nørskov2018; Seibt, Vestergaard, & Damholdt, Reference Seibt, Vestergaard, Damholdt, Nørskov, Seibt and Quick2020) relinquishes this traditional assumption about sociality and allows for precise descriptions of various forms of asymmetric sociality, which has been found useful for the description of participatory sense-making in social robotics and AI (Zebrowski & McGraw, Reference Zebrowski and McGraw2022). I quickly want to set some pointers for how one could combine the important insights of the depiction model with the analytical concepts of the OASIS framework, because both approaches seem to complement each other.
In the OASIS framework, human experiences of social robots are taken to involve complex cognitive-practical processes of “sociomorphing.” Sociomorphing is currently a theoretical construct – a dynamics with various phases and feedback, which typically first engages preconscious “mechanisms” of social cognition and subsequently more or less routinized, tacit, or actively searching perceptual interpretation. While the latter phases may include the establishment of what the authors call the “scene depicted,” the initial perceptual mechanisms effect that a robot's motion is understood as a socially coordinated bodily movement (e.g., keeping critical distance in spatial navigation). Thus, unlike in the depiction model, here it is assumed that already the “base scene” can involve bona fide social agents, because robots can realize certain basic capacities of social coordination.
OASIS recognizes 10 levels of social coordination based on capacities ranging from socio-biological automatisms to empathic coordination to various forms of collective intentionality. While robots realize some low-level social coordination capacities, they currently can only simulate more involved coordination capacities, such as the capacity to coordinate based on affective empathy or the understanding of social norms. OASIS distinguishes five degrees of simulation (defined in terms of similarity relations among [human vs. robotic] processes). If a robot simulates a high-level capacity poorly, that is, at a low degree, human responses to robots often include active interpretatory sense-making processes of the kind that the authors describe as the transition from a “base scene” to a “depiction”: Kismet's (poor) simulation of smiling-at-X requires much interpretatory effort and thus is consciously understood as a mere depiction of smiling-at-X. However, this seems less plausible in the case of sophisticated simulations of coordinative capacities – vide the smiles of the robots Ameca or Sophia, which we may experience as insincere smiles rather than as depictions. (Unless exhibited in a museum, a three-dimensional pipe made of wood imitation is a pipe with restricted functionality.) In general, one might wonder whether the authors' thesis that all robotic gestures are experienced as depictions rides on the fact the authors' illustrations involve robots (Aibo, the Smooth robot, Asimo) with low-degree simulations of high-level coordinative capacities.
While the authors focus on the robotic object as artifact, prop, and character, in OASIS it is robotic actions (and parts of actions) that are the primary target of human sense-making. This allows us to differentiate between robotic actions that are low-degree simulations and thus experienced as depictions, such as Asimo's ceremonial bow, and those that we understand as such genuine social actions, without symbolic reference, such as Asimo's pointing to the right, because they are high-fidelity simulations.
Furthermore, on the OASIS approach, any social interaction requires at least seven perspectives: first-, second-, and (internal) third-person perspectives for each of the (here: two) agents, plus the external third person perspective of an observer (e.g. society at large). The cognitive activity of sociomorphing begins with implicit phases of social cognition but largely takes place in more or less tacit sense-making processes that arise when a human agent takes the second-person perspective onto her or his own action: “how will the other understand what I do?” The authors' fine-grained description of the parameters of interpretatory processes (e.g., in target article, sect. 8) offers valuable tools especially for these later phases of sociomorphing where human agents try to anticipate coordinative capacities of their interaction partner. The dimension of depiction may or may not loom large in such anticipations, depending on the degree of simulation and on whether human agents include the external third-person perspective of (in Clark and Fischer's terminology) the robot's “principal” or creator. Besides the principal, however, there are many other external third-person perspectives that might influence how we anticipate, in more or less tacit sense-making, the coordinative capacities of a robot. The taking and changing of perspectives figures centrally both in the depiction model and in OASIS, and by combining the respective perspectival differentiations we receive a more differentiated description of how people understand robotic actions.
While the OASIS account of sociomorphing could complement the idea that human understanding of robotic actions may involve that we understand them as depictions of social actions, the authors' assimilation of social robots to fictional characters strikes me as unhelpful: Social interactions cannot straddle the actual-fictional divide – a rescue robot can issue commands as rep-agent, but not as Hamlet.
The article is an excellent illustration for the progress we can make in unraveling the “social artifact puzzle” once human–robot interaction (HRI) integrates humanities expertise pertaining to the analysis of the symbolic space of human social interaction. The depiction model presents a vast advance over coarse-grained taxonomies for human experiences in HRI (see e.g., Barak, Alves-Oliveira, & Ribeiro, Reference Barak, Alves-Oliveira, Ribeiro, Jost, Pedevic and Grandgeorge2020; Onnasch & Roesler, Reference Onnasch and Roesler2021), and unlike “relational” or “postphenomenological” pointers to “social construction” it provides concrete guiding concepts for analysis and design of HRI.
The authors' core assumption, however, that social robots are always and only experienced as depictions of social agents, rather than as social agents proper, seems problematic. Cognitive science and neuroscience research on “implicit” (pre-conscious) phases of social cognition provide ample counterevidence: Robot motions trigger many of the same perceptual “implicit mechanisms of social cognition” as human motions. There is thus no reason to assume that, at the level of implicit social perception, human motions are processed as socially coordinated movements but robotic motions only as depictions thereof, unless one wishes to keep the traditional assumption that sociality requires human consciousness.
The OASIS framework, another descriptive framework for human experiences in HRI developed by HRI researchers with humanities background (Seibt, Reference Seibt, Coeckelbergh, Loh, Funk, Seibt and Nørskov2018; Seibt, Vestergaard, & Damholdt, Reference Seibt, Vestergaard, Damholdt, Nørskov, Seibt and Quick2020) relinquishes this traditional assumption about sociality and allows for precise descriptions of various forms of asymmetric sociality, which has been found useful for the description of participatory sense-making in social robotics and AI (Zebrowski & McGraw, Reference Zebrowski and McGraw2022). I quickly want to set some pointers for how one could combine the important insights of the depiction model with the analytical concepts of the OASIS framework, because both approaches seem to complement each other.
In the OASIS framework, human experiences of social robots are taken to involve complex cognitive-practical processes of “sociomorphing.” Sociomorphing is currently a theoretical construct – a dynamics with various phases and feedback, which typically first engages preconscious “mechanisms” of social cognition and subsequently more or less routinized, tacit, or actively searching perceptual interpretation. While the latter phases may include the establishment of what the authors call the “scene depicted,” the initial perceptual mechanisms effect that a robot's motion is understood as a socially coordinated bodily movement (e.g., keeping critical distance in spatial navigation). Thus, unlike in the depiction model, here it is assumed that already the “base scene” can involve bona fide social agents, because robots can realize certain basic capacities of social coordination.
OASIS recognizes 10 levels of social coordination based on capacities ranging from socio-biological automatisms to empathic coordination to various forms of collective intentionality. While robots realize some low-level social coordination capacities, they currently can only simulate more involved coordination capacities, such as the capacity to coordinate based on affective empathy or the understanding of social norms. OASIS distinguishes five degrees of simulation (defined in terms of similarity relations among [human vs. robotic] processes). If a robot simulates a high-level capacity poorly, that is, at a low degree, human responses to robots often include active interpretatory sense-making processes of the kind that the authors describe as the transition from a “base scene” to a “depiction”: Kismet's (poor) simulation of smiling-at-X requires much interpretatory effort and thus is consciously understood as a mere depiction of smiling-at-X. However, this seems less plausible in the case of sophisticated simulations of coordinative capacities – vide the smiles of the robots Ameca or Sophia, which we may experience as insincere smiles rather than as depictions. (Unless exhibited in a museum, a three-dimensional pipe made of wood imitation is a pipe with restricted functionality.) In general, one might wonder whether the authors' thesis that all robotic gestures are experienced as depictions rides on the fact the authors' illustrations involve robots (Aibo, the Smooth robot, Asimo) with low-degree simulations of high-level coordinative capacities.
While the authors focus on the robotic object as artifact, prop, and character, in OASIS it is robotic actions (and parts of actions) that are the primary target of human sense-making. This allows us to differentiate between robotic actions that are low-degree simulations and thus experienced as depictions, such as Asimo's ceremonial bow, and those that we understand as such genuine social actions, without symbolic reference, such as Asimo's pointing to the right, because they are high-fidelity simulations.
Furthermore, on the OASIS approach, any social interaction requires at least seven perspectives: first-, second-, and (internal) third-person perspectives for each of the (here: two) agents, plus the external third person perspective of an observer (e.g. society at large). The cognitive activity of sociomorphing begins with implicit phases of social cognition but largely takes place in more or less tacit sense-making processes that arise when a human agent takes the second-person perspective onto her or his own action: “how will the other understand what I do?” The authors' fine-grained description of the parameters of interpretatory processes (e.g., in target article, sect. 8) offers valuable tools especially for these later phases of sociomorphing where human agents try to anticipate coordinative capacities of their interaction partner. The dimension of depiction may or may not loom large in such anticipations, depending on the degree of simulation and on whether human agents include the external third-person perspective of (in Clark and Fischer's terminology) the robot's “principal” or creator. Besides the principal, however, there are many other external third-person perspectives that might influence how we anticipate, in more or less tacit sense-making, the coordinative capacities of a robot. The taking and changing of perspectives figures centrally both in the depiction model and in OASIS, and by combining the respective perspectival differentiations we receive a more differentiated description of how people understand robotic actions.
While the OASIS account of sociomorphing could complement the idea that human understanding of robotic actions may involve that we understand them as depictions of social actions, the authors' assimilation of social robots to fictional characters strikes me as unhelpful: Social interactions cannot straddle the actual-fictional divide – a rescue robot can issue commands as rep-agent, but not as Hamlet.
Financial support
This work was supported by a grant from the Danish Research Council for the Project “Robot-Mediated Learning and Socratic Robotics: New Forms of Experienced Sociality for Tutoring, Self-Edification, and Coaching (ROLES),” Grant no. 9131-00136B.
Competing interest
None.