1. Introduction
Sound, especially musical sound, has traditionally been viewed as a product of human activity with intrinsic sociocultural value. However, the relationship between the concepts of sound and sociality is a complex matter, raising questions regarding both the nature of musical sound and, perhaps more critically, the nature and definition of sociality. The point of departure for our discussion is a conceptualisation (proposed by some of the authors of the present article, and taking into account the musical context; Kim, Reifgerst and Rizzonelli Reference Kim, Reifgerst and Rizzonelli2019) that attempts to categorise social behaviour which comprises forms of basal social behaviour that, although not yet covered in social ontology, are nevertheless relevant to inter-individual behaviours, including those mediated by musical sound.
The categories of ‘inter-individual behaviour leading to social interaction, without any collective or we-intentions’, ‘collective inter-individual behaviour involving we-intentions’ and ‘collective inter-individual behaviour based on shared or collective intentionality’ (see section 2.1) indicate that sociality is the result of mechanisms that unfold on both basic and complex levels. That various salient features of music-making and listening present commonalities with the concept of sociality (Cross, Laurence and Rabinowitch Reference Cross, Laurence and Rabinowitch2012) suggests that musical sound may be a valid tool to facilitate social interaction and, more generally, pro-social behaviours; this has been confirmed by a number of empirical studies. Sociality and social interaction, in turn, are a fundamental and indispensable aspect of human health.
Based on the proposed relationship between sound, sociality and health, we developed a research project in which a body–machine interface and participatory performance is used to exploit the power of musical sound. This project is called ‘Sentire’, an Italian word meaning both ‘to hear’ and ‘to feel’. Born as an artistic endeavour and further developed as a scientific project, Sentire offers the possibility of accessing sociality in an innovative and intuitive fashion. The goals of the project are facilitating social interaction through technologically mediated sound feedback and fostering general well-being and use in various therapeutic approaches.
To investigate these possibilities, we apply the empirical method of structured observation – well-established in the social sciences – to analyse human behaviour during social interaction with Sentire. In sections 2 and 3, we discuss the theoretical background of the project: the conditions of social behaviour and mechanisms of social interaction, with a particular focus on coordination (section 2) and the relationship between musical sound and sociality (section 3). Section 4 discusses Sentire as both a sound-based body–machine interface and a participatory performance. Finally, in section 5, we present the first results of our structured observations, and, building on the fundamental role of coordination as discussed in the theoretical background, illustrate how we have investigated Sentire’s capacity to facilitate social interaction.
2. Theoretical background
2.1. Sociality: from basic to higher-ordered social behaviours
Human social interaction can occur in a variety of contexts and on different levels of complexity. More often than not, social interaction is constrained by implicit rules determined by specific social contexts, norms, or hierarchies. Social ontology – an attempt to ‘explain the fundamental nature and mode of existence … of human social institutional reality’ (Searle Reference Searle2010: ix), which therefore addresses these constraints – has, however, disregarded those (largely non-verbal) forms of social interaction that do not necessarily require social cognition and intention. Kim et al. (Reference Kim, Reifgerst and Rizzonelli2019: section 3) develop a more comprehensive model of social ontology, proposing distinct categories of social interaction: basic and higher-order social behaviours. Since basic social behaviours that do not involve shared or collective intentionality also play an important role in musical interaction and other forms of non-verbal interaction that do not aim to communicate representational meanings, Kim and colleagues’ categories of social behaviours serve as a point of departure for our discussion of sociality and (musical) sound.
For simplicity, in outlining those categories we consider only dyadic interaction, although the conditions given are also valid for groups of more than two individuals.
First, social interaction requires two individuals that are capable of perceiving both their own and another’s behaviour, and the difference between the self and the other. Second, these individuals must be situated in a shared social context that affords or constrains both the selection of an individual’s behaviour and their perception of the other’s behaviour. Third, both individuals have to act in a causal relation to the social context constraining them, thus creating an action–perception cycle which may or may not involve social cognition; in addition, both individuals have to act in the course of reciprocation to their counterpart.
A category of behaviours that meets all the preceding conditions is ‘inter-individual behaviour leading to social interaction, without any collective or we-intentions’; this category does not require social cognition (Kim et al. Reference Kim, Reifgerst and Rizzonelli2019: section 3.2). A classic example is infant–caregiver non-verbal vocalisation. Such an interaction is neither planned nor based on intention, but is the result of both caregiver and infant fulfilling all aforementioned conditions.
The interaction may be more complex and require social cognition when individuals share what Searle terms ‘we-intentions’ (Searle Reference Searle, Cohen, Morgan and Pollack1990). An example of ‘collective inter-individual behaviour involving we-intentions’ (Kim et al. Reference Kim, Reifgerst and Rizzonelli2019: section 3.3) is continuous applause by an audience following a performance with the intent of prompting an encore. Here, the communal intent, performed individually, is oriented towards a common goal.
A higher-ordered form of social interaction is categorised as ‘collective inter-individual behaviour based on shared or collective intentionality’ (Kim et al. Reference Kim, Reifgerst and Rizzonelli2019: section 3.4). Language-based forms of social interaction aiming to communicate representational meanings belong to this category. Some forms of non-verbal interaction, such as orchestral performance, fall into this category as well; in such cases, each individual’s act is an integral part of an overall act, and each individual adjusts or complements his own behaviour to that of the others to pursue the overall goal (e.g., playing a symphony).
Tying in with Kim and colleagues’ claim that social interaction is based on the perception and selection of one’s own and the other’s behaviour, distinguishing the self from the other, we propose to ground social interaction on kinaesthetic perception and coordination. In particular, we want to set our primary focus on how coordination builds the foundations of basal non-verbal social behaviours. Therefore, in the next section, after briefly introducing kinaesthetic perception, we will discuss the concept of coordination in detail. Even though we recognise the importance – both for our project and for social interaction in general – of higher-ordered complex behaviours including shared and collective intentionality and possibly involving verbalisations, social norms and power dynamics, we do not address these topics in the present article.
2.2. Coordination as a basic mechanism for social interaction
One of the necessary conditions for coordination is kinaesthetic perception, that is, the perception of one’s own movements (Rosenbaum Reference Rosenbaum2010: 51). According to some conceptualisations, kinaesthetic perception can also be oriented towards a partner’s movements (see ‘kinaesthetic intersubjectivity’, Samaritter and Payne Reference Samaritter and Payne2013 and ‘enkinaesthesia’, Stuart Reference Stuart and Radman2012) and thus build a bridge towards social interaction.
Coordination is a broad concept used to encompass a variety of behaviours. We propose to distinguish two main acceptations: that referring to the temporal axis, where ‘coordination’ can be replaced by the more specific terms of ‘synchronisation’ and ‘entrainment’ (generally used interchangeably); and that extending beyond the temporal axis and referring to the matching of specific behavioural aspects, which we will call ‘attunement’.
Coordination as synchronisation/entrainment is defined as ‘an adjustment of rhythms of oscillating objects due to their weak interaction’ (Pikovsky, Rosenblum and Kurths Reference Pikovsky, Rosenblum and Kurths2001: 8).
The concept of synchronisation/entrainment is established as a physical phenomenon but it has been taken up and further developed in the social sciences (McGrath and Kelly Reference McGrath and Kelly1986) and other disciplines, including musicology (Clayton, Sager and Will Reference Clayton, Sager and Will2005; Clayton, Jakubowski and Eerola Reference Clayton, Jakubowski and Eerola2019; Kim et al. Reference Kim, Reifgerst and Rizzonelli2019).
Many empirical studies show that synchronisation/entrainment facilitates or strengthens pro-social behaviour in the form of empathy and affiliation. Coordination within long-term musical group interaction was shown to increase the index of empathy and the ability to recognise emotional facial expressions (Rabinowitch, Cross and Burnard Reference Rabinowitch, Cross and Burnard2012). Stupacher (Stupacher, Wood and Witte Reference Stupacher, Wood and Witte2017) found that synchronisation/entrainment as performed in a tapping task had a positive effect on an implicit measure of affiliation and helpfulness.
A theoretical model of the mutual connection between coordination and social interaction has been provided by De Jaegher and Di Paolo (Reference De Jaegher and Di Paolo2007). Based on the premises that 1) the ability to create and appreciate meaning, which is called sense-making, is the definitional Footnote 1 property of cognisers (ibid.: 488), and 2) that movements (including utterances) are the tools of our cognition, the authors propose participatory sense-making as the core of social cognition. This conceptualisation emphasises the temporal aspects of interaction, particularly coordination, as borrowed from dynamical systems theory. In line with our views, De Jaegher and Di Paolo understand coordination as synchronisation/entrainment and define social interaction as the coupling, regulated by coordination, of at least two autonomous agents. Coordination allows the agents to maintain their coupling while remaining autonomous and enables (temporal) organisation to emerge from the social relation of the agents. Depending on the agents’ participation in the creation of meaning, participatory sense-making may be weaker (interaction is coordinated but coordination largely remains an individual activity) or stronger (interaction is the result of a fully joint and shared effort).
The model can be summarised as follows: movements (and utterances) are the tools of sense-making, while regulation of social coupling takes place through coordination of movements (and utterances); therefore, social agents can coordinate their sense-making in social interaction. Coordination influences (sustains, modifies) interaction and interaction in turn promotes certain patterns of coordination. If coordination is the crucial feature of participatory sense-making and participatory sense-making defines social cognition, then coordination represents the basis on which other, more complex forms of social interaction and cognition are built.
When participatory sense-making is combined with mutual incorporation – or, as we suggest, mutual embodiment – that is, the process by which two people extend the perception of their bodies and form a common intercorporality, enactive intersubjectivity emerges: the ability ‘to grasp (to a certain extent) the experience of a non-verbal partner by interacting with him’ (Fuchs and De Jaegher Reference Fuchs and De Jaegher2009: 482).
We now turn to the conceptualisation of coordination as attunement. Coordination can be understood in a much broader sense than simply the adjustment of two behaviours to each other on the temporal axis. In fact, rather than the matching of another person’s behaviour per se, it can indicate the matching of a specific aspect of that behaviour. Coordination therefore need not be an exact reproduction of the original, but can be a response that resembles the original only in certain respects. This form of coordination has been observed in infant–caregiver interaction from approximately nine months old: Stern (Reference Stern1985) reports the case of a boy sitting in front of his mother and shaking a rattle up and down with a display of interest; the mother’s reaction is that of nodding her head up and down to the child’s beat. Mother and son are not moving the same body part, yet the scene gives the impression that a form of matching has occurred. Stern defines this form of matching ‘affect attunement’, in which one party matches a behavioural aspect that reflects the feeling state of their partner. Indeed, in mother–infant dyads, affect attunement is a far more common maternal response than pure imitation (48 per cent versus 19 per cent, respectively).
Furthermore, in 87 per cent of cases affect attunement occurs cross-modally, meaning that the two interactors are using different modalities, Footnote 2 and transferring so-called amodal properties (intensity, time, shape) from one modality to another (Stern, Hofer, Haft, and Dore Reference Stern, Hofer, Haft, Dore, Field and Fox1985). The ability to identify cross-modal equivalences arises in the first months of life and grows as a distinctive property of human interaction. Interaction between adults is also rich in cross-modal attunement experiences such as smiling at a known person that approaches us; for example, in a work context.
Although the concept of affect attunement seems very promising for the analysis of interaction, it has not been consistently integrated into sociological research since its introduction in the mid-1980s, nor is there another established term to describe this specific form of coordination. As such, we have chosen to use ‘attunement’, without any further specifications, to indicate coordination in social contexts that refers to the matching of a specific behavioural aspect and differs from synchronisation/entrainment.
3. Musical sound and sociality
3.1. Sociality of musical behaviours
The role of coordination in social interaction has come to the fore in recent research on proto-musical and proto-linguistic behaviours. In developmental psychology, the coordination that emerges during infant–caregiver interaction is considered to be based on ‘communicative musicality’, an ability linked to ‘our innate skill for moving, remembering and planning in sympathy with others that makes our appreciation and production of an endless variety of dramatic temporal narratives possible’ (Malloch and Trevarthen Reference Malloch, Trevarthen, Malloch and Trevarthen2009: 4). The concept of communicative musicality allows verbal or musical interaction to be founded on pre-linguistic interactions using sound and gesture. Hence, to discuss the social dimensions of sound, we are interested in sonorous phenomena and behaviours that become coordinated during interaction.
Sonorous phenomena and behaviours that become coordinated with each other, or with gestural phenomena and behaviours, may be characterised as forms of movement that have duration and contours of intensity, pitch and so on. Coordination may be related to the structure of duration (i.e., rhythmic structure) as synchronisation/entrainment; and/or to the structures of contours as attunement. Even sonorous phenomena and behaviours used in verbal communication often become coordinated with each other when interacting speakers make sense of and empathise with each other through tacit engagement (Gill Reference Gill2015).
Musical sounds relate to one another to build dynamic forms of movement, which can be referred to as ‘sonorous forms of vitality’. Stern coined the term ‘forms of vitality’ to describe physical actions and mental processes that are shaped in relation to others and allow for the experience of vitality (Stern Reference Stern2010). Forms of vitality in music manifest themselves in structural features (e.g., contours of pitch) and codified forms (e.g., sonata form; Stern Reference Stern2010, Kim Reference Kim2013). This argument is in line with several music-theoretical approaches characterised as ‘energetic’ (Rothfarb Reference Rothfarb and Christensen2002), in which music’s dynamic qualities are identified and described in reference to contours of force (Mersmann Reference Mersmann1925; Kurth Reference Kurth1931; Zuckerkandl Reference Zuckerkandl and Trask1956; Halm Reference Schmalzriedt1978). Since forms of vitality emerge relationally, that is, through interaction with others, forms of vitality in music that resemble human forms of vitality could act as a basis for investigating the social dimensions of musical sounds.
Our thesis holds that music’s structural features and codified musical forms constitute sonorous forms of vitality that are related both to musical elements and to the world and others. We thereby refute the view of musical formalism in aesthetics, which sees ‘tonally moving forms’ as constituted merely through intra-musical relationships and not through relationships with the world and others (Hanslick Reference Hanslick1854).
Furthermore, music-making often comprises acts of collaboration, which involve either we-intentions or collective intentionality (see section 2.1). At the same time, listening to music involves mental processes of co-shaping forms of vitality (Kim Reference Kim2013), which could correspond to those forms of vitality that manifest themselves in musicians’ mental processes, as well as music’s structural features and codified musical forms. Hence, music affords a sense of mutual affiliation involving coordination processes among musicians, or between musicians and listeners (Cross Reference Cross2014). These processes include synchronisation/entrainment related to movements, tempo and metrical structure (Clayton et al. Reference Clayton, Jakubowski and Eerola2019), as well as processes relevant to joint action (Keller, Novembre and Hove Reference Keller, Novembre and Hove2014) that can be subsumed under the concept of attunement.
Cross argues that music plays a significant social role at times of potential social stress (Cross Reference Cross2009; Cross and Woodruff Reference Cross, Woodruff, Botha and Knight2009). He finds that music acts as a medium for managing events that can be categorised as situations of social uncertainty (Cross Reference Cross2014) by facilitating reciprocity, social bonding and cohesion (Cross Reference Cross, Barrett and Dunbar2007; Woolhouse, Tidhar and Cross Reference Woolhouse, Tidhar and Cross2016). Cross proposes the term ‘empathic creativity’ to indicate the mutual affective alignment that arises during the creative process of making music together (Cross et al. Reference Cross, Laurence and Rabinowitch2012: 6). Empathic creativity indicates both the innate mechanisms that occur on an automatic level below awareness and the acquired processes that occur on a volitional, aware level. Musical interaction exhibits various salient features that may give rise to empathic creativity in cooperative musical contexts: imitation, synchronisation/entrainment, disinterested pleasure, flexibility, ambiguity and shared intentionality (for a detailed description, see Cross et al. Reference Cross, Laurence and Rabinowitch2012).
Empirical studies (Cross et al. Reference Cross, Laurence and Rabinowitch2012) have demonstrated that the aforementioned natural features of musical interaction can promote and train empathic creativity. Digital musical instruments can be specifically designed for such purposes, allowing for a collective experience and guaranteeing intuitiveness of the system and easy access for all users (Blaine and Fels [2003] Reference Blaine and Fels2017). A central feature of these instruments is limitation of the musical parameters that can be controlled (Cook [2001] Reference Cook2017; Tanaka and Knapp Reference Tanaka and Knapp2002), so that interaction is prioritised (Robson Reference Robson2002) and made possible regardless of users’ musical skills.
3.2. Musical sound, therapy and well-being
Sound, particularly in the form of concurrent (or real-time) auditory feedback, can be used to support kinaesthetic perception (briefly introduced in section 2.2). The potential of concurrent sensory feedback for enhancing motor learning and performance has been extensively studied (see Sigrist, Rauter, Riener and Wolf Reference Sigrist, Rauter, Riener and Wolf2013, and Effenberg, Fehse, Schmitz, Krueger and Mechling Reference Effenberg, Fehse, Schmitz, Krueger and Mechling2016, respectively). For instance, the relationship between kinaesthetic perception, sound and intersubjectivity has been explored in dance improvisation. Here, the individual’s action–perception loop is combined with the loop between the self and the other in a form of non-representational, embodied and experiential interaction that Samaritter defines as ‘kinaesthetic intersubjectivity’ (Samaritter and Payne Reference Samaritter and Payne2013). The therapeutic approach of dance movement psychotherapy is based on this principle and makes use of shared movement as a specific intervention.
In the therapeutic context, concurrent auditory feedback has been applied (e.g., for patients with Parkinson’s disease) to improve specific gait parameters such as stride time (Hove, Suzuki, Uchitomi, Orimo and Miyake Reference Hove, Suzuki, Uchitomi, Orimo and Miyake2012) and stride length (Rizzonelli, Kim, Gladow and Mainka Reference Rizzonelli, Kim, Gladow and Mainka2017). In both study designs, interactive feedback is compared with non-interactive auditory stimulation and no stimulation. The results of both studies show that the condition under research corresponds to significant improvement in the training’s goal parameter as compared with the two control conditions. The assumptions that underlie these therapeutic protocols are: 1) that musical feedback can close the disrupted action–perception loop that characterises the impaired proprioception typical of Parkinson’s disease; and 2) that mutual entrainment between motor execution and auditory perception is the mechanism that enables the closure of the loop.
It may be argued that the preceding studies show the effectiveness of real-time auditory feedback on motor learning or task performance, but do not support free improvisational movement as a form of non-verbal social interaction. However, we claim that, because concurrent auditory feedback has been proven effective in motor tasks that rely on the basic mechanism of synchronisation/entrainment, it is likely to be effective even in the context of social interaction, where both basic and more complex cognitive processes are at play. Moreover, in line with several studies suggesting that, with respect to motor performance, music is a more effective stimulus than isolated rhythmic beats (Thaut, McIntosh, Rice, Miller, Rathbun and Brault Reference Thaut, McIntosh, Rice, Miller, Rathbun and Brault1996; Thaut, Rathbun and Miller Reference Thaut, Rathbun and Miller1997; Dyer, Stapleton and Rodger Reference Dyer, Stapleton and Rodger2017; Rose, Delevoye-Turrell, Ott, Annett and Lovatt Reference Rose, Delevoye-Turrell, Ott, Annett and Lovatt2019), we propose the use of musically informed real-time feedback.
Recent musicological research has considered which musical parameters influence motor production (e.g., Buhmann, Desmet, Moens, Van Dyck and Leman Reference Buhmann, Desmet, Moens, Van Dyck and Leman2016) and encouraged further research on ‘the precise aspects of music, besides musical tempo, that might influence the spatialisation of body movements’ (Styns, Van Noorden, Moelants and Leman Reference Styns, Van Noorden, Moelants and Leman2007). This leads to a reflection on how sound can be shaped involving musicality (i.e., with regard not only to timing but also to pitch, dynamics and so on) to facilitate therapy and, more generally, foster well-being. That many studies applying sound with therapeutic purposes do not describe the acoustic stimulation in detail reveals that the choice of the stimulation and its intrinsic qualities are often considered of secondary importance with respect to the goal of the studies. By contrast, we believe that the musicality available in sound should be central for the development of any sound-based intervention – be it therapeutic or aimed at increasing general well-being.
Sound has been described with great accuracy in a project aimed at promoting reflective engagement with the act of walking through acoustic feedback. Feltham (Feltham, Loke, van den Hoven, Hannam and Bongers Reference Feltham, Loke, van den Hoven, Hannam and Bongers2014) developed an interactive surface called ‘Slow Floor’ that generates sound according to the pressure exerted by the person walking on it. Pressure changes are mapped to sound qualities such as pitch and volume that seek to resemble the qualities of the movements, providing an intuitive experience. Each sound environment used in the study includes both pitched and unpitched acoustic material. Feltham’s project combines the facilitation and kinaesthetic perception with sound design and human–computer interaction. First empirical findings showed that users experienced a strong sense of creative agency over the sound, as they were stimulated to create new, original foot movements while generating and responding to the sound at the same time.
In a similar project by Françoise and colleagues (Françoise, Candau, Alaoui and Schiphorst Reference Françoise, Candau, Alaoui and Schiphorst2017), the interactive system called ‘still, moving’ was developed to enhance kinaesthetic perception Footnote 3 through real-time sonification of a person’s micro-movements. The system consists of two bracelets worn on the user’s legs, which track movement information and muscle tension through electromyographic sensors, with the system’s sensitivity increasing as the extent of movement decreases. The only sound parameter mapped to movement force is loudness, but the sound corpus is built in such a way that loudness variations result in consistent and rich timbric variations. The authors chose environmental sounds because of their evocative power, which resonates with people’s auditory experiences in the real world and thus facilitates an embodied experience.
A final remark needs to be made addressing personal musical taste. Although we acknowledge that preferences and cultural background influence a person’s reaction to musical stimuli – and are accordingly planning to consider personal musical taste in further studies – given the complexity and variety of the topic, our first musical feedback design, which we describe in this paper, focuses on functional, rather than subjectively perceived, acoustic qualities.
4. Sentire
The reflections of Feltham et al. (Reference Feltham, Loke, van den Hoven, Hannam and Bongers2014) and Françoise et al. (Reference Françoise, Candau, Alaoui and Schiphorst2017) are particularly relevant to our project. Sentire was initiated with the claim that an artistically sensitive and responsive use of sound in the form of auditory feedback can facilitate therapy and promote well-being by fostering the experience of the self as dependent on others. Building on knowledge gained from years of performance practice and on the theoretical background discussed in section 2, our project presents a developing scientific approach to adapt an artistic endeavour for healthcare purposes.
4.1. Sentire as a body–machine interface
A body–machine interface (BMI) is a technological interface capable of extending or replacing human capabilities. Utilising a BMI, the user may gain complete or shared control over the machine through signals derived from their body (Casadio, Ranganathan and Mussa-Ivaldi Reference Casadio, Ranganathan and Mussa-Ivaldi2012). Sentire is a sound-based BMI that sonifies motor behaviour in real time by detecting proximity and touch between two users. While established systems capable of distance detection typically use infrared or ultrasonic sensors, Sentire uses the body itself as an electronic component of the sensor system. This overcomes the disadvantages of measuring at discrete points and/or requiring a clear field of vision.
Sentire enables whole-body proximity detection independent of sensor positioning and orientation of the body in space. This is accomplished with our custom software and capacitive sensing system. The technical implementation of proximity detection takes advantage of the human body’s electric conductivity. Through a cable connected to a conducting bracelet, an electrical signal of very low voltage is fed into a person (for convenience called the transmitter). Attaching a second person (the receiver) to the same electrical circuit with another bracelet and cable causes capacitive coupling between the two bodies. Through a low noise signal amplifier, the amplitude of the transmitted signal is measured at the receiver. Because the signal strength is dependent on the distance between the receiver and the transmitter, this ‘proximity effect’ can be utilised to obtain a control signal, which changes linearly with the distance of the two bodies. The signal is then mapped to selected parameters of an algorithmic sound synthesis environment (parameter mapping). Finally, the output of the sound synthesis process is made audible, enabling closed-loop auditory interaction between the two persons. Figure 1 illustrates the signal path and processing of Sentire’s various components.
The use of parameter mapping is a central element in the design process of the interactive system, as it manifests the linkage between non-verbal social interaction and auditory feedback. Parameter mapping consists of the specific correspondence between control parameters (derived from performers’ actions) and sound synthesis parameters (Hunt, Wanderley and Paradis Reference Hunt, Wanderley and Paradis2002). As with digital musical instruments, the mapping layer is a key factor in determining constraints (Magnusson Reference Magnusson2010), dimensionality (Gurevich and von Muehlen Reference Gurevich and von Muehlen2001; Zappi and McPherson Reference Zappi and McPherson2015), and expressiveness (Arfib, Couturier and Kessous Reference Arfib, Couturier and Kessous2005). Following Rovan’s classic definition of parameter mappings (Rovan, Wanderley, Dubnov and Depalle Reference Rovan, Wanderley, Dubnov and Depalle1997), Sentire uses a ‘divergent’ mapping, where a one-dimensional gestural parameter (proximity) is simultaneously linked to multiple musical parameters.
Specifically, proximity is sonified in the intimate (0–0.5 m), personal (0.5–1.2 m) and social (1.2–3 m) areas, as they are defined in terms of proxemics (see Hall Reference Hall1966). Depending on context or conditions, the touch feature and the corresponding intimate area can be silenced. The physical experience of proximity (and touch) within dyadic interaction is thus digitally designed as a sonic-aesthetic experience (for an overview of the concept of digital proxemics, see McArthur Reference McArthur2016). The design of such an experience depends strongly on multisensory integration, the automatic process through which the human nervous system integrates different sensory modalities. In comparison with unimodal events, multisensory events can be detected faster and more easily, and improve detection sensitivity and event comprehension, as more sensory information is available (Hobeika Reference Hobeika2017). The usage of Sentire therefore combines the perceptive modalities of hearing, touch and kinaesthetic perception.
Sentire provides two sound environments (SEs; see Livingstone and Miranda Reference Livingstone and Miranda2004) called ‘Sinus’ and ‘Pulse’. We understand an SE as a set of parameters that defines the tonal quality and results from predefined artistic decisions combined with a specific parameter mapping. The composition of a given SE is based on sound design (i.e., how the sounds are generated, structured, transformed and mixed) and algorithmic processes (which are optionally controlled through the parameter mapping).
The two SEs primarily differ in their tonal quality (an ambient pad sound for ‘Sinus’ and a percussive sound for ‘Pulse’) but both feature a similar mapping of the control signal to the musical parameters, which is aimed at sonically intensifying the acts of approach and touch. The proximity signal is mapped to amplitude and pitch for each sound; when the users approach each other, the sound becomes louder and higher in pitch. (In ‘Pulse’, the speed of the discrete pulses is also mapped to the proximity signal, that is, the closer the participants are, the faster the pulse sounds are generated.) When the users touch each other, the root frequency of the proximity sound is changed based on a given probability distribution. In addition, another synthesiser is triggered; this synthesiser has the same sound generation as that mapped to the proximity parameter, but uses an envelope with short attack and release times, which makes the touch sound percussive and thus emphasises the touch event.
4.2. Sentire as a participatory performance
Sentire is not only a body–machine interface but also a participatory performance, in which a performer invites one person at a time to interact on stage. The performer, who knows Sentire and its sound environments, applies strategies to shape the interaction. In this sense, the performer serves as the leader of the interaction.
Through numerous performances in festivals and events worldwide, Footnote 4 we developed different strategies to intensify the interactive experience. Many of these rely on the performer’s improvisational skills and are influenced by dance practices such as contact improvisation. For example, moving in a circle is a strategy to overcome the tension that may arise when expert and participant first face each other but have not yet started interaction: the expert moves along an imaginary circle on the floor and approaches the participant from the side, rather than from the front.
Through a three-year project funded by the German Federal Ministry of Education and Research (BMBF) and conducted at the Humboldt University of Berlin, we are investigating the potential of Sentire in promoting social interaction in both real-world contexts and specifically designed experiments, both for the general population and among people with specific therapeutic needs. The results of these studies will inform further technical development as well as therapeutic approaches.
5. An empirical approach through structured observation
5.1. Structured observation for behavioural analysis
Observational measurements are particularly effective for the analysis of complex non-verbal behaviour, which unfolds in time and is context-sensitive, because they allow for an observation of spontaneous – rather than artificially elicited – behaviour, which is central in real-world research (Robson and McCartan Reference Robson and McCartan2016).
Structured observation (SO) consists of human, qualitative, non-participatory observation and codification of behaviour resulting in quantitative data such as number, frequency and duration of the codes applied. SO includes two interdependent processes: 1) segmentation of behaviour into chunks and 2) annotation, that is attribution of a code to each chunk. While segmentation of verbal communication is intuitive due to the presence of clearly defined units of speech (phonemes, words, sentences), chunking the seamless flow of non-verbal behaviour may be extremely challenging and is more prone to subjective judgement. The possibility to segment non-verbal behaviour depends entirely on the development of clear, exhaustive and mutually exclusive codes (Bakeman and Quera Reference Bakeman and Quera2011), that is, on the elaboration of a solid coding scheme and manual. Codes can be either empirical or functional. Empirical codes are as concrete as possible and based on pure observation of the morphology (e.g., ‘moves arm upwards’) or the purpose (e.g., ‘grasps food’) of the observed action. Functional codes are more abstract and require a certain degree of inference; for example, for the observation of an infant crying, the functional code could be ‘seeks attention’ or ‘expresses discomfort’ (Bakeman and Quera Reference Bakeman and Quera2011: 19).
To investigate original research questions, it is often necessary to develop a coding scheme ex novo during a complex and time-consuming initial phase of informal observation. Once scheme and manual have been refined and tested, SO in the strict sense can be applied. Cohen’s kappa (Cohen Reference Cohen1960) is used to calculate inter-observer agreement (IOA) between two independent observers for at least 20 per cent of the data (see, e.g., Pellecchia, Beidas, Mandell, Cannuscio, Dunst and Stahmer Reference Pellecchia, Beidas, Mandell, Cannuscio, Dunst and Stahmer2020). IOA values that can be considered strong (see McHugh Reference McHugh2012) prove the validity of a coding scheme and the reliability of the method in toto.
5.2. Structured observation for the analysis of Sentire
The SO approach is ideal for the study of interaction (Fuchs and De Jaegher Reference Fuchs and De Jaegher2009), specifically human–computer interaction and human–robot interaction (Seifert and Kim Reference Seifert, Kim and Pavlidis2008; Kim et al. Reference Kim, Chang, Schmidt, Otto, Buch, Seifert and Coussement2010). Our main hypothesis, that is, that Sentire can foster non-verbal social interaction, is addressed by SO as follows: 1) what are relevant social behaviours that emerge from interaction with Sentire? 2) does an increasing trend of pro-social or interaction-relevant behaviours emerge when Sentire is used over a span of time?
To apply SO, videos are recorded by three cameras in a triangular setup (to guarantee an accurate capture of the full interaction space) and analysed with the open source software ELAN from the Max Planck Institute for Psycholinguistics. In the following, we present a detailed report of our informal observation, which has led to the development of an effective coding scheme (with strong IOA values) and to preliminary results on the interaction itself.
We started our informal observation with the annotation of all visible behaviours, then used an affinity diagram to group them into 14 categories (i.e., 14 tiers or levels of analysis in ELAN), each with a predefined set of applicable codes (controlled vocabulary). Some major problems arose during this phase: first, the definitions on which codes were based (Koch Reference Koch2014, based on Kestenberg Movement Profile, see Kestenberg and Sossin Reference Kestenberg and Sossin1973) were often difficult to apply in a consistent, non-interpretative way; second, some behaviours needed to be coded on multiple levels, which makes the coding process more error-prone.
We hypothesised that reducing the scheme to its minimal terms, that is, to a single category/tier of mutually exclusive and exhaustive codes, would have allowed us to overcome the aforementioned difficulties and to perform a faster and easily comparable analysis. Under the umbrella category of behaviour, we subsumed the following five codes. The codes ‘simultaneous copy’ and ‘lagged copy’ indicate coordinated behaviour understood as synchronisation/entrainment. This is based on the fact that interpersonal coordination in the form of synchronisation/entrainment, like synchronisation/entrainment in the physical domain, refers to temporal coordination both in a one-to-one ratio and in any other constant phase relationship, that is, including a time lag. The code ‘compensation’ accounts for behaviour that does not appear as a copy but can nonetheless be conceived as coordinated in the sense of attunement (see section 2.2) – in our case: step forward–step backwards. Finally, the codes ‘different, directed behaviour’ and ‘different, non-directed behaviour’ indicate non-coordinated or divergent behaviour (Burgoon, Dillman and Stern Reference Burgoon, Dillman and Stern1993), where head and gaze may or may not be directed towards the partner. All codes are annotated only if they last at least three seconds, which represents the minimum duration for most cognitive activities and is characterised as psychological present (see, e.g., Jaffe and Feldstein Reference Jaffe and Feldstein1970; Wittmann and Pöppel Reference Wittmann and Pöppel1999/2000; and Stern Reference Stern2004 for a comprehensive, psychological approach).
The preceding coding scheme was systematically applied by three independent observers to a pilot study where two participants with a diagnosis of psychosomatic disorders interacted with a trained music therapist over three sessions. The analysis revealed two problems. First, it was necessary to separate segmentation from annotation so that an expert observer pre-segmented the video material before two external observers could annotate. Second, despite consistent segmentation and strict formulation of the codes, IOA was insufficient. Cohen’s kappa value (0 ≤ k ≤ 1; Cohen Reference Cohen1960) reached a mean value of 0.43, which, according to McHugh’s interpretation (McHugh Reference McHugh2012), can only be considered weak. The reason for the ineffectiveness of the single-tier strategy probably lies in the fact that, in order for the codes to be as few and exhaustive as possible, they were not strictly empirical, but included a certain degree of interpretation.
Therefore, it was necessary to resort to the original, more time-consuming strategy with multiple tiers (effectively applied in similar studies, e.g. Evola, Skubisz and Fernandes Reference Evola, Skubisz and Fernandes2015). This time, however, we maintained codes strictly empirical, did not set a minimum duration for behaviour to be coded and limited the options of the controlled vocabulary. As a result, observers were able to perform segmentation and annotation at once, that is, without pre-segmentation, and to reach a mean Cohen’s kappa value of 0.85 (Cohen Reference Cohen1960), corresponding to strong IOA (McHugh Reference McHugh2012). Table 1 shows the tiers applied to each interactor individually (during the COVID-19 pandemic, we carried out studies without touch to comply with COVID-19 regulations and therefore not using the ‘touch’ tier).
This coding scheme is strictly and intentionally limited to empirical codes, which are not prone to subjective interpretation and can therefore be annotated consistently by independent observers. Higher-order social categories are interpreted strictly as a result of empirical observations. Single empirical codes referring to whole-body movements do not disclose information about the amount or quality of the interaction, but the combination of participant and therapist’s corresponding tiers allows us to develop functional categories out of the originally annotated empirical tiers. In other words, coordination as synchronisation/entrainment results from the overlapping of the two participants’ whole-body tiers. Empirical codes referring to partner-oriented arm movements, all touch-related codes (currently not being used), and the annotation of gaze and smile, do as such provide information about directedness towards the partner and can be interpreted as attunement in Stern’s sense (see section 2.2). Moreover, mutual gaze or smile result from the overlapping of the two participants’ tiers ‘gaze’ and ‘smile’, respectively.
Figure 2 shows the percentage of interaction-relevant behaviours performed by the two participants during the whole pilot study. Figure 3 shows the increasing percentage of mutual gaze and smile (between each participant and the therapist) over three sessions.
The preliminary IOA results of the pilot study show the reliability of the coding scheme, which can now be applied to further studies, checking for IOA only for a portion of the data (at least 20 per cent, as indicated in the previous section). As it has become clear, the development of the current coding scheme is the result of a reiterative process, where literature-based hypotheses and codes had to be applied empirically and could be accepted only if they led to sufficient IOA. As for the results of the pilot study with respect to interaction-relevant behaviours, our preliminary findings suggest that pro-social behaviours such as coordination and directedness are solidly present during the usage of Sentire. In particular, mutual gaze and smile between each participant and the music therapist have increased over the three sessions. Clearly, these results are not suitable to claim any statistical significance, but represent a valid basis for the design of further studies and for a consistent application of SO based on the last version of the multiple-tier coding scheme.
6. Conclusion
The ongoing research project ‘Social interaction through sound feedback – Sentire’ aims to further develop Sentire, an innovative sound-based body–machine interface and participatory performance in order to promote non-verbal social interaction (both artistically guided and otherwise), specifically for therapeutic purposes. It seeks to design the experience of social relations between interactors and the emergence of the self as dependent on others using sound feedback to interactive behaviours. The artistically informed approach is combined with human–computer interaction research and real-world research.
Tying in with considerations by Kim et al. (Reference Kim, Reifgerst and Rizzonelli2019) concerning relevant conditions and systematic categories of social behaviour, we presented our theoretical claim that coordination represents a crucial basic mechanism of social interaction (sections 2.1 and 2.2). We argued that musical sound can promote sociality in the sense of communicative musicality, experience of vitality, and musical empathic creativity (section 3.1). On these grounds, we proposed that sociality facilitated through real-time auditory feedback can play a relevant part in therapeutic and well-being-related applications, emphasising that auditory feedback has the potential to increase embodied experiences and coordination (section 3.2).
The Sentire project, combining human–computer interaction with artistic and empirical approaches, was presented (section 4). The development of the interactive technology and choice of its sound design was largely derived from the artistic background of the project’s creators. Artists are often pioneers in exploring and extending the use of emerging technologies; Sentire demonstrates how an art project can inform academic research, providing a technology that is both intuitive to use and fulfils all requirements for promoting sociality through musical interaction and mutual embodiment. Scientific research demands a solid methodology to empirically prove what has been shown informally in the artistic context; to this end, the potential of Sentire to foster non-verbal social interaction is investigated by applying structured observation, a quasi-quantitative method of behavioural analysis. Although this approach is suitable for the study of interaction, chunking the seamless flow of non-verbal behaviour is not a trivial matter and is prone to subjective judgement; the reiterative process of developing a coding scheme and the results of the current scheme, which shows strong levels of IOA and therefore reliability, were discussed (section 5).
Artistic and technological ideas for Sentire will be developed further, informed by the results of empirical studies; and vice versa, the design of empirical studies will be informed by artistic realisations and technological developments. We aim to establish this unique approach, Artistic Human–Computer Interaction Design, for use in diverse fields of artistic and scientific research and to explore its possible therapeutic applications.
Acknowledgements
We thank Rebekka Gold for her proofreading and valuable comments, and Florian-Hendrik Gehrmann for his contribution to references. Moreover, we thank the German Federal Ministry of Education and Research (BMBF) for funding the project ‘Social interaction through sound feedback – Sentire’.