1. INTRODUCTION
Since technology has a distinct role in defining the constraints and potentials of improvisation in electronic music, technical preparation often becomes an important aspect of an improviser’s performance practice. A central consideration in this respect is the question of what constraints to set and on what level to pitch the creative register in a performance. If we want to start with nothing, an empty code interpreter or an unpatched modular system theoretically provide some of the widest horizons for sonic adventure, but we may then be facing distracting technical challenges and a degree of ‘performative latency’ due to the time it takes to code or patch. Reducing input modalities can make a system more intuitive and imply limiting the sonic potentials to a more instrumental level, which can provide creative constraints and engender nuanced performance. If we embed cognition and agency in the technology, we may be able to maintain direct performance agency, and simultaneously allow for a virtually unlimited complexity of structure and morphology in the music, without losing real-time dynamics. But that comes at a cost: a human performer in such a context can have a great degree of intuitive, bodily agency and influence on sound, but will have to sacrifice some control of some of the music as well, because much of it is generated by a computer algorithm.
In this article I discuss my recent practice-based research, which explores such an approach with an interest in how algorithmically generated synthetic sound behaves in human–computer improvisation, and the acousmatic qualities of such sound in a multichannel spatial setting. The thesis is that engaged listening, spontaneity and bodily agency are virtues of improvisation that can be conditioned by a technical system that is also engaged in listening, acting and spontaneous organisation. Developing technical systems in this manner is a process more akin to composition than instrument design, if we think of composition as the defining of a distinct aesthetic sound environment and a topological network of sounds and structures to explore through improvisation. The blurring of distinctions between design, technology and composition is, as Thor Magnusson has written, almost innate to electronic and digital musics:
The instruments become epistemic, composed, often directly fusing the instrument with the composition, as exemplified in the work of David Tudor, Gordon Mumma or Erkki Kurenniemi; where the instrument constitutes the piece, for example in the work of Éliane Radigue or Morton Subotnick; or where a specific technique becomes the theory and aesthetics of a new piece, as with Stockhausen or Xenakis. (Magnusson Reference Magnusson2019: 57)
However, this also has consequences for how we view improvisation in relation to composition. In this article, improvisation is regarded as an expression of agency and contingency through exploration of sound within the topological constraints of a system. Agency here refers to the ability for a performer to intervene or influence in a timely manner (Armstrong Reference Armstrong2012), but it also applies to the ability of computational processes to act. Contingency implies that the music has a dependence on agency, but also that any action is contingent and, philosophically speaking, not strictly necessary: it is precisely the fact that it could be otherwise that reveals that a performance constitutes a unique creative process. Thus, definitions overlap: designing through composing is done in the aim of composing through improvisation, one bringing the other into fruition.
The term ‘post-acousmatic’, coined by Monty Adkins, Richard Scott and Pierre Alexandre Tremblay (Reference Adkins, Scott and Tremblay2016), is relevant here. The neologism applies to ‘specific areas of practice that engage with acousmatic thinking whether they be 1) influenced by it, 2) an augmentation of its practice or 3) a critique of it’ (ibid.: 108). All three of these criteria apply here: much of the aesthetic thinking has an ancestry in acousmatic theory, while developing new ways of addressing the acousmatic in composition and performance, but also rethinking some of the fundamental assumptions about music and sound that acousmatic music tends to be based upon. Adkins etal. make several observations concerning aspects of acousmatic music which are challenged here too, including how ‘most acousmatic works follow [a] traditional notion of musical linearity’ (ibid.: 109) associated with the era ‘prior to the emergence of the Darmstadt avant-garde’, and how this linearity is also manifest in a gestural and physical cause-and-effect phrasing, based on the ‘notion of an “event” which has consequences’ (ibid.). Especially important is what Adkins etal. write about acousmatic composition and performance practice:
Acousmatic music, by developing almost entirely as a studio-based compositional practice with only the diffusion of fixed works remaining of its live performative aspect, has accepted and indeed further exaggerated the separation of compositional time and performance time it inherited from European classical music. The aspects of live musical practice that the acousmatic paradigm has profoundly abandoned – extemporisation, variation, variability of performance parameters, and sharing the moment of invention with the audience – are exactly those which free improvisation has vigorously reasserted. (ibid.: 111)
I want to celebrate these aspects of live musical practice and present the acousmatic as an osmosis of human and technological agency. I draw from post-humanist discourse to explain these views and their consequences for relevant music discourse. I will discuss two works – Texton Mirrors and Intra-action – to demonstrate theory in practice.
2. POST-HUMAN COGNITION AND INTRA-ACTION
N. Katherine Hayles (Reference Hayles1999) has discussed the concept of the ‘post-human’ as a reassessment of the very notion of the human, in favour of a subject for whom mind is not primary, and body is ‘the original prosthesis we all learn how to manipulate, so that extending or replacing the body with other prostheses becomes a continuation of a process that began before we were born‘ (ibid.: 3). Cybernetics, the interdisciplinary science named by Norbert Wiener and colleagues, which encompassed ‘the entire field of control and communication theory, whether in the machine or in the animal‘ (Wiener Reference Wiener2013: 11), implied ‘that the boundaries of the human subject are constructed rather than given. Conceptualizing control, communication, and information as an integrated system cybernetics radically changed how boundaries were conceived’ (Hayles Reference Hayles1999: 84). For Hayles, the most important implication of the post-human was not bio-technological hybrid organisms or prostheses, but rather the opportunity to envision human embodiment and consciousness as distributed processes. Hayles explains her conception of the post-human as follows:
Whereas the ‘human’ has since the Enlightenment been associated with rationality, free will, autonomy and a celebration of consciousness as the seat of identity, the posthuman in its more nefarious forms is construed as an informational pattern that happens to be instantiated in a biological substrate. There are, however, more benign forms of the posthuman that can serve as effective counterbalances to the liberal humanist subject, transforming untrammelled free will into a recognition that agency is always relational and distributed, and correcting an over-emphasis on consciousness to a more accurate view of cognition as embodied throughout human flesh and extended into the social and technological environment. (Hayles Reference Hayles2006: 161)
Hayles has since developed her view on cognition further, based on neuroscientific discoveries showing that human consciousness is powered by a more pervasive ‘nonconscious cognition’,Footnote 1 and that such nonconscious cognition also exists in all biological life forms, and in computational technical systems:
Consciousness occupies a central position in our thinking not because it is the whole of cognition but because it creates the (sometimes fictitious) narratives that make sense of our lives and support basic assumptions about worldly coherence. Cognition, by contrast, is a much broader capacity that extends far beyond consciousness into other neurological brain processes; it is also pervasive in other life forms and complex technical systems. Although the cognitive capacity that exists beyond consciousness goes by various names, I call it nonconscious cognition. (Hayles Reference Hayles2017: 9)
For Hayles, nonconscious cognition ‘provides a bridge between human, animal, and technical cognitions, locating them on a continuum rather than understanding them as qualitatively different capacities’ (ibid.: 67). Footnote 2 Central to the argument is that ‘nonconscious cognitions in biological organisms and technical systems share certain structural and functional similarities, specifically in building up layers of interactions from low-level choices, and consequently very simple cognitions, to higher cognitions and interpretations’ (ibid.: 13, author’s italics). Hayles does not see a close parallel between technical systems and self-aware consciousness, but draws attention to relations on the nonconscious cognitive level:
Like human nonconscious cognition, technical cognition processes information faster than consciousness, discerns patterns and draws inferences and, for state-aware systems, processes inputs from subsystems that give information on the system’s condition and functioning. Moreover, technical cognitions are designed specifically to keep human consciousness from being overwhelmed by massive informational streams so large, complex, and multifaceted that they could never be processed by human brains. (Ibid.: 11)
Importantly, Hayles avoids equating cognition with intelligence. She sets cognition at a relatively low threshold by defining it as ‘a process that interprets information within contexts that connect it with meaning‘ (ibid.: 22, author’s italics). She outlines the idea of a ‘cognitive assemblage’ as a collective of different cognitive systems that can span across biological life and technological media. While the idea of extended mind or extended cognition is not at all new (cf. Clark and Chalmers, Reference Clark and Chalmers1998; Clark Reference Clark2008; Varela, Thompson and Rosch Reference Varela, Thompson and Rosch2016), Hayles takes specific interest in technological systems within cognitive assemblages, and ‘the implication that arrangements can scale up, progressing from very low-level choices into higher levels of cognition and consequently decisions affecting larger areas of concern’ (Hayles Reference Hayles2017: 118). Further, she explains that, ‘because humans and technical systems in a cognitive assemblage are interconnected, the cognitive decisions of each affect the others, with interactions occurring across the full range of human cognition, including consciousness/unconscious, the cognitive nonconscious, and the sensory/perceptual systems that send signals to the central nervous system’ (ibid.). The cases studied in her book include infrastructural traffic control systems, personal digital assistants, social signalling and somatic surveillance.
Body and mind are contingent not only upon nonconscious cognitive processes but also ‘technogenesis’, ‘the idea that humans and technics have coevolved together’ (Hayles Reference Hayles2012: 10). The consequence of this is that the human is deeply enmeshed with technology in a manner that is not clearly hierarchical. Karen Barad (Reference Barad2007) uses the term ‘intra-action’ to describe a phenomenon or process that has several agencies within itself, which bring one another into existence. In this view, causality is not a one-way process, but rather traced to ‘agential separability’ and ‘exteriority within phenomena’ (ibid.: 177). Barad’s post-humanism emphasises performativity and how ‘phenomena are specific material performances of the world’ (ibid.: 335). Moreover, Barad explains, ‘agency is about possibilities for worldly reconfigurings. So agency is not something possessed by humans, or non-humans for that matter. It is an enactment. And it enlists, if you will, “non-humans” as well as “humans”’ (Barad in Dolphijn and van der Tuin Reference Dolphijn. and van der Tuin2012: 54).
Intra-action is a concept that shows how a phenomenon such as sound or music is composed of agential encounters but simultaneously cannot be reduced to these. It can also show how technological and human agencies are entangled in creative processes. However, as Hayles argues, though agency is present everywhere – even in material objects – computational media and biological organisms are unique because they are capable of making decisions. This is also a key reason why the human relation to technology is different where computational media are concerned. Magnusson has also emphasised that this cognitive dimension, along with the abstract symbolic design process of software, make the digital musical instrument ‘an epistemic tool: a designed tool with such a high degree of symbolic pertinence that it becomes a system of knowledge and thinking in its own terms’ (Magnusson Reference Magnusson2009: 168). Thus, the divide between materiality and information discussed by Hayles is evident in computer music too, since ‘code as material is not musical; it does not vibrate; it is merely a set of instructions turned into binary information converted to an analogue electronic current in the computer’s soundcard’ (ibid.: 172). As Robert Seaback (Reference Seaback2020) has also recently showed, following Hayles, this process does constitute a materialisation of information in sound. Though cognitive tools are proliferating widely as artificial intelligence is applied within creative practices, this is of course in most cases not implying autonomous computational creativity, but rather notions more similar to what Artemis-Maria Gioti (Reference Gioti2020) terms ‘co-creativity’ or ‘extended intelligence’, featuring a human–computer liaison rather than an exclusive or competitive model.
3. REVEALING DETOURS: IMPROVISATION AS POIĒSIS
In his criticism of the ‘instrumental and anthropological definition of technology’ (Heidegger Reference Heidegger1977: 5), according to which technology is a means to human ends, Martin Heidegger emphasised how technology is a poiēsis, a ‘bringing-forth’, or ‘revealing’: ‘through bringing-forth, the growing things of nature as well as whatever is completed through the crafts and the arts come at any given time to their appearance’ (ibid.: 11). As Bernard Stiegler puts it, this process ‘brings into being what is not’ and ‘the final cause is not the efficient operator but being as growth and unfolding’ (Stiegler Reference Stiegler1998: 9). Here, the ‘efficient operator’ is the human master, and ‘being’ is phusis, or nature. Also referring to Heidegger, Francesca Ferrando (Reference Ferrando2019) cites musical improvisation or poetry-writing as examples of poiēsis: processes that, although they may have premeditated elements, are also characterised by surprise and discovery, as something new is revealed. This might be paralleled with Bruno Latour’s metaphor of the detour, describing ‘the labyrinth that [one] will have to confront before pursuing [one’s] initial objectives’ (Latour and Venn Reference Latour and Venn2002: 251), a journey that transforms both means and ends. Derek Bailey’s view on improvisation seems a good match with these ideas:
Although some improvisors employ a high level of technical skill in playing their instrument, to speak of ‘mastering’ the instrument in improvisation is misleading. The instrument is not just a tool but an ally. It is not only a means to an end, it is a source of material, and technique for the improvisor is often an exploitation of the natural resources of the instrument. (Bailey Reference Bailey1993: 99)
Both composition and improvisation are technological detours, involving discovery and revelation. But how do we experience this poiēsis?
A common, though perhaps traditional, perspective on electronic live performance is that it is difficult for the spectator to engage with the performance because they cannot see what the performer is doing, or do not understand how the instrument works. Does this suggest that the technology is occluding rather than revealing the creative process? Implicitly, computer technology can reveal itself by producing sound not traced to human or acoustic causes, perhaps by doing what humans and acoustic causes cannot do. Simultaneously, we may not know exactly what it is doing because the causal mechanism of sound is hidden in code and circuitry. This unknown dimension is also present, though in a different sense, in acoustic performance. Part of the excitement about, for example, virtuosity, or extended techniques, might be that what we hear defies what we thought was possible. A challenge with computer technology, then, is that sonically, virtually anything is possible, since acoustic mechanics and bodily human skill are not necessarily constraining the potentials of the system. Echoing this instrumental perspective, Denis Smalley has raised concerns about the potential incoherence caused by performance interfaces and processing in live performance:
Thus we can arrive at a situation where sounding spectro-morphologies do not correspond with perceived physical gesture: the listener is not adequately armed with a knowledge of the practicalities of new ‘instrumental’ capabilities and limitations, and articulatory subtlety is not recognized and may even be reduced compared with the traditional instrument (creating what I call a minus-instrument). The puzzled listener can be forgiven for not knowing whether to ascribe perceived musical deficiencies to a minus-instrument, the performer, or the composer. (Smalley Reference Smalley1996: 104)
Indeed, electronic and computer music systems, and their associated hardware, such as modular synths, laptops and controllers, may not offer as direct a causal link between human agency and sound model, as do acoustic instruments. However, although Smalley’s statement makes sense to an audience primed on instrumental music – as was more likely the case when the preceding words were written – today’s music enthusiasts are not necessarily conditioned by instrumental music practice, and may not consider acoustic instruments and sounds as archetypal for music. Moreover, the instrumental analogy is tied to gesture and the notion that sounds carry human expression and body language, thus anchoring the music in human cause and intention. Gesture also draws attention away from many of the characteristics that are distinct about computational or electroacoustic sound, which carry material qualities and a spontaneous autonomy not linked to human and acoustic causes. My post-human perspective takes interest precisely in the reframing of human agency by the encounters with more speculative sound ecologies, where humanity has to define itself in relation to the unfamiliar. This is a context where all agency – ‘human’ or ‘computational’ – is technologically mediated, and where sounds that are activated by human gesture may not have morphological profiles corresponding with acoustic cause and effect and with human body language. As human agency is nested within a wider agential and nonconscious cognitive matrix, Hayles’s ‘posthuman subject’ emerges as ‘an amalgam, a collection of heterogeneous components, a material-informational entity whose boundaries undergo continuous construction and reconstruction’ (Hayles Reference Hayles1999: 3). Gesture loses its rhetorical power, as ‘the presumption that there is agency, desire, or will belonging to the self and clearly distinguished from the “wills of others” is undercut in the posthuman’ (ibid.: 3–4). The liaison of agency and contingency can therefore subsume gesture within a distributed ecology where any processes or sounds can act and choose, hinging on an awareness of time, change and potentials, and conditioned through composition strategies where time is contingent on performance. A parallel might be drawn to Earle Brown’s compositions, of which Morton Feldman said that ‘when the performer is made more intensely aware of time, he also becomes more intensely aware of the action or sound he is about to play. The result is a heightened spontaneity which only performance itself can convey’ (Feldman quoted in Bailey Reference Bailey1993: 60). Attributes such as ‘spontaneity’, ‘risk of failure’ and ‘indeterminacy’, which Kerry Hagan (Reference Hagan2016) associates with live performance, or Simon Emmerson’s notion of ‘living presence’ (Reference Emmerson2007), become palpable. In this situation, sound virtually frames the performance-technology domain (Nyström Reference Nyström2018b), and any uncertainties in the linkage between cause and effect is ‘sound unseen’ (Kane Reference Kane2014), an acousmatic added value (Chion Reference Chion1990), located explicitly in the computational domain, where human agency is absorbed in currents of information, which are transformed into sound. The poiēsis, then, is propelled precisely by the post-human agency and contingency of a system (human and computer) engaged in improvisation. Agency and contingency are situated in a recursive, intra-active relation, one bringing the other into existence. Since improvisation is inherently processual, and technology is not a means to an end any more than performance is, we arrive at a recursive definition of ‘artefact’ or ‘work’, where what is being created is the process of creation, or as Latour has put it, ‘the end of the means’ (Latour and Venn Reference Latour and Venn2002: 247).
3.1. Acousmatic Black Boxes
Magnusson has pointed out that music technologies, like other technologies, become ‘black boxes’: though we know their inputs and outputs, their inner workings and rationales become obscure through repeated and transforming usage (Magnusson Reference Magnusson2009). As Latour explains, ‘[t]he more technological systems proliferate, the more they become opaque, so much so that the growth of the rationality of the means and ends (according to the conventional model) is manifested precisely by the successive accumulation of layers, each of which makes the preceding ones more obscure’ (Latour and Venn Reference Latour and Venn2002: 251). Magnusson reminds us that black boxes appear differently to the designer, who ‘creates the instrument from a conceptual understanding of the domain encapsulated by it’, and to the user, who ‘gains operational knowledge that emerges through use (or habituation) and not from abstract understanding of the internal functionality’ (Magnusson Reference Magnusson2009: 171). He notes the added complexity created by the fact that the designer and performer frequently are the same person, who has to alternate between two roles. From my point of view, black-boxing occurs repeatedly during the making of a work, as programmed objects are created and linked to others. In performance, when objects process data and make decisions, they begin to form a matrix of nonconscious cognisers, whose individual existences I may forget about, though depend upon. The input is reduced from code to a handful of controllers, which reveal complex sounds via highly constrained physical input modalities. These works, then, may be thought of as acousmatic black boxes, which host both code and physical technology, conceived as compositions that are realised through improvisation. What happens between input and output is a virtual mechanics, embodied in sound; a speculative world revealed in the poiēsis of improvisation. The use of the term ‘acousmatic’ in this context is concurrent with Kane’s view that ‘acousmaticity, the determination or degree of spacing between source, cause, and effect’ (Kane Reference Kane2014: 225) is a continuum that depends on a listener’s knowledge – or lack thereof – of a sound’s nature within a specific context. Concerns shared with acousmatic fixed-media music – in particular, morphology, texture, gesture and spatiality – remain relevant, not least since they carry a causal relation with agency and contingency in performance. Improvisation with generative processes reveals an acousmatic morphogenesis (Nyström Reference Nyström2017) made palpable in the ongoing activity of performance, technology and space, and synthesised by the listener.
4. IN-FORMALISED COMPOSITION
A synergy of design, composition and performance in live electronic music was demonstrated early on by pioneers such as Gordon Mumma and David Tudor (Mumma Reference Mumma, Appleton and Perera1975). For computers, Joel Chadabe’s ‘interactive composing’ is another example: ‘a two-stage process that consists of (1) creating an interactive composing system and (2) simultaneously composing and performing by interacting with that system as it functions’ (Chadabe Reference Chadabe1984: 23). Robert Rowe writes that interactive systems for improvisation are a domain of composition where computers have ‘changed the nature of the compositional act itself’, precisely because we are building cognising algorithms, thus moving to a meta-level of distributed decisions and ceding ‘a large measure of control over musical decision-making to the human improviser’ (Rowe Reference Rowe1999: 85). George Lewis also demonstrated an entanglement of composition and improvisation in his Voyager, ‘a computer program [which] analyzes aspects of a human improvisor’s performance in real time, using that analysis to guide an automatic composition (or, if you will, improvisation) program that generates both complex responses to the musician’s playing and independent behavior that arises from its own internal processes’ (Lewis Reference Lewis2000: 33). While much interactive computer music features acoustic instruments and note-based material, Agostino Di Scipio demonstrated an entirely timbre/texture-based approach and reformulated interactive composition into ‘composing interactions’ with his audible ecosystems, using sound itself and the listening environment as interfaces of interaction (Di Scipio Reference Di Scipio2003). Di Scipio’s ecosystems demonstrate an intra-action: acts of listening and sound-making occur within a system that, as a whole, cannot be divided into clear-cut agents.
The nuances in terms of what composition entails within this kind of practice are noteworthy. A work might be designed as a circuitry or algorithm that follows a relatively simple scheme even if sonic outcomes can be complex. Tudor’s circuit diagram scores are good examples of this; Agostino Di Scipio’s audible ecosystems or Dario Sanfilippo’s adaptive systems are other fine examples (Sanfilippo Reference Sanfilippo2018). The latter also set a clear boundary for human intervention. However, the desire to create a complex system can easily lead to a more interventional composition strategy, where the system becomes increasingly heterogenous, due to exceptions to rules that prove too crude under certain conditions and discoveries of relations that are too compelling not to take further. This does not necessarily mean that the scope for improvisation reduces, however; the opposite can be equally true. The definition and development of the relational structures of the system become a process which might be termed in-formalised algorithmic composition, in that the formalisation inherent to programming turns on itself when rules add up in a ‘successive accumulation of layers, each of which makes the preceding ones more obscure’ (reiterating Latour’s words). In-formalised composition continues in improvisation, where sonic spontaneity results from the contingent chain reactions of both human and algorithmic agency.
5. STRANGE POST-HUMAN ATTRACTORS
Though employing different techniques between themselves, the two works discussed here have in common that they are based on the principle of having an algorithm capture data from performance and using that to generate an accumulating and evolving synthesised texture. Typically, the process is such that when the performer lets go of the controllers, the texture continues playing autonomously. If the performer touches the controllers again, they will take over the control mechanism. The algorithms do not replicate the performed material, but rather create derivative textures that carry traces of the original input. I started working with this type of method for several reasons. First, it provides a very clear method of revealing the poiēsis of improvisation: the material is made up in the moment, and the manner in which the system evolves with it is audible. Having both human and computer operate on the same material makes the distribution of agency palpable, as it becomes clear that sound is contingent upon both human and computer. Second, the reconstruction and deformation of human agency creates sound behaviours that might be termed ‘strange post-human attractors’: phenomenologically chaotic systems, embodying both computational processes and human agency. Third, the accumulation of performance-derived material establishes a cognitive assemblage of human–computer activity, based on nonconscious memory. This allows for a continuity that lets us perceive how a history of past actions influences the present moment. Finally, the state of the system as a whole of course affects the nature of agency and contingency within it.
6. MACHINE LEARNING IN TEXTON MIRRORS
Both Texton Mirrors and Intra-action are realised entirely in the SuperCollider environment for synthesis and algorithmic composition. Texton Mirrors was created and performed in several iterations during 2018/19, and most recently presented as part of the AI x Music programme for Ars Electronica Festival, Linz, 2019. It is based on the idea of organising a spatially distributed texture as a montage of micro-temporal sounds grouped in space, which I refer to as ‘textons’. The concept of textons is derived from Bela Julesz’s neuroscientific research in texture perception (Julesz and Schumer Reference Julesz and Schumer1981), which postulates that visual spatial perception is based on the processing of microscopic particles of different shapes and orientations. The sonic counterpart is a spatially distributed texture that has a rich array of organised subfields as spatially localised streams of sound with different characteristics (Nyström Reference Nyström2011). Texton Mirrors was informed by Horacio Vaggione’s ‘micro-montage’ composition technique, where sound is assembled in a ‘pointillist’ manner (Roads Reference Roads.2005). Working in fixed media, Vaggione emphasises an ‘action-perception reciprocity’ in employing both algorithmic processes and manual editing (Criton Reference Criton.2005): this is here translated into co-dependence between real-time generated and ‘manually’ performed sound.
The work makes significant use of machine learning – primarily unsupervised – for structuring aggregates of textons based on improvised performance. In addition, several time-sensitive processes are used to harness temporal data from the progress of a performance to construct emergent structures in both micro- and macro-time. The computer system carries out nonconscious cognitive organisation of the performed material, in a process that separates the input into groups, and generates variations of sounds within these groups.
The centre of the work is an array of several instances of a synthesis process mapped to pads on a MIDI controller. Each pad is controlling its sound via velocity alone, mapped to multiple parameters, affecting both spectral and temporal properties of the sound in different ways. As the pads are played, a clustering algorithm stores and classifies the parameter data into 12 different sound groups.Footnote 3 The process works by defining centroids in a parameter space, based on the inputs, each centroid being the centre value of a group. Each new texton is classed as belonging to the group of its nearest centroid, but also updates that centroid, so that the centroid will move as the new data is entered. The groups are thus not static, but transform over time, as new data enters. The sequence of classifications is used to train a Markov set,Footnote 4 which is the basis for generating a texture, based on the probability for occurrence of each texton type in succession to another. When the performer stops playing, the system continues, generating texture on the basis of the analysis of the performance. Each new sound generated is a variation within the confines of its group.Footnote 5 The texture generator separates different groups of textons into spatially localised streams, so that textons from the same group will be spatially positioned in relation to one another. The system remembers the spatial location of the last occurring texton in each group, so that it can distribute each new sound to its appropriate place. Each new texton is incrementally displaced in relation to the previous in its group, so that, over time, the different streams keep moving. This incremental spatial movement is a function of time, with the effect that short time intervals within the same texton group result in less spatial motion than long time intervals. This is to prevent the texture from moving so much that it loses its perceptual grouping. If a group coincides spatially with another, one group will shift to make space for the other. This has the effect that the streams will self-organise to maintain the coherence of grouping while also being dynamic. While this process is generating sound, it is also retraining itself, by classifying its own output in the same manner that it analyses human performance, which means that the centroids of the system keep moving continually.
From the central texture process, the system branches out by generating other sound material in response to performance. Another array of pads with an identical analysis process, but different sonorities, is also available, the sounds of which are matched in relation to the first. There are also knobs available that increase the probability for additional streams of sound that are selected from self-organised maps or searched for nearest match in k-dimensional trees,Footnote 6 in relation to the main stream, but mapped to different synthesis processes. The additional texture processes serve to introduce sound on both lower and higher timescales by using sequences of time intervals captured from performance and creating new figures and patterns from these. Supervised learning is also used in the form of artificial neural networks (ANNs),Footnote 7 which have been trained to generate sounds in relation to the main texture process under certain circumstances.
While the system controls the texture, the performer has other sound-generating processes to work on, accessible on other pads and knobs. One such process is a set of pads that updates its synthesis mappings depending on the time and control data derived from performance. This means that the mapping transforms under the hands of the performer, in an emergent, time-dependent, but not indeterministic manner. This mapping also has a dynamic offset to set the frequency of the output sound in relation to spectral properties of other processes in the system. ANNs have also been used for knobs that feed their position into a calculation that also takes values from other sounds happening simultaneously. This means that a performer can acquire a rough intuition about what sounds could appear when turning the knob, although it is impossible to predict the exact output as it also depends on other elements in the texture. These knobs also have interlinked action so that different mappings are activated when certain knobs are played simultaneously. Because all the classifications throughout the performance have a history dating back to the original centroids of the first 12 sounds, there is an irreversible imprint on the performance from the start.
An added dimension to this work is the sounds that become available under the pads as the system’s state evolves. The system has been programmed so that after certain performance criteria have been met and the overall density is above a certain threshold, additional sounds become accessible on the pads. Some of these are mapped using an ANN trained on time intervals between pad strokes, so that certain performance patterns will bring additional sounds to the texture. The pads also allow a complex response to time-dependent gestures that will introduce streams of background texture, building up behind the main process.
The element that gives the system a larger-scale behaviour is a machine listening routine that monitors the density of the system as well as spectral centroids of events, regulating its output accordingly. This is coupled to an ‘activity rate’ monitor that checks successive time intervals and spectra and increases its rate output when the current time interval is shorter than the previous and the current frequency is higher than the previous, thus multiplying if the playing or system output increases in energy. The durations between the crests of activity that are created over time are used to project longer glissandi over time as the work progresses. The spatial distributions of some textures also involve a simple form of cellular automata, where spatially distributed synthesis processes are controlled in a manner that allows them to self-organise, using topographic synthesis techniques, which I have presented elsewhere (Nyström Reference Nyström2018a).
The mode of spatialisation is different to standard processes where sounds are distributed according to a function that does not know what it is distributing. This texture-generating process is aware of sound types, temporal pattern and spatial locations. It is also aware of, and responding to, its own output by updating its organisation.
Performance of Texton Mirrors requires careful listening and attention to controller responses. As per basic MIDI protocol, each of the controllers only has one dimension of input: velocity for pads and position for knobs. Yet these data are used in a multitude of ways to bring forth a large variety of sounds under varying circumstances. Because the system is entirely contingent on input, but also generates sound on its own, human improvisation is made necessary by context. Over time, this system increasingly constrains improvisation because it accumulates in an irreversible manner, on the basis of its history. The range of possible sounds that can be made in performance increases over time, but the complexity of the texture also increases, with the result that there is more context to consider and relate actions to. This is intentional, as the idea was for the system to take a direction which human performance can influence but not control.
As a cognitive assemblage, this system includes a human’s listening and physical agency, and a listening and learning system that interprets very simple control input information in a complex, evolving context, that gives the input meaning and a variety of consequences. A cognitive ‘mirror’ is formed as human listening and perception are reflected in texture-organising algorithms, where textons are sonic-informational packets of data that human–computer processes act upon. Agency is awarded the performer in the form of performance precision, rather than quantity of controllers. Although MIDI pads are a blunt instrument, articulating sound within precise velocity ranges becomes a bodily skill in itself. Contingency is present in how the interface makes certain sounds available under certain conditions, and how much of the texture is based on chains of probabilistic decisions, which present an uncertainty of consequence that can generate interesting surprises.
7. LISTENING AGENTS IN INTRA-ACTION
The work Intra-action was commissioned by, and premiered at, NEXT Festival, Bratislava, in 2019. It is constructed as an ensemble of synthesis agents that are responding to each other’s behaviours. Following Barad’s ideas, the morphological behaviour of each member of the ensemble is defined in relation to those of other sounds, since they are always acting upon one another. While Texton Mirrors is structured around one main process with numerous tentacles, Intra-action features several synthesis processes that are instantiated through human performance, in any order or combination, and which continue performing on their own when human control stops, modifying their output by listening to other processes. The system is organised around a hub that stores information about the most recent output of each member (whether controlled by itself or by human performance), which agents are neighbouring one another and which ones are playing or silent. This means that when the agents listen to one another, they take in not only the present moment but also a longer period’s worth of information, so that they are able to make a textural rather than momentary judgement. The algorithm plays using data it has stored from performance, but filters these data and adjusts their temporal density depending on what else is playing. Each agent listens only to one neighbour, but which one can change depending on performance (see Figure 1 for an illustration of the agent relationships). If a member agent is alone, it will listen to and respond to itself. If there are many members, each agent’s listening target will be determined by their order of appearance in the texture: the most recent member to join will listen to the last one to appear before it, and the ‘oldest’ member listens to the most recent. This means that they link up in a circular feedback loop so that all members have an indirect consequence on the whole texture. The agents do not always listen, but will drop in more or less frequently, at continuously varying probabilities. This is to ensure that actions and consequences remain contingent on each member in the collective. In addition to this information feedback, there is also an audio loop where each agent’s output modulates its neighbour agent. This links the synthesis processes both timbrally and behaviourally, creating sonorities that are unique to the specific constellation of agents and their behaviours. In addition to gathering spectral data from their neighbours, all agents are also listening to the density of the whole texture, which can have the consequence that the collective as a whole changes its pace in awareness of its performance.
This system also features an activity rate monitor analysing both audio and control input; this is used primarily for dynamically controlling background textures that can be activated optionally by the performer. These textures have spectral and temporal motion whose intensity is mapped to the activity rate, but whose audio output of course also affects the activity rate. The result is slowly accumulating and dissipating waves of sound that add a longer-duration temporal profile to the performance, typically reaching peaks every 30 to 60 seconds. An interesting feature from the point of view of improvisation is that the system also clocks controller inactivity and will introduce a new member to the ensemble by itself if inactivity goes beyond certain temporal thresholds. Knowing that the system can introduce new activity may prompt the performer to keep a certain pace in order to pre-empt the computer; on the other hand, it can also encourage listening without acting, in anticipation of a spontaneous algorithmic intervention.
The central physical performance interface is a box of knob controllers, the movement of which will both trigger and control parameters of sound. The mappings of Intra-action feature a measure of time intervals between controller data, so that the continuous knob controllers become sensitive to the velocity at which they are turned, meaning that position and velocity are in constant conflict with one another (Nyström Reference Nyström2020). Thus, there is no way of determining mapping output without taking the context of time into account. There are several different approaches used for velocity of knob movements in the piece, but they are all very sensitive to hand movement, and can allow for a multidimensional range of output values from a single-dimensional controller, determined by awareness of temporal context. Further, several of the knobs are linked so that two knobs need to be turned simultaneously in order for sound to be activated and affected. If both knobs are mapped to the same sound in different ways, a highly discontinuous mapping will result, where the performer has no direct parametric control of the sound. Thus, the knob mappings are a key aspect of the morphological potentials of the synthesis processes and the physicality of performance. The synthesis techniques used include sine tones and saw waves distorted in various ways, chaotic non-standard synthesis oscillators, and processes where control data are directly mapped to wave-form segments via arrays that are used as oscillating wave-form envelopes. The spatial distribution principle is similar to that of Texton Mirrors, in that the agents self-organise spatially, but they also have a timbral link to a spatial position: the closer they get to the member to which they are linked, the stronger the intermodulation becomes.
A distinct feature with Intra-action, compared with Texton Mirrors, is that the system has no encoded hierarchies between sounds and processes. Though there is a central information hub, there is no central sound process, but simply a quantity of synthesis agents that will begin to behave and listen for information once they are set in motion. Any module can be started and stopped at the performer’s discretion, but all actions have consequences that propagate through the entire texture, as a member of the ensemble enters or leaves. Because the controller mappings are time-sensitive and non-linear, and the agents behave in an autonomous manner, it is impossible to predict exactly how the system will behave. Even if, theoretically, one could plan a performance in advance by deciding which modules to play, in which order and roughly how, this would be counterproductive, as the arrivals at unique intra-actions are where a real poiēsis is taking place, and something novel is created. This requires a free mode of improvisation and listening, but also practice and exploration of the sonic potentials of the system. The system as a whole forms a cognitive assemblage of acting processes on many levels: the synthesis processes organise as a whole, although they are only aware of their neighbours; they are shaped by one another and by time-aware controllers, moved by the hands of a human body. The fact that control inputs are not only setting parameters but also generate and capture an intra-action between physical input and algorithmic processes means that the system cannot have a neutral relation to performance, and that human agency is decomposed to the more tentacular level of hands and fingers, as opposed to top-down mind-centred control.
8. CONCLUSION
Texton Mirrors and Intra-action are compositions that demand improvisation because agency is required of both algorithm and human, and its consequences are always contingent on unpredicted response. The works are intra-active not only in how they are constituted technologically but also in that composition and improvisation cannot be isolated from one another, but bring each other into existence. Even if the computer system were to be regarded as composition alone, it could not be conceived as music without recourse to the improvised performances that make it sound. Such improvisation is an exploration of the encoded sound-structural topologies, and the only way to account for the possible sound manifestations of the works.
This practice endeavours to present an acousmatic sound experience that is anchored in a process of making, where technology is not a transparent reproduction of supposed sound sources hidden behind a Pythagorean veil. The cognitive assemblage of human mind–body parts and computer algorithms is productive rather than reproductive, and the sound therefore has no origin other than the present performance-technology domain of embodiment, materiality and information. As composition practice, this demonstrates one way of encoding real-time poiēsis into acousmatic music, through algorithmic studio composition, completed through improvisation. As improvisation practice, it is a model for live performance in the acousmatic and spatial computer music arena. The techniques show how a rich multilayered performative sound palette and texture can be encoded into a performance system, while maintaining a thoroughly dynamic relationship between computer-generated sound and performance input.
Acknowledgements
The initial versions of Texton Mirrors were created as part of a Leverhulme Early Career Fellowship at the BEAST studios at University of Birmingham. Thanks to NEXT Festival for commissioning Intra-action.