Introduction
Technological advances in the civilian and military sectors comprising ‘AI’ (artificial intelligence) are progressing at pace. In political and public discourse, the military transformation associated with automatisation, technological autonomy, and algorithms is framed by a discourse that, for example, argues for an inevitable ‘AI’ arms raceFootnote 1 and underlines the superiority of ‘AI’ made to finally overcome human limitations. As the March 2021 final report of the US National Security Commission on Artificial Intelligence (NSCAI), put it, ‘the ability of a machine to perceive, evaluate, and act more quickly and accurately than a human represents a competitive advantage in any field – civilian or military’.Footnote 2 This call to ‘AI’ arms promoted by segments of the political and think tank community as well as the industry is not only remarkable because it leaves aside a substantial academic-political debate on the limitations and risks of using military AI such as autonomous weapon systems (AWS).Footnote 3 It also portrays the technological capabilities, referring chiefly to machine learning (ML) in combination with different platforms, as unequivocally advanced, reliable, and preferable. What is this supposed to mean in practice?
Since the introduction of the influential OODA loop (Observe, Orient, Decide, Act) by US Air Force Colonel John Boyd in 1986, the US military is sounding out ways to improve its decision-making framework. Perception, the ability to become aware of information based on the senses, is central to the initial observation stage. Over the past 20 years, the technological promise to prevail in the perennial struggle to gain complete ‘situational awareness’ in military terms is particularly boosted by the proliferation of platforms such as drones and software that are meant to close the gaps between the OODA stages evermore. Ultimately, this development comes with the promise of ‘lifting the fog of war’Footnote 4 – historically pointed out by Carl von Clausewitz – in the literal rather than in the metaphorical sense.Footnote 5, Footnote 6 As two of the main protagonists of the US narrative about the promises of military ‘AI’ shaped at the interstice of government and industry – Eric Schmidt and Robert O. Work – put it, ‘one key change is that militaries will have great difficulty hiding from or surprising one another. Sensors will be ubiquitous … Machines can also serve as the “eyes and ears” of their human teammates.’Footnote 7
The augmentation of the limited human senses of perception by using technology to ‘observe’ is a century-old undertaking. For that reason, observation in the sense of ‘seeing’ or ‘vision’ in the context of security, military, and warfare is an important research field for theoretically motivated studies in International Relations (IR) and security studies.
More specifically, there is a substantial body of literature on vision in the context of drone warfare.Footnote 8 This literature has highlighted the implications of a scopic regime that according to Maurer ‘refers in this context to the drone’s visual framing, i.e. its ocular operations of capture, its optical perspective on the target, the visual sensing of the drone and its controller, the target’s range of vision, as well as the representation of drones in social and aesthetic discourses’.Footnote 9 A scopic regime is hence about established forms of seeing, perceiving, and deciding in the context of technological augmentation, but also about establishing ‘truth claims’, in the words of Allen Feldman.Footnote 10 Here, the ocular-centrism of ‘the eye turned into a weapon’Footnote 11 is represented by the repeated evocation of the all-seeing ‘eye of God’ analogy and amplified by the US military practice of mystifying systems by naming them Gorgon Stare or ARGUS-IS.Footnote 12 The argument of the omnipresent ‘martial’Footnote 13 gaze, of the ‘militarized regime of hypervisibility’Footnote 14 as a ‘fetishized drone vision’Footnote 15 is central to the narrative of an omnivoyant, impenetrable, and infallible military instrument that only becomes more powerful the more sophisticated AI is. As Paul Virilio argued in a literary prelude to the algorithmic warfare of the present, ‘it is a war of images and sounds, rather than objects and things, in which winning is simply a matter of not losing sight of the opposition. The will to see all, to know all, at every moment, everywhere, the will to universalised illumination: a scientific permutation on the eye of God which would forever rule out the surprise, the accident, the irruption of the unforeseen.’Footnote 16
Virilio also highlighted here a crucial link between seeing and knowing that is also reflected in studies on the military scopic regime in the broad sense. Vision as the most central element of perception is the basis of what Bousquet calls the ‘martial gaze that threatens anything that falls under it with obliteration’, which presents as ‘a convergence of perception and destruction’ in the ‘struggles over visibility across planetary battlespaces’.Footnote 17
At the same time, battlefield ‘vision’ as the basis of observing and knowing is transforming and losing the character it had for thousands of years. The human–machine teaming that is referred to in the above quote by Schmidt and Work is increasingly about supplementing and partly replacing the human input into all OODA stages – to a different extent – with AI applications. We are therefore also encountering a complex transformation of fundamental elements of military agency. This transformation requires a comprehensive consideration of AI implications for the interrelated stages of the loop.
Here, the paper’s basic questions are: how does AI change military ‘observation’ and what implications does it have for the conceptualisation and role of ‘vision’ in the context of an action loop?
Analytically-theoretically, the paper addresses how the scopic regime as captured conceptually by international security scholarship is contested by what I call a process of ‘de-visualisation’. This process is part of a new, powerful regime where seeing is no longer the ultimate basis of knowing (as well as deciding and acting). De-visualisation thereby denotes, first, the decreasing role of human vision, both as a direct and as an electronically mediated observation. Second, it underlines the selective process of algorithmic non-seeing as well as, third, representing counter-acts of de-visualising or disturbing non-human vision, which changes practices of camouflaging and hiding.Footnote 18 Moreover, the current transformation of use of force practices from alleged hypervisibility to the de-visualisation associated with an ‘algorithmic fog of war’Footnote 19 results in a diminished capacity for human control, where seeing is an equally important but underconceptualised basis for knowing.
De-visualisation not only transcends the seeing–knowing–action nexus but also decision-making as a process. Military AI could arguably ‘be used to help reduce risks to civilians in military operations, such as by … automating target identification, tracking, selection, and engagement to improve speed, precision, and accuracy’.Footnote 20 But the materialisation of what Virilio calls the ‘sightless vision’Footnote 21 of a ‘vision machine’ constitutes, in fact, a direct challenge to the omniscient ‘gaze’ narrative. In contrast to this narrative, the putative superior ‘martial gaze’ defined as ‘the entire range of sensorial capabilities relevant to the conduct of war’Footnote 22 of systems can translate into human unawareness in the use of force. This is not only relevant for weapon systems that can potentially apply force without prior human assessment, but also in the context of human–machine teaming that is already an everyday operational reality. Human–AI teaming promises to realise the vision of omniscience by implying ‘a massive increase in situational awareness, it allows things to go faster, it helps mitigate the chances of human mistakes’,Footnote 23 as Pentagon’s then director of the US Joint Artifical Intellgigence Center (JAIC) Lt General Jack Shanahan put it.
The empirical background of the paper are the current developments regarding the US Joint All-Domain Command and Control (JADC2) strategy to exemplify the move towards a novel, integrated sensory-action loop. JADC2 is supposed to be an AI-integrative ‘coherent approach for shaping future Joint Force C2 [command and control] capabilities and is intended to produce the warfighting capability to sense, make sense, and act at all levels and phases of war, across all domains, and with partners, to deliver information advantage at the speed of relevance’.Footnote 24 In that, it is also meant as a substantial reformation of the existing OODA cycle by compressing the four stages of observe, orient, decide, and act into three accelerated and interrelated dimensions of sense, make sense, and act, which are based on integrating autonomous AI elements. JADC2 shows that the process of de-visualisation is complex and comprehensive, going beyond the former observation stage.
The paper unfolds as follows: in the first section, the discussion delves into the realm of vision within the context of military AI, elucidating the research problem at hand. Additionally, it provides an overview of the pertinent existing literature. The subsequent section illuminates the JADC2 initiative, serving as a concrete illustration of the AI-induced transformation of the well-established observation, orientation, decision, and action loop within the US military. Moving to the third section, the paper introduces its theoretical contribution by articulating the concept of de-visualisation. The fourth section articulates how the development and utilisation of AI-driven technologies inherently revolutionise the concept of vision with consequences for human control and agency. The fifth section extrapolates the implications of an emerging algorithmic fog of war for the lofty promises of attaining ultimate omniscience. The paper’s discussion is rounded off with a conclusion.
Seeing, knowing, and doing in war
Since the early 2000s, research on observation and action in the context of military technology is dominated by studies on drone warfare.Footnote 25 The expansion of drone warfare that started with the launch of the US-led operation in Afghanistan in 2001 marks the rising importance of remotely conducted warfare as a central pillar of deploying military force in the 21st century. The large-scale usage of drones in Iraq, Ukraine, Syria, and Yemen, among other places, and most recently in the Russia–Ukraine war, has changed the way force is projected, perceived, and thought. Optimising the identification, selection, and attack of targets based on novel modes of visualisation is at the core of these military efforts. Schwarz aptly summarises the dominant, positive outlook in the military associated with these developments: ‘drones offer a visual technology that enables the collection of data, facilitates diagnostic analysis and is able to administer a course of action in specific situations of conflict with minimal risk to the operators overseeing the use of the technology’.Footnote 26 ‘Vision’ should be understood here in the broadest sense of the term, as drones can work as and interact with multi-sensory systems that take in visual but also electronic or audio data.
Drones as remotely controlled uninhabited aerial vehicles (UAVs) are often understood as an extension or an augmentation of humans in terms of vision and action. Control becomes hybrid – humans are no longer necessarily present in the physical space where the use of force takes place, while their vision and ultimately their agency are embodied and hyper-present in an electronically mediated from. The images delivered in real time by UAVs might be regarded and favoured as ‘an enhanced, improved, extended, sober and ostensibly neutral version of human vision’.Footnote 27 However, drone vision is both an extension and a contraction of vision as well as of space and time. Detailed view, zoom, surveillance, and landscape modes, and various angles promise to deliver what the human eye cannot gather; the distance between the human operator and the target increases tremendously and is often transcontinental; at the same time, close surveillance over prolonged periods produces an unseen intimacy between operator and target, while drone footage is still often of remarkably low definition.Footnote 28
Drone vision should hence be understood as simultaneously an enhancement and an exacerbation of human sight and perception. The emergence of (armed) drones has therefore contributed to the, for some, panoptic, for others, promising, prospect of total surveillance. Scholarship on novel forms of (drone) vision in warfare has noted that ‘there is the potential to see more than can possibly be seen at any given time by human observers’.Footnote 29
The data amassed by drones is not only vast in scope and quantity but also characterised by a distinct paradigm of perception. The multi-sensory capabilities of drones offer a unique way of seeing, encompassing not just what is observed but also what may be intentionally omitted by sensors. This approach extends beyond a pursuit of absolute knowledge or control, emphasising a nuanced perspective that involves seeing differently, not being seen, and, notably, not seeing.
Current developments aimed at incorporating AI into weapons technologies can be understood as a step towards rectifying the human limitations that are still present with drone vision regarding the quantity of data that slows down decision-making and acting. But the techno-optimism reflected in parts of the military and industry discourse does not sufficiently consider the limitations of vision in the interaction of humans and technologies. This is contrary to the military focus on technologically ‘lifting’ the Clausewitzian ‘fog of war’Footnote 30 by finding a tech solution for gaining ultimate situational awareness.Footnote 31 Bringing ‘light’ to the ‘darkness’ of war is thereby tapping into a long-established narrative about the advantages of technological progress for seeing and knowing in the military. For example, Canadian troops used helicopters equipped with ‘Nightsun’ spotlights in Kosovo in the early 2000s. The following quote by Sergeant Robert Wheatley exemplifies this narrative of technology providing divine superiority: ‘We did overwatch at night … They could hear us at night, but they couldn’t see us. We’d fly around blacked out. Other times we used Nightsun and it was all overt: it’s like a big candle in the sky. The message was, we were like God, who’s watching everything.’Footnote 32
The algorithmic turn in warfare is meant to accelerate and complete this development towards a state of ‘omnivoyance’,Footnote 33 or rather omniscience,Footnote 34 that novel systems integrating AI technologies are supposed to provide. As a case in point, NSCAI Commissioner Ken Ford reportedly argued that ‘AI gives commanders eyeglasses for the mind’.Footnote 35
We can therefore identify a specific scopic regime of military technovision that promotes the putative options offered by ‘AI’ as part of a further augmentation or replacement of human perception and importantly decision agency. Research has deconstructed the ‘scopic regime of modernity’Footnote 36 in the context of drone vision. But the transition from the all-seeing system to a ‘sightless vision’Footnote 37 and to forms of algorithmically informed warfare that feature a new perception–action apparatus remain understudied. The current developments point to a reverse trend of giving away sight and control in warfare in the form of what could be called a post-scopic regime. This does not mean that human vision ceases to play a role. But human–machine interaction is increasingly complex, and human agency increasingly diminished. This concerns particularly developments in computer visionFootnote 38 and machine learning (ML), especially in deep neural network (DNN) models that deal with unlabelled or unstructured data and are used for anomaly detection.Footnote 39, Footnote 40
Research on military AI and the question of vision
In recent years, a substantial and growing body of research addresses the promises and pitfalls of military AI from ethical, legal, and normative perspectives interconnected with critical security studies. Most works consider the (emerging) normative framework that surrounds the implications of integrating AI in terms of autonomous weapon systems (AWS) into the practice of warfare. The political background of this debate are discussions in the United Nations’ framework of the Convention on Certain Conventional Weapons (CCW) since 2014 that are critically observed by academia and NGOs. The current formation of the ‘Group of Governmental Experts (GGE) on emerging technologies in the area of lethal autonomous weapons systems (LAWS)’ is sounding greatly divergent viewpoints on the characteristics as well as regulation or prohibition possibilities of such systems. A more detailed review of research on AWS is beyond the scope of this paper. More importantly, it should be noted that questions of visualisation and de-visualisation are rarely in the research focus. This is noteworthy because there is significant reflection on the question of human control over AI or AWS in this case in the academic and political debate. State parties and NGOs have contributed here by introducing concepts such as ‘meaningful human control’Footnote 41 or ‘appropriate levels of human judgment over the use of force’Footnote 42 into the debate, the latter being the standard human control definition of the US government for over a decade. Apart from controversy about important key terms, such as a definition of what autonomy, appropriate, meaningful, control, or judgement could mean, and the resulting lack of universally shared understandings, the question of human control seems to be intricately linked also to vision as a foundation of human agency. Wilke, for example, argues how military agency in terms of targeting and other aspects that are an outcome of observation are traditionally based on ‘professional vision’. This is taken up by Suchman, Follis, and Weber, who posit that ‘panoptic aspirations to situational awareness are instantiated instead as highly formatted and constrained modes of professional vision’.Footnote 43 The transformation of this professional military vision by military AI is not yet well accounted for in the research literature.
In the interaction of humans and AI, agency becomes ‘distributed’Footnote 44 and turns into a complex system of human and non-human agentic elements that sense, make sense, and act, to use the terminology of JADC2. We are facing here a new regime of sensing that is very much influenced by a de-visualisation of what has been previously conceptualised as professional vision in the military context. At the same time, de-visualisation goes beyond the ‘visual crises’Footnote 45 that are an outcome of bringing together human mindsets and vision technologies, as human vision and agency can be partly replaced altogether.
In the following section, I will take a closer look at JADC2 as an example of the new concept of a distributed agency. Thereafter, I will present the three different stages of visualisation/de-visualisation as outlined in the introduction.
A new perception–action apparatus (JADC2)
The launch of the JADC2 initiative is part of the US military focus on the role and importance of data. As noted by the 2020 Department of Defense (DoD) Data Strategy, ‘The DoD now recognizes that data is a strategic asset that must be operationalized in order to provide a lethal and effective Joint Force that, combined with our network of allies and partners, sustains American influence and advances shared security and prosperity’.Footnote 46 The future vision of the strategy is that the ‘DoD is a data-centric organization that uses data at speed and scale for operational advantage and increased efficiency’.Footnote 47 JADC2 is an attempt to translate this vision to the operational level by creating a complex network of sensor input, unified data storage and data access, hardware platforms, and actors that allows the informing of real-time decision-making. It was made public in the DoD ‘Summary of the Joint All-Domain Command & Control (JADC2) strategy’ in March 2022.
Crucially, ‘JADC2 provides a coherent approach for shaping future Joint Force C2 capabilities and is intended to produce the warfighting capability to sense, make sense, and act at all levels and phases of war, across all domains, and with partners, to deliver information advantage at the speed of relevance’.Footnote 48 The JADC2 vision is depicted in Figure 1 below. It shows the complexity of the system that aims at integrating input from all domains, different platforms, and actors into a new decision cycle that is importantly governed by AI applications such as ML.
The major transformation that JADC2 means for the decision process is the use of AI to achieve ‘human on the loop’ (supervising algorithmic decision-making) or even ‘human out of the loop’ (algorithmic decision-making without human supervision) applications in certain parts that contrast with the traditional human in the loop concept of the OODA loop, where human are active decision-makers on different stages.Footnote 49
The information that is yet available on JADC2 is limited. The 2022 strategy paper outlines in general that ‘“Sense and integrate” is the ability to discover, collect, correlate, aggregate, process, and exploit data from all domains and sources (friendly, adversary, and neutral), and share the information as the basis for understanding and decision-making’, while ‘“Make Sense” refers to analyzing information to better understand and predict the operational environment and the actions and intentions of an adversary, as well as the actions of our own and friendly forces’.Footnote 50 Important to note in the ‘make sense’ category is also the central role AI/ML is supposed to play in the reformation of OODA: ‘JADC2 developed capabilities will leverage Artificial Intelligence and Machine Learning to help accelerate the commander’s decision cycle. Automatic machine-to-machine transactions will extract, consolidate and process massive amounts of data and information directly from the sensing infrastructure.’Footnote 51 There is therefore a clear intention to automate the central elements of sensing and making sense. I understand this development as a move that results in a broad de-visualisation. While human vision as ‘seeing’ that is coupled with ‘knowing’ based on sensory input plays and will play a role in the future, ‘vision’ is meant to be largely replaced by AI-driven ‘sense’ and ‘sense making’ due to data load and the importance of speed. It is here at the point of data analytics conducted by ML where ocular assessments and human processing are being replaced and lost.
Roberts summarised the JADC2 vision succinctly as follows: ‘in simple terms, that JADC2 aims to network everything military (and some non-military stuff too), run it through some AI and ML, and deliver the Joint Force commander a set of recommendations at a speed faster than an adversary can act or react’.Footnote 52 In this theoretical example, the commander will still access ‘recommendations’ based on the visual sense, but this concept of vision is very different from the observation (as well as orientation and decision) of the OODA loop that also summarised military practices of the past centuries.
The JADC2 perspective is the background of the military transformation currently taking place by including AI in imaginations of future warfare. The following section will shed more light on the process and three dimensions of de-visualisation that are arguably an important part of this transformation.
Military vision between visualisation and de-visualisation
The transformation of military vision is part of the US defence AI initiative. The USA as the leading developer of military AI has invested significantly in relevant technology. In recent years, the US military – in acceptance of the seemingly inevitable ‘race for AI supremacy’Footnote 53 vis-à-vis China and Russia – has spent billions on the research and development of algorithmic warfare. The DoD fiscal year 2023 budget proposal submitted to Congress in March 2022 requested more than 130.1 billion USD for research and development and earmarks 1.1 billion USD for ‘AI’, in addition to 11.2 billion USD funding for ‘cybersecurity’.Footnote 54 Bloomberg Government ‘found the Pentagon is seeking a combined $5.2 billion in FY-21 for 319 research and development programs with “some AI/ML component”, up from $4 billion in DOD’s FY-20 budget request’,Footnote 55 while the Pentagon spent ‘an additional $1.7 billion to $3.5 billion for unmanned and autonomous systems’ in 2020.Footnote 56 The NSCAI’s final report to Congress advised to ‘increase federal funding for non-defense AI R&D at compounding levels, doubling annually to reach $32 billion per year by Fiscal Year 2026’.Footnote 57
The role of ML in terms of deliberately decreasing human vision can be exemplified by US Department of Defense (DoD) projects such as ‘the now-famous’Footnote 58 image classification project Maven (Algorithmic Warfare Cross-Functional Team). Launched in 2017, Project Maven, repeatedly covered in recent IR research,Footnote 59 is precisely the attempt to develop deep learning models that can perform the ‘vision’ task on vast quantities of image data. It is a direct response to the limits human cognitive abilities pose to the efficiency of decision-making in military scenarios, predicated on comprehensive situational awareness. In the words of then JAIC director and project leader of Maven Jack Shanahan, Maven is a ‘perception project’ that is meant to ‘automatically detect, classify, track and maybe provide a little bit extra information so that a human doesn’t have to stare at a video screen for 11 hours at a time’.Footnote 60 But what Maven is supposed to deliver goes beyond the role of a refined telescope. It provides an extensive level of human–machine interaction in form of a perception–action apparatus, where algorithms filter, detect, and highlight data under time constraints: as Shanahan explained, ‘this is about, let the machines go through the data as fast as possible, make recommendations or – or options to an analyst, to a commander, to an operator. And it just gets through decision-making processes better and gives humans time back.’Footnote 61 The close resemblance to elements of JADC2 are obvious. The argument is that computer vision coupled with suitable ML algorithms can make more accurate and faster detections, but also de facto decisions about the relevance of data.
The Maven technology processes ‘traditional’ images or video footage gathered by drone sensors, but deploying algorithmic screening marks a de-visualisation that is based on a completely different way of processing image data than human vision. ML is concerned with statistical pattern recognition in large data sets – the detection of anomalies.Footnote 62 Even though human seeing and knowing based on vision still play a role when it comes to acting on pre-screened data, the question is to what extent the algorithmic representation produces the image instead of the image producing the algorithmic representation.
Further US projects like Skynet – a National Security Agency (NSA) surveillance programme using ML to analyse communications data in anti-terror operations that made headlines in 2015 – are precisely concerned with developing technical capabilities to use deep neural networks that can detect patterns or anomalies in data autonomously (unsupervised learning) to provide a response to the increasing complexity of data environments on the ‘battlefield’.Footnote 63 These environments consist of a range of mixed and complex data, signals, and electronic emissions. The aim is to gain an advantage in processing such environments.
Details about Skynet were leaked by Edward Snowden and published on the website The Intercept. For Skynet, the NSA tested the detection of Al Qaeda couriers based on ML analysis of mobile phone metadata and resulting patterns of usage as well as travel. The visual output of this analysis is shown in Figure 2 below.
The metadata is here translated into a visual display of ‘patterns of life’ that also provides an interpretation of ‘normal’ and anomalous, suspicious behaviour. Based on these files, it was reported that the individual with the most suspicious profile was presented as Ahmad Muaffaq Zaidan (see Figure 3), who holds Syrian nationality and has served as the Islamabad bureau chief for Al Jazeera for an extended period.In the files, he was listed as a ‘Member of Al-Qa’ida’ and the ‘Muslim Brotherhood’. But throughout his professional journey, Zaidan has dedicated his reporting to the Taliban and Al Qaeda, conducting numerous notable interviews with senior Al Qaeda figures, including Osama bin Laden.Footnote 64
The use of ‘observation’ or ‘sensing’ data that is in this case not based on video footage or photographic images but on data from the electromagnetic spectrum that is invisible to the human eye shows the first step towards a de-visualisation of a military perception–decision-action sequence. The computer output in the leaked examples is optimised to meet human expectations in seeing. It shows a mapped terrain and coloured lines, dots, and arrows that highlight directions of travel and agglomerated stays. But the data is in this sense already highly filtered and structured. It provides not only a representation but also interpretation of reality in a visualisation of the non-visual. The messiness of material and social interactions is sanitised into data points, clear surfaces, unquestionable lines and geometries. It is a representation signalling objectivity and neutrality. The visualisation is here based on making something visual – electromagnetic signals – that are created by humans and have never been directly observable like the image reflections of sunlight.
In the example above, the OODA loop has not necessarily collapsed, as there is no direct execution of ‘sense making’ into action based on AI. However, the confident presentation of Zaidan as a courier of a terrorist group already questions the extent to which ‘meaningful human control’ or agency is applied in the human–machine interaction.
The implication of visual representation of non-visual data such as mobile phone signals is related to the proliferation of interfaces in the security and military context but also beyond. In the words of Fedorova, ‘in a conventional sense, or in relation to computational technologies, the interface is a place of connection between a human and a digital system that allows them to communicate with one another in order to generate and exchange information’Footnote 65 and is based on visual presentations. In a broader perspective of initiatives such as JADC2, it can be argued that ‘interfaces are situated devices designed in relation to political visions and imaginaries of control and power while being interactive, malleable, and adaptable’,Footnote 66 as Maia put it.
Interfaces are a decades-old technology of translating machine data into output information that is understandable and usable by human actors. In the military context, platforms, systems, and weapons all have a type of interface that can be sophisticated or very simple. Regarding the studies on drone warfare referred to above, drone control stations are typical interfaces based on screen output and the real-time processing of different sensory inputs, most importantly video footage. In that, the interface is the technological artifact of the promise of transparency, full situational awareness, control, and agency.Footnote 67 The visual representation is also a truth claim about what is happening in a given situation and moment. At the same time, it stands for a further step in the de-visualisation process, where not only is the role of human vision in terms of ‘observing’ changing and of decreasing importance, but also ML is de-visualising due to the selective ‘seeing’ that the algorithm offers to the human operator. The sanitised interface is not fulfilling dreams of omniscience – rather they offer a limited representation of social reality by reducing data to what is processable by humans.
The most recent aim of the industry is to link visual interfaces with the emerging generative AI models (large language models) that are known as ChatGPT and similar applications in the civilian domain. The company Palantir is at the forefront of this development by presenting in April 2023 an Artificial Intelligence Platform (AIP) that runs large language models coupled with an interface. The image below (Figure 4) is a screenshot from a Palantir demo video of AIP. The operator can interact with AIP by asking questions and providing prompts in the way a chat between humans would generally take place. The fundamentally transformative aspect is that AIP also gives recommendations for actions that can be selected by the operator. Again, we see here a neat representation of the ‘battlefield’ that gives no reason or rather basis to ‘doubt’Footnote 68 the algorithmic situational assessment that is cleared from all unnecessary noise of the old ‘fog of war’. At the same time, important situational nuances and distinctions such as between combatants and non-combatants that could be much safer made in a slower and deliberative OODA loop seem to be disappearing in the new electronic fog of war.
These examples also underline the trend towards a realisation of the JADC2 vision, in which de-visualisation will be completed by removing human operators from immediate and high-speed decision-making. In this regard, it seems that military operators dealing ‘with second-order visualisations of these sensor inputs’Footnote 69 based on interfaces are increasingly considered as the weak link in the military ambition to move through JADC2 at machine speed, which becomes an accepted viewpoint in the miliary discourse promoting AI ‘solutionism’.Footnote 70 As the NSCAI’s report puts it, ‘the best human operator cannot defend against multiple machines making thousands of maneuvers per second potentially moving at hypersonic speeds and orchestrated by AI across domains. Humans cannot be everywhere at once, but software can.’Footnote 71 In the words of former DoD electronic warfare senior executive William Conley, ‘a future battlespace will contain threat signals not previously observed, [so] it will be essential for many platforms to be executing real time decision algorithms’.Footnote 72
The potential use of ‘real time decision algorithms’ could be seen as at the core of the debate about AWS, the limits of human agency, and whether decision-making takes place within the confines of ‘meaningful human control’ (MHC).Footnote 73 While the argument that ‘the act of seeing is an act that proceeds action’Footnote 74 is certainly a fundamental epistemological base of past centuries, algorithmic warfare challenges this base because it promises to unify perception, decision, and action as outlined by JADC2: ‘it involves new modes of weapons based on the annihilation of time’.Footnote 75
This vision is putatively also fulfilled by Anduril Industries, which took over Project Maven after the withdrawal of Google due to internal protest in 2018. Anduril offers the ‘Lattice Platform’ as a command and control interface device. Here, ‘Lattice accelerates complex kill chains by orchestrating machine-to-machine tasks at scales and speeds beyond human capacity’Footnote 76 – without elaborating on the question to what extent ‘beyond human capacity’ also means beyond human control. Anduril further explains that ‘Lattice streamlines the complexity of the decision-making process by presenting decision points – not noise – and using deep learning models to present recommended decision support to operators’.Footnote 77 In that, ‘Lattice cuts through the noise and creates a shared real-time understanding of the battlespace. It autonomously parses data from thousands of sensors & data sources into an intelligent common operating picture in a single pane of glass.’Footnote 78
Professing trust in AI ‘solutions’ to long-standing problems of warfare that originate in human limitations deliberately contributes to a de-visualisation in war. The superiority of AI systems in terms of speed and accuracy is valued more than the fundamental role that human vision has played in warfare as a mechanism of knowing followed by decision and acting over centuries. At the same time, the discourse developed by military and industry raises the expectation that human knowledge will be more powerful and more accurate, empowered by superior technologies that enable ultimate ‘situational awareness’.Footnote 79 As Luckey puts it, ‘I think soldiers are going to be superheroes who have the power of perfect omniscience over their area of operations, where they know where every enemy is, every friend is, every asset is’.Footnote 80 It is part of the older discourse on omnipresence and omnivoyance mainly boosted by the drones’ view from above. But it is vision no longer predicated on humans seeing things.
Acts and counter-acts of de-visualisation
Based on the above, it can be argued that the governmental-military as well as industry discourse has established a strong narrative about the unlimited possibilities of AI for the question of perceiving, knowing, deciding, and acting in recent years. This discourse is partly reproduced by the media that contributed to the mystification of AI also for civilian purposes. The limitations of these AI applications are much less in focus. This is also the case regarding the broad sensing complex, where the same logic of seeing and hiding plays out that has been important for warfare since the beginning of the 20th century in terms of concealing and camouflage.Footnote 81 At the same time, the changing mechanism and implications of AI remain an understudied research issue. The aspect I highlight in the following is how de-visualisation appears here as deliberate acts to attack and distort visual sensing technology used in the military.
Bousquet outlines in a detailed study how hiding became part of the military strategy particularly during the First and Second World Wars and how military engineering went to great lengths to improve camouflage to conceal from human vision.Footnote 82 The changes in military sensing after the Second World War, which moved away from the ocular-centric approach to including electromagnetic signals (radar in particular), also required a different approach to hiding. While camouflage remained of importance for items such as uniforms and the painting of military assets, technology such as stealth offered a new response to the challenge of indirect visual detection – indirect in the sense of radar screen and other sensor interfaces. As Bousquet puts it in this context, ‘camouflage has become increasingly understood as an exercise in signature management, whereby a given target’s signature corresponds to its characteristic aggregate of distinctive signal features across the array of relevant sensorial fields’.Footnote 83
In that, the central logic of hiding is to make objects less easily visible and detectable – whether by the ‘naked’ human eye or by technologically augmented human vision ranging from the telescope to the drone vision of the past two decades, or by other sensors collecting sound and electromagnetic signal reflections. This central logic still plays a role with the automatisation of vision and applications such as image recognition, where the correct classification of images can be physically perturbed. At the same time, digital attacks add a new dimension to the visual/de-visual dimension. Here, it is no longer that the object is being camouflaged, but that the process of image recognition is being disturbed before ML processes a specific image. In other words, it is not the sensing that is directly disturbed but the ‘make sense’ component.
Vision under attack: Adversarial examples
In the last decade, research in ‘adversarial attacks’ on deep neural network’s perceptual architecture for computer vision has intensified.Footnote 84 What are adversarial attacks or examples (AE) in the context of adversarial machine learning (AML)? In the visual domain, AE can be either digital or physical.Footnote 85 Digitally, examples are imperceptible perturbations to images that consist in adding ‘noise’ to the pixel of an image, thereby provoking, for example, a misclassification or a misdetection of objects in the image. Noise is digital information that is not perceived by the human eye. In a research example, a layer of digital ‘noise’ was added to an initial set of images. These images were beforehand correctly classified as ‘dog’ by the ML model. After adding the noise, the deep convolutional neural network ‘ImageNet’ used by Szegedy et al. classifies all images as ‘ostrich’ with high confidence.
The main challenge for launching such digital, non-physical attacks is to get access to the inner structure and function of a DNN. While access during the development and training phase can potentially enable the infiltration by AE, AML requires a higher level of sophistication. However, there are various other AEs that exploit the vulnerability of deep learning systems that lead to similar outcomes. It is noteworthy that one of the initial findings of Szegedy et al. on emerging adversarial attacks was that ‘the specific nature of these perturbations is not a random artifact of learning: the same perturbation can cause a different network, that was trained on a different subset of the dataset, to misclassify the same input’.Footnote 86 In other words, they are robust. These attacks are also considered as ‘black box adversarial examples’ that can create ‘a target model without access to the model’s architecture or parameters’,Footnote 87 which makes the attack especially powerful. In contrast, in white-box settings, full access to the model’s parameters is obtained. This refers mainly to full knowledge of a ML algorithm, architecture, and model. Research has repeatedly confirmed the transferability of black-box attacks.Footnote 88 It should also be mentioned that research discusses a ‘grey-box attack where the adversary may have partial information. This could be access to open-source data used to train the target network, or the ability to probe the target network by analysing the outputs resulting from a given input.’Footnote 89
The difference from the practice of camouflage here is that the deliberate de-visualisation of such attacks leads to the putative sensing of objects that are non-existent. It is not about hiding a material object from surveillance view, but about creating the illusion of the existence of a materially non-existent object in the virtual world. Deliberate attacks can aim at the integrity of machine learning models in a subtle and hardly detectable way before it is used in practice. Such attacks could, for example, be aimed at ‘data poisoning’ in the training phase, and the US military is aware of these risks and the necessity to act upon them. As former Deputy Secretary of Defense Work argued, ‘we’re moving into an era of AI competition, and poisoning data is a way to gain an advantage. We have to be able to guard against that.’Footnote 90 However, the more central question for this paper is how physical adversarial examples challenge the promise of computer vision or rather of a decision-making machine.
AEs in the physical dimension work according to the same logic used in the digital domain but alter the physical space within the vision field that forms the sensor input of a computer vision system. In other words, perturbations are physically added to the objects a computer vision system aims to classify. For example, Brown et al. created an attack based on generating an image-independent patch.Footnote 91 This means that the authors ‘construct an attack that does not attempt to subtly transform an existing item into another … This patch can then be placed anywhere within the field of view of the classifier, and causes the classifier to output a targeted class. Because this patch is scene-independent, it allows attackers to create a physical-world attack without prior knowledge of the lighting conditions, camera angle, type of classifier being attacked, or even the other items within the scene.’Footnote 92 In this example, the classifier rated a banana with very high confidence as a ‘banana’. After adding the patch, the classifier instead rated the same object with very high confidence as a ‘toaster’.
This AE is a variation on a range of experiments undertaken in the context of autonomous driving and machine learning. Crashes involving the autonomous driving systems of Tesla and Uber have gained considerable public attention in recent years and showed the limitation of current computational perception. The creation of general attack algorithms, also known as Robust Physical Perturbations (RP2),Footnote 93 has proven to be robust in changing, unstable environments and with varying distances and camera angles. Here, an often-considered AE is the alteration of road signs by adding small objects such as patches. Figure 5 shows a ‘Stop’ sign that is altered by random graffiti (left), which is a general occurrence. The Stop sign on the right shows patterns that are AE. Both alterations are detectable by human vision and do not distort a human’s understanding of the sign’s meaning. While a human would therefore most likely consider both examples as random acts of vandalism, the deliberate adversarial examples can lead to misclassifications and to incorrect driving decisions by computer vision systems. As in other cases, physical AEs are often easily identifiable by human vision and would not lead to altered action. The decisive point is, however, that the deliberate de-visualisation of what situational awareness or perception means leads to a new set of post-scopic challenges at the interstice of physical imagery and electronic data processing.
The ability to create stable attacks for a noisy environment suggests that this could also influence the reliability of autonomous systems in the military and security domains. One of the key areas of research for different militaries are autonomous air, land, and sea vehicles. While the development of land vehicles is particularly challenging due to the complexity of the environment, physical attacks on their vision systems based on RP2 are a possibility. Attacks on autonomous civilian driving systems are even easier to imagine, given the importance of road signs in such environments.
It is noted that physical adversarial attacks on imaging systems are constrained by real-world physical conditions and that the robustness of AEs depends on extensive research and training.Footnote 94 At the same time, Chen concludes that ‘ultimately, we found that when AI technology is really widely used in the military field, adversarial examples will have a subversive impact on several activities in several steps in the kill chain, which will directly lead to the interruption of the entire kill chain’.Footnote 95
While algorithmic ‘seeing’ opens new ways of sensing, it is not a step path towards omniscience. Counter-measures against the all-seeing algorithmic eye also take productive forms, moving from concealing to producing classifications. 3D-printed adversarial objects proved to be robust in fooling neural network classifiers in the physical world over varying viewpoints and natural noise.Footnote 96 The authors of an experiment from MIT’s Computer Science and Artificial Intelligence Laboratory fooling Google image recognition also showed that they were able to choose what the image recognition algorithm was perceiving. In the words of Anish Athalye, ‘It’s actually not just that they’re [adversarial examples] avoiding correct categorization – they’re classified as a chosen adversarial class, so we could have turned them into anything else if we had wanted to … The algorithm takes in any textured 3D model, such as a turtle, and finds a way to subtly change the texture such that it confuses a given neural network into thinking the turtle is any chosen target class.’Footnote 97
Considered in the military context, these insights question the overly optimistic view on AI becoming the ultimate solution for awareness and precision issues. For example, it was suggested that AI could make war ‘more ethical’Footnote 98 if ‘drones could be taught not to shoot at “protected symbols” such as the red cross sign, or not to shoot at children, by being trained not to target people below a certain height’.Footnote 99 Quite apart from the technological feasibility of using AI reliably in combat, such understandings do not accommodate the established research on AI vulnerability. Based on the insights from this section, ‘protected symbols’ or ‘physical features’ could be easily perturbed but also deliberately exploited to cause misclassifications in a way that might be imperceptible by the human eye even if a human operator was in/on the loop.
The algorithmic fog of war
The military-industrial, but also some of the academic, exploration of military vision technology mainly operates in the context of the ‘prosthetic’ augmentation concept, where vision, knowledge, and decision as well as the human and material are distributed elements of a system. Here, technologically enhanced vision is a tool – a tool that might not always deliver what it promises, but whose failures are usually seen as the outcome of flawed human–machine interaction that could, in principle, be fixed. The idea of JADC2 moves the expectations, however, increasingly outside of this apparatus of distributed agency. ML promises the emergence of a machine that does not only correct or overcome human limitations but also overcomes the very necessity of distributing tasks between machines and humans. This new unified ‘machine’ agent is capable of detecting, classifying, deciding, and acting in seconds – it runs through the whole former OODA loop independently in its end-point version. However, this central promise of the imagined algorithmic turn for the military that is strongly connected to the condition of speed makes a consideration of the vulnerability of AI-empowered systems crucial.
While the aforementioned concept of meaningful human control introduced to the debate about LAWS at the United Nations’ CCW lacks a systematic and comprehensive operationalisation,Footnote 100 the baseline is that control can only be ‘meaningful’ when ‘sense making’ and ‘deciding’ are acts involving human deliberation. However, the comprehensive de-visualisation taking place by automating and de-linking conventional human vision from knowing results in a reliance on data output that gives human operators only an abstract option to control actions.
In that, we are moving towards a twofold contestation of human control and decision-making capacity. First, vision is affected by the translation of live images and by the increasing use of image recognition technologies that will transgress the initial usage of pattern recognition informing human monitoring. Second, moving beyond the simple mechanics of first- and second-generation armed drones (or of other surveillance technology) to the integration of autonomous technologies in vastly different security and weapons apparatuses sets a clear trajectory for decision machines powered by automated sensory input in terms of computer and electromagnetic signals.
Technical research underlines the vulnerability of machine learning to adversarial input perturbations,Footnote 101 but these findings, along with a growing awareness and response to this problem, have virtually no platform in governmental-military discourse. It remains almost completely dominated by optimistic narratives praising the opportunities of ‘AI’ while failing to address the technology’s complexity and acknowledging associated challenges and risks.
Perspectives based on ‘the scopic regimes of modernity, which have been influential in shaping viewing practices in Western contexts for over 500 years’Footnote 102 are yet to take account of this emerging post-scopic condition. Grayson and MawdsleyFootnote 103 as well as BousquetFootnote 104 showed that the view from the drone is deeply embedded in Cartesian perspectivalism and Baconian empiricism. Both concepts influence our understanding of vision; they provide the basis for legitimising truth claims predicated on the drone providing the human observer with a privileged status and revealing the ‘true’ essence of the observed field. The narrative of augmentation within the existing scopic regime lies at the core of the promise to overcome the limits of vision and knowledge in warfare.
But seeing in the prosthetic sense of augmenting or replacing the human eye’s direct visual contact with an object (or target) increasingly moves to the background. The emergence of a regime of non-human perception, data processing, and decision as well as the novel truth claims about algorithmic objectivity, precision, and neutrality therefore implies a reversal of fundamental logics of drone warfare understood as ‘a mode of “seeing without being seen” that reproduces the scopic regimes of modernity’.Footnote 105 This is the ultimate future vision of JADC2.
The new regime also features the dissolution of the omnipotent ‘gaze’.Footnote 106 What we find now is a change of subject positions, in which the human ‘operator gaze’Footnote 107 is no longer the default viewing and perceiving subject. Here, the established truth claims based on visual evidence are replaced by a different regime that gains its legitimacy from technological superiority in line with the dominant narrative of ‘AI’ progress. In other words, the truth or accuracy claims of technology presenting its output to a human, or deciding and acting without human input, are legitimised via the meta-narrative of infallible technology that is beyond human abilities (and understanding).
While there is little public debate about the limits of military ‘AI’, the US military appears to be aware of challenges emerging in de-visualisation. In 2019, the US DoD released a funding call for the creation of the ‘Guaranteeing AI Robustness against Deception (GARD)’ programme, running for 48 months.Footnote 108 It was stated that GARD ‘will initially concentrate on state-of-the-art image-based ML, then progress to video, audio and more complex systems – including multi-sensor and multi-modality variations. It will also seek to address ML capable of predictions, decisions and adapting during its lifetime.’Footnote 109 As Hava Siegelmann, then programme manager for GARD, noted when talking about adversarial examples in military situations that are impossible to identify by humans, ‘it’s like we’re blind’.Footnote 110 Hence, the emerging post-scopic perception and action apparatus that promises superior outcomes leads to a comprehensive ‘blindness’ of human operators only performing ‘meaning-less control’,Footnote 111 if any. The expectations of ‘lifting the fog of war’ that have underpinned imagining technological innovation since the late 1990s were premature. A new, dense, and incapacitating digital fog emerges and arguably, ‘even as people worry about intelligent killer robots, perhaps a bigger near-term risk is an algorithmic fog of war - one that even the smartest machines cannot peer through’.Footnote 112
Conclusion
Algorithmic warfare is transforming war and security policies. While the discipline of Internationl Relations slowly accommodates the powerful narrative of an algorithmic turn in empirical and theoretical regards, the consequences of this development for re-conceptualising ‘vision’ in the military context remain under-researched. A significant body of research has focused on drone warfare, particularly in the last decade, and has also addressed the important implications of remotely controlled violent force along the visual dimension. At the same time, the development, testing, and deployment of systems integrating AI technologies in targeting continue. But vision in the human ocular sense is not a feature of autonomy. We may be entering an era of ocular regression in which a central human sense – arguably the most central sense in combat – is further and further debilitated. The great visual extension and transformation of the drone age seems to be of limited future relevance in the new vision of JADC2 and similar initiatives that ultimately aim to replace human agency in the novel ‘sense’, ‘make sense’, and ‘decide’ loop. In its most extreme version, this loop will be compressed in a single action.
What is known as ‘vision’, and now labelled ‘sensing’, turns into a multisensory data input operation, and conditions of speed decrease options for meaningful human control. The dominant narrative about the potential of ‘AI’ in terms of providing superior technological omnivoyance and omniscience contributes to a process of de-visualisations that culminates in a deliberate human ‘blindness’, or rather incapacitation, in the digital fog of war.
In this context, human–machine interactions in terms of interfaces and adversarial ML attacks that are so far able to fool, disturb, and disable image classification significantly are rarely examined. Relying on electronically mediated and translated imagery in various forms creates specific problems for humans interacting with machine output in the new, complex decision-action apparatus. Like the existing challenges to perception – to what we see, how we see it, and what we know – that are studied in the context of drone warfare, novel military innovations such as the use of generative AI in interfaces presenting clean visualisations of a messy reality introduce technology that becomes an increasing part of human decision-making. Under conditions of time pressure, speed, and information overload that are characteristic of modern warfare, trust in the objectivity and rationality of what is displayed and filtered becomes a central requirement. Truth claims are, however, increasingly less based on the outlined scopic regime of modernity and the established knowledge about vision. Algorithms that no longer require human intervention or supervision appeal to a different truth that is at the heart of a socio-technical narrative of machine superiority.
As argued, technological developments and political statements in recent years point clearly into the direction of a wide-ranging integration of autonomous or ‘AI’ technologies into military decision-making and targeting. The perceived advantages of systems that process, filter, and assess information on the spot are tempting for actors in military and security settings. The discourse of powerful ‘AI’ is, however, contested when we explore the limitations of algorithmic processing. In other words, the systems that have currently been developed and aim at fulfilling the JADC2 vision are much less reliable and more vulnerable than the dominant narrative suggests. However, this does not make the question of autonomy less important or this development less problematic. The socio-technical imagination of a revolution in warfare has paved the way to accepting AI in the broad sense as a solution to long-standing problems such as speed, distance, situational awareness, or precision. This acceptance is linked to an expectation that such systems are now emerging and being developed by perceived adversaries and that there is an immediate necessity to win the race about AI arms.Footnote 113
While logics of seeing, perceiving, and knowing have remained stable for centuries, we might now enter an era of a post-scopic regime in which the visual field becomes ever more fragmented in the interplay of electronic and non-electronic data. As Bousquet argued in the context of drone studies, ‘it is less the weapon that has come to serve as a prosthetic extension of the eye than perception itself which has been caught up in an unrelenting process of becoming weapon’.Footnote 114 However, in the process of reversing the dream of human hypervisibility as form of hypervisualisation in favour of algorithmic de-visualisation, the weapon is now in an unrelenting process of becoming perception – it starts to replace human senses and importantly also collapses how the act of seeing precedes action. Rather than distributing tasks in the use of force, this is the development of a unified technological agency that perceives and decides.
Acknowledgements
The author wishes to thank the editors of EJIS as well as two different sets of anonymous reviewers in two review processes as well as Ingvild Bode, Anna Nadibaidze, Guangyu Qio-Franco, and Tom Watts for their helpful feedback on this article.
Hendrik Huelss is Assistant Professor of International Relations at the Center for War Studies, University of Southern Denmark. He is affiliated Senior Researcher in the AutoNorms project at SUD, funded by the European Research Council (2020–5). Hendrik’s work is located at the intersection of international political sociology and studies of AI and technologies. His primary research interest and publication activities aim at producing critical thinking on the role of AI in the context of security and military practices. This is often combined with Hendrik’s second major research interest, the new conceptualisation of the role of norms in International Relations.