Eye-tracking analysis to assess the mental load of unmanned aerial system operators: systematic review and future directions

A.C. Russo; M.M. Cardoso Junior; E. Villani

doi:10.1017/aer.2024.122

Eye-tracking analysis to assess the mental load of unmanned aerial system operators: systematic review and future directions

Published online by Cambridge University Press: 25 November 2024

and

A.C. Russo*: Affiliation:
Departamento de Engenharia de Minas e de Petróleo da Escola Politécnica da Universidade de Sao Paulo, São Paulo, Brazil
M.M. Cardoso Junior: Affiliation:
Divisão De Engenharia Mecânica-Aeronáutica, Instituto Tecnológico de Aeronáutica, São José dos Campos, Brazil
E. Villani: Affiliation:
Divisão De Engenharia Mecânica-Aeronáutica, Instituto Tecnológico de Aeronáutica, São José dos Campos, Brazil
*: Corresponding author: A.C. Russo; Email: [email protected]

Article contents

Abstract
Nomenclature
Introduction
Materials and methods
Results and discussions
Final thoughts
References

Rights & Permissions

Abstract

This article presents a systematic review on the use of eye-tracking technology to assess the mental workload of unmanned aircraft system (UAS) operators. With the increasing use of unmanned aircraft in military and civilian operations, understanding the mental workload of these operators has become essential for ensuring mission effectiveness and safety. The review covered 26 studies that explored the application of eye-tracking to capture nuances of visual attention and assess cognitive load in real-time. Traditional methods such as self-assessment questionnaires, although useful, showed limitations in terms of accuracy and objectivity, highlighting the need for advanced approaches like eye-tracking. By analysing gaze patterns in simulated environments that reproduce real challenges, it was possible to identify moments of higher mental workload, areas of concentration and sources of distraction. The review also discussed strategies for managing mental workload, including adaptive design of human-machine interfaces. The analysis of the studies revealed a growing relevance and acceptance of eye-tracking as a diagnostic and analytical tool, offering guidelines for the development of interfaces and training that dynamically respond to the cognitive needs of operators. It was concluded that eye-tracking technology can significantly contribute to the optimisation of UAS operations, enhancing both the safety and efficiency of military and civilian missions.

Keywords

eye-tracking mental workload unmanned aircraft system (UAS)UAV operator

Type: Research Article
Information: The Aeronautical Journal , First View , pp. 1 - 30

DOI: https://doi.org/10.1017/aer.2024.122 [Opens in a new window]
Copyright: © The Author(s), 2024. Published by Cambridge University Press on behalf of Royal Aeronautical Society

Nomenclature

UAS: Unmanned Aircraft System
UAV: Unmanned Aerial Vehicle
NASA-TLX: NASA Task Load Index
PRISMA: Preferred Reporting Items for Systematic Reviews and Meta-analyses
ECG: Electrocardiogram
EEG: Electroencephalogram
fNIRS: Functional Near-Infrared Spectroscopy
PERCLOS: Percentage of Eyelid Closure
ApEn: Approximate Entropy
HRV: Heart Rate Variability
SCOUT: Supervisory Control Operations User Testbed
CHMI: Cognitive Human-Machine Interfaces
CHMI2: Cognitive Human-Machine Interfaces and Interactions
SA&CA: Separation and Collision Avoidance
MUM-T: Manned/Unmanned Teaming
AAA: Attention Allocation Aid
DelCon: Delegation of Control
StArt: State of the Art through Systematic Review
PICo: Population, Intervention, Comparison, Outcome
AMC: Air Mission Commander
GCS: Ground Control Stations
SCORCH: Supervisory Control of Remote Crewed and Uncrewed Assets

1.0 Introduction

As in many areas, aviation is progressively adopting unmanned aircraft systems (UAS), as the increasing incorporation of unmanned aircraft into military operations proves to be a game-changer in defense tactics and strategies. These systems can perform long-duration missions in remote and hostile areas, eliminating the need for expensive and bulky life support systems, which results in a higher payload capacity per flight Fricke and Holzapfel [Reference Fricke and Holzapfel1]. These autonomous systems offer numerous advantages, from advanced tactical reconnaissance to surgical action in high-risk environments. However, the effectiveness of these operations intrinsically depends on the mental load of the operators involved in the cognitive human-machine interfaces.

In military contexts, UAS operators face highly complex and dynamic situations. Target identification, real-time data analysis, crucial decision-making and strategic coordination require a level of attention and mindload management that directly influences mission success. In addition, such operations often occur in harsh environments, where the ability to maintain constant vigilance is of utmost importance for safety and effectiveness.

In this scenario, the assessment of the mental load of UAS operators in cognitive human-machine interfaces emerges as a critical consideration. Mental load, representing the cognitive effort required to perform tasks, plays a vital role in the execution of operations. Maintaining a balanced mental load is essential to allow operators to focus on crucial tasks, ensuring continuous vigilance and accurate decision-making in the face of ever-evolving situations.

Accurate assessment of mental load, however, is a complex challenge, particularly in military settings. Traditional approaches, such as self-assessment questionnaires such as NASA-TLX [Reference Alaimo, Esposito, Orlando and Simoncini2–Reference Zheng, Yin, Dong, Fu, Shuguang, Shuiting, Yanlai and Junmin7], may be limited in terms of accuracy and objectivity. Therefore, it is imperative to employ more advanced methods that enable real-time and continuous understanding of the mental load.

In this article, we will turn to eye tracking analytics, a powerful tool for capturing the nuances of operators’ visual attention during UAS operations.

By thoroughly analysing the gaze patterns of operators in simulated environments, which reproduce the real challenges of military operations, it is hoped to identify the moments of greatest mental load, areas of concentration and possible sources of distraction [Reference Lim, Gardi, Ramasamy, Vince, Pongracic, Kistan and Sabatini8]. This in-depth analysis will allow us to understand how the mental load varies throughout operations and how this variation influences critical decision-making.

In addition, it is intended to identify effective strategies to manage the mental load efficiently. This will include the adaptive design of the cognitive human-machine interface, where the distribution of information and alerts can be dynamically adjusted, considering the perceived load of the operators [Reference Sibley, Foroughi, Brown, Drollinger, Phillips and Coyne9]. By optimising data presentation and effective information management, it is hoped to keep operators in a state of mental load suitable for performing tasks, reducing cognitive fatigue and improving performance [Reference Lim, Ramasamy, Gardi, Kistan and Sabatini10].

Given the complexity and importance of assessing the mental load in UAS operators in cognitive human-machine interfaces, this article employs a systematic review focused on the potential of eye-tracking as a diagnostic and analytical tool. By compiling and analysing relevant studies, we seek not only to understand the effectiveness of this technology in capturing mental load indicators, but also to identify guidelines for the development and improvement of interfaces and training that dynamically respond to the cognitive needs of operators. Such an approach aims to contribute significantly to the optimisation of UAS operations, elevating both the safety and efficiency of military and civilian missions.

2.0 Materials and methods

This research is a theoretical study through the application of the technical procedure of systematic review of the literature (SRL). This technique was used to identify, evaluate and interpret relevant research on the subject, using a defined methodological sequence that allows the aggregation of knowledge and the construction of knowledge [Reference Greenhalgh11, Reference Kitchenham and Charters12].

The design of this SRL was prepared in accordance with the guidelines of the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) statement [Reference Moher, Shamseer, Clarke, Ghersi, Liberati, Petticrew, Shekelle and Stewart13] and since this was a literature review, it was not necessary to submit it to the Ethics Committee.

The SRL comprises a sequence of three stages: planning, conducting and presenting the review, each with its own actions (Fig. 1).

Figure 1. Stages of the systematic review of the literature. Source: Kitchenham and Charters [Reference Kitchenham and Charters12] – modified)

2.1 Search strategy

The PICo (Population/Problem, Interest and Context) strategy for non-clinical research was used to construct the research question (Table 1), These are: How the use of eye tracking contributes to the assessment of the mental load of operators of UASes and unmanned aerial vehicles (UAVs)?

The choice of a population composed exclusively of UASes and UAVs operators for this systematic review is strategic and justified. UAS and UAV operators play a crucial role, acting as remote pilots who control and monitor aircraft from distant locations, without the physical presence in the cockpit. This role involves managing navigation, making critical decisions in real-time, and analysing complex sensory data, requiring a high cognitive load to maintain safety and operational efficiency.

Table 1. PICo strategy for the elaboration of the research question

The complexity and unique cognitive demands faced by these professionals provide a deep spectrum of insights into mental load in cognitively demanding operations, directly relevant to UAS and UAV operation. In addition, the specific literature focused on UAS and UAV operators, especially about mental load assessment via eye tracking, is notoriously sparse. The specific inclusion of these operators allows for a direct understanding of the cognitive demands they face, contributing to filling the gaps identified in existing research and expanding the body of knowledge applicable to the human-machine cognitive interface in UAS contexts.

The searches were carried out in the Web of Science and Scopus databases, chosen for their interdisciplinary nature and for being considered two of the largest reference databases in the world. For this purpose, word combinations were used (Fig. 2).

Figure 2. Combination of keywords used in the search. Source: Authors.

From the analysis based on the keyword combinations in Fig. 2, it was possible to perform a temporal analysis of the volume of publications and citations related to the use of eye tracking in the assessment of mental load in UAS and UAV operators (Fig. 3).

Figure 3. Years of publication (Scopus). Source: Authors.

The Fig. 3 shows a significant increase in the number of published documents over the years, with a particularly sharp rise starting in 2019, reaching a peak in 2024. This trend indicates a growing relevance of the topic, demonstrating increased interest and engagement from the scientific community in recent years.

Figure 4 depicts a pie chart that demonstrates the distribution of scientific publications by areas of knowledge, related to the use of eye tracking to assess mental load. The areas with the highest number of publications include Computer Science and Engineering (23.4% and 20.6%, respectively). This image was generated by the Scopus platform.

The graph (Fig. 4) demonstrates the interdisciplinarity of the study of mental load and the application of eye tracking in a variety of fields, highlighting the cross-cutting relevance of this technology in understanding complex cognitive phenomena.

Figure 4. Main areas of knowledge (Scopus). Source: Authors.

2.2 Eligibility criteria

Potentially relevant studies were selected by two independent reviewers according to the following inclusion criteria: only articles in Portuguese and English; full article; review studies, studies with objectives other than the present review were excluded; studies with different audiences; abstracts, technical reports, oral communications, letter to the editor.

The initial selection of the articles occurred independently, through the reading of their titles and abstracts. Subsequently, both reviewers read the full texts of the articles that met the inclusion criteria. Any disagreement about the eligibility of the articles was resolved through consultation with a third researcher. The report of the number of studies included and excluded in the different phases of the systematic review is presented later using the PRISMA flowchart (Chart 3).

2.3 Extraction of data from articles

Data extraction included the following variables: authors, year, objective and main results of the study. The StArt (State of the Art through Systematic Review) software, developed by researchers from the Federal University of São Carlos, was used to manage the selection of articles [Reference Fabbri, Silva, Hernandes, Octaviano, Di Thommazo and Belgamo14].

2.4 Quality assessment

In the evaluation methodology adopted for the systematic review, two different scales were considered to assess the quality of the selected articles: the scoring scale for the population studied and the scoring scale for the study design.

In the first scale, the articles were evaluated based on the relevance and representativeness of the studied population in relation to the aeronautical sector. The score ranged from 0 for unspecified or irrelevant populations to 5 for those that were exceptionally specified and representative.

In the second scale, the focus was on the methodological rigor of the studies, with the score ranging from 1 to 5, assigned to different study designs, from narrative reviews and expert opinions to experimental studies, valuing studies that allowed a strong causal inference and strict control of variables.

In addition to the two parameters previously mentioned for the evaluation of the selected articles, the software used for data extraction assigned an additional score based on the presence of the keywords defined for this study. Each article could receive up to 5 points if the keywords were present in the title, 3 points if they were in the abstract, and 2 points if they were listed among the keywords. This complementary score served as an indication of the relevance of the article in relation to the focus of the study, allowing for a more refined weighting and a selection of articles highly pertinent to the topic of interest.

This detailed and insightful methodological approach ensured a comprehensive and fair evaluation of the articles, allowing for a reliable qualitative synthesis of existing data on the mental load of unmanned aircraft operators. The scarcity of articles specifically focused on this audience justified the inclusion of professionals from different areas of the aeronautical sector, ensuring a comprehensive view of the application of eye tracking in various cognitive contexts.

3.0 Results and discussions

The search strategy identified 137 articles. A total of 16 duplicate articles were eliminated and 64 articles were selected for title and abstract screening, of which 57 were excluded because they did not meet the inclusion criteria.

Of the remaining 64 articles evaluated in full, 38 were excluded because they also did not meet the inclusion criteria. Therefore, 26 articles were included in the present systematic review (Fig. 5).

Figure 5. Flow of information with the different phases of the systematic review. Source: Authors.

3.1 Analysis of the studies found

The discussion of the main results found in the studies analysed in this article are highlighted in Chart 1.

The analysis of the findings of the 26 reviewed studies on the use of eye-tracking in the assessment of the mental load of UAS operators reveals both important convergences and divergences among the authors. This discussion seeks to deepen these points by exploring the contributions of each study and how they interrelate.

3.1.1 Convergences between the studies

Eye tracking is widely recognised as a crucial tool for assessing mental load. McKinley et al. [Reference McKinley, McIntire, Schmidt, Repperger and Caldwell15] demonstrated that the approximate entropy (ApEn) of pupil position is a more sensitive and consistent indicator of fatigue than PERCLOS (Percentage of Eyelid Closure), which measures the percentage of time an individual’s eyelids are 80% or more closed. Their findings suggest that fatigue reduces the complexity of eye movements, likely due to longer fixations and slower saccades.

Similarly, Monfort et al. [Reference Monfort, Sibley and Coyne16] identified pupil dilation, visual dispersion, and reaction time as key metrics for real-time workload prediction. These methods have proven effective, especially in complex and realistic simulation environments.

Chart 1. Remaining studies are fully evaluated

Roy et al. [Reference Roy, Bovo, Gateau, Dehais and Carvalho Chanel17] broaden this perspective by investigating markers of engagement from oculomotor, cardiac, and brain data, finding that blink rate and decrease in the number of fixations are indicative of lower mental engagement during prolonged UAV operations. Sibley et al. [Reference Sibley, Coyne, Avvari, Mishra and Pattipati18] and Coyne et al. [Reference Coyne, Sibley, Sherwood, Foroughi, Olson and Vorm19] explore heart rate variability (HRV) and other eye metrics, such as pupil size, to monitor mental load, concluding that these measurements can be used to predict whether an operator will be able to successfully complete the mission.

The application of eye tracking in dynamic environments is highlighted by several studies. Sibley et al. [Reference Sibley, Coyne and Thomas20] present SCOUT, a testbed designed to investigate human performance and automation challenges, demonstrating its effectiveness in detecting when an operator has scanned specific sensor feeds, and providing insight into cognitive workload based on pupil size. Turpin et al. [Reference Turpin, Surana, Alicia and Taylor21] demonstrate that a single crew member can manage multiple UASes in complex tactical missions, with the help of automated systems that improve operational efficiency and safety.

There was also a consensus on the correlation between eye tracking and other physiological measures. Lim et al. [Reference Lim, Ramasamy, Gardi, Kistan and Sabatini10] combine eye-tracking with EEG and ECG to assess cognitive states and adapt command-and-control functionalities, showing that specific eye-tracking variables, such as visual entropy, can discriminate between different control modes and task difficulty levels. Singh et al. [Reference Singh, Chanel and Roy34] highlight that pupil dilation and the average duration of fixations decrease with increasing workload, suggesting that these metrics are effective in estimating mental load.

3.1.2 Divergences between the studies

The variations in measurement methods employed across the reviewed studies reflect different approaches to assessing mental workload in UASoperators. McKinley et al. [Reference McKinley, McIntire, Schmidt, Repperger and Caldwell15] used approximate entropy as a metric to detect signs of fatigue, focusing on the complexity of eye movements as a response to cognitive demand. In contrast, Coyne et al. [Reference Coyne, Sibley, Sherwood, Foroughi, Olson and Vorm19] focused on pupil diameter and the Nearest Neighbor Index (NNI) to measure cognitive effort and gaze dispersion, respectively. While approximate entropy can capture subtle variations in the regularity of eye movements, pupil diameter is associated with changes in mental effort, and NNI provides information on the spatial distribution of eye fixations. These differing methodological choices suggest that there is no consensus on the most appropriate metrics for assessing mental workload, reflecting the diversity of approaches available in the literature.

In addition to measurement methods, the contexts in which the studies are conducted also vary widely. Studies such as those by Devlin et al. [Reference Devlin, Byham and Riggs31] Sibley et al. [Reference Sibley, Coyne and Thomas20] focused on military scenarios where operations are characterised by high complexity, requiring maximum attention and cognitive performance from operators. In these studies, mental workload is often associated with situations that demand rapid decision-making and the simultaneous execution of multiple tasks, which can affect both the workload measurements, and the results obtained.

In contrast, studies exploring applications in simulation and training environments, such as those by Devlin and Riggs [Reference Devlin and Riggs22] and Niu et al. [Reference Niu, Wang, Niu and Wang29], operate under controlled conditions that allow for the manipulation and control of specific variables. These simulation environments are designed to replicate critical aspects of real-world operations but differ in terms of stressors present in real operational scenarios, such as time pressure and the unpredictability of situations. The difference between these contexts can impact the validity of the results obtained in simulation studies when compared to real-world operational scenarios.

The diversity in methodological choices and application contexts reflects the inherent complexity of research on mental workload in UAS operators. Each methodological approach and operational context brings with it a specific set of advantages and limitations that influence both data collection and the interpretation of results. For example, while pupil diameter measurement may be sensitive to rapid changes in cognitive load, approximate entropy might capture the evolution of fatigue over time. Similarly, application in military versus simulation scenarios can lead to results that vary not only in precision but also in practical relevance.

These methodological and contextual variations raise questions about the comparability of studies. The absence of standardisation in mental workload measurement metrics and the experimental conditions used may hinder the construction of a cohesive knowledge base and the extrapolation of results to different operational scenarios. Standardising metrics and harmonising application contexts could facilitate comparison between studies and synthesis of results, contributing to a more integrated understanding of mental workload in UAS operators.

These divergences highlight the importance of considering both the nature of the measurement methods and the operational context when interpreting study results. The choice of metrics and the environment in which the study is conducted can have significant implications for the findings and the conclusions that can be drawn about operators’ mental workload. Therefore, when evaluating the literature on mental workload in UAS, it is essential to account for this diversity to better understand how different approaches may complement or contrast with one another.

3.1.3 Results on operational efficiency

Some studies, such as the one by Monfort et al. [Reference Monfort, Sibley and Coyne16], show high accuracy in predicting the “live” workload with the use of eye tracking, while others, such as that of Devlin et al. [Reference Devlin, Flynn and Riggs27], highlight the complexity of predicting performance trends during workload transitions. This suggests that the effectiveness of eye tracking may vary depending on the experimental conditions and study design.

Wanyan et al. [Reference Wanyan, Zhuang and Zhang38] introduced a multidimensional perspective in the assessment of mental workload by combining eye tracking with behavioural and physiological measures for a more complete understanding of the mental state of pilots. The authors broadened the scope of application of eye tracking by focusing on mental workload prediction, which highlights the importance of adaptive flight interfaces and procedures to avoid cognitive overload. Gomolka et al. [Reference Gomolka, Kordos and Zeslawska39], Rudi et al. [Reference Rudi, Kiefer, Giannopoulos and Raubal40] and Schriver et al. [Reference Schriver, Morrow, Wickens and Talleur41] explored the applicability of eye tracking to better understand pilots’ attention, pointing to improvements in training and interface design.

The discussion deepened with Pongsakornsathien et al.(Reference Pongsakornsathien, Lim, Gardi, Hilton, Planke, Sabatini, Kistan and Ezer42) and Singh et al.(Reference Singh, Chanel and Roy34) who investigated the use of eye tracking in operations with UAVs and human-machine systems, respectively, suggesting new possibilities for optimising cooperation and operational efficiency. Lefrançois et al. [Reference Lefrançois, Matton and Causse43], Li et al. [Reference Li, Oksama and Hyönä44, Reference Li, Zhang, Le Minh, Cao and Wang45], Scannella et al. [Reference Scannella, Peysakhovich, Ehrig, Lepron and Dehais46] and Yu et al. [Reference Yu, Wang, Li, Braithwaite and Greaves47] underscored the value of eye tracking in pilot training and incident investigation while Diaz-Piedra et al. [Reference Diaz-Piedra, Rieiro, Suárez, Rios-Tejada, Catena and Di Stasi48] and Lounis et al. [Reference Lounis, Peysakhovich and Causse49] discussed fatigue detection and the impact of experience on operational efficiency.

Lim et al. [Reference Lim, Ramasamy, Gardi, Kistan and Sabatini10] conclude this comprehensive review by introducing cognitive human-machine interfaces for UAS, marking a significant advance in adapting air operations to the cognitive needs of pilots.

The reviewed studies offer a comprehensive overview of the potential and limitations of eye tracking in assessing the mental load of UAS operators. The convergence in findings highlights the usefulness of this technology as a valuable tool for improving safety and operational efficiency. However, methodological and contextual divergences underline the need for standardisation and a more integrated approach that considers multiple sources of physiological data for a more accurate and holistic assessment. Future research should focus on harmonising the metrics and exploring the applicability of eye tracking in diverse operational contexts to maximise its effectiveness. These investigations promise to enhance the expertise and effectiveness of UAS operators and pave the way for future innovations in aviation.

3.2 Evaluation of the quality of the studies found

Chart 2 presents the classification of the articles analysed in the systematic review, evaluated based on three main criteria: score (S), study population (P) and study design (D).

• Score (S) represents the congruence of the publications with the research terms. Articles were rated based on how well their titles, abstracts and keywords aligned with the predefined research terms.
• Study Population (P) assesses the adequacy of the research groups concerning the scope of the study. The scoring for this criterion is divided as follows:
- ∘ Unspecified or irrelevant population: 0 points
- ∘ Minimum specification: 1 point
- ∘ Specified but not very representative: 2 points
- ∘ Adequately specified: 3 points
- ∘ Very representative: 4 points
- ∘ Exceptionally specified and representative: 5 points
• Study Design (D) refers to the methodology used in the articles, with the following scoring system:
- ∘ Experimental studies: 5 points
- ∘ Quasi-experimental studies: 3.5 points
- ∘ Cross-sectional studies: 3 points
- ∘ Control case studies: 2.5 points
- ∘ Case studies: 2 points
- ∘ Narrative reviews and expert opinions: 1 point

Chart 2. Assessment of the quality of the studies analysed in full

Chart 3. PRISMA checklist

These criteria were applied to ensure a comprehensive and objective analysis of the studies, which allowed for a consistent evaluation of their quality based on methodological rigor, relevance, and the alignment of their research focus with the terms used in this systematic review.

The StArt software (State of the Art through Systematic Review), used to manage the selection of articles, automatically generates the Score (S), which represents the congruence of the articles with the predefined research terms. This score is based on the match between the titles, abstracts and keywords of the articles with the terms of the research in question.

However, good articles may receive a score of zero. This can happen when a relevant article for the field of study does not present titles, abstracts or keywords that directly match the predefined research terms. This situation reflects the limitations of a purely textual search, as high-quality articles may be excluded due to the lack of strict alignment with the terms used in the search process. Therefore, it is important to recognise that, although the score provided by the software is useful for initial filtering, it should not be the sole criterion for exclusion or inclusion, as it may fail to capture the more complex nuances of certain studies’ relevance to the research.

The studies of McKinley et al. [Reference McKinley, McIntire, Schmidt, Repperger and Caldwell15] and Coyne et al. [Reference Coyne, Sibley, Sherwood, Foroughi, Olson and Vorm19] stood out for their high scores, reflecting a strong congruence with research terms, well-defined populations and robust methodologies. McKinley et al. [Reference McKinley, McIntire, Schmidt, Repperger and Caldwell15] presented an in-depth analysis of approximate entropy (ApEn) as an indicator of fatigue, while Coyne et al. [Reference Coyne, Sibley, Sherwood, Foroughi, Olson and Vorm19] focused on pupil diameter and the NNI to measure mental load.

The studies of Sibley et al. [Reference Sibley, Coyne and Thomas20] and Devlin et al. [Reference Devlin, Byham and Riggs31] also received high scores, highlighting the effectiveness of SCOUT, a testbed designed to investigate human performance and automation challenges in UAS operations. These studies have demonstrated the ability of eye tracking to provide valuable data on operators’ cognitive load and attention allocation.

On the other hand, some studies, such as those by Jian et al. [Reference Jian, Yin, Shen and Niu24] and Lim et al. [Reference Lim, Choi, Oh, Kim, Lee, Kim, Kim and Yang23], received lower scores in terms of the population studied, indicating a need for greater specification and representativeness of the research groups. However, these studies still contributed significantly to the understanding of mental load in UAS operations.

The variability in scores reflected methodological and contextual differences between studies. Studies such as those by Monfort et al. [Reference Monfort, Sibley and Coyne16] and Roy et al. [Reference Roy, Bovo, Gateau, Dehais and Carvalho Chanel17] have used specific ocular metrics, such as pupil dilation and blink rate, to predict the real-time workload and mental engagement of operators, highlighting the usefulness of these measurements in complex simulation settings.

Devlin and Riggs [Reference Devlin and Riggs22] used a Markovian framework to analyse eye-scan patterns, providing insights into individual differences in operator performance. Studies such as those by Niu et al. [Reference Niu, Wang, Niu and Wang29] have proposed the use of machine learning techniques to classify eye movement patterns and detect states of fatigue and cognitive overload, showing the applicability of eye tracking in various operational contexts.

The studies also varied in terms of application contexts. Sibley et al. [Reference Sibley, Coyne and Thomas20] and Devlin et al. [Reference Devlin, Byham and Riggs31] focused on military scenarios and highly complex operations, while others, such as Devlin et al. [Reference Devlin, Flynn and Riggs27] and Foroughi et al. [Reference Foroughi, Brown, Sibley and Coyne32], explored human-automation interaction in supervisory control environments. This diversity of contexts reinforces the versatility of eye tracking, although it highlights the need for standardisation in the metrics used to assess mental load.

The analysis of the quality of the reviewed studies evidenced the methodological robustness and relevance of the findings for the assessment of the mental load of UAS operators. The highest quality studies provided detailed insights into mental load indicators and highlighted the importance of rigorous methodologies and well-defined populations. However, the variability in scores and application contexts indicated the need for standardisation of metrics and a more integrated approach that considers multiple sources of physiological data for a more accurate and holistic assessment. Future research should focus on harmonising the metrics and exploring the applicability of eye tracking in diverse operational contexts to maximise its effectiveness.

4.0 Final thoughts

The findings of this systematic review highlighted the relevance and methodological robustness of studies investigating the use of eye tracking as a tool to assess the mental workload of UAS operators. The diversity and complexity of the studied contexts demonstrated the versatility of eye tracking in capturing critical nuances of cognitive load, especially in demanding environments such as military operations and air traffic control.

High-quality studies, such as those by Lefrançois et al. [Reference Lefrançois, Matton and Causse43] provided detailed insights into mental workload indicators and emphasised the importance of rigorous methodologies and well-defined populations. These works showed a strong correlation between specific ocular metrics and cognitive load, validating the use of eye tracking as a reliable indicator. Additionally, research by Devlin et al. [Reference Devlin, Byham and Riggs31] and Sibley et al. [Reference Sibley, Coyne and Thomas20] highlighted the effectiveness of systems like SCOUT and CHMI2, which combine physiological sensors with artificial intelligence techniques to improve workload management in complex operations.

In contrast, studies such as those by Behrend and Dehais (2020) and Scannella et al. [Reference Scannella, Peysakhovich, Ehrig, Lepron and Dehais46], which received lower scores, indicated the need for greater specificity and representativeness in the studied populations. However, even these studies contributed significantly to understanding mental workload, suggesting methodological improvements and standardisation of the metrics used.

The variability in study scores reflected the methodological and contextual differences. Studies like those by Monfort et al. [Reference Monfort, Sibley and Coyne16] and Roy et al. [Reference Roy, Bovo, Gateau, Dehais and Carvalho Chanel17] used ocular metrics such as pupil dilation and blink rate to predict real-time workload, highlighting the utility of these measures in complex simulation environments. Devlin and Riggs [Reference Devlin and Riggs22] applied a Markovian framework to analyse eye scan patterns, providing valuable insights into individual differences in operator performance.

One potential limitation of this review process was the reliance on studies available in specific databases and the exclusion of non-English publications, which might have resulted in a selection bias. Additionally, variations in the methodologies and metrics used across different studies could have influenced the comparability and generalisability of the findings. The review protocol used in this study is available upon request from the authors.

Future research should focus on harmonising metrics and exploring the applicability of eye tracking in various operational contexts. Integrating multiple sources of physiological data would provide a more precise and holistic assessment of mental workload, contributing to the development of more intuitive interfaces and training programmes that mitigate cognitive overload, thus enhancing the safety and effectiveness of operations.

In conclusion, eye tracking is a valuable and promising tool for assessing the mental workload of UAS operators. The research underscored the importance of rigorous methodologies and well-defined populations in understanding the nuances of mental workload in this specific context. Furthermore, the consistency of the results supported the use of eye tracking as a reliable indicator of mental workload, allowing for the improvement of cognitive human-machine interfaces and suggesting a fertile field for future investigations.

References

Fricke, T. and Holzapfel, F. An approach to flight control with large time delays derived from a pulsive human control strategy, AIAA Atmospheric Flight Mechanics Conference, 2016. https://doi.org/10.2514/6.2016-1033 CrossRef Google Scholar

Alaimo, A., Esposito, A., Orlando, C. and Simoncini, A. Aircraft pilots workload analysis: Heart rate variability objective measures and NASA-task load index subjective evaluation, Aerospace, 2020, 7, (9). https://doi.org/10.3390/aerospace7090137 CrossRef Google Scholar

Hart, S.G. and Staveland, L.E. Development of NASA-TLX (Task Load Index): Results of empirical and theoretical research, Human Mental Workload, 1988, pp 139–179. https://doi.org/10.1007/s10749-010-0111-6 CrossRef Google Scholar

NASA, FAA Improve Air Traffic Safety, Efficiency. Aviation Space Environ. Med., 2000, 71, (4), p 463. https://www.scopus.com/inward/record.uri?eid=2-s2.0-0034024167&partnerID=40&md5=9c6afb4e4543e62212561f9bab26af06 Google Scholar

National Aeronautics and Space Administration. NASA TLX: Task Load Index, 2020. Google Scholar

Yiyuan, Z., Tangwen, Y., Dayong, D. and Shan, F. Using NASA-TLX to evaluate the flight deck design in design phase of aircraft, Procedia Eng., 2011, 17, pp 77–83. https://doi.org/10.1016/j.proeng.2011.10.010 CrossRef Google Scholar

Zheng, Y.Y., Yin, T.W., Dong, D.Y. and Fu, S. Using NASA-TLX to evaluate the flight deck design in design phase of aircraft, in Shuguang, Z., Shuiting, D., Yanlai, Z. and Junmin, D. (Eds), 2nd International Symposium on Aircraft Airworthiness (ISAA), vol. 17, 2011. https://doi.org/10.1016/j.proeng.2011.10.010. WE - Conference Proceedings Citation Index - Science (CPCI-S).CrossRef Google Scholar

Lim, Y., Gardi, A., Ramasamy, S., Vince, J., Pongracic, H., Kistan, T. and Sabatini, R. A novel simulation environment for cognitive human factors engineering research, AIAA/IEEE Digital Avionics Systems Conference - Proceedings, 2017-September, 2017a. https://doi.org/10.1109/DASC.2017.8102126 CrossRef Google Scholar

Sibley, C., Foroughi, C., Brown, N., Drollinger, S., Phillips, H. and Coyne, J. Augmenting traditional performance analyses with eye tracking metrics, in A. H. and A. U. (Eds), Advances in Intelligent Systems and Computing: Vol. 1201 AISC, Springer, 2021a, pp 118–125. https://doi.org/10.1007/978-3-030-51041-1_17 Google Scholar

Lim, Y., Ramasamy, S., Gardi, A., Kistan, T. and Sabatini, R. Cognitive human-machine interfaces and interactions for unmanned aircraft, J. Intell. Robot. Syst. Theory Appl., 2018a, 91, (3–4), pp 755–774. https://doi.org/10.1007/s10846-017-0648-9 CrossRef Google Scholar

Greenhalgh, T. How to read a paper: Papers that summarise other papers (systematic reviews and meta-analyses), BMJ, 1997, 315, (7109), pp 672–675. https://doi.org/10.1136/bmj.315.7109.672 CrossRef Google Scholar

Kitchenham, B. and Charters, S. Guidelines for performing Systematic Literature Reviews in Software Engineering, 2007. https://www.elsevier.com/__data/promis_misc/525444systematicreviewsguide.pdf %3EGoogle Scholar

Moher, D., Shamseer, L., Clarke, M., Ghersi, D., Liberati, A., Petticrew, M., Shekelle, P. and Stewart, L.A.S. Preferred reporting items for systematic review and meta-analysis protocols (prisma-p) 2015 statement, Jpn Pharmacol. Ther., 2019, 47, (8), pp 1177–1185.Google Scholar

Fabbri, S., Silva, C., Hernandes, E., Octaviano, F., Di Thommazo, A. and Belgamo, A. Improvements in the StArt tool to better support the systematic review process, ACM International Conference Proceeding Series, 01-03-June-2016, 2016. https://doi.org/10.1145/2915970.2916013 CrossRef Google Scholar

McKinley, R.A., McIntire, L.K., Schmidt, R., Repperger, D.W. and Caldwell, J.A. Evaluation of eye metrics as a detector of fatigue, Hum. Factors, 2011, 53, (4), pp 403–414. https://doi.org/10.1177/0018720811411297 CrossRef Google Scholar PubMed

Monfort, S.S., Sibley, C.M. and Coyne, J.T. Using machine learning and real-time workload assessment in a high-fidelity UAV simulation environment, Next-Gener. Anal. IV, 2016, 9851, p 98510B. https://doi.org/10.1117/12.2219703 Google Scholar

Roy, R.N., Bovo, A., Gateau, T., Dehais, F. and Carvalho Chanel, C.P. Operator engagement during prolonged simulated UAV operation, IFAC-PapersOnLine, 2016, 49, (32), pp. 171–176. https://doi.org/10.1016/j.ifacol.2016.12.209 CrossRef Google Scholar

Sibley, C., Coyne, J., Avvari, G.V., Mishra, M. and Pattipati, K.R. Supporting multi-objective decision making within a supervisory control environment, Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9744, 2016, pp 210–221. https://doi.org/10.1007/978-3-319-39952-2_21 CrossRef Google Scholar

Coyne, J.T., Sibley, C., Sherwood, S., Foroughi, C.K., Olson, T. and Vorm, E. Assessing workload with low cost eye tracking during a supervisory control task, Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 10284 11th, 2017, pp 139–147. https://doi.org/10.1007/978-3-319-58628-1_12 CrossRef Google Scholar

Sibley, C., Coyne, J. and Thomas, J. Demonstrating the supervisory control operations user testbed (SCOUT), Proc. Hum. Factors Ergon. Soc., 2016, pp 1323–1327. https://doi.org/10.1177/1541931213601306 Google Scholar

Turpin, T., Surana, A., Alicia, T. and Taylor, G.S. Removing the bottleneck: Utilizing autonomy to manage multiple UAS sensors from inside a cockpit, 2018, 22. https://doi.org/10.1117/12.2303915 CrossRef Google Scholar

Devlin, S.P. and Riggs, S.L. Analyzing eye tracking data using a Markovian framework to assess differences in scan patterns, Proceedings of the Human Factors and Ergonomics Society, 2017-October, 2017, pp 1814–1818. https://doi.org/10.1177/1541931213601935 CrossRef Google Scholar

Lim, H.-J., Choi, S.-H., Oh, J., Kim, S., Lee, J.-W., Kim, B.S., Kim, S. and Yang, J.H. Comparison study of potential workload index in a simulated multiple-UAV operation environment, 17th International Conference on Control, Automation and Systems (ICCAS), 2017. CrossRef Google Scholar

Jian, L., Yin, D., Shen, L. and Niu, Y. Human machine collaborative support scheduling system of intelligence information from multiple unmanned aerial vehicles based on eye tracker, J. Shanghai Jiaotong Univ. (Sci.), 2017, 22, (3), pp 322–328. https://doi.org/10.1007/s12204-017-1838-0 CrossRef Google Scholar

Bektaş, K., Çöltekin, A., Krüger, J., Duchowski, A.T. and Fabrikant, S.I. GeoGCD: Improved visual search via gaze-contingent display, Eye Tracking Research and Applications Symposium (ETRA), 2019. https://doi.org/10.1145/3317959.3321488 CrossRef Google Scholar

Foroughi, C.K., Sibley, C., Brown, N.L., Rovira, E., Pak, R. and Coyne, J.T. Detecting automation failures in a simulated supervisory control environment, Ergonomics, 2019, 62, (9), pp 1150–1161. https://doi.org/10.1080/00140139.2019.1629639 CrossRef Google Scholar

Devlin, S.P., Flynn, J.R. and Riggs, S.L. How shared visual attention patterns of pairs unfold over time when workload changes, Eye Tracking Research and Applications Symposium (ETRA), 2020. https://doi.org/10.1145/3379156.3391339 CrossRef Google Scholar

Moacdieh, N.M., Devlin, S.P., Jundi, H. and Riggs, S.L. Effects of workload and workload transitions on attention allocation in a dual-task environment: Evidence from eye tracking metrics. J. Cogn. Eng. Decision Making, 2020, 2020, (2).Google Scholar

Niu, J., Wang, C., Niu, Y. and Wang, Z. Monitoring the performance of a multi-UAV operator through eye tracking, Proceedings - 2020 Chinese Automation Congress, CAC 2020, 2020, pp 6560–6565. https://doi.org/10.1109/CAC51589.2020.9326955 CrossRef Google Scholar

Planke, L.J., Lim, Y., Gardi, A., Sabatini, R., Kistan, T. and Ezer, N. A cyber-physical-human system for one-to-many UAS operations: Cognitive load analysis, Sensors (Switzerland), 2020, 20, (19), pp 1–21. https://doi.org/10.3390/s20195467 CrossRef Google Scholar PubMed

Devlin, S.P., Byham, J.K. and Riggs, S.L. Does what we see shape history? Examining workload history as a function of performance and ambient/focal visual attention, ACM Trans. Appl. Percept., 2021, 18, (2). https://doi.org/10.1145/3449066 CrossRef Google Scholar

Foroughi, C.K., Brown, N.L., Sibley, C. and Coyne, J.T. Near-perfect automation: Investigating performance, trust, and visual attention allocation, Human Factors, 2021, 65, (4), pp 546–561. https://doi.org/10.1177/00187208211032889 CrossRef Google Scholar PubMed

Sibley, C., Foroughi, C., Brown, N., Drollinger, S., Phillips, H. and Coyne, J. Augmenting traditional performance analyses with eye tracking metrics, Advances in Intelligent Systems and Computing, 1201 AISC, 2021b, pp 118–125. https://doi.org/10.1007/978-3-030-51041-1_17 CrossRef Google Scholar

Singh, G., Chanel, C.P.C. and Roy, R.N. Mental workload estimation based on physiological features for pilot-UAV teaming applications, Front. Hum. Neurosci., 2021, 15. https://doi.org/10.3389/fnhum.2021.692878 CrossRef Google Scholar PubMed

Devlin, S.P., Brown, N.L., Drollinger, S., Sibley, C., Alami, J. and Riggs, S.L. Scan-based eye tracking measures are predictive of workload transition performance, Appl. Ergon., 2022, 105. https://doi.org/10.1016/j.apergo.2022.103829 CrossRef Google Scholar PubMed

Dalilian, F. and Nembhard, D. Biometrically measured affect for screen-based drone pilot skill acquisition, Int. J. Hum.-Comput. Interact., 2023. https://doi.org/10.1080/10447318.2023.2208991 Google Scholar

El Iskandarani, M., Atweh, J.A., McGarry, S.P.D., Riggs, S.L. and Moacdieh, N.M. Does it multiMatch? What scanpath comparison tells us about task performance in teams, J. Cogn. Eng. Decision Making, 2023, 17, (3), pp 294–309. https://doi.org/10.1177/15553434231171484 CrossRef Google Scholar

Wanyan, X., Zhuang, D. and Zhang, H. Improving pilot mental workload evaluation with combined measures, Bio-Med. Mater. Eng., 2014, 24, (6), pp 2283–2290. https://doi.org/10.3233/BME-141041 CrossRef Google Scholar PubMed

Gomolka, Z., Kordos, D. and Zeslawska, E. The application of flexible areas of interest to pilot mobile eye tracking, Sensors (Switzerland), 2020, 20, (4). https://doi.org/10.3390/s20040986 CrossRef Google Scholar PubMed

Rudi, D., Kiefer, P., Giannopoulos, I. and Raubal, M. Gaze-based interactions in the cockpit of the future: A survey, J. Multimodal User Interfaces, 2020, 14, (1), pp 25–48. https://doi.org/10.1007/s12193-019-00309-8 CrossRef Google Scholar

Schriver, A.T., Morrow, D.G., Wickens, C.D. and Talleur, D.A. Expertise differences in attentional strategies related to pilot decision making, Human Factors, 2008, 50, (6), pp 864–878. https://doi.org/10.1518/001872008X374974 CrossRef Google Scholar PubMed

Pongsakornsathien, N., Lim, Y., Gardi, A., Hilton, S., Planke, L., Sabatini, R., Kistan, T. and Ezer, N. Sensor networks for aerospace human-machine systems, Sensors (Switzerland), 2019, 19, (16). https://doi.org/10.3390/s19163465 CrossRef Google Scholar PubMed

Lefrançois, O., Matton, N. and Causse, M. Improving airline pilots’ visual scanning and manual flight performance through training on skilled eye gaze strategies. Safety, 2021, 7, (4). https://doi.org/10.3390/safety7040070 CrossRef Google Scholar

Li, J., Oksama, L. and Hyönä, J. Close coupling between eye movements and serial attentional refreshing during multiple-identity tracking, J. Cogn. Psychol., 2018, 30, (5–6), pp 609–626. https://doi.org/10.1080/20445911.2018.1476517 CrossRef Google Scholar

Li, W.C., Zhang, J., Le Minh, T., Cao, J. and Wang, L. Visual scan patterns reflect to human-computer interactions on processing different types of messages in the flight deck, Int. J. Indus. Ergon., 2019, 72, pp 54–60. https://doi.org/10.1016/j.ergon.2019.04.003 CrossRef Google Scholar

Scannella, S., Peysakhovich, V., Ehrig, F., Lepron, E. and Dehais, F. Assessment of ocular and physiological metrics to discriminate flight phases in real light aircraft, Hum. Factors, 2018, 60, (7), pp 922–935. https://doi.org/10.1177/0018720818787135 CrossRef Google Scholar PubMed

Yu, C.S., Wang, E.M.Y., Li, W.C., Braithwaite, G. and Greaves, M. Pilots’ visual scan patterns and attention distribution during the pursuit of a dynamic target, Aerospace Med. Hum. Perform., 2016, 87, (1), pp 40–47. https://doi.org/10.3357/AMHP.4209.2016 CrossRef Google Scholar PubMed

Diaz-Piedra, C., Rieiro, H., Suárez, J., Rios-Tejada, F., Catena, A. and Di Stasi, L.L. Fatigue in the military: Towards a fatigue detection test based on the saccadic velocity, Physiol. Meas., 2016, 37, (9), pp N62–N75. https://doi.org/10.1088/0967-3334/37/9/N62 CrossRef Google Scholar PubMed

Lounis, C., Peysakhovich, V. and Causse, M. Visual scanning strategies in the cockpit are modulated by pilots’ expertise: A flight simulator study, PLoS ONE, 2021, 16, (2). https://doi.org/10.1371/journal.pone.0247061 CrossRef Google Scholar PubMed

Schwerd, S. and Schulte, A. Operator state estimation to enable adaptive assistance in manned-unmanned-teaming, Cogn. Syst. Res., 2021, 67, pp 73–83. https://doi.org/10.1016/j.cogsys.2021.01.002 CrossRef Google Scholar