Introduction
Augmented and virtual reality (AR and VR respectively) broadly refer to advanced visualization systems enabling users to immerse in and meaningfully interact with computer-generated environments. According to Milgram, AR and VR exist on a spectrum wherein reality and virtuality share an inverse relationship. Reference Milgram, Takemura, Utsumi and Kishino1 VR is a three-dimensional fully digitized, simulated, and immersive environment either emulating a real or imaginary world. Comparatively, AR overlays virtual elements onto the physical environment to enhance and supplement certain elements of the real world. Mixed reality is proposed to be an extension of AR, wherein users can interact with and extensively manipulate both the physical and virtual environments.
AR and VR technologies were developed in the 1960s with Sutherland creating the first head-mounted display using ultrasound tracking to create a 3D virtual world. Reference Sutherland2 Subsequent significant developments in AR and VR technology include the development of electromagnetic trackers to manipulate the distance between objects in a virtual world, increasingly sophisticated graphics, and haptic feedback. There is a growing interest in using these technologies as a way to optimize surgical exposure without the typical cost and resource constraints of traditional cadaver labs, with early studies demonstrating general improvements in speed and accuracy with VR/AR-guided training compared to traditional didactic methods. Reference Mao, Lan and Kay3,Reference Seymour, Gallagher and Roman4
Despite the growing momentum surrounding the potential for AR and VR to advance surgical training, there is limited evidence available regarding its effectiveness, validity, and outcome measurement. There is a significant advantage to adopting these technologies to spine procedures. Fundamental procedures such as pedicle screw placement and endoscopic spine approaches have a very steep learning curve and are often left to the purview of senior operators. Spine surgery training also consists of residency training in either a neurological or orthopedic surgery program, often followed by a spine fellowship. These two separate training models with a heterogeneity of case breadth and variety pose a unique challenge in spine surgery training Reference Daniels, Ames and Garfin5 . Neurosurgical and spine procedures are optimal for incorporating AR and VR given the limited inherent mobility of the brain and spine. This inertia facilitates tracking and displaying of images, giving rise to the ubiquity of neuronavigation for multiple purposes; in the spine, used commonly to guide instrumentation placement, and more recently decompression, tumor resection, deformity correction, and interbody grafting. Reference Wacker, Vogt and Khamene6–Reference Wu, Wang, Liu, Hu and Lee8
This must be weighed against the inherent limitations to AR and VR, including cost, physical discomfort (i.e., cybersickness including eye strain, vertigo, and nausea), latency in visual displays or tactile feedback, and limited realism. Reference Hughes, Fidopiastis, Stanney, Bailey and Ruiz9–Reference Grosch, Schröder, Schröder, Onken and Picht11 Moreover, AR/VR is not a homogenous entity and needs to be tailored for a specific purpose. For example, laparoscopic simulators are fairly well-established, while similar training models in orthopedic or ophthalmology remain poorly developed. Reference Cao and Cerfolio12
The aim of this systematic review is to evaluate the current landscape of advanced visualization techniques in the context of spine surgical education and identify gaps that may be addressed with existing and emerging technologies.
Methods
Search Strategy
A systematic search was conducted on May 13th 2022 of published articles available across four electronic databases: PubMed, Web of Science, Medline, and Embase. Search strategies were refined in collaboration with an institutional librarian, with the following terms for OVID Medline: (“augmented reality” OR “virtual reality OR haptic technology” OR “computer-assisted instruction” OR “stimulation training”) AND (“spine surgery” OR “spinal diseases”) AND (“surgery training” OR “surgical education” OR “internship and residency”); these terms were translated by our librarian to meet the requirements of the other databases. Search included inception through May 13, 2022. To optimize the yield of our search, no restrictions for publication status, year, nor language were placed.
Eligibility Criteria
We included all primary studies evaluating the role of virtual, augmented, or mixed reality in training surgical residents across spine procedures. Studies that did not involve surgical trainees or junior surgeons were excluded. There were no exclusion criteria based on the type of study, training modality, surgical procedure, nor outcomes collected. Our primary objective was to assess the current landscape of AR/VR in spine surgical education and its feasibility as a training tool for the range of spinal procedures.
All eligible articles were screened in two stages. Initial screening of abstracts was performed by VM and YJ to exclude overtly irrelevant articles. Discrepancies at this phase were automatically forwarded to the next stage to ensure all relevant articles were included. This was followed by full-text screening of all articles by PG and VM; discrepancies at this stage were solved with consensus and oversight by a third reviewer (YJ) as required.
Data Extraction
Data extraction was performed by two independent reviewers. Given the aim of this paper was to assess the role of AR/VR technology in surgical education, the Medical Education Research Study Quality Instrument (MERSQI) score was calculated for each study. The MERSQI score is a 10-item scale designed in 2007 to assess the quality of reporting of literature in medical education. Reference Reed, Cook, Beckman, Levine, Kern and Wright13 Scores for each item are totaled into a final MERSQI score, ranging from 3 to 18. This instrument was specifically designed to provide an objective means of evaluating the quality of study design, data analysis, and outcome measurements.
Within each paper, the following data were also extracted for analysis; type of study, number of participants, type of AR/VR device used, surgical procedure analyzed, outcomes, available feedback. For the purposes of bibliometric evaluation, the number of citations, citations per year, and funding sources were also recorded. Where applicable, a meta-analysis of standardized mean differences was conducted to evaluate the impact of AR/VR-guided training on operative performance. To this end, a random-effects model was used. Calculations and forest plots were generated using MedCalc statistical software (MedCalc Software Ltd).
Results
Overview of Results
The literature search initially yielded 7259, of which 510 were duplicates. A total of 6752 publications were screened, and 12 were selected and four additional studies were manually added from a search of references (Figure 1). The studies collectively included 348 participants and 3 study designs (single group cross-sectional, n = 6; randomized control trial, n = 6; validation study, n = 4). Virtual reality platforms were most common (n = 11), Reference Hou, Shi, Lin, Chen and Yuan14–Reference Knafo, Penet, Gaillard and Parker20 followed by augmented reality (n = 4), Reference Yanni, Ozgur and Louis21–Reference Luciano, Banerjee and Sorenson24 with only one study focusing on mixed reality (n = 1). Reference Yu, Zhou, Lei, Liu, Fan and He25 The most common procedure was pedicle screw insertion (n = 7), Reference Hou, Shi, Lin, Chen and Yuan14–Reference Xin, Chen and Wang17,Reference Yanni, Ozgur and Louis21–Reference Luciano, Banerjee and Bellotte23 followed by discectomy (n = 4), Reference Yu, Zhou, Lei, Liu, Fan and He25–Reference Alkadri, Ledwos and Mirchi28 laminectomy (n = 3), Reference Chen, Zhang and Ding19,Reference Knafo, Penet, Gaillard and Parker20,Reference Bissonnette, Mirchi, Ledwos, Alsidieri, Winkler-Schwartz and Del Maestro29 percutaneous spinal needle placement (n = 1), Reference Luciano, Banerjee and Sorenson24 and lateral lumbar access (n = 1). Reference Luca, Giorgino and Gesualdo18 A total of 11 unique AR/VR systems were evaluated, and individual study results are outlined below and highlighted in Table 1.
IVRSS = immersive virtual reality surgical simulator; PGY = post-graduate year; PTED = percutaneous transforaminal endoscopic discectomy; VSTS = virtual surgical training system.
MERSQI Scores
The methodological rigor of the studies varied, with MERSQI scores ranging from 7.5 to 14.5, with a mean score of 12.1 + 1.8 (Table 2). Notably, most studies scored poorly within the sampling section, as most studies were single-center and did not specify a response rate. All selected publications scored a 1.5 on the outcome category, as they all focused on operative skills without direct patient involvement. Most studies (n = 14) had objective outcomes, with only two studies focusing on subjective self-assessments. Reference Chen, Zhang and Ding19,Reference Knafo, Penet, Gaillard and Parker20
Bibliometric Data
The bibliometric data collected are available in Table 3. Across these 16 studies, the most common journal of publication was Neurosurgery (n = 3) and World Neurosurgery (n = 3), followed by Operative Neurosurgery (n = 2), Neurosurgical Focus (n = 2), and Archives of Orthopedic and Trauma Surgery (n = 1), Computers in Biology and Medicine (n = 1), Journal of Orthopedic Surgery and Medicine (n = 1), North American Spine Society Journal (n = 1), and The Journal of Bone and Joint Surgery (n = 1). The number of citations (at time of writing) ranged from 2 to 68, with an average of 39.5 + 40.3. The average yearly citation rate was 4.0 + 1.8. Overall, the total number of citations appears to loosely correlate with the age of the paper.
AANS = American Association of Neurological Surgeons; NIH = National Institute of Health; NIBIB = National Institute of Biomedical Imaging & Bioengineering
Training Modalities
The virtual surgical training system (VSTS) is a custom VR platform created by Hou et al. (2018), Reference Hou, Shi, Lin, Chen and Yuan14,Reference Hou, Lin, Shi, Chen and Yuan15 to optimize training of thoracic pedicle screw placement and fixation in junior trainees. The software reconstructs patient CT scans into manipulable 3D models of the spine displayed on a screen without the use of a headset. This system provides haptic feedback to the user, through a connected external handle. This technology was also used by Hou et al. (2018) Reference Hou, Shi, Lin, Chen and Yuan14 in a similar study of the cervical spine and Shi et al. (2018) Reference Shi, Hou, Lin, Chen and Yuan16 for lumbar pedicle screw placements. All three studies demonstrated improved accuracy of pedicle screw placements on cadaver models for trainees who underwent a practice trial with the VSTS compared to a standard teaching session. Across all three studies, pedicle screw placement was evaluated based on CTs by three independent observers. Hou et al. (2018) and Shi et al. (2018) used a four-point grading scale to quantify the extent of pedicle screw breach (grade I: no breach, grade II: 0–2 mm breach, grade III: 2–4 mm breach, grade IV: > 4 mm breach) for thoracic and lumbar screw placement. For cervical screw placement, Hou et al. (2018) used three-point grading scale (grade I: no breach, grade II: < 50% screw diameter breach, grade III: > 50% screw diameter breach). At the same institution as these three studies, Xin et al. (2018) Reference Xin, Chen and Wang17 developed a novel Immersive Virtual Reality Surgical Simulator (IVRSS). Unlike the VSTS, the IVRSS uses VR glasses to recreate a practice model, instead of a display screen. This study also measured screw placement on CTs by three independent observers, based on a similar four-point grading scale (grade I: no breach, grade II: < 25% screw diameter breach without violation of the anterior vertebra, grade III: 25%–50% screw diameter breach without violation of the anterior vertebra, grade IV: > 50% screw diameter breach and/or violation of the anterior vertebra). Compared to the control group, surgical trainees randomized to practice sessions with the IVRSS had significantly improved pedicle screw accuracy and lower failure rates on cadaver specimens.
NeuroTouch/NeuroVR (CAE Inc., Montreal, QB, Canada) is a commercially available neurosurgical VR simulator. Bissonnette et al. (2019) Reference Bissonnette, Mirchi, Ledwos, Alsidieri, Winkler-Schwartz and Del Maestro29 performed a validation study of the NeuroVR system by creating an artificial intelligence algorithm that was able to distinguish senior vs junior trainee performance across several metrics (i.e., procedure duration, force applied on dura, control over the drill/burr). The algorithm was able to correctly identify trainees with a 97.6% accuracy. Knafo et al. (2021) Reference Knafo, Penet, Gaillard and Parker20 demonstrated that performance across several procedures (lumbar hemilaminectomy, meningioma resection, and ETV) was unrelated to a trainee’s self-assessment of their surgical ability and knowledge. Performance metrics included volume of L3 vertebra removed, and points were deducted for removal of vertebrae other than L3, injury/removal of neighboring structures (spinal cord, dura, other tissues), and excessive blood loss. These outcomes were pre-programmed within the NeuroTouch software. Performance scores were unrelated to seniority and a trainee’s self-evaluation of their abilities. A major limitation of this study was the lack of a control group.
Several studies used commercially available VR or AR headsets. Dennler et al. (2020) Reference Dennler, Jaberg and Spirig22 combined the Microsoft Hololens (Microsoft, Redmond, WA, USA), an AR headset, with Unity (Unity Technologies, San Francisco, CA, USA), a commercially available software package, enabling participants to rotate and translate a 3D sawbone model of the lumbar vertebrae to facilitate pedicle screw placements. The results were graded based on CT scans obtained of the models after pedicle screw insertion; the authors used Phönix-PACS software (GmbH, Freiburg, Germany) to calculate the accuracy of screw placements. For the junior surgeons, there was a lower rate of pedicle screw perforations with AR-guided placement compared to free-hand attempts. Luca et al. (2020) Reference Luca, Giorgino and Gesualdo18 used the Oculus Rift VR headset (Oculus VR, Irvine, CA, USA) and its built-in software development kit system, combined with a robotic arm with haptic feedback to mimic lateral lumbar access. Trainees were evaluated on competency in setting up the OR, bleeding control, manipulation of soft tissues, and target accuracy through this VR system. Using the Oculus Rift’s software development kit, three senior spine surgeons created a pre-defined pathway of correct options, including appropriate OR set-up, identifying the correct anatomic level with fluoroscopy, and identification of the tissue layers. The system automatically created a score sheet based on a trainee’s performance and choices compared to those of their senior colleagues. There was no difference between residents and senior surgeons in respect to the mean number of errors. Chen et al. (2021) Reference Chen, Zhang and Ding19 opted to use Samsung Odyssey (Samsung, Seoul, South Korea) over the Oculus headsets as it relies on inside-out tracking instead of conventional position tracking. This facilitates faster set-up as there are no external cameras and the device is essentially a self-contained unit. This was combined with 3D Slicer (open-source), an image analysis software capable of developing 3D models based on CT and MRIs. Junior residents had significant improvements in anatomical test scores after performing a virtual laminectomy and lateral recess decompression, whereas there were no changes for senior residents, suggesting VR enhances understanding of surgical anatomy in the early stages of training. Yu et al. (2019) Reference Yu, Zhou, Lei, Liu, Fan and He25 also used the 3D Slicer platform to create 3D representations of the lumbar spine based on CT scans. Compared to trainees who were taught with plain films, residents who learnt percutaneous transforaminal endoscopic discectomy had shorter procedure times.
A team at the University of Chicago used an in-house AR system, ImmersiveTouch (Industrial Virtual Reality Institute, Chicago, IL, USA) to evaluate pedicle screw placement Reference Luciano, Banerjee and Bellotte23 and percutaneous spinal fixation. Reference Luciano, Banerjee and Sorenson24 ImmersiveTouch is a projection-based AR system with hardware capable of providing haptic feedback. Accuracy (i.e., distance to target or correct level) improved with repeated attempts for both procedures.
Yanni et al. (2021) Reference Yanni, Ozgur and Louis21 adapted SpineAR (Surgical Theater, Inc., Beachwood, OH, USA), an intraoperative navigation tool combined with a headset to allow for overlay of AR-guided pedicle screw trajectory with the operative field, optimizing surgical efficiency by eliminating the need for the operator to continuously shift their gaze from an external monitor to the operative site. Both junior learners and attendings had higher rates of Gertzbein-Robbins grade 0 (i.e., no cortical breach) or 1 (i.e., 0–2 mm cortical breach) screw placements compared to literature rates of free-hand placements.
A series of validation studies for a novel VR tool, the Sim-Ortho (OSSimTech, Montreal, Canada, and AO Foundation, Davos Switzerland) from McGill. Ledwos et al. (2020) demonstrated reasonable validity of the device for a one-level anterior cervical discectomy and fusion (ACDF). Adequate face and content validity were demonstrated across resident ratings of simulator realism. Construct validity was assessed by analyzing the performance differences between junior and senior trainees; the latter interacted with the disc for a greater proportion of time throughout the operation and removed more target tissue, demonstrating Sim-Ortho’s potential as a surgical education tool. Some noted limitations of the Sim-Ortho were its omission of certain parts of a standard ACDF (e.g., soft tissue exposure) and lack of realism across certain structures (e.g., the posterior longitudinal ligament). Subsequent studies from the same institution aimed to use the Sim-Ortho for specific portions of an ACDF, such as the discectomy Reference Mirchi, Bissonnette and Ledwos27 and annulus incision. Reference Alkadri, Ledwos and Mirchi28 Both studies demonstrated an artificial intelligence algorithm could be trained to identify senior and junior trainees with 80 and 83.3% accuracy using pre-defined metrics such as safety, efficiency, and appropriate usage of surgical instruments. This demonstrates a role for surgical simulation to not only train residents, but also provide objective feedback as to their performance against a standardized cohort.
Simulator Feedback
The 16 studies varied significantly in the type and extent of feedback provided to the trainee, as outlined in Table 1. Across the 11 different simulators used across the studies, five provided feedback on pedicle screw placement (i.e., ImmersiveTouch, VSTS, IVRSS-PSP, Microsoft HoloLens, and SpineAR). For several, The ImmersiveTouch program provided automatic calculations for pedicle screw placements, whereas for the other modalities, the results were manually obtained based on post-instrumentation CTs. The validation studies for Sim-Ortho focused on more nuanced details such as volume of tissue removed, contact with structures, instrument tip path length (as a proxy for efficient intraoperative movements), force used with drilling – all automatically provided by the built-in software. The 3D Slicer primarily compared trainees based on overall fluoro exposure time. In Chen’s study, a plugin software was used to mimic posterior lumbar decompression, and an emphasis was placed on the exposure and handling of soft tissue. All simulators provided haptic and visual feedback, with the Sim-Ortho uniquely incorporating auditory feedback to further optimize the realism of the scenarios.
The five studies analyzing pedicle screw placement offered enough data to allow for pooled analysis; all of these studies were non-randomized comparisons between AR/VR-trained groups and controls. All studies demonstrated increased accuracy of pedicle screw placements (i.e., reduced frequency of pedicle screw breach/perforations), with AR/VR-trained residents (OR 5.05, 95% CI 2.93–8.68). (Figure 2). Due to the disparity across studies in the grading scales used, the data are represented as odds of no breach vs any breach. For heterogeneity, I2 = 0%.
Discussion
This review identified 12 studies evaluating the role of AR/VR technologies in spine education. While a collective analysis of the data was limited by the heterogeneity of outcome measurements, procedure types, and AR/VR platforms across the study, there are several findings that suggest AR/VR may serve as a useful tool in surgical education.
Of note, AR/VR technologies are a heterogeneous entity, with no robust comparisons between the different available modalities in respect to surgical training. Four of sixteen studies evaluated AR systems, six focused on VR, with 1 study using a mixed reality system. Generally within neurosurgery, VR technologies have been used more frequently for training, and AR has traditionally used for intraoperative image enhancement, due to the ability to overlay images onto the operative field. Reference Liu, Tai and Zhao30 However, as demonstrated by several studies in this review, there is a role for AR-guided training in spine surgery. This becomes particularly relevant, as immersive VR systems are often time-intensive to set up and suffer from a lack of realism; this is countered in an AR environment, which allows for the overlay of visual data onto existing surroundings.
A general advantage of AR/VR is the ability to refine surgical skills in a low-stakes environment while generating customizable surgical scenarios and automated objective feedback. This provides trainees with concrete techniques to improve upon and allows for comparisons over time. In addition, AR/VR technologies allow trainees to focus on specific skills; for example, Luca and Knafo’s models specifically incorporated bleeding control and handling of soft tissue into their metrics, while several other studies simply looked at final accuracy (e.g., pedicle screw placement). Other companies such as the OculusRift have created software development kits that enable users to create their own scenarios, theoretically allowing incorporation of real-world complications such as CSF leak, dural tears, patient wakening, etc. Furthermore, based on the validation studies performed for the Sim-Ortho model, there may also be a role for AR/VR technologies in providing objective feedback for residents, to help them identify if they are performing at a standard appropriate for their training year. Reference Ledwos, Mirchi and Bissonnette26–Reference Bissonnette, Mirchi, Ledwos, Alsidieri, Winkler-Schwartz and Del Maestro29 In another study by Winkler-Schwartz et al. (2019), Reference Winkler-Schwartz, Bissonnette and Mirchi31 machine learning algorithms were used to identify junior trainees, senior trainees, and staff surgeons with remarkable accuracy, even across complex surgical tasks. Reference Winkler-Schwartz, Bissonnette and Mirchi31
An interesting concept raised by these studies is the growing role of artificial intelligence in surgical education. Winkler-Schwartz et al. (2019b) outline a 20-point scale to help determine quality of literature in this area – they specify in their review, many studies do not explain the educational relevance of a chosen metric. Reference Winkler-Schwartz, Yilmaz and Mirchi32 While AR/VR technologies offer the ability to objectively quantify technical skills that were previously difficult to manually assess, it is important relevant measures are chosen – for example, eye movements, a common outcome used in AR/VR studies may not be particularly relevant, as it is not a teachable or trainable facet of surgical performance. Rather, metrics directly impacting surgical outcomes, such as duration, bimanual dexterity, volume of target tissue removed, etc., may be more useful.
Across the 16 studies, the commonly analyzed procedure was pedicle screw insertion, likely reflective of the significant morbidity associated with pedicle screw breach (i.e., fracture, nerve injury). While the studies used different grading systems to evaluate outcomes, when dichotomized into no breach vs breach, the pooled results of the 5 studies offering controls demonstrate a benefit of AR/VR-guided training. While this suggests AR/VR technologies can help trainees achieve minimal competency, these studies varied significantly in respect to case complexity and realism (e.g., soft tissue handling, hemostasis), making it difficult to determine if these findings will translate to enhanced operative performance.
Three studies demonstrated a decreased number of errors after a few practice runs in the same sitting, for lateral lumbar access Reference Luca, Giorgino and Gesualdo18 , thoracic pedicle screw placement, Reference Luciano, Banerjee and Bellotte23 and percutaneous spinal fixation, Reference Luciano, Banerjee and Sorenson24 indicating that lack of experience with AR/VR is not a significant barrier to its use. Across the 5 studies that randomized participants into two groups (i.e., AR/VR teaching vs standard didactic lecture) prior to a surgical task, all demonstrated an improvement in respect to accuracy Reference Hou, Shi, Lin, Chen and Yuan14–Reference Xin, Chen and Wang17,Reference Dennler, Jaberg and Spirig22 or time Reference Yu, Zhou, Lei, Liu, Fan and He25 with VR/AR. This improvement may be facilitated by an enhanced understanding of surgical anatomy with AR/VR technology.
Conclusion
In this systematic review, limited evidence suggests AR/VR platforms are a useful tool for enhancing proficiency in various spine procedures, particularly pedicle screw insertion. However, the cost-benefit, translation to clinical practice remains unclear. As AR and VR technologies rapidly advance, further research will be necessary to reassess their role in surgical education.
Specific areas of future interest include the role of increasingly sophisticated automated feedback, the role of artificial intelligence, and algorithms to help analyze performance and determine the procedures and specific steps most amenable to being taught via AR/VR tools.
Conflicts of Interest
The authors of this manuscript have no conflicts of interest to disclose.
Author Contributions
YJ: conceptualization, methodology, writing-original draft, writing-reviewing & editing.
VM: conceptualization, methodology, writing-reviewing & editing.
PG: conceptualization, methodology, writing-reviewing & editing.
DG: conceptualization, methodology, writing-original draft, writing-reviewing & editing, visualization, supervision.
PG: conceptualization, methodology, writing-reviewing & editing.
DG: conceptualization, methodology, writing-original draft, writing-reviewing & editing, visualization, supervision.