INTRODUCTION
Every year thousands of service members (SMs) in the U.S. military are diagnosed with a mild traumatic brain injury (mTBI), also known as concussion (Defense and Veterans Brain Injury Center [DVBIC], 2016). These injuries can take place in a variety of settings due to several causes, including those similar to sports-related concussion in the civilian sector. Regardless of where or how concussion occurs, there is a need for timely and effective evaluation of an individual’s cognitive functioning (e.g., Kelly, Coldren, Parish, Dretsch, & Russell, Reference Kelly, Coldren, Parish, Dretsch and Russell2012). Assessment of cognitive abilities via neuropsychological (NP) tests is considered the cornerstone of concussion management (McCrory et al., Reference McCrory, Meeuwisse, Aubry, Cantu, Dvořák, Echemendia and Sills2013). However, these tests are time consuming and require particular expertise for administration and interpretation of results. In more recent years computerized neurocognitive assessment tools (NCATs) have been increasingly used as a quicker and more feasibly administered alternative to NP tests (e.g., Friedl et al., Reference Friedl, Grate, Proctor, Ness, Lukey and Kane2007; McCrory et al., Reference McCrory, Meeuwisse, Aubry, Cantu, Dvořák, Echemendia and Sills2013).
The Automated Neuropsychological Assessment Metrics 4 TBI-MIL (ANAM4) is an NCAT developed by the U.S. Army (Friedl et al., Reference Friedl, Grate, Proctor, Ness, Lukey and Kane2007) and widely used in the military (Defense Health Board, 2016). ANAM4 is regularly administered before a deployment as a means to generate a neurocognitive baseline for post-deployment and post-injury comparison (DoDi 6490.13). Despite the goal of NCATs, including ANAM4, existing evidence is inconclusive regarding the ability to identify cognitive issues following concussion (see Arrieux, Cole, & Ahrens, Reference Arrieux, Cole and Ahrens2017; Resch, McCrea & Cullum, Reference Resch, McCrea and Cullum2013).
Typically findings from ANAM4 are based on analyses comparing post-injury scores either to individual baseline measurements or normative databases (see Haran et al., Reference Haran, Dretsch, Slaboda, Johnson, Adam and Tsao2016; McCrea et al., Reference McCrea, Pliskin, Barth, Cox, Fink, French and Powell2008). The most commonly interpreted standardized ANAM4 score is “throughput,” calculated for each subtest and based on the number of correct responses per response time. Building on evidence from a growing literature, this article applies an alternative, more fine-grained scoring method that may be better suited for identifying cognitive dysfunction. The analyses focus on within-person inconsistent performance, or intra-individual neurocognitive variability.
Although intra-individual variability is often viewed as noise or test error, it may in fact reflect fluctuation in cognitive processing and reveal cognitive deficits that a mean or standard score is attempting, but failing, to capture. For example, research in aging populations has shown intra-individual variability on various behavioral and neurophysiological measures to be associated with decline in cognitive performance (for example, Fjell, Rosquist, & Walhovd, Reference Fjell, Rosquist and Walhovd2009; Reference Lovden, Li, Shing and LindenbergerLovden, Shing, & Linderberger, 2007). Although the literature base is relatively small, intra-individual variability in acute and post-acute concussion populations has been studied for more than 2 decades using both traditional NP and reaction time (RT) tests (e.g., Rabinowitz & Arnett, Reference Rabinowitz and Arnett2013; Sosnoff et al., Reference Sosnoff, Broglio, Hillman and Ferrara2007; Stuss et al., Reference Stuss, Stethem, Hugenholtz, Picton, Pivik and Richard1989).
Using NP tests, Hill, Rohling, Boettcher, and Meyers (Reference Hill, Rohling, Boettcher and Meyers2013) analyzed intra-individual variability using means from the Meyers Neuropsychological Battery in individuals reporting a history of mTBI and found that overall performance is negatively correlated with variability. Similarly, in a study using RT-based stimulus discrimination and flanker tests, history of concussion was shown to be associated with increased intra-individual variability (Parks et al., Reference Parks, Moore, Wu, Broglio, Covassin, Hillman and Pontifex2015). Beyond behavioral measures, Segalowitz, Dywan, and Unsal (Reference Segalowitz, Dywan and Unsal1997) demonstrated for a TBI group, and not for a control group, RT variability was related to electrophysiological measures of attentional allocation and sustainment (the P300 amplitude and the preresponse component of the contingent negative variation E-Wave), supporting the idea that RT variability reflects this attentional processing.
Studies have also examined intraindividual-variability in TBI using NCATs. Bleiberg, Garmoe, Halpern, Reeves, and Nadler (Reference Bleiberg, Garmoe, Halpern, Reeves and Nadler1997) demonstrated participants with mild to moderate TBI performed more inconsistently in same-day and across multiple day sessions than a healthy control group. Makdissi et al. (Reference Makdissi, Collie, Maruff, Darby, Bush, McCrory and Bennell2001) investigated a simple RT test in a different NCAT, CogState, in athletes and found greater standard deviation in reaction time in acutely concussed versus never concussed athletes at follow-up, though not at baseline. However, longer RT in concussed participants as compared to controls could account for greater standard deviation in RT. Sosnoff et al. (Reference Sosnoff, Broglio, Hillman and Ferrara2007) adjusted for mean RT in a group of individuals tested within 72 hours of concussion and found that after this adjustment, concussed individuals did not have greater RT SD than healthy age- and gender-matched individuals.
The above studies, most of which demonstrate an ability to differentiate TBI and control group performance using intra-individual variability measures, all compare an individual’s performance on a test or whole battery across test sessions. In contrast, the present investigation explores potential differences in intra-individual variability by comparing performance on one subtest repeated within a battery in patients with acute concussion and healthy controls. Our approach allows examination of the use of intra-individual variability analyses within an abbreviated window and without a need for repeat testing of an entire battery.
The ANAM4 is an ideal test to examine intra-individual variability in this way, as unlike most NCATs, the ANAM4 includes an identical simple RT (SRT) task at the beginning and the end of the battery. Although the ANAM4 standard output generates the RT standard deviation on each subtest, our approach differs because it examines the standard deviation of the difference between the trial-by-trial RT data. This approach allows for a more fine-grained measure of intra-individual variability and an individual’s change in RT (i.e., dSRT) over a brief period of time. In addition to looking at trial-by-trial raw RT data and dSRT, the current study investigated acutely concussed individuals, as previous research suggests ANAM4 has limited clinical utility more than eight days following concussion, as well as healthy controls (e.g., Nelson et al., Reference Nelson, LaRoche, Pfaller, Lerner, Hammeke, Randolph and McCrea2016). We hypothesize that this alternative trial-by-trial approach to interpreting RT on ANAM4 will reveal differences in variability and dSRT across the two groups.
METHODS
Sample
A total sample of 350 individuals was selected from a larger study’s sample of SMs from Fort Bragg with and without mTBI where ANAM4 was administered (Cole, Arrieux, Dennison, & Ivins, Reference Cole, Arrieux, Dennison and Ivins2017). Informed consent was obtained from all subjects and data were collected in compliance with the Womack Army Medical Center Institutional Review Board’s regulations and requirements. The sample included 242 healthy controls (CTRL) and 108 participants within 7 days of mTBI (mTBI). The following criteria were used exclude data from the analyses: (1) potentially invalid data according to the ANAM4 embedded effort index (EI; CTRL: n=10; mTBI: n=5); and (2) RT less than 150 ms or greater than 900 ms (CTRL: n=1; mTBI: n=3), also deemed to be indicative of potentially invalid data.
After exclusions, 231 records were assigned to the CTRL group and 100 were assigned to the mTBI group.
Instrumentation
The ANAM4 (CSRC, 2014) is an automated, computerized neurocognitive test battery that includes a sleepiness scale, mood scale, a self-report TBI questionnaire, and seven core subtests: Code Substitution Delayed (CDD), Code Substitution (CDS), Matching-to-Sample (M2S), Mathematical Processing (MTH), Procedural Reaction Time (PRO), Simple Reaction Time (SRT1), and Simple Reaction Time Repeated (SRT2). Due to the larger study’s procedures, an additional battery of questionnaires was administered before testing, including demographics, military history, head injury history, Post-Traumatic Checklist – Civilian (PCL-C), and the Neurobehavioral Symptom Inventory (NSI). Following the questionnaires, the seven core ANAM subtests were administered per usual procedures. Validity of the data was evaluated by an embedded EI, which flags atypical scores based on accuracy and discrepancy of responses (Roebuck-Spencer, Vincent, Gilliland, Johnson, & Cooper, Reference Roebuck-Spencer, Vincent, Gilliland, Johnson and Cooper2013). For the purposes of this manuscript, only the EI and the raw data from the SRT1 and SRT2 were used in the analyses.
Data Analyses
The following metrics were calculated using data from the SRT1 and SRT2 raw RT data (N=40 trials): (1) SRT difference score (dSRT; formula 1), (2) the standard deviation (SD) of the dSRT (dSRT-SD), (3) the mean of dSRT, and (4) the standardized response mean (SRM) of dSRT (dSRT SRM; formula 2).
Both dSRT SRM and dSRT-SD were used as metrics of intra-individual variability.
Statistical Analyses
Group differences for demographic data were examined using Mann-Whitney U tests and Chi-Square tests. There were minor violations of the Lilliefors test of normality for the simple reaction subtest data; however, the potential for a familywise type I error due to multiple comparisons was accounted for with sample sizes sufficient enough (i.e., n>30) for the central-limit theorem to apply, robustness of the parametric tests used, and Bonferroni-Holm sequential corrections.
Group differences were analyzed using a general linear model (1×2) multivariate analysis of variance(MANOVA), with group membership (2 levels) as the between-subjects variable. Univariate tests and pairwise comparisons were conducted to follow-up significant main effects. Effect size (ES) for group differences was calculated using the Hedge’s g and Cohen’s U3 statistic, and the results were interpreted using the following criteria: recommended minimum practical effect size (RMPE; ES>0.41), moderate effect (ES>1.15), and strong effect (ES>2.70) (Ferguson, Reference Ferguson2009).
All analyses were performed with Matlab 2015b (Mathworks, Natick, MA) and SPSS Version 22 (IBM, Armonk, NY).
RESULTS
There were significant differences for sex and rank on the demographic variables (Table 1). Differences in sex are believed to be due to the higher number of officers in the control group, as there was a higher proportion of female officers than female enlisted soldiers. It is believed that officers were over-represented in the control group due to their greater ability to control and dictate their daily schedules, allowing them to take time off to volunteer in a research study. There were no other statistically significant differences on other measured demographic variables (Table 1).
CTRL=control group; mild traumatic brain injury=mTBI group.
a Two-tailed Mann-Whitney U test.
b Chi-square test.
c 0=less than 12 years; 1=General Educational Development (GED) certificate; 2=high school graduate; 3=some college; 4= associate degree; 5=bachelor’s degree or higher.
The results of the MANOVA revealed that there was a significant multivariate main effect for group membership on ANAM4 performance (F (4,326)=18.56; p<.001; ηp 2=.19). The univariate tests associated with the main effect for group were significant for the SRT1 (F (1,329)=23.93; p=.001;, ηp 2=.07), SRT2 (F (1,329)=60.58; p<.001; ηp 2=.16), dSRT(F (1,329)=14.77; p<.001; ηp 2=.04), dSRT-SD (F (1,329)=55.22; p<.001; ηp 2=.14), and dSRT SRM (F (1,329)=18.60; p<.001; ηp 2=.05). Pairwise comparisons revealed that the mean for the control group was significantly lower (i.e., faster RT, less variability, less change in RT over time) than the mean for the mTBI group on each metric (Table 2). It should be noted that the effect for group differences (ES>.41) exceeded the RMPE for each variable.
CTRL=control group; mTBI=mild traumatic brain injury group; SRT=simple reaction time; SR2=simple reaction time repeated; dSRT=SR2-SRT; dSRT sd=dSRT standard deviation; dSRT SRM=dSRT standardized response mean; Δ=CTL value – mTBI value; tstat=t-statistic; ES=effect size calculated as Hedge’s g; U3=Cohen’s U3 statistic.
As a result of the group differences between enlisted and officer data, a second MANOVA was performed on just the data from enlisted personnel, which resulted in group equivalency across demographics. The results of the MANOVA revealed that there was a significant multivariate main effect for group membership on ANAM4 performance (F (4,209)=10.12; p<.001; ηp 2=.97). The univariate tests associated with the main effect for group were significant for the SRT1 (F (1,212)=15.71; p=.001; ηp 2=.07), SRT2 (F (1,212)= 36.25; p<.001; ηp 2=.15), dSRT(F(1,212)=8.98; p=.003; ηp 2=.04), dSRT-SD (F (1,212)=28.57; p<.001; ηp 2=.12), and dSRT SRM (F (1,212)=10.52; p<.001; ηp 2=.05). Pairwise comparisons revealed that the mean for the control group was significantly lower than the mean for the mTBI group for each metric with effects exceeding the RMPE for all variables except for dSRT (ES=– .37) (Table 2).
DISCUSSION
The current study investigated differences in mean RT and RT variability between healthy controls and those with acute concussion using raw trial-by-trial RT data from the ANAM4. This approach was relatively unique as most previous studies have focused on the use of standardized scores and cognitive efficiency metrics (e.g., throughput scores) to investigate group differences. Moreover, prior studies examining differences in RT variability have almost exclusively done so across test sessions rather than using a repeated subtest within a battery and test session. Our hypotheses were largely supported, as those with acute concussion had slower RTs and greater RT variability than healthy controls. The most important finding was that significant group differences were seen across all variables, with raw SRT2 and dSRT-SD appearing to be the most sensitive variables with ESs (−.68 and −.70, respectively) that were similar to values previously reported for raw SRT2 (ES=−.60; Adam et al., Reference Adam, Mac Donald, Rivet, Ritter, May, Barefield and Brody2015) and nearly double values reported for throughput scores (ES=−.35; Haran et al., Reference Haran, Dretsch, Slaboda, Johnson, Adam and Tsao2016).
It is not surprising that there were differences in the variability between the healthy control and mTBI groups. RT and RT variability have been shown to provide information about the allocation of attentional resources in those with neurological insult such as mTBI. Specifically, it is thought that attention allocation can be measured by RT latency in healthy controls, whereas in those with mTBI attention allocation is more related to RT variability than RT latency (Bleiberg et al., Reference Bleiberg, Garmoe, Halpern, Reeves and Nadler1997; Segalowitz et al., Reference Segalowitz, Dywan and Unsal1997). As such, the current finding of within subject variability on ANAM4 SRT performance in an acute mTBI group provides additional evidence to the body of literature.
In general, these results reveal greater trial-to-trial fluctuations in performance for the mTBI group as compared to the control group. Based on the central tendency theory, these fluctuations are often viewed as noise, instability, or error. However, they may be indicative of subtle cognitive decline after concussion that may otherwise be missed by more traditional metrics. That is, analyses of raw RT (particularly the raw SRT2 data), trial-by-trial RT change, and trial-by-trial RT variability appears to be an alternative metric for NCATs. Moreover, these alternative metrics may offer greater clinical utility than metrics commonly used in cognitive testing. Given the computerized nature of NCATs, metrics such as raw RT, trial-by-trial RT change, and trial-by-trial RT variability can be more quickly and feasibly calculated. Furthermore, ANAM4 presents a conceivable advantage over other NCATs by including a repeated simple reaction time test, allowing comparison of RT and RT variables across time though still within one testing session, potentially tapping into “cognitive fatigue.”
Limitations
The current study was derived from data from a larger study, and, therefore, procedures not relevant to the current analyses surrounded this study’s data collection of interest. These procedures sometimes included other testing before taking the ANAM4, which could have increased fatigue. However, any potential fatigue would be relatively equitable across groups and relatively controlled for by comparing SRT2 to SRT1 which occurred within the same testing session. Additionally, recent studies demonstrated that when administering multiple NCATs in one session, performance was not affected by the order of administration (Cole, Arrieux, Dennison, & Ivins, Reference Cole, Arrieux, Dennison and Ivins2017; Nelson et al., Reference Nelson, LaRoche, Pfaller, Lerner, Hammeke, Randolph and McCrea2016).
Another potential limitation is the differences between sex and rank in the control and mTBI groups, with more females in officers and more officers in control group. Even so, when omitting officers from analyses, thus rendering the groups equitable across all measured demographics, the results still held.
Finally, as with any study of NCATs, there are many factors that exist and were not controlled for. The computer platform used (e.g., hardware and software configurations), the participants’ familiarity with the ANAM4, the nature of injury, time since injury (e.g., <3 days vs. 3–7 days), ongoing symptomatology, potential medication with cognitive side effects (e.g., stimulants or sedatives), and so on. However, all efforts were taken to administer the tests with a platform as close to the ANAM4 manual specifications. Additionally, testing was done in a quiet room with a trained test proctor, in an environment similar to how baseline or post-injury testing would likely occur, likely rendering the results ecologically valid despite the potential for other sources of error.
CONCLUSIONS AND FUTURE DIRECTIONS
The results from this study support a small but growing body of literature that raw RT, RT change, and RT variability scores may be much more sensitive to the subtle cognitive effects often seen after concussion. It appears that mTBI participants can temporarily perform similarly to normal controls on RT latency, but repeated RT assessments at multiple time points throughout a battery demonstrate increased inconsistent performance. Interpreting these metrics rather than the traditionally reported standardized scores (e.g., throughput) appears to hold promise for the use of ANAM4 in acute concussion populations.
In addition, this study highlights the strength of using raw scores instead of standardized scores, where subtle cognitive effects may be washed out. However, additional work is needed to fully clarify the clinical utility (e.g., diagnostic and prognostic capabilities) of these metrics, and to determine if they do indeed offer advantages over traditional metrics obtained from traditional NP tests and NCATs. There is some existing evidence that shorter ANAM4 SRT is predictive of recovery in those acutely concussed (Norris, Carr, Herzig, Labrie, & Sams, Reference Norris, Carr, Herzig, Labrie and Sams2013). Thus, it may be that faster raw RT, less change in RT across SRT1 and SRT2, and less RT variability could be predictive of faster and/ or better recovery after concussion and, therefore, incorporated into return to duty or return to play decisions. Given the Army’s baseline/ predeployment testing program, it will also be important to determine if baseline assessments are valuable with regard to such metrics for diagnostic and prognostic purposes.
ACKNOWLEDGMENTS
The authors thank Karen Schwab, Brian Ivins, Felicia Qashu, Mary Alice Dale, James Wes McGee, Katie Toll, and Alex Fender for their contributions to this research. This material is published by permission of the Defense and Veterans Brain Injury Center, operated by General Dynamics Information Technology for the U.S. Defense Health Agency under Contract No.W91YTZ-13-C-0015. The authors have no financial interests to disclose. The views expressed in this manuscript are those of the author(s) and do not necessarily reflect the official policy or position of the Department of the Navy, Department of the Army, Department of Defense, or the U.S. Government.