Introduction
A key purpose of neuropsychological assessment is to provide objective data regarding cognitive functioning to reliably and validly identify cognitive change over time. In the context of Alzheimer’s disease and related dementias (ADRD), a crucial task is distinguishing between normal age-related cognitive decline and atypical cognitive decline associated with neurodegenerative disease (Chelune & Duff, Reference Chelune, Duff, Ravdin and Katzen2019). Just as medical laboratory tests are repeated to evaluate changes in disease progression or response to intervention, serial neuropsychological assessment can provide valuable information about the current course and the future trajectory of cognitive changes in an individual patient.
A variety of statistical methods have been developed to identify changes in individual test performances, including (a) simple discrepancy score, (b) standard deviation methods, (c) reliable change methods, (d) regression-based methods, and (Duff, Reference Duff2012). The simple discrepancy score is a straightforward method in which the difference between raw scores at two time points is compared to a normative database to determine how frequently a specific discrepancy is observed in a particular population (Patton et al., Reference Patton, Duff, Schoenberg, Mold, Scott and Adams2005). In standard deviation methods, a difference in test performance between two evaluations that exceeds a particular standard deviation cut-off is characterized as a significant change in cognitive ability (Frerichs & Tuokko, Reference Frerichs and Tuokko2005). In reliable change methods, an observed difference in test performance that exceeds the amount of change expected from measurement error or practice effects is characterized as a significant cognitive change (Chelune et al., Reference Chelune, Naugle, Liiders, Sedlak and Awad1993; Jacobson & Truax, Reference Jacobson and Truax1991; Stein et al., Reference Stein, Luppa, Brähler, König and Riedel-Heller2010). In standard regression-based methods, an individual’s baseline and follow-up scores are entered into a regression equation to determine whether the magnitude of the observed change in test performance exceeds the predicted variability in test performance based on a control sample (Hammers et al., Reference Hammers, Kostadinova, Unverzagt and Apostolova2022; McSweeny et al., Reference McSweeny, Naugle, Chelune and Lüders1993). Reliable change and regression-based methods are particularly useful because they provide estimates of the degree of measurement error influencing test-retest difference scores and then allow the examiner to infer the extent to which the examinee has experienced a statistically reliable change in performance (Brooks et al., Reference Brooks, Holdnack and Iverson2016).
There is considerable debate about which approaches to identifying cognitive change best predict real-world function, and each approach presents some important limitations (Heilbronner et al., Reference Heilbronner, Sweet, Attix, Krull, Henry and Hart2010). First, sophisticated statistical procedures like reliable change methods and regression-based methods are rarely used in clinical settings; clinicians often simply examine differences in raw scores between two evaluations and rely on subjective judgments of clinical significance. A single reliable change or regression equation can help identify changes in an individual measure, but a clinician would need to input data into several different equations to examine all the measures in a battery (Cysique et al., Reference Cysique, Franklin, Abramson, Ellis, Letendre, Collier, Clifford, Gelman, McArthur, Morgello, Simpson, McCutchan, Grant and Heaton2011; Woods et al., Reference Woods, Childers, Ellis, Guaman, Grant and Heaton2006). Furthermore, manually inputting data into multiple equations is less favorable under typical clinical time constraints. Second, when the standard deviation methods are employed, there is wide variability in the thresholds used to denote significant changes (e.g., ±1.0, ±1.5, or ±2.0 standard deviations). Third, the demographic characteristics that influence the normative distributions from which these standard deviations are based (e.g., age, gender, education, race, and ethnicity) differ widely between neuropsychological tests and can create interpretation issues when comparing multiple tests in a given battery (Merkley et al., Reference Merkley, Esopenko, Zizak, Bilder, Strutt, Tate and Irimia2022). Therefore, there is a need for statistical approaches to identifying atypical cognitive change that can be quickly applied in clinical settings.
The primary aim of the current study was to develop quick-reference criteria for identifying atypical cognitive change in older adults using a novel and easily accessible approach: examining the number of change scores in a test battery that corresponds to a statistically rare magnitude of change across multiple measures in cognitively normal older adults. A secondary aim was to examine the extent to which multivariate changes in cognitive performance predict diagnostic status after accounting for other variables that are commonly used in demographic normative adjustments for neuropsychological tests, including age, gender, education, and race/ethnicity.
Method
Participants
Data used in the preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). The primary goal of ADNI has been to examine the extent to which imaging, clinical, and neuropsychological measures predict the progression of Alzheimer’s disease (AD). ADNI data has been collected across four phases to date; ADNI1 (begun in 2004), ADNI GO (begun in 2009), ADNI 2 (2011), and ADNI 3 (2016). Please see the ADNI website (https://adni.loni.usc.edu) for a thorough review of the participating institutions and study phases. ADNI was approved by the institutional review boards (IRBs) of all participating institutions. Written informed consent was obtained from study participants or their proxy, following the ethical standards set forth by the Declaration of Helsinki.
Eligible participants at the initial ADNI screening visit were fluent English or Spanish speakers ages 55 to 90, with at least six years of formal education and no prior history of acquired brain injury or psychiatric disease. Participants were diagnosed as normal (NL), mild cognitive impairment (MCI), or dementia according to ADNI’s classification of diagnostic categories. NL participants had no abnormal memory complaints, clinical dementia rating (CDR) scores of 0, Mini-Mental State Exam (MMSE) scores between 24 and 30, and performed within the following ranges on the Wechsler Memory Scale-Revised (WMS-R) Logical Memory II delayed paragraph recall subtest: raw scores greater than or equal to 9 for individuals with 16 or more years of education, greater than or equal to 5 for those with 8–15 years of education, and greater than or equal to 3 for those with 0–7 years of education. MCI participants had abnormal memory complaints (verified by a study partner) with intact functioning in activities of daily living, CDR scores of 0.5, MMSE scores between 24 and 30, and WMS-R Logical Memory II raw scores less than or equal to 8 for individuals with 16 or more years of education, less than or equal to 4 for those with 8–15 years of education, and less than or equal to 2 for those with 0–7 years of education. Participants were diagnosed with dementia if they had abnormal memory complaints and the same Logical Memory II score ranges as the MCI participants but also had CDR scores of 0.5 or 1.0, MMSE scores between 20 and 26 and met the NINCDS/ADRDA criteria for probable AD (McKhann et al., Reference McKhann, Drachman, Folstein, Katzman, Price and Stadlan1984).
For the present study, inclusion criteria consisted of participants who (a) were diagnosed as cognitively normal or MCI at baseline, (b) were diagnosed as cognitively normal, MCI, or dementia at the two-year follow-up visit, and (c) had complete data on the following tests: American National Adult Reading Test (ANART; Grober et al., Reference Grober, Sliwinsk and Korey1991); Boston Naming Test (BNT-2; Kaplan et al., Reference Kaplan, Goodglass and Weintraub1983), Clock Drawing Test (Goodglass & Kaplan, Reference Goodglass and Kaplan1983), Trail Making Test (Reitan, Reference Reitan1958); Category Fluency Test (Morris et al., Reference Moms, Heyman, Mohs, Hughes, van Belle, Fillenbaum, Mellits and Clark1989); and the Auditory Verbal Learning Test (RAVLT; Rey, Reference Rey1964). Performance on the Digit Span Test and the second trial of the Category Fluency Test (vegetables) were also considered for this analysis; however, these two tasks were not administered in ADNI2 and were therefore not examined to maximize the diagnostic group sample sizes. Logical Memory II performance was not considered in defining the diagnostic groups.
As of July 17, 2022, a total of 2,414 participants had cognitive data across four ADNI protocols: ADNI1 (2004; n = 819), ADNI2 (2011; n = 130), ADNIGO (2009; n = 789), and ADNI3 (2016; n = 675). 358 ADNI1 participants were excluded due to consensus diagnostic classification of dementia at baseline, missing a diagnosis at the two-year study visit, or missing data on the cognitive measures of interest. 315 ADNI2 participants were excluded due to missing diagnoses at baseline or follow-up, a diagnosis of dementia at baseline, reversions in diagnosis (MCI to NL, or dementia to MCI), or missing test data. All participants from ADNIGO were excluded because they did not have cognitive data at the two-year mark, and all participants from ADNI3 were excluded due to unavailable data for the BNT-2, one of the measures of interest.
The final sample consisted of 935 participants from ADNI1 (n = 461) and ADNI2 (n = 474) who had complete and valid neuropsychological data at baseline and at the two-year study visit. Three groups were examined in this study based on diagnosis at their year 2 study visit: a cognitively normal group (NL; n = 401), an MCI group (MCI; n = 381), and a dementia group (DEM; n = 153). The MCI group included 24 participants who had been diagnosed as normal at their baseline visit and 357 originally diagnosed as MCI. All participants had a baseline diagnosis of MCI. See Figure 1 for a visualization of the participant selection process for the current study.
Statistical analyses
First, change scores were calculated using the difference between each participant’s raw performance at baseline and at their two-year follow-up visit. Change scores were calculated for the following seven measures: BNT-2 total correct (the sum of spontaneously correct responses and correct responses following stimulus cue); CDT total score (the sum of raw scores on command and copy); TMT Part A (TMT-A) and Part B (TMT-B); Animal fluency; the sum of the five immediate recall trials from the RAVLT (RAVLT Immediate) and the delayed recall trial (RAVLT Delayed). Change scores were calculated by subtracting the raw score at baseline from the raw score at follow-up. Second, the distribution of change scores in the NL cohort was examined for each of the seven measures. For each measure, the change score that corresponded to the 5th percentile of the distribution was identified as the threshold, denoting a significant decline in performance. Participants were classified as having exhibited a large change score on a given measure if their decline in performance met or exceeded the 5th percentile threshold (i.e., a worse decline in performance than 95% of the NL participants). Third, the number of change scores below the 5th percentile threshold was counted for each participant (henceforth called 5th percentile change count), thereby assigning a number ranging from 0 to 7 to each participant. Next, a stepwise multinomial logistic regression analysis was conducted to predict diagnostic group membership. Age, gender, race/ethnicity, years of education, and ANART error scores (as a measure of premorbid function) were entered in the initial model. ANART errors rather than standard scores were used to minimize multicollinearity with education. Based on ADNI demographics, race/ethnicity was recoded into three categories: White (non-Hispanic), Black (non-Hispanic), and Other. The 5th percentile change count was entered in the following step as the primary variable of interest.
Finally, classification accuracy analyses were conducted to examine the sensitivity and specificity of a dichotomized 5th percentile change count for differentiating between the NL participants and those with atypical cognitive decline (i.e., the MCI and dementia participants).
In an exploratory analysis, the 10th percentile was also examined as a change score threshold for each measure to provide a less conservative estimate of cognitive decline. The results of the regression analyses using the 10th percentile change counts were nearly identical to those obtained with the 5th percentile change counts and are therefore not reported here.
Results
Sample characteristics
Table 1 displays the descriptive characteristics of each of the three diagnostic groups. Kruskal−Wallis H tests examined differences in age, years of education, and estimated premorbid intellect (NART education-corrected standard scores) between the diagnostic groups. Distributions of age, years of education, and estimated premorbid intellect were similar for all groups, as assessed by visual inspection of boxplots. The diagnostic groups did not differ in years of education (H(2) = 3.628, p = .163). However, the median age was statistically significantly different between groups (H(2) = 6.524, p = .038); the MCI group was younger than the NL group but not significantly younger than the dementia group. In addition, the MCI and dementia groups showed lower premorbid intellect (H(2) = 36.280, p < .001) than the NL group.
Note: Only shows the statistically significant differences between diagnostic groups, obtained via Kruskal–Wallis H Tests.
Cognitive performance at baseline and follow-up
Table 2 displays cognitive performance at baseline and two-year follow-up for each diagnostic group. There appeared to be a slight practice effect for the NL group; these participants tended to perform slightly better in the follow-up visit across measures. However, there was less evidence of a practice effect for the other two groups; the MCI and dementia participants tended to perform at the same level or slightly worse in the follow-up visit.
Note: Kruskal–Wallis H Tests with Bonferroni corrections to account for multiple comparisons.
The distribution of change scores for the cognitively normal participants
Figure 2 displays the distribution of change scores for the Rey Auditory Verbal Learning Test (RAVLT) Delayed Recall trial. The median and modal change score on this task was zero; 16.7% of NL participants obtained a change score of 0 (i.e., recalled the exact same number of words on the delayed recall trial at baseline and follow-up). 48.4% of NL participants exhibited an improvement in RAVLT Delayed Recall performance at two-year follow-up (ranging from +1 to +15 words), and 34.9% exhibited a decline in delayed recall (ranging from −1 to −14 words). Of those who exhibited a decline, 20 participants (4.98% of the sample) obtained change scores less than or equal to −7 (i.e., they recalled seven words or more at baseline than they did at follow-up). Therefore, −7 served as the 5th percentile change score threshold for the RAVLT Delayed Recall subtest; individuals who obtained change scores of −7 or lower were classified as having exhibited a significant decline in performance. The change score distributions for the other six measures were examined similarly; as a second example, Figure 3 displays the change score distribution for the Category Fluency test.
Table 3 displays descriptive statistics for the change scores, as well as the change scores that corresponded to the 5th percentile threshold for each of the seven measures. Table 4 displays the associations between the 5th percentile change count and relevant demographic characteristics for the overall sample. The 5th percentile change count was not associated with age, gender, education, race/ethnicity, or premorbid intellect.
Note: Higher scores on the Trail Making Test indicate worse performance (i.e., a longer time to complete a timed task). Therefore, higher scores at 2-year follow-up indicate a decline in performance.
Note: Point-biserial correlations were used for the association between 5th Percentile Change Count and gender. * p = 0.01.
Comparing the number of substantial change scores across diagnostic groups
Figure 4 shows the 5th percentile change count across diagnostic groups for all seven cognitive measures. Among the NL participants, the grand majority (75.3%) did not have any significant change scores below the 5th percentile thresholds across seven measures, and one-fifth (20.4%) had only one significant change score. It was increasingly rare for the NL participants to have significant change scores across multiple variables; only 3.7% had a 5th percentile change count of two or more, and only 0.5% had a change count of three or more. By comparison, the MCI participants demonstrated a slightly higher proportion of large declines in performance, with 22.3% having a 5th percentile change count of one or more and 6.8% having two or more. The dementia group had the highest proportion of participants with significant change scores at every level, with 27.5% having a 5th percentile change count of one or more and 20.9% having two or more.
Figure 5 displays the cumulative percentage of participants in each diagnostic group with significant change scores. A 5th percentile change count of one or more was relatively common for all three groups. A 5th percentile change count of two or more was rare for the NL group (4.2%), relatively rare for the MCI group (10.6%), but relatively common for the dementia group (38.6%). A 5th percentile change count of three or more was relatively rare for all three groups.
The predictive value of the number of change scores toward diagnostic status
Table 5 displays the parameter estimates for the logit predicting NL versus MCI group membership and the logit predicting NL versus dementia group membership. The NL group served as the reference category for pairwise comparisons of the odds ratio predicting MCI or dementia group membership. The fit between the stepwise multinomial logistic regression model containing age, gender, race/ethnicity, education, and premorbid intellect was significantly improved with the addition of the dichotomized 5th percentile change count (X 2(14) = 206.27, p < .001, Nagelkerke R 2 = .23).
Note: For ANART errors, a higher number of errors indicates worse performance. Therefore, an odds ratio greater than 1.00 indicates a greater likelihood of lower premorbid intellect.
For the logit predicting NL versus MCI group membership, younger age (Exp(β) = .96, p < .001), male gender (Exp(β) = .1.58, p = .003), lower premorbid intellect (Exp(β) = 1.05, p < .001), and the 5th percentile change count (Exp(β) = 1.57, p < .001) were all associated with greater odds of MCI group membership. Education and race/ethnicity did not significantly predict NL versus MCI group membership. For the logit predicting NL versus dementia group membership, male gender (Exp(β) = 1.63, p = .029), lower premorbid intellect (Exp(β) = 1.07, p < .001), and the 5th percentile change count (Exp(β) = 3.48, p < .001) were associated with greater odds of dementia. In contrast, Black participants (Exp(β) = .08, p = .018) had lower odds of dementia relative to NL group membership. Age and education did not significantly predict NL versus dementia group membership.
Table 6 displays the classification matrix differentiating between the diagnostic groups for using the 5th percentile change count. For this analysis, participants with MCI and dementia were combined into a single group representing all individuals with atypical cognitive decline (i.e., a “positive” diagnosis) in comparison to the NL participants (i.e., a “negative” diagnosis). In addition, the 5th percentile change count was dichotomized such that participants with two or more change scores below the 5th percentile served as the “positive” test result, and participants with zero or only one change score below the 5th percentile served as the “negative” test result. The dichotomized 5th percentile change count showed high specificity (96%) and high positive predictive value (85%) for differentiating between the diagnostic groups but low negative predictive value (47%) and low sensitivity (19%). The analysis thereby confirmed the findings illustrated in Figure 5; a 5th %ile change count of two or more might distinguish individuals with MCI or dementia from cognitively normal individuals, whereas a 5th %ile change count of zero or one does not distinguish well between the diagnostic groups.
Discussion
This study aimed to provide proof-of-concept for a novel, quickly-accessible method for identifying meaningful changes in neuropsychological test performances over time. By examining the performances of a large sample of older adults diagnosed as cognitively intact, calculating their change scores across a two-year interval, and establishing the magnitude of change needed to be considered normatively rare for each measure, this method could allow clinicians to estimate whether an examinee’s performances are atypical in comparison to other older adults who present as clinically normal. Participants diagnosed as cognitively normal at baseline and at two-year follow-up served as the normative reference group for establishing the criteria by which abnormally large declines in performance were identified; these criteria were then validated in participants diagnosed with MCI at both time points, as well as in those who transitioned from cognitively normal to MCI or from MCI to dementia. Establishing base rates of multivariate cognitive change may help improve the value of neuropsychological evaluations in the diagnosis of neurological disease (Donders, Reference Donders2020; Jak et al., Reference Jak, Bondi, Delano-Wood, Wierenga, Corey-Bloom, Salmon and Delis2009).
It was relatively common for participants to show a substantial decline on at least one of the seven cognitive measures in this study, regardless of diagnostic group. When assessed at baseline and two years later, roughly one-quarter of the participants diagnosed as cognitively normal at both time points showed one or more declines in performance that fell below the cut-off score corresponding to the 5th percentile in the distribution of change scores for each measure. Similarly, it was also common for the MCI participants (one-third of the sample) to have at least one change score below the 5th percentile threshold. Among the participants who converted from MCI to dementia, exhibiting a substantial decline in cognitive performance was the rule rather than the exception; it was more common for these individuals to exhibit at least one large decline in performance (two-thirds of the group) across the seven measures than it was to obtain a change score above the 5th percentile thresholds. These findings lend further support to an established body of research demonstrating that it is common for cognitively intact and cognitively impaired individuals to exhibit at least one large change in performance over time when examining multiple neuropsychological tests. This phenomenon has been repeatedly shown in research employing reliable change methods (Binder et al., Reference Binder, Iverson and Brooks2009; Brooks et al., Reference Brooks, Holdnack and Iverson2016), highlighting the dangers of overinterpreting a decline in performance on a single measure.
Between-group differences in the 5th percentile change count became more apparent as the criterion moved from having one or more large declines in performance to having two or more large declines. It remained relatively uncommon for cognitively normal participants to have a 5th percentile change count of two or more (less than 5% of the group). The overwhelming majority of the cognitively normal participants (over 95%) either had no declines or only one decline that exceeded the 5th percentile threshold for each of the seven measures. It was slightly more common for the MCI participants (about one-tenth of the group) to have a 5th percentile change count of two or more. Importantly, a large majority of the MCI participants (the remaining nine-tenths) still had no declines or only one decline in performance that fell below the 5th percentile thresholds, similar to the cognitively normal participants. The most notable between-group differences emerged when examining participants who converted from MCI to dementia. In this group, it was relatively common (over one-third) to exhibit at least two large declines in performance, a much larger proportion than the other MCI and cognitively normal participants. These results indicate that having two or more large declines may be a useful criterion for distinguishing between the typical variability seen in clinically normal individuals and atypical cognitive decline.
One aspect of the study that warrants further investigation is the classification accuracy of the 5th percentile change count, particularly its low sensitivity for identifying participants in the two cognitively impaired groups. Classifying participants based on a 5th percentile change count of two or more yielded a high false negative rate: most of the participants in the combined MCI and dementia group obtained only one or no large change scores exceeding the 5th percentile thresholds. One possible contributor to the low sensitivity of the change count metric may be floor effects. For example, on the RAVLT Delayed Recall test, the average baseline recall was seven words in the cognitively normal group, whereas, in the MCI group, the average baseline recall was just over four words for the MCI group; in the group who converted to dementia, the average baseline recall was less than two words. The 5th percentile threshold for the RAVLT Delayed Recall test corresponded to recalling seven fewer words at follow-up than at baseline. Thus, a substantial portion of the participants in the two cognitively impaired groups were not able to exhibit large enough declines to exceed this threshold. Therefore, a 5th percentile change count of two or more measures may have missed many participants who actually performed substantially worse at follow-up but remained within the threshold because they were too near the floor, thereby contributing to the high false negative rate. Floor effects may limit the use of the 5th percentile change count in clinical settings. If an examinee’s score reaches the floor in a follow-up evaluation, a clinician could interpret their poor performance as a substantial decline by clinical judgment, without relying on base rates for that measure. However, if the examinee’s score is well above the floor, the 5th percentile threshold could be useful for distinguishing between typical and atypical cognitive change across an entire battery.
Although the 5th percentile change count demonstrated low sensitivity, it is important to note that it yielded a high positive predictive value (PPV), which is the more clinically relevant metric (Smith et al., Reference Smith, Ivnik, Lucas, Morgan and Ricker2008). Because it was rare for cognitively normal individuals to show two or more large declines in performance across seven neuropsychological measures, obtaining such a score has PPV for non-normality. Furthermore, the data collection procedures used in ADNI intentionally oversampled cognitively normal individuals (relative to a clinical setting). Given the characteristics of individuals referred to undergo clinical neuropsychological evaluations, it is likely that the base rate of atypical cognitive decline will be higher in clinical settings than in research settings. And because positive (and negative) predictive values are strongly influenced by base rates, it is likely that the 5th percentile change count of two or greater would have an even larger PPV in the clinical scenario. Of course, a typical neuropsychological battery involves more than seven measures for which change scores could be calculated. Future research should explore how the 5th percentile change count changes with battery size. Nevertheless, these findings provide proof of concept that examining raw change scores in a large neuropsychological database could generate quickly accessible information. This information could be used to guide expectations about typical versus atypical cognitive change that can be applied to individual patients.
Strengths, limitations, and directions for future research
A strength of this novel method is the ability to examine multivariate cognitive change. In contrast to reliable change and regression-based methods, in which each individual change score can only be examined in isolation, this novel, whole-battery approach allows examiners to make inferences about changes across multiple measures at once. Although some prior research has examined multivariate approaches for assessing meaningful cognitive change (Cysique et al., Reference Cysique, Franklin, Abramson, Ellis, Letendre, Collier, Clifford, Gelman, McArthur, Morgello, Simpson, McCutchan, Grant and Heaton2011; Woods et al., Reference Woods, Childers, Ellis, Guaman, Grant and Heaton2006), these studies rely on modified reliable change methods and/or standard regression-based approaches that remain underused by clinicians. A unique aspect of this study is using multivariate cognitive change to distinguish between diagnostic groups; prior research on this topic has typically been restricted to cognitively normal samples and has lacked external clinical samples that would help validate findings (Woods et al., Reference Woods, Childers, Ellis, Guaman, Grant and Heaton2006). Another strength of the current study is the lack of confounding variables. For example, the finding that individuals with stable MCI obtained fewer significant change scores than those who transitioned to dementia did not simply occur because cognitive performance was part of the diagnostic criteria to define the groups in the first place (i.e., performance on WMS-III Logical Memory II). The findings indicate that multivariate changes in cognitive performance across multiple cognitive domains may add value towards predicting diagnostic status in older adults.
The present study has several important limitations that influence the generalizability of the findings. First, the utility of this novel approach to identifying significant multivariate change is restricted by the limited neuropsychological test battery examined in this study. Presumably, larger multivariable batteries would produce a greater number of change scores that would normatively fall below a 5th percentile threshold. Additionally, a two-year interval between baseline and follow-up evaluations was selected to maximize the number of available data points for participants converting to MCI or dementia. The change score thresholds demonstrated in this study are less generalizable to neuropsychological evaluations performed over shorter or longer time periods. A second important limitation revolves around the population sampled. The present study was specifically designed to assist with the detection of abnormal cognitive aging, using a large sample of older adult research participants in ADNI as the normative reference. Therefore, the change score thresholds established in this study cannot be validly used to examine multivariate cognitive change in other populations where detecting cognitive change is of great interest, including healthy pediatric and adult populations as well as neurological populations where a gradual decline is not necessarily the expected cognitive trajectory (e.g., post-surgical epilepsy, brain injury populations). Likewise, the predominantly non-Hispanic White sample across all of the ADNI protocols currently available hinders the generalizability of the findings to other racial/ethnic groups (Mindt et al., Reference Mindt, Okonkwo, Weiner, Veitch, Aisen, Ashford, Coker, Donohue, Langa, Miller, Petersen, Raman and Nosheny2022). This is particularly relevant for detecting meaningful cognitive change in historically marginalized racial/ethnic groups who are at heightened risk for cognitive decline.
Each of the aforementioned limitations pose an opportunity to promote this field of research in the future. The study should be replicated in a large, ethnically diverse sample to examine base rates of substantial declines in performance using a larger battery that more closely resembles the number of measures obtained in a typical neuropsychological evaluation. A larger test battery would likely yield a larger 5th percentile change count necessary to identify atypical cognitive change. This study focused on examining cognitive decline rather than increases in cognitive performance; future research should explore the extent to which this approach can be applied to neuromedically stable populations (Cysique et al., Reference Cysique, Franklin, Abramson, Ellis, Letendre, Collier, Clifford, Gelman, McArthur, Morgello, Simpson, McCutchan, Grant and Heaton2011) and other populations where gradual improvement is a likely cognitive trajectory (e.g., mild traumatic brain injury). Future studies should also examine whether specific cognitive domains within a test battery can help distinguish between diagnostic groups. Previous work on reliable change in cognitively normal older adults has focused on memory (Binder et al., Reference Binder, Iverson and Brooks2009; Brooks et al., Reference Brooks, Iverson and White2007); the extent to which declines in other cognitive domains offer unique insights into typical and atypical cognitive change should be explored.
Conclusion
This study demonstrates how examining the multivariate distribution of change scores among cognitively normal older adults may provide normative information for identifying atypical cognitive change. Among older adults assessed over a two-year interval, it was statistically rare to have two or more change scores out of seven measures that fell below the 5th percentile in the distribution of change scores. Older adults who exhibit multivariate changes in performance that exceed these standards are likely experiencing atypical cognitive decline. More research is needed to validate this simple method of examining multivariate cognitive change.
Acknowledgments
This work was supported in part by the Florida Department of Health, Public Health Research, Biomedical Research Program.
Data collection and sharing for this project was funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir, Inc.; Cogstate; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.;Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Lumosity; Lundbeck; Merck & Co., Inc.;Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (www.fnih.org). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Therapeutic Research Institute at the University of Southern California. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California.
Competing interests
The authors have no conflicts of interest to declare.