Introduction
Alzheimer’s disease (AD) is a rising global concern, with approximately 50 million people living with dementia (Nichols et al., Reference Nichols, Steinmetz, Vollset, Fukutaki, Chalek, Abd-Allah, Abdoli, Abualhasan, Abu-Gharbieh, Akram, Al Hamad, Alahdab, Alanezi, Alipour, Almustanyir, Amu, Ansari, Arabloo and Ashraf2022). In the United States, AD presents additional social problems as older Black adults are at twice the risk for being diagnosed compared to older White adults (Weuve et al., Reference Weuve, Barnes, Mendes de Leon, Rajan, Beck, Aggarwal, Hebert, Bennett, Wilson and Evans2018). Given its degenerative course, significant focus has been placed on the early detection of dementia and AD. Evidence of cognitive decline compared to prior functioning is one of the essential criteria for diagnosing dementia (Hugo & Ganguli, Reference Hugo and Ganguli2014). In the absence of baseline cognitive testing, which is most often the case, estimates of premorbid cognitive functioning (PMIQ) help to determine the presence and magnitude of decline.
Previous work has indicated that reading ability is a better estimate of cognitive reserve than total years of education within racially diverse populations, as reading ability may reflect the quality of education received (Manly et al., Reference Manly, Jacobs, Touradji, Small and Stern2002). Reading tests are also relatively resistant to the effects of brain injury and cognitive dysfunction, and as a result, they are often used to estimate PMIQ. A recent review of PMIQ assessments in those with mild and moderate dementia highlighted the need for more research demonstrating the psychometric properties of such assessments (Overman et al., Reference Overman, Leeworthy and Welsh2021). This brief report addresses these literature gaps by providing psychometric evidence for a relatively new measure of PMIQ, Oral Reading Recognition (ORR) scores from the NIH Toolbox Cognition Battery (NIHTB-CB), in a diverse sample.
The NIHTB-CB is a recently developed computerized measure, now adapted to be administered via iPad, that assesses fluid and crystallized cognition (Weintraub et al., Reference Weintraub, Dikmen, Heaton, Tulsky, Zelazo, Slotkin, Carlozzi, Bauer, Wallner-Allen, Fox, Havlik, Beaumont, Mungas, Manly, Moy, Conway, Edwards, Nowinski and Gershon2014). While iPad administration facilitates the economic accessibility of the NIHTB-CB for use with minority and underserved populations, the validity of test scores in this modality must still be established. Some evidence suggests the battery has clinical utility in assessing older adults (Hackett et al., Reference Hackett, Krikorian, Giovannetti, Melendez‐Cabrero, Rahman, Caesar, Chen, Hristov, Seifan, Mosconi and Isaacson2018; Scott et al., Reference Scott, Sorrell and Benitez2019). Other research in healthy young and older adults has broadly reported adequate construct validity for the crystallized composite scores but poor to acceptable validity for fluid cognition scores (Buckley et al., Reference Buckley, Sparks, Papp, Dekhtyar, Martin, Burnham, Sperling and Rentz2017; Loring et al., Reference Loring, Bowden, Staikova, Bishop, Drane and Goldstein2019; Ott et al., Reference Ott, Schantell, Willett, Johnson, Eastman, Okelberry, Wilson, Taylor and May2022). While measures of crystallized cognition are expected to remain relatively stable over time in those with MCI and measures of fluid cognition are expected to decline over time, Kairys and colleagues (Reference Kairys, Daugherty, Kavcic, Shair, Persad, Heidebrink, Bhaumik and Giordani2022) found a different pattern using the NIHTB-CB. In a sample of older Black adults, these researchers found that crystallized scores were more predictive of MCI than fluid scores (Kairys et al., Reference Kairys, Daugherty, Kavcic, Shair, Persad, Heidebrink, Bhaumik and Giordani2022). Taken together, the extant literature generally supports the NIHTB-CB as an adequate measure of crystallized reasoning with relatively poorer validity as a measure of fluid reasoning. Notwithstanding, the literature suggests some further need to investigate the validity and clinical utility of NIHTB-CB scores in racial minorities.
Performance on the ORR test from the NIHTB-CB may estimate PMIQ. However, no extant research demonstrates the validity of the NIHTB-CB ORR scores when administered on the iPad as measures of PMIQ in a large sample of Black and White participants with and without MCI. Validating performance on this instrument as a measure of PMIQ is essential because it may help to characterize baseline cognitive functioning. Additionally, valid tests of PMIQ that are easy to administer and economically accessible may help to alleviate racial disparities in AD as they may become part of more routine and affordable care.
Therefore, in a sample of older Black and White adults with and without MCI, we investigated whether scores on the NIHTB-CB ORR test were related to performance on the Wechsler Test of Adult Reading (WTAR), an established word recognition test that estimates PMIQ (Green et al., Reference Green, Melo, Christensen, Ngo, Monette and Bradbury2008). Convergent validity is the extent to which scores from a measure positively correlate with scores from a similar measure, whereas discriminant validity is the extent to which scores from a given measure do not correlate with scores from a dissimilar measure (Campbell & Fiske, Reference Campbell and Fiske1959). As evidence for convergent validity, we hypothesized positive correlations between scores on the NIHTB-CB ORR scores and WTAR performance. We expected a null relationship between NIHTB-CB ORR and Flanker performance as a confirmation of discriminant validity.
Methods
A total of 243 community-dwelling older adults were recruited across two sites: the Healthier Black Elders Center (Mitchell et al., Reference Mitchell, Perry, Rorai, Ilardo, Lichtenberg and Jackson2020) at the Wayne State University Institute of Gerontology and the Michigan Alzheimer’s Disease Research Center (MADRC). Participants were co-enrolled in allied studies, and all recruitment and baseline measurements occurred between March 2017 and March 2022. The recruitment period was the same for both sites. Inclusion and exclusion criteria followed the MCI criteria established by the 2011 National Institute of Health and Alzheimer’s Association (Albert et al., Reference Albert, DeKosky, Dickson, Dubois, Feldman, Fox, Gamst, Holtzman, Jagust, Petersen, Snyder, Carrillo, Thies and Phelps2011). Participants were excluded if they suffered from significant psychiatric, medical, or neurological deficits other than MCI that may have impaired cognitive ability. Participants were also excluded if they had physical limitations that precluded neuropsychological assessment. A consensus conference determined MCI diagnoses following NACC guidelines outlined previously (Rahman-Filipiak et al., Reference Rahman-Filipiak, Giordani, Heidebrink, Bhaumik and Hampstead2018). The MCI diagnostic group included all-cause MCI. All participants provided informed consent, and the University of Michigan Medical School and Wayne State University IRBs approved data collection. All procedures complied with the Declaration of Helsinki.
Neuropsychological assessments were completed at the MADRC. All participants received the National Alzheimer’s Coordinating Center (NACC) Uniform Data Set, Version 3 (UDS-3) assessments and locally added measures (e.g., the WTAR and the NIHTB-CB). Only baseline measurements were included in the current study. All testing occurred on the same day, and the WTAR preceded the NIHTB-CB.
The WTAR is a standardized reading test and may serve as a measure of PMIQ, as demonstrated by Green et al. (Reference Green, Melo, Christensen, Ngo, Monette and Bradbury2008). A page of words with irregular pronunciations is presented, and the participant is asked to read each word aloud. The word difficulty increases throughout the list, and the examiner determines the number of correctly read items (Green et al., Reference Green, Melo, Christensen, Ngo, Monette and Bradbury2008).
Similarly, the NIHTB-CB ORR test is a standardized reading assessment designed to assess crystallized cognition, administered via iPad. Single, irregularly spelled words appear in the center of the iPad screen and are read aloud by the participant. Items are presented sequentially using computer adaptive testing (i.e., item difficulty is automatically adjusted according to participant performance). The NIHTB-CB Flanker test is a standardized assessment of inhibitory control and attention. It consists of trials in which a central arrow pointing left or right is presented on the iPad, flanked by arrows in a congruent or incongruent direction. The participant is asked to indicate the direction of the central arrow for each trial, and scores are determined by accuracy and response time (Weintraub et al., Reference Weintraub, Dikmen, Heaton, Tulsky, Zelazo, Slotkin, Carlozzi, Bauer, Wallner-Allen, Fox, Havlik, Beaumont, Mungas, Manly, Moy, Conway, Edwards, Nowinski and Gershon2014).
Bayesian ANOVAs and chi-square tests of independence were used for demographic analyses. To evaluate the influence of demographic predictors on both NIHTB-CB and WTAR equally in our sample, we used uncorrected standard scores for all neuropsychological measures. Bayesian bivariate correlations were used to demonstrate convergent and discriminant validity. Intraclass correlation coefficients (ICC) were included as supplemental analyses. Bayesian analyses yield Bayes Factors (BF10) that estimate the relative support for the alternative and null hypotheses. As BF10 values exceed 3, they provide increasing evidence for rejecting the null hypothesis. When BF10 values fall below 1/3 and approach 0, they suggest evidence for accepting the null hypothesis. BF10 values between 1/3 and 3 indicate indeterminate support for either hypothesis (Kruschke, Reference Kruschke2021). Analyses were done in JASP version 0.16.3 (JASP Team, 2022). Default priors were used: ANOVAs used the uniform distribution as the prior; the chi-square prior concentration was set to 1; correlations used a stretched beta prior width of 1.
Results
Demographics
The sample’s sex and racial compositions were approximately 75% female and 63% Black. Age ranged from 55 to 88 years, with an average age of 71.4 (SD = 7.3). Years of education ranged from eight to 20 years, with a mean of 15.7 years (SD = 2.5). Participants comprised two diagnostic groups: 130 were CN, and 113 were diagnosed with MCI. This composition formed four race-by-diagnosis subgroups: 74 CN Blacks, 79 MCI Blacks, 56 CN Whites, and 34 MCI Whites.
2 × 2 Bayesian ANOVAs were conducted to determine whether there were any differences in age or years of education, respectively, depending on cognitive status (CN and MCI) and race (Black and White). Results indicated that age did not differ depending on an interaction effect between race and cognitive status (BFM = 0.11) nor did age differ by race alone (BFM = 0.05). Unsurprisingly, there was a main effect indicating age differed across cognitive status (BFM = 12.20). A post hoc Bayesian t-test showed that the MCI group was 2.7 years older than the CN group, corresponding to a small effect size (BF10 = 9.03, Median δ = 0.37, 95% CI [0.12, 0.62]).
Similarly, years of education did not differ depending on an interaction effect of race and cognitive status (BFM = 0.58). Evidence for differences in years of education between racial groups (BFM = 1.05) and cognitive status groups (BFM = 1.42) was inconclusive. Lastly, Bayesian chi-square tests of independence revealed no conclusive evidence that the proportion of MCI status was related to race (BF10 = 2.02).
Validity coefficients
Bayesian bivariate correlations in the entire sample of participants indicated that ORR scores were strongly positively associated with WTAR scores (r = .90, BF10 = 5.94 × 1085, 95% CI [.87, .92]). ORR scores were not conclusively associated with Flanker test scores (r = .11, BF10 = 0.39, 95% CI [−.01, .24]). WTAR scores and Flanker test scores were also not conclusively correlated (r = .14, BF10 = 0.76, 95% CI [.01, .26]). ICCs were included as supplemental analyses and showed this same pattern of results (see Supplemental Material). Further correlational analyses showed that years of education were moderately positively correlated with performance on the WTAR (r = .47, BF10 = 5.38 × 1011, 95% CI [.36, .56]) and the ORR test (r = .44, BF10 = 6.47 × 109, 95% CI [.33, .53]). However, age was not associated with WTAR scores (r = .10, BF10 = 0.25, 95% CI [−.03, .22]) or with ORR test scores (r = .07, BF10 = 0.15, 95% CI [−.05, .20]). The null relations between age and WTAR scores and age and ORR scores remained for all subgroup analyses except when restricting the sample to White participants only, where weak relations emerged (r = .27, BF10 = 3.69, 95% CI [.07, .45] and r = .28, BF10 = 4.87, 95% CI [.08, .46]).
Restricting analyses to cognitive status, racial, and race-by-diagnosis groups
The same Bayesian bivariate correlations were used when restricting the sample to each cognitive status group (CN and MCI), the racial groups (Black or White participants only), and the four race-by-diagnosis subgroups. Results are summarized in Table 1. These analyses were consistent with the validity coefficients found in the whole sample. The correlations between the neuropsychological assessment scores and demographic covariates are found in Table 2.
*BF10 > 10, ** BF10 > 30, *** BF10 > 100.
†BF10 > 3, * BF10 > 10, ** BF10 > 30, *** BF10 > 100.
Group differences
Bayesian correlation coefficients estimate the correlation parameter value and the 95% most credible values for that parameter. Therefore, comparisons across the subgroups of the data are possible, as nonoverlapping 95% CIs indicate different correlation parameters.
Comparing the correlation parameter estimates for the relationship between WTAR scores and the ORR scores revealed that the 95% CIs for the whole sample overlapped with those of the two diagnostic groups and the four race-by-diagnosis subgroups. Thus, the relationship between WTAR performance and ORR scores is consistent across cognitive status, race, and race-by-diagnosis subgroups, suggesting convergent psychometric validity (see Table 1). Further, this pattern of overlapping 95% CIs also held across the correlation estimates between ORR scores and Flanker scores when restricting the sample to diagnostic and race-by-diagnosis groups. These results provide evidence of construct validity for the NIHTB-CB ORR as a measure of PMIQ in older Black and White adults with and without MCI.
Discussion
In support of our hypotheses, the present study indicated that in a sample of older Black and White adults who were either CN or diagnosed with MCI, scores on the NIHTB-CB ORR test were strongly positively correlated to an established neuropsychological measure of PMIQ and unrelated to Flanker test scores. These results provide evidence of psychometric convergent and discriminant validity, respectively, of the ORR scores as a measure of PMIQ in this population. The validity coefficients remained consistent even when restricting the whole sample to individual cognitive status groups, racial groups, and race-by-diagnosis subgroups.
Age was uncorrelated with scores on the WTAR or the ORR test in this group of older participants over 55 years, except when restricting the sample to White participants only where weak correlations were found. Expectedly, years of education was moderately correlated with both WTAR scores and NIHTB-CB ORR test performance. The current findings are consistent with previous research on reading ability as a better predictor of neuropsychological functioning than years of education (Manly et al., Reference Manly, Schupf, Tang and Stern2005). The present findings also extend the previous research that suggested adequate construct validity for the NIHTB-CB crystallized measures by using an established language functioning measure and expanding the sample to a clinical context (Buckley et al., Reference Buckley, Sparks, Papp, Dekhtyar, Martin, Burnham, Sperling and Rentz2017; Ott et al., Reference Ott, Schantell, Willett, Johnson, Eastman, Okelberry, Wilson, Taylor and May2022).
To our knowledge, this study is the first to have validated NIHTB-CB ORR test scores as a measure of PMIQ in a racially diverse sample of CN older adults and older adults with MCI. The current findings show that NIHTB-CB ORR test scores are valid for estimating the WTAR scores of healthy aging adults and those with MCI. Importantly, these findings demonstrate that the NIHTB-CB ORR test performance functions equally well across racial and race-by-diagnosis subgroups, likely tapping into the same cognitive domain. Given Black Americans’ disproportionate risk of developing AD (Weuve et al., Reference Weuve, Barnes, Mendes de Leon, Rajan, Beck, Aggarwal, Hebert, Bennett, Wilson and Evans2018), validating accessible and practical neuropsychological assessments is an essential step forward for early AD detection. The present findings address these concerns and demonstrate that performance on the iPad NIHTB-CB may be used to estimate baseline cognitive functioning in the context of age-related cognitive decline in older, community-dwelling adults.
This study had several limitations. Sampling bias may play a role in interpreting the results, given that our sample was more female than the general population. Including another paper-and-pencil measure would have bolstered the evidence of discriminant validity. MCI participants were all-cause type, which could obfuscate differences in the relationship between ORR and WTAR scores across MCI subtypes. Nevertheless, the present study establishes robust validity coefficients (in the .80–.90 range), demonstrating the ability of NIHTB-CB ORR scores to estimate PMIQ.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S1355617723000425
Acknowledgments
The authors have no conflicts of interest to disclose. This research was partially supported by funding from NIH/NIA grant P30 AG053760 to the MADRC (PI, H. Paulson) and from NIA/NIH, R01 AG 054484 (PI, V. Kavcic).