As of 2020, there were over 55 million people worldwide living with dementia. This number is expected to almost double every 20 years, reaching approximately 139 million by 2050 (World Health Organisation, 2022). Projected trends suggest that over 700,000 Australian older adults are expected to meet criteria for dementia by 2050, with one-third consisting of individuals from culturally and linguistically diverse backgrounds (Australian Institute of Health & Welfare, 2018). Research has consistently reported that many English language neuropsychological measures and normative data are prone to misclassification when applied to culturally and linguistically diverse groups (Daugherty et al., Reference Daugherty, Puente, Fasfous, Hidalgo-Ruzzante and Pérez-Garcia2017; Heaton et al., Reference Heaton, Taylor, Manly, Tulsky, Saklofske, Chelune, Heaton, Ivnik, Bornstein, Prifitera and Ledbetter2003). Poor test specificity has been attributed to cultural and linguistic heterogeneity, level and quality of education, use of unrepresentative normative data, culturally biased test content, and test-taking attitudes (Rivera Mindt et al., Reference Rivera Mindt, Byrd, Saez and Manly2010; Shuttleworth-Edwards, Reference Shuttleworth-Edwards2016). These issues have been extensively examined within minority groups in the United States, including African Americans and Hispanics, where healthy individuals were over three times more likely to be misclassified as impaired (Heaton et al., Reference Heaton, Taylor, Manly, Tulsky, Saklofske, Chelune, Heaton, Ivnik, Bornstein, Prifitera and Ledbetter2003). In response to this issue, demographically focused normative data have been developed across a range of culturally diverse populations (Nielsen et al., Reference Nielsen, Segers, Vanderaspoilden, Bekkhus-Wetterberg, Minthon, Pissiota, Bjørkløf, Beinhoff, Tsolaki, Gkioka and Waldemar2018; Pienaar et al., Reference Pienaar, Shuttleworth-Edwards, Klopper and Radloff2016). While it is anticipated that normative data specifically developed for ethnic minority groups will improve the validity of neuropsychological assessment, surprisingly limited empirical research has examined this issue. The Neuropsychological Norms for the U.S.-Mexico Border Region in Spanish Project (NP-NUMBRS) reported that their adapted tests and normative data more accurately classified healthy Spanish-speaking individuals and yielded expected rates of impairment for a majority of measures compared to published normative data for English-speaking non-Hispanic Whites and Blacks in the United States (Díaz-Santos et al., Reference Díaz-Santos, Suárez, Marquine, Umlauf, Rivera Mindt, Fortuny, Heaton, R. and Cherner2021; Marquine et al., Reference Marquine, Morlett Paredes, Madriaga, Blumstein, Umlauf, Kamalyan and Cherner2020; Marquine et al., Reference Marquine, Yassai-Gonzalez, Perez-Tejada, Umlauf, Kamalyan, Morlett Paredes, Rivera Mindt, Artiola, Fortuny, Cherner and Heaton2021; Scott et al., Reference Scott, Morlett Paredes, Taylor, Umlauf, Fortuny, Heaton, R., Cherner, Marquine and Rivera Mindt2021; Suarez et al., Reference Suarez, Díaz-Santos, Marquine, Kamalyan, Mindt, Umlauf, Heaton, Grant and Cherner2021). However, further research across multiple international contexts is required to ascertain whether the use of adapted tests and minority group reference normative data indeed results in reducing misclassification rates in healthy culturally diverse individuals and improves diagnostic accuracy across a range of neurocognitive disorders.
In the absence of valid and reliable neuropsychological measures, accurate assessment of cognitive disorders within ethnic minority and/or non-English-speaking groups continues to be an ongoing challenge facing the global neuropsychological community, including Australia (Blakemore et al., Reference Blakemore, Kenning, Mirza, Daker-White, Panagioti and Waheed2018; Franzen et al., Reference Franzen, Papma, van den Berg and Nielsen2021; Rivera Mindt et al., Reference Rivera Mindt, Byrd, Saez and Manly2010; Wallace et al., Reference Wallace, Berry and Shores2018). Similar to other developed nations, illiteracy rates in Australia are very low and the average level of education is high (Australian Institute of Health and Welfare, 2021). In contrast, a large proportion of aging Australian ethnic minorities attained low levels and a poorer quality education prior to immigrating, relative to standards noted in their Australian-born elderly counterparts (Fratti et al., Reference Fratti, Bowden and Pino2011; Jupp, Reference Jupp2001; Noutsos, Reference Noutsos2003; Plitas et al., Reference Plitas, Tucker, Kritikos, Walters and Bardenhagen2009). A case in point is elderly Greek Australian immigrants, who represent one of the largest aging ethnic minority groups in Australia, totaling approximately 63,000 individuals aged 65 years and older, with over 40,000 having attained a primary level of education or less (Australian Bureau of Statistic, 2016; Fanany & Avgoulas, Reference Fanany, Avgoulas and Ratuva2019; Staios, Reference Staios and Irani2022). Accordingly, performance on most neuropsychological tests is heavily influenced by years (and quality) of education (Shuttleworth-Edwards, Reference Shuttleworth-Edwards2016). However, older low educated immigrant groups are underrepresented in a majority of normative studies, making it extremely difficult to interpret what constitutes a normal performance on several neuropsychological measures (Kosmidis, Reference Kosmidis2017; Nielsen & Waldemar, Reference Nielsen and Waldemar2021). Given the noted limitations with using English language tests and normative data, we adapted test content and developed normative data for a broad range of neuropsychological measures, including general intelligence (i.e., Wechsler Adult Intelligence Scale-Fourth Edition [WAIS-IV] Greek Adaptation; Wechsler, Reference Wechsler2014), verbal and visual memory tests, language and naming, and executive functioning measures for use with Greek Australian older adults (Staios et al., Reference Staios, Kosmidis, Kokkinis, Papadopoulos, Nielsen, Kalinowski, March and Stolwyk2023a, b). The first aim of this study was to compare our newly established Greek Australian normative data with existing published English language normative data in terms of impairment rates they yield within a healthy Greek Australian older adult sample. We hypothesized that using Greek Australian normative data would produce lower impairment rates, while English language normative data would produce significantly higher rates of impairment. The second aim of the study was to examine whether Greek Australian normative data could yield sensitive and specific cut scores to distinguish healthy Greek Australians from Greek Australians with a diagnosis of Alzheimer’s disease (AD).
Method
Participants
Healthy participants were recruited using convenience sampling from several Greek social clubs throughout the Melbourne metropolitan area. In order to be eligible for participation, participants had to be aged between 70–85 years, literate, and immigrants from Greece, with Greek as their dominant language. The exclusion criteria for the participants were: (i) a score on the Mini-Mental State Examination below 22/30 (MMSE; Fountoulakis et al., Reference Fountoulakis, Tsolaki, Chantzi and Kazis2000; Plitas et al., Reference Plitas, Tucker, Kritikos, Walters and Bardenhagen2009); (ii) a score of < 0.5 on the Clinical Dementia Rating Scale (CDR; Hughes et al., Reference Hughes, Berg, Danziger, Coben and Martin1982); (iii) history of serious neurological or psychiatric conditions known to impact cognition; (iv) a score ≥ 6/15 on the Geriatric Depression Scale-15 (GDS-15) (Fountoulakis et al., Reference Fountoulakis, Tsolaki, Iacovides, Yesavage, O’Hara and Kazis1999) or a score of ≥ 8/20 on the Geriatric Anxiety Inventory (GAI; Pachana et al., Reference Pachana, Byrne, Siddle, Koloski, Harley and Arnold2007); (v) evidence of long term alcohol or substance abuse; and (vi) uncorrected sensory and/or motor deficits.
Participants with AD were recruited via specialist outpatient clinics throughout the Melbourne metropolitan area. Inclusion and exclusion criteria were the same as for healthy participants but also requiring a probable diagnosis of AD (therefore MMSE, CDR exclusion criteria did not apply). The clinical diagnosis of AD was classified using the following methods: interview with the patient and (when possible) an informant; a neurological, physical, and psychiatric examination; cognitive screening with the Greek version of the MMSE or the Neuropsychiatry Unit Cognitive Assessment Tool (NUCOG; Walterfang et al., Reference Walterfang, Velakoulis, Gibbs and Lloyd2003); a score of ≥1 on the CDR; laboratory screening with blood tests, and structural brain imaging with computerized tomography, magnetic resonance imaging, and/or functional brain imaging with single-photon emission computed tomography. Diagnoses were based on using evidence from all clinical results, in conjunction with the established diagnostic criteria as described by the Diagnostic and Statistical Manual of Mental Disorders—Fifth Edition (DSM-5; American Psychiatric Association, 2013) for neurocognitive disorders and the diagnostic research criteria for AD (Dubois et al., Reference Dubois, Feldman, Jacova, DeKosky, Barberger-Gateau and Cummings2007).
Measures
The measures presented in the section to follow were previously administered in the Greek language and normed for use with Greek Australian older adults. This has been described in detail elsewhere (Staios et al., Reference Staios, Kosmidis, Kokkinis, Papadopoulos, Nielsen, Kalinowski, March and Stolwyk2023a, b). To compare performances between normative data sets, we selected comparable Australian tests and normative data routinely used in clinical practice for the assessment of older adults. In the absence of local Australian normative studies, we employed tests and normative data from the United States and Canada, which are also routinely used for the assessment of older adults in Australia. For the present study, the term English language normative data will be used throughout this paper when referring to existing published normative data developed for the majority English-speaking population, including normative data published in test manuals and research papers.
Cognitive screener
Mini-Mental State Exam – Greek Adaptation (MMSE; Fountoulakis et al., Reference Fountoulakis, Tsolaki, Chantzi and Kazis2000). Previous research has established that using the recommended Greek national MMSE cutoff of < 24/30 has resulted in large proportions of healthy Greek Australian older adults falling within the impaired range (i.e., 35% scored below the MMSE cutoff score of < 24/30; Plitas et al., Reference Plitas, Tucker, Kritikos, Walters and Bardenhagen2009). Therefore, we selected an MMSE cutoff of < 22/30, which has been recommended when screening low educated ethnic minority populations (Kochhann et al., Reference Kochhann, Varela, de Macedo Lisboa and Chaves2010; Pedraza et al., Reference Pedraza, Clark, O’Bryant, Smith, Ivnik, Graff-Radford, Lucas, Willis, Petersen and Lucas2011).
General intelligence
The Wechsler Adult Intelligence Scale-Fourth Edition, Greek Adaptation (WAIS-IV GR; Wechsler, Reference Wechsler2014). The Greek version of the WAIS-IV was adapted from the original version of the WAIS-IV (Wechsler, Reference Wechsler2008) and includes the same subtests, items, and score structure with only minor cultural adaptations in item content/scoring. It includes 10 core and five supplemental subtests and yields a full-scale IQ (FSIQ) and four index scores (verbal comprehension index, VCI; perceptual reasoning index, PRI; working memory index, WMI, and processing speed index, PSI). English language normative data used to calculate standard performances were derived using US data (Wechsler, Reference Wechsler2008).
New learning & memory
The Hopkins Verbal Learning Test – Revised Form 5 (HVLT-R; Brandt & Benedict, Reference Brandt and Benedict2001). The HVLT-R is a verbal memory task consisting of 12 words from three semantic categories (professions, food, and sports) presented over three trials, with correct words summed to equal a total score (range = 0–36). Following a 30-min interference interval, participants are required to recall as many words learned from the list presented (range = 0–12), followed by a yes/no recognition trial. The recognition component has a total of 24 words: 12 target words, 6 semantically related words, and 6 semantically unrelated words. English language normative data used to calculate standard performances were derived using Australian data (Hester et al., Reference Hester, Kinsella, Ong and Turner2004).
Greek Story Memory Tests (Kosmidis et al., Reference Kosmidis, Bozikas and Vlahou2012). The Greek Story Memory Tests assesses narrative memory under a free recall condition. A short story relating to a couple taking a holiday on a Greek island is presented over two trials, with immediate recall after each trial (range = 0–32). After a 30-min interference interval, participants are required to recall as much of the story as possible (range = 0–16), followed by a 10-question recognition trial (range = 0–10).
Wechsler Memory Scale - Fourth Addition (WMS-IV; Wechsler, Reference Wechsler2009) Visual Reproduction. WMS-IV Visual Reproduction assesses memory for visual stimuli. Five pages of designs are presented in turn, for 10 s each. After each page is presented, the examinee is asked to draw the designs (range = 0–43). After a 30-min interference interval, participants are asked to draw as many designs as they can remember, in any order (range = 0–43). Finally, a recognition trial is administered where the examinee is required to select the target design among six options (range = 0–7). English language normative data used to calculate standard performances were derived using US data (Wechsler, Reference Wechsler2009).
Language & naming
Greek Naming Test (Kosmidis et al., Reference Kosmidis, Bozikas and Vlahou2012). The Greek Naming Test requires participants to name a series of 40 black and white pictures of common objects, giving a range of scores from 0 to 40 points.
Executive functions
Color Trails Test (CTT; D’Elia et al., Reference D’Elia, Satz, Uchiyama and White1996). The CTT is a nonalphabetical version of the Trail Making Test (Reitan, Reference Reitan1955) that was developed for cross-cultural populations. In CTT 1, participants are required to connect numbered circles in ascending order. In CTT 2, participants are required to switch between pink and yellow colors while connecting numbers in an ascending order (i.e., pink 1, yellow 2, pink 3, yellow 4, and so on). We recorded the time for the completion for both CTT 1 and 2 in seconds. We also recorded number errors, color errors, self-corrected errors, prompts, and the interference index (ratio of CTT 2 – 1/CTT 1 time scores). English language normative data used to calculate standard performances were derived using US data (D’Elia et al., Reference D’Elia, Satz, Uchiyama and White1996).
Semantic Verbal Fluency (Kosmidis et al., Reference Kosmidis, Vlahou, Panagiotaki and Kiosseoglou2004). Participants were required to generate as many different animal names and ‘things you can buy in a supermarket’ within 1 min. English language normative data used to calculate standard performances were derived using Australian data (Wardill et al., Reference Wardill, Anderson, Graham and Perre2009).
Victoria Stoop Test (Regard, Reference Regard1981): The Victoria Stoop Test consists of three A4 cards, each containing six rows of four items. In Part D (Dots), participants must name as quickly as possible the color of 24 dots printed in blue, green, red, or yellow. In Part W (Words) the dots are replaced by common words (when, hard, and over), where participants are required to name the colors of the words printed. In Part C (Colors) stimuli are the color names (blue, green, red, and yellow) printed in a color that does not correspond to the name of the color (e.g., “red” is written in blue ink). For each condition, the scores are the time of completion and the number of uncorrected errors. Also, an Interference Index was calculated (Colors/Dot time scores). English language normative data used to calculate standard performances were derived using Canadian data (Troyer et al., Reference Troyer, Leach and Strauss2006).
Procedure
The study was approved by the Monash University Human Research Ethics Committee and was conducted in accordance with the Declaration of Helsinki. All participants volunteered and did not receive financial compensation. Participants were assessed individually in a quiet, distraction-free room in their homes. Home visits were offered as a means of maximizing recruitment potential, participant comfort, and reducing the cost and burden of travel. The first author contacted nine Greek social clubs throughout the Melbourne metropolitan area and was subsequently invited to present information to attendees relating to the normative studies (Staios et al., Reference Staios, Kosmidis, Kokkinis, Papadopoulos, Nielsen, Kalinowski, March and Stolwyk2023a, b). Following a general introduction, individuals who showed interest in participation were provided with a plain language statement and allowed to ask additional questions. Those interested in participating provided contact details and a time was arranged to complete the assessment battery. AD participants and their next of kin were informed about the present study by their treating physician(s) during routine clinic attendance and were provided with a plain language statement. Individuals interested in taking part in the study consented to being contacted by the research team to complete the assessment battery. All participants provided informed consent. In instances where individuals with AD were unable to provide informed consent, their next of kin provided consent of their behalf.
Prior to completing cognitive assessment measures, all participants were initially screened using the Greek language adaptions of the MMSE, GDS-15, and GAI, and underwent a semi-structured clinical interview and provided information regarding demographics (e.g., date of birth, education level, years in Australia), medical history (e.g., history of head injury with loss of consciousness, psychiatric or neurological diagnoses), and cultural factors (e.g., age of immigration, English/Greek and/or other language skills). Healthy participants also provided information regarding self-reported changes in activities of daily living and memory.
The data presented were collected between May 2018 to February 2020. All tests were administered in Greek by two registered bilingual Greek-English-speaking clinical neuropsychologists. Testing occurred over two sessions, lasting approximately 2 hr in total for healthy participants. Healthy participants completed the test battery reported in the materials section. AD participants completed a shortened test battery, consisting of the following tests: WAIS-IV GR Selected Subtests: Block Design, Matrix Reasoning, Similarities, Digit Span; HVLT-R; WMS-IV Selected Subtests: Visual Reproduction; Greek Story Memory Test; Greek Naming Test; Semantic Verbal Fluency: Animals and Supermarket, and the CTT. Testing occurred over one session and lasted approximately 1 hr for AD participants.
Data analysis
Statistical analyses were performed using the Statistical Package for the Social Sciences (SPSS), Version 22.0. Independent sample t tests were used to examine group differences between healthy and AD participants on age, years of education, time in Australia, MMSE, CDR, GDS-15, and GAI scores. The chi-square test of independence was used to examine group differences regarding the distribution of sex. Cohen’s d was used to calculate effect sizes with d values = 0–0.2 small, 0.2–0.5 medium, and >0.8 large (Cohen, Reference Cohen1988).
Comparison of standardized performances and impairment rates using Greek Australian versus English language normative data
Comparison of standardized performances and impairment rates yielded from newly developed Greek Australian normative data and English language normative data was examined only within healthy participants (N = 90). All test raw scores were converted to age adjusted (and where available education adjusted) scaled scores. A series of paired samples t tests were used to compare differences between normative data sets on neuropsychological measures within healthy participants. An alpha level of less than .05 two-tailed was considered statistically significant. Cohen’s d was again used to calculate effects sizes.
The number of participants classified as impaired versus intact were calculated separately using Greek Australian normative data and English language normative data. For both normative data sets, a performance of 1.5 standard deviations below the mean was used to indicate impairment. The relative proportion of the Greek Australian sample scoring 1.5 standard deviations below the mean was compared to the theoretically expected proportion of 7% of the sample that should fall within the impaired range. A comparison of the proportion of impairment rates when using Greek Australian versus English language normative data were examined using McNemar’s tests (McNemar, Reference McNemar1947).
Sensitivity and specificity of AD-related cognitive impairment detection within older Greek Australians
A series of independent sample t tests were first conducted to compare raw score performances between healthy and AD participants on all neuropsychological measures. An alpha level of less than .05 two-tailed was considered statistically significant. Cohen’s d was again used to calculate effect sizes for all neuropsychological measures. A receiver operating characteristic (ROC) curve analysis was used to identify optimal cutoff scores and examine the sensitivity and specificity of the neuropsychological tests using AD participants as the reference standard. Sensitivity and specificity were estimated by the area under the curve (AUC), with a 95% confidence interval. Sensitivity and specificity were calculated for the overall sample with AUC values = < 0.5–0.6 poor, = 0.7–0.8 acceptable, = 0.8–0.9 excellent, and = > 0.9 outstanding (Hosmer et al., Reference Hosmer, Lemeshow and Sturdivant2013). Cut scores for each measure were calculated in order to correspond to the optimal balance between sensitivity and specificity. Sensitivity and specificity of 5th and 10th percentile cut scores according to age adjusted normative data for each measure, were also calculated.
Results
Participants
One hundred and one healthy participants were initially recruited. Five participants were excluded from the study; three scored below cutoff for cognitive impairment on the Greek version of the MMSE; two scored above the cutoff for depression on the GDS-15. A further six participants withdrew due to time constraints as a result of caregiving duties or upcoming medical procedures. The final healthy sample consisted of 90 community-dwelling individuals. A total of 24 participants with a diagnosis of AD were invited to take part in the study. Four participants withdrew and did not complete testing due to a decline in physical and/or mental health. The final sample consisted of 20 participants with a confirmed diagnosis of AD.
Demographic data for the healthy and AD participants are presented in Table 1. In summary, AD and healthy participants did not differ with respect to age, years of education, sex, or years in Australia. In contrast, there was a significant difference between healthy and AD participants on the MMSE, with the latter group achieving significantly lower scores. All AD participants were classified as moderately impaired on the CDR, while healthy participants fell within normal limits. AD participants endorsed more symptoms of depression than healthy participants, however, no participant in either group reported clinically significant levels of depression or anxiety.
AD = Alzheimer’s disease, CDR = Clinical Dementia Rating Scale, GDS = Geriatric Depression Scale, GAI = Geriatric Anxiety Inventory, MMSE = Mini-Mental State Exam, n = number, SD = standard deviation.
Comparison of standardized performances and impairment rates calculated using Greek Australian versus English language normative data
Standardized performances of our healthy sample, calculated separately using Greek Australian and English language normative data, are presented in Table 2. In summary, mean performances yielded using English language normative data were significantly lower than those obtained using Greek Australian normative data for all tests, with medium to large effect sizes noted. The magnitude of differences between normative data sets generally ranged between one to two standard deviations below the mean when using English language normative data.
d = effect size, IQR = interquartile range, M = mean, SD = standard deviation.
Rates of impairment across neuropsychological tests, calculated separately using Greek Australian and English language normative data, are presented in Figure 1 (WAIS-IV subtests) and Figure 2 (all other neuropsychological measures), corresponding to a 1.5 standard deviation criterion. Impairment rates derived from the Greek Australian normative data using a 1.5 standard deviation criterion showed that rates of impairment generally fell within the expected 7% range. In contrast, impairment rates for all tests derived using English language normative data (except for the HVLT-R Discrimination Index) were significantly higher and ranged from 11-66% (all ps < .05).
Sensitivity and specificity of AD-related cognitive impairment detection within older Greek-Australians
Raw neuropsychological test data for all tests are presented separately for healthy and AD participant groups in Table 3. In summary, mean scores for the AD group were significantly lower compared to the healthy group (all ps < .05), with all tests displaying medium to large effect sizes. Results from the ROC curve analyses are presented in Table 4 with AUC ranging from .721 to .999 across tests. More specifically, outstanding sensitivity and specificity (AUC > .90) were noted for Block Design and Digit Span (all conditions), Hopkins Verbal Learning Test-Revised (Total Score, Delayed Recall, Percentage Retained, Discrimination Index), Greek Story Memory Test (all conditions), Visual Reproduction II and Semantic Fluency (all conditions). Excellent sensitivity and specificity (AUC > .80) were noted for Matrix Reasoning, Hopkins Verbal Learning Test-Revised (Learning), Visual Reproduction I, and Visual Reproduction Recognition. In contrast, acceptable sensitivity and specificity (AUC > .70) were noted for the Similarities subtest and CTT (all conditions). Sensitivity ranged from .450 to 1.00, with a mean sensitivity of .956 across all measures. Specificity ranged from .511 to .978, with a mean specificity of .844 across all measures. Sensitivity and specificity of 5th percentile and 10th percentile cut scores are also presented in Table 4. Across measures, sensitivity ranged from .200 to 1.00, with a mean sensitivity of .692. Specificity ranged from .856 to 1.00, with a mean specificity of .926.
d = effect size, IQR = interquartile range, M = mean, SD = standard deviation.
AUC = area under the curve, ROC = receiver operating characteristic, SN = sensitivity, SP = specificity.
Discussion
This study aimed to compare standard performances and impairment rates of a healthy Greek Australian older adult sample calculated using either Greek Australian or English language normative data. We also sought to examine whether cut scores could be identified, capable of sensitively and specifically distinguishing between healthy Greek Australians from those with a diagnosis of AD. Consistent with previous research in other ethnic minority groups (Heaton et al., Reference Heaton, Taylor, Manly, Tulsky, Saklofske, Chelune, Heaton, Ivnik, Bornstein, Prifitera and Ledbetter2003), the application of English language normative data within our Greek Australian sample resulted in significantly lower performances across all tests and higher impairment, with medium to large effect sizes noted across most measures. In contrast, when Greek Australian normative data were used, performances yielded typical impairments rates and distributions reflecting expected interindividual variability within a healthy population (Schretlen et al., Reference Schretlen, Munro, Anthony and Pearlson2003), with approximately 7% of the sample falling below the 1.5 standard deviation cutoff. While statistical comparisons of education between standardization samples were not possible, the level of education observed within the English language normative samples were substantially higher when compared to the Greek Australian sample; a factor that most likely contributed to the magnitude of the current findings. Similar findings have been noted in previous research, where the application of English language normative data to culturally diverse individuals with ≤6 years of education has been noted to result in high misclassification rates (Cherner et al., Reference Cherner, Suarez, Lazzaretto, Fortuny, Mindt, Dawes, Marcotte, Grant and Heaton2007). Consistent with previous research, the present findings demonstrate that the use of representative normative data resulted in superior performance of the cognitive measures, with the ability to accurately classify this group of educationally disadvantaged older adults and reduce rates of misclassification (Díaz-Santos et al., Reference Díaz-Santos, Suárez, Marquine, Umlauf, Rivera Mindt, Fortuny, Heaton, R. and Cherner2021; Marquine et al., Reference Marquine, Morlett Paredes, Madriaga, Blumstein, Umlauf, Kamalyan and Cherner2020; Marquine et al., Reference Marquine, Yassai-Gonzalez, Perez-Tejada, Umlauf, Kamalyan, Morlett Paredes, Rivera Mindt, Artiola, Fortuny, Cherner and Heaton2021; Scott et al., Reference Scott, Morlett Paredes, Taylor, Umlauf, Fortuny, Heaton, R., Cherner, Marquine and Rivera Mindt2021; Suarez et al., Reference Suarez, Díaz-Santos, Marquine, Kamalyan, Mindt, Umlauf, Heaton, Grant and Cherner2021).
We found that the application of English language normative data resulted in significantly lower mean scores and over 57% of healthy participants were classified as impaired on all WAIS-IV verbal subtests. In light of previous research findings, it was not surprising that the use of nonrepresentative normative data resulted in high impairment rates on verbal subtests given the level of education observed within the present sample (Razani et al., Reference Razani, Murcia, Tabares and Wong2007; Shuttleworth-Edwards et al., Reference Shuttleworth-Edwards, Kemp, Rust, Muirhead, Hartman and Radloff2004). In contrast, while mean scores on WAIS-IV nonverbal subtests were marginally higher (except for Matrix Reasoning) in comparison to verbal subtests, the use of English language normative data underestimated performances and impairment rates were still unacceptably high. While past research has established that nonverbal measures are also prone to the effects of culture and education, their effects may be less pronounced as such measures are less dependent on language (Ardila et al., Reference Ardila, Ostrosky-Solis, Rosselli and Gómez2000). Furthermore, high impairment rates when using English language normative data were also noted on tasks with a timed component, supporting the premise that attitudes toward speeded tasks can be impacted by cultural orientations toward time (Agranovich et al., Reference Agranovich, Panter, Puente and Touradji2011; Messinis et al., Reference Messinis, Malegiannaki, Christodoulou, Panagiotopoulos and Papathanasopoulos2011). Interestingly, the application of English language normative data resulted in misclassifying 33% of healthy participants on WMS-IV Visual Reproduction I. This finding is also consistent with previous research examining drawing tests in older low educated immigrant groups, which have consistently shown that completing such tests is particularly difficult in these populations due to limited experience with using pencils, remembering graphic symbols, and the organization and analysis of visuospatial information (Hong et al., Reference Hong, Yoon, Shim, Cho, Lee, Kim and Yang2011; Nielsen & Jørgensen, Reference Nielsen and Jørgensen2013; Staios et al., Reference Staios, Nielsen, Kosmidis, Papadopoulos, Kokkinias, Velakoulis, Tsiaras, March and Stolwyk2022). Finally, no differences were observed on the HVLT-R Discrimination Index between normative data sets. This finding suggests that while cultural and education factors appeared to significantly impact learning and free recall, these factors may have a relatively lower impact on the retention of information and ability to discriminate between true and false positives items. However, further research is needed to confirm these outcomes.
Overall, we were able to establish high sensitivity and specificity in our sample, while also providing cut scores for a broad range of neuropsychological measures. Results showed that Greek Australian participants with AD performed poorer than their healthy counterparts on measures of verbal and visual memory, language, visuospatial skills, and executive functions, with most neuropsychological measures providing robust sensitivity and specificity. Among the neuropsychological tests used in this study, Block Design, Digit Span (total and backward), HVLT-R, Greek Story Memory Test, Greek Naming Test, and Semantic Verbal Fluency were found to have the highest sensitivity and specificity (AUC > .90), followed by Visual Reproduction (AUC > .80). Consistent with previous research (Sala et al., Reference Sala, Illán-Gala, Alcolea, Sánchez-Saudinós, Salgado, Morenas-Rodríguez, Subirana, Videla, Clarimón, Carmona-Iragui, Ribosa-Nogué, Blesa, Fortea and Lleó2017; Salimi et al., Reference Salimi, Irish, Foxe, Hodges, Piguet and Burrell2018; Weissberger et al., Reference Weissberger, Strong, Stefanidis, Summers, Bondi and Stricker2017), higher sensitivity and specificity was found on tasks assessing verbal and visual memory in discriminating AD, given the profound deficits that those diagnosed with AD exhibit in these cognitive domains (Clarens et al., Reference Clarens, Crivelli, Calandri, Chrem Méndez, Martin, Russo and Allegri2022; Ramirez-Gomez et al., Reference Ramirez-Gomez, Zheng, Reed, Kramer, Mungas, Zarow and Chui2017). Furthermore, measures of executive functioning, namely working memory and verbal fluency, showed equivalent sensitivity and specificity. Previous research examining dementia profiles in ethnic minority groups have noted that verbal memory deficits outweigh those observed in visuospatial skills, language and executive functioning, in the initial stage of disease progression (Barnes-Marrero et al., Reference Barnes-Marrero, Horter, Hayden, Patel, Mendoza and Castillo2022; Nielsen et al., Reference Nielsen, Segers, Vanderaspoilden, Bekkhus-Wetterberg, Minthon, Pissiota, Bjørkløf, Beinhoff, Tsolaki, Gkioka and Waldemar2018). In this context, the degree of executive deficits observed in our AD group is likely attributed to the fact that they were moderately impaired, where such deficits are commonly observed (Weintraub et al., Reference Weintraub, Wicklund and Salmon2012).
Moreover, of all the neuropsychological measures administered, the Similarities subtest and the CTT showed low specificity (AUC = <.80). Poor specificity values noted on the Similarities subtest were likely due to factors such as level and quality of education obtained in Greece pre-immigration and acculturation (Staios, Reference Staios and Irani2022). Regarding the CTT, results revealed that the total time score for both conditions was found to yield a high proportion of false positive results, as indicated by low specificity values, leading to a misidentification of approximately half of the healthy participants. Similar findings have previously been noted in a sample of educationally disadvantaged Brazilian older adults, where both conditions of the CTT time score for were found to lack sensitivity and specificity when compared to AD participants (Araujo et al., Reference Araujo, Nielsen, Barca, Engedal, Marinho, Deslandes, Coutinho and Laks2020). These findings are likely due to several factors, including no previous experience with psychometric testing, level and quality of education, and cultural orientation toward time and speeded tasks (Agranovich et al., Reference Agranovich, Panter, Puente and Touradji2011; Al-Jawahiri & Nielsen, Reference Al-Jawahiri and Nielsen2021; Messinis et al., Reference Messinis, Malegiannaki, Christodoulou, Panagiotopoulos and Papathanasopoulos2011). In light of these findings, in instances where clinicians are required to carry out evaluations of Greek Australian elders with moderate dementia, we recommend using the following measures: WAIS-IV Block Design and Digit Span, Hopkins Verbal Learning Test-Revised, Greek Story Memory Test, Greek Naming Test, and Verbal Fluency. The aforementioned measures displayed excellent sensitivity and specificity in discriminating health from AD participants and are also appropriate for testing several cognitive domains known to be impacted by AD (Lezak et al., Reference Lezak, Howieson, Bigler and Tranel2012).
Finally, we aimed to identify cut scores for a broad range of neuropsychological measures, based on the optimal balance between sensitivity and specificity, and compare this to traditionally used benchmarks of 5th and 10th percentiles. Overall, optimal cut scores yielded excellent results for the assessment of moderately impaired individuals with AD and are now available for research and clinical use. Inspection of the data concerning the 5th percentile revealed that this cutoff compromised sensitivity across most tasks. In contrast, the cut scores across both the suggested optimal and 10th percentile ranges remained approximately the same. However, a notable difference was observed for the Similarities and Matrix Reasoning subtests, Visual Reproduction (recognition), and CTT, where the use of 5th and 10th percentiles corresponded to a higher discrepancy between sensitivity and specificity in favor of the latter in comparison with the optimal cutoff scores. This finding may provide useful information in clinical practice, as it allows the selection of the cutoff score with the most suitable combination between sensitivity and specificity depending on the clinical question at hand. In other words, a high sensitivity score may be preferable when older adults at higher risk for dementia need to be identified, while a high specificity score could be more useful when confirmation of AD is necessary. In summary, the selection of the cutoff score for these specific tests could provide different outcomes according to the goal of the clinical assessment.
This study had several limitations. First, participants were selected through convenience sampling and consisted of primary school educated individuals. Therefore, results are not generalizable or appropriate for clinical use when assessing Greek Australians with higher levels of education. Second, the fact that our clinical group comprised moderately impaired AD participants, as can be inferred by the discrepancy in MMSE and CDR scores between the two groups, may partially explain the excellent sensitivity and specificity observed in several neuropsychological tests. Therefore, the cut scores derived in this study should only be applied to Greek Australians presenting with moderate to severe forms of dementia. In cases where milder forms of dementia are suspected, we advise referring to our normative studies and using these data to calculate probable levels of impairment (Staios et al., Reference Staios, Kosmidis, Kokkinis, Papadopoulos, Nielsen, Kalinowski, March and Stolwyk2023a, b). Despite our best efforts to recruit participants with mild cognitive impairment, we were unable to do so. Greek Australians and other culturally diverse groups continue to be underrepresented in clinical research, relative to their Anglo Australian counterparts (Low et al., Reference Low, Barcenilla-Wong and Brijnath2019). Factors such as stigma, limited access to culturally appropriate resources and treatment services continue to be an issue facing culturally diverse Australians, leading to underutilization of health care services (LoGiudice et al., Reference LoGiudice, Hassett, Cook, Flicker and Ames2001; Low et al., Reference Low, Anstey, Lackersteen, Camit, Harrison, Draper and Brodaty2010; Phillipson et al., Reference Phillipson, Magee, Jones, Reis and Skladzien2015). As a result, a combination of these factors impacted our ability to recruit participants with early-stage AD. Future research would benefit from exploring the sensitivity and specificity of the tests used in the present study within a broader and milder range of clinical presentations, as well as their ability to differentiate between a range of neurocognitive disorders within the Greek Australian community.
In conclusion, we believe our findings represent an important contribution to the field of clinical neuropsychology. We have demonstrated how using English language normative data within healthy educationally disadvantaged ethnic minority groups can result in erroneous diagnostic outcomes. We have also demonstrated how use of demographically focused normative data and alternate cut points can help ameliorate this issue. We anticipate that the methodology employed in the present study may be used as a template for other ethnic minority groups to improve cross-cultural neuropsychological practice and test development internationally. Finally, while culturally specific tests and normative data may facilitate accuracy of testing outcomes, they should not be viewed as the only solution to improving the validity of cognitive assessment within underrepresented groups (Ardila, Reference Ardila2005). The need for increasing cultural competence, the development of culturally considered clinical guidelines, ongoing professional development, and wider representation within the field of neuropsychology are necessary (Rivera Mindt et al., Reference Rivera Mindt, Byrd, Saez and Manly2010).
Funding statement
This research was supported by Fronditha Care, the Australasian Hellenic Educational Progressive Association, Marathon Foods, Anna Timou, the Australian Hellenic Golf Federation, the Hellenic Women’s Cultural Association, Kirsty Chiaplias, and the Thessaloniki Association.
Competing interests
None.