Introduction
With increasing demographic aging, migration, and globalization, there is a pressing need for standardized neuropsychological tests suited for diverse older populations (Nielsen, Reference Nielsen2022). A certain degree of diversity has always been present in European countries, but cultural, language, and educational diversity has increased greatly over the last seven decades due to increasing mobility within the European Union as well as immigration from countries outside Europe (Van Mol & De Valk, Reference Van Mol and De Valk2016). Although immigration patterns differ between countries, the largest immigrant groups across Europe originate from other European countries, the Middle East, North Africa, and South Asia, followed by groups of Sub-Saharan African and Latin American origin (Nielsen et al., Reference Nielsen, Franzen, Goudsmit and Uysal-Bozkir2022). Despite recent advances in cross-cultural neuropsychological test development, suitable tests for cross-linguistic assessment of language functions are not widely available (Franzen et al., Reference Franzen, Papma, van den Berg and Nielsen2020). Naming impairment is frequent across several neurocognitive disorders, including stroke (RELEASE Collaborators, 2021), traumatic brain injury (Strain et al., Reference Strain, Didehbani, Spence, Conover, Bartz, Mansinghani, Jeroudi, Rao, Fields and Kraut2017), and a variety of neurodegenerative disorders (Grossman et al., Reference Grossman, McMillan, Moore, Ding, Glosser, Work and Gee2004). Most dementia syndromes are associated with naming impairment due to varying degrees of semantic memory impairment, impaired lexical retrieval, or impaired visual perception, depending on the subtype (Taler & Phillips, Reference Taler and Phillips2008). For instance, anomia is one of the core features of semantic dementia (Gorno-Tempini et al., Reference Gorno-Tempini, Hillis, Weintraub, Kertesz, Mendez, Cappa, Ogar, Rohrer, Black and Boeve2011) and although memory impairment is generally the core feature of early AD, anomia is another common feature, especially as the disease progresses (McKhann et al., Reference McKhann, Knopman, Chertkow, Hyman, Jack, Kawas, Klunk, Koroshetz, Manly, Mayeux, Mohs, Morris, Rossor, Scheltens, Carrillo, Thies, Weintraub and Phelps2011). Thus, assessment of naming impairment is standard in most neuropsychological assessments and is typically measured with confrontation naming tasks (Strauss et al., Reference Strauss, Sherman and Spreen2006) – for example, the Boston Naming Test (BNT; Kaplan et al., Reference Kaplan, Goodglass and Weintraub2001).
Performance on confrontation naming tests is influenced by several linguistic and cultural variables. The difficulty level of the individual items depends on factors such as word frequency, familiarity, age of acquisition, length, visual complexity, and name and image agreement (Ivanova & Hallowell, Reference Ivanova and Hallowell2013), and these vary between cultures and languages (Ardila, Reference Ardila2007; Bertola & Malloy-Diniz, Reference Bertola and Malloy-Diniz2018; George & Mathuranath, Reference George and Mathuranath2007). For instance, items such as a pretzel, beaver, and asparagus included in the BNT may be familiar to people living in North America and parts of Europe but less familiar or virtually unknown in other cultural contexts (Franzen et al., Reference Franzen, van den Berg, Ayhan, Satoer, Türkoğlu, Akpulat, Visch-Brink, Scheffers, Kranenburg and Jiskoot2023). In contrast, abacus is a difficult item to name in North America but relatively easier in China as abacuses are more common there (Gollan et al., Reference Gollan, Weissberger, Runnqvist, Montoya and Cera2012).
Although the BNT has been adapted to several languages and has had widespread clinical and research applications (Maruta et al., Reference Maruta, Guerreiro, de, Hort and Scheltens2011; Rabin et al., Reference Rabin, Paolillo and Barr2016), cross-cultural research has shown that the items included in the BNT are suboptimal for assessing confrontation naming abilities in culturally, linguistically, and educationally diverse populations. More specifically, several studies have shown large differences in BNT performance between ethnoracial groups in the United States (US) (Baird et al., Reference Baird, Ford and Podell2007; Boone et al., Reference Boone, Victor, Wen, Razani and Ponton2007), and lower performances in bilinguals (Gollan et al., Reference Gollan, Fennema-Notestine, Montoya and Jernigan2007; Kohnert et al., Reference Kohnert, Hernandez and Bates1998; Roberts et al., Reference Roberts, Garcia, Desrochers and Hernandez2002) and second-language speakers (Stålhammar et al., Reference Stålhammar, Hellström, Eckerström and Wallin2022), even after controlling for differences in level of education and other demographic variables. Furthermore, BNT performance has been shown to be influenced by level of education (Strauss et al., Reference Strauss, Sherman and Spreen2006). This may reflect increasing vocabulary and exposure to a wider range of concepts with increasing levels of education. In addition, it has also been suggested that people with limited education and literacy may find it difficult to process items presented as black-and-white line drawings but perform better when the same items are presented in colored photographs or drawings (Reis et al., Reference Reis, Faisca, Ingvar and Petersson2006; Reis et al., Reference Reis, Petersson, Castro-Caldas and Ingvar2001). While some of these challenges may be overcome by using language and culture-adapted versions of the BNT and applying relevant normative adjustments, this clearly does not solve all issues.
Use of language and culture-specific confrontation naming tests and norms derived for native speakers have limited feasibility in most European memory clinics, in which patients may differ widely in their cultural, linguistic, and educational characteristics (Franzen et al., Reference Franzen, van den Berg, Ayhan, Satoer, Türkoğlu, Akpulat, Visch-Brink, Scheffers, Kranenburg and Jiskoot2023; Nielsen, Reference Nielsen2022). Therefore, to develop a reliable and valid solution for clinical practice it is important to develop confrontation naming tests with potential applicability across diverse cultural and language groups (Franzen et al., Reference Franzen, Neuropsychology, Watermeyer, Pomati, Papma, Nielsen, Narme, Mukadam, Lozano-Ruiz and Ibanez-Casas2022).
During the last two decades, there have been several efforts to develop cross-linguistic naming tests. However, many of these efforts have resulted in tests with an inadequate balance between cross-linguistic properties and sensitivity to naming impairment. Thus, tests such as Body Part Naming from the Cross-Cultural Neuropsychological Test Battery (Dick et al., Reference Dick, L.Teng, Kempler, S.Davis, Taussig and Ferraro2002), the Cross-Linguistic Naming Tests (Ardila, Reference Ardila2007), and Picture Naming from the European Cross-Cultural Neuropsychological Test Battery (Nielsen et al., Reference Nielsen, Segers, Vanderaspoilden, Bekkhus-Wetterberg, Minthon, Pissiota, Bjørkløf, Beinhoff, Tsolaki and Gkioka2018) have good cross-linguistic properties but poor sensitivity to milder language impairment in patients with Alzheimer’s disease (AD) and other dementia disorders due to ceiling effects (Abou-Mrad et al., Reference Abou-Mrad, Chelune, Zamrini, Tarabey, Hayek and Fadel2017; Araujo et al., Reference Araujo, Nielsen, Barca, Engedal, Marinho, Deslandes, Coutinho and Laks2020; Ardila, Reference Ardila2007; Dick et al., Reference Dick, L.Teng, Kempler, S.Davis, Taussig and Ferraro2002; Gálvez-Lara et al., Reference Gálvez-Lara, Moriana, Vilar-López, Fasfous, Hidalgo-Ruzzante and Pérez-García2015; Nielsen et al., Reference Nielsen, Segers, Vanderaspoilden, Beinhoff, Minthon, Pissiota, Bekkhus-Wetterberg, Bjorklof, Tsolaki, Gkioka and Waldemar2019b). In contrast, the abbreviated version of the Multilingual Naming Tests (MINT; Ivanova et al., Reference Ivanova, Salmon and Gollan2013) has high sensitivity to milder language impairment in AD but is biased toward more highly educated white English speakers in the US (Franzen et al., Reference Franzen, van den Berg, Ayhan, Satoer, Türkoğlu, Akpulat, Visch-Brink, Scheffers, Kranenburg and Jiskoot2023; Li et al., Reference Li, Zeng, Neugroschl, Aloysi, Zhu, Xu, Teresi, Ocepek-Welikson, Ramirez and Joseph2022; Paplikar et al., Reference Paplikar, Varghese, Alladi, Vandana, Darshini, Iyer, Kandukuri, Divyaraj, Sharma and Dhaliwal2022; Stasenko et al., Reference Stasenko, Jacobs, Salmon and Gollan2019). More recent efforts include the Indian Council of Medical Research Picture Naming Tests (ICMR-PNT; Paplikar et al., Reference Paplikar, Varghese, Alladi, Vandana, Darshini, Iyer, Kandukuri, Divyaraj, Sharma and Dhaliwal2022) and the Naming Assessment in Multicultural Europe (NAME; Franzen et al., Reference Franzen, van den Berg, Ayhan, Satoer, Türkoğlu, Akpulat, Visch-Brink, Scheffers, Kranenburg and Jiskoot2023). Both instruments have shown promising clinical utility for cross-linguistic assessment. However, the ICMR-PNT may be less useful outside the Indian subcontinent as some items, such as tabala (a musical instrument), are culture-specific, and the 60-item NAME is rather long, taking up to 20 minutes to administer, which impedes its clinical utility in a busy clinical setting.
Building on these efforts, our aims were to develop and validate a brief cross-linguistic naming test for assessment of culturally, linguistically, and educationally diverse older adult populations in Europe. We compared the diagnostic accuracy of the novel confrontation naming test and traditional language tests in a diverse memory clinic population as well as the psychometric properties of the tests. Items included objects as well as pictured actions since action naming impairment is also diagnostic of dementia (Parris & Weekes, Reference Parris and Weekes2001). The rationale for the study design, comparing patients with dementia, mild cognitive impairment (MCI), affective disorder, and subjective cognitive decline (SCD), is that these conditions are common differential diagnoses in patients referred to neuropsychological evaluation in memory clinic settings.
Method
Participants
Patients were recruited from the Copenhagen University Hospital Memory Clinic at Rigshospitalet, which is a multidisciplinary outpatient clinic based in the Department of Neurology. For this study, patients with immigrant background referred for neuropsychological evaluation as part of their diagnostic assessment were selectively included between June 2021 and June 2022. Patients with a majority ethnic Danish background were consecutively recruited in the same period. As described below, all patients in the clinic are assessed with cognitive screening tests as part of the basic diagnostic assessment. One of the criteria for referral to more comprehensive neuropsychological evaluation (approximately 2 hours) in the clinic is a Mini-Mental State Examination (MMSE; Folstein et al., Reference Folstein, Folstein and McHugh1975) or Rowland Universal Dementia Assessment Scale (RUDAS; Storey et al., Reference Storey, Rowland, Basic, Conforti and Dickson2004) score ≥ 22 at the initial visit in the clinic, but patients with lower MMSE or RUDAS scores may also be referred if necessary (e.g., in the case of patients with aphasia). In total, 169 patients who completed both the Copenhagen Cross-Linguistic Naming test (C-CLNT) and BNT were included in the study. Exclusion criteria included severe psychiatric symptoms and a diagnosis other than dementia, MCI, affective disorder, or SCD. In total, seven patients were excluded (two diagnosed with sequelae from traumatic brain injury, two with sequelae from stroke, two with epilepsy and psychiatric disorder, and one with atypical Parkinsonian disorder), resulting in a final sample of 162 patients.
All patients had an extensive diagnostic assessment including an interview with the patient and (when possible) an informant; a neurological, physical, and psychiatric examination including cognitive assessment with the MMSE and Addenbrooke’s Cognitive Examination (Mathuranath et al., Reference Mathuranath, Nestor, Berrios, Rakowicz and Hodges2000) or the RUDAS and Multicultural Examination (Nielsen et al., Reference Nielsen, Segers, Vanderaspoilden, Beinhoff, Minthon, Pissiota, Bekkhus-Wetterberg, Bjorklof, Tsolaki, Gkioka and Waldemar2019a) in case of cultural, linguistic and/or educational barriers; laboratory screening with blood tests and electrocardiography; and structural brain imaging with magnetic resonance imaging and/or computerized tomography. Further investigations, including functional imaging with [18F]FDG-PET, amyloid imaging with [11C]PIB-PET, and/or dopamine transporter imaging with [18F]FE-PE2I PET, cerebrospinal fluid biomarker analysis, and comprehensive psychiatric or neuropsychological evaluation were performed on clinical indication. Diagnoses were based on evidence from all clinical and investigational results, except the C-CLNT, applying the 5th edition of the Diagnostic and Statistical Manual of Mental Disorders (American Psychological Association, 2013) criteria for dementia, and diagnostic research criteria for specific dementia subtypes (Gorno-Tempini et al., Reference Gorno-Tempini, Hillis, Weintraub, Kertesz, Mendez, Cappa, Ogar, Rohrer, Black and Boeve2011; McKeith et al., Reference McKeith, Boeve, Dickson, Halliday, Taylor, Weintraub, Aarsland, Galvin, Attems and Ballard2017; McKhann et al., Reference McKhann, Knopman, Chertkow, Hyman, Jack, Kawas, Klunk, Koroshetz, Manly, Mayeux, Mohs, Morris, Rossor, Scheltens, Carrillo, Thies, Weintraub and Phelps2011; Rascovsky et al., Reference Rascovsky, Hodges, Knopman, Mendez, Kramer, Neuhaus, Van Swieten, Seelaar, Dopper and Onyike2011; Sachdev et al., Reference Sachdev, Kalaria, O'Brien, Skoog, Alladi, Black, Blacker, Blazer, Chen, Chui, Ganguli, Jellinger, Jeste, Pasquier, Paulsen, Prins, Rockwood, Roman, Scheltens and Cognitive2014), MCI (Winblad et al., Reference Winblad, Palmer, Kivipelto, Jelic, Fratiglioni, Wahlund, Nordberg, Bäckman, Albert and Almkvist2004), and SCD (Jessen et al., Reference Jessen, Amariglio, Van Boxtel, Breteler, Ceccaldi, Chételat, Dubois, Dufouil, Ellis and Van Der Flier2014). Affective disorder (e.g., depression, anxiety, post-traumatic stress disorder) was diagnosed by applying the 10th edition of the International Classification of Diseases criteria (World Health Organization, 1993). Professional interpreters provided by interpretation services were freely available to patients during diagnostic assessments, including neuropsychological evaluation, when considered necessary.
Also, 24 cognitively intact participants aged 60 years or older were recruited from local general practice clinics and through the social networks of multicultural and multilingual researchers. Participants were assessed in their private homes, in the Copenhagen University Hospital Memory Clinic, or in another suitable location, depending on their preference. Participation was voluntary and without any economic incentive. All cognitively intact participants were living independently, reported no significant memory problems, psychiatric or neurological disorders, or substance abuse, and scored ≥ 24/30 points on the MMSE or ≥ 23/30 points on the RUDAS, and ≤ 6/15 points on the 5/15-item Geriatric Depression Scale (Weeks et al., Reference Weeks, McGann, Michaels and Penninx2003).
Procedure
All participants underwent an approximately two-hours clinical assessment, in which medical and demographic data were collected and neuropsychological tests, including the C-CLNT, were administered. All assessments were made by specialists in neuropsychology. The comprehensive neuropsychological evaluation in the Copenhagen University Hospital Memory Clinic is based on a flexible assessment approach, meaning that a standardized, fixed set of neuropsychological tests covering the main cognitive domains is given to most patients with some flexibility to add or subtract tests given the specific referral question (Nielsen et al., Reference Nielsen, Franzen, Goudsmit and Uysal-Bozkir2022). The applied tests generally come from the international literature, but locally developed tests are also used. In case of cultural, linguistic, and/or educational barriers, patients are mainly assessed with tests from the European Cross-Cultural Neuropsychological Test Battery (Nielsen et al., Reference Nielsen, Segers, Vanderaspoilden, Beinhoff, Minthon, Pissiota, Bekkhus-Wetterberg, Bjorklof, Tsolaki, Gkioka and Waldemar2019b; Nielsen et al., Reference Nielsen, Segers, Vanderaspoilden, Bekkhus-Wetterberg, Minthon, Pissiota, Bjørkløf, Beinhoff, Tsolaki and Gkioka2018). Participants with immigrant background were assessed in their primary language, either by a multilingual neuropsychologist (in Danish, English, Kurdish, or Turkish; n = 30) or through interpreter-mediated assessment (n = 30).
Demographic data collected at the clinical assessment included data on age, sex, years of education, country of origin, and mother tongue. For participants with an immigrant background, years of residence in Denmark were calculated by subtracting the year of the assessment from the year of immigration. Also, mother tongue was classified as a European or non-European language and cultural distance between the original culture and Danish culture was calculated using the Kogut and Singh Index (KSI; Kogut & Singh, Reference Kogut and Singh1988).
All participants were asked about any vision or hearing impairment and were assessed using their hearing aids or prescribed glasses when this was confirmed.
Results for the C-CLNT were compared with three traditional language tests: BNT (Full and 15-item version) and Category Fluency. In all tests, correct responses in any language were accepted.
The BNT contains 60 black-and-white line drawings ranked according to difficulty. In this study, the Danish adaptation of the BNT was used, in which the original items were ranked according to difficulty in a sample of older Danish typical participants (Jørgensen et al., Reference Jørgensen, Johannsen and Vogel2017), using a discontinuation rule of six consecutive failures. The score is the number of correct responses, including responses after semantic cues. The score range is 0–60 points.
Scores for the abbreviated 15–item version of the BNT (BNT-15) introduced by the Consortium To Establish a Registry for Alzheimer’s Disease (Morris et al., Reference Morris, Mohs, Rogers, Fillenbaum and Heyman1988) were extracted from the full BNT. The range of scores is 0–15 points.
In Category Fluency (Strauss et al., Reference Strauss, Sherman and Spreen2006), participants are given one minute to produce as many different animal names as possible. As it may be challenging to perform fast-paced simultaneous translation of animal names in interpreter-mediated assessments, in this study interpreters were instead instructed to say “yes” for every new animal name and the neuropsychologist put a checkmark for each “yes” on the record form. Immediately following the test, the neuropsychologist checked with the interpreter for any repetitions of animal names. The score is the number of different animal names produced in one minute.
Also, scores from the MMSE and RUDAS were treated as a single measure of general cognitive function (MMSE/RUDAS) in all comparisons. The rationale behind this was that the two instruments have the same range of scores (0-30 points), are highly correlated (Naqvi et al., Reference Naqvi, Haider, Tomlinson and Alibhai2015), have similar diagnostic performance for dementia (Nielsen & Jorgensen, Reference Nielsen and Jorgensen2020), and were used interchangeably with patients and cognitively intact participants, depending on participant characteristics. In total, 150 participants were assessed with the MMSE and 36 with the RUDAS.
The study adhered to the Declaration of Helsinki for experiments involving humans (reference no. 22007675) and was approved by the Danish Data Protection Agency (RH-2018-34).
Development of the copenhagen cross-linguistic naming test
The C-CLNT was based on MULTIMAP, a free open-access database of 218 standardized color drawings representing both objects and actions (Gisbert-Muñoz et al., Reference Gisbert-Muñoz, Quiñones, Amoruso, Timofeeva, Geng, Boudelaa, Pomposo, Gil-Robles and Carreiras2021). MULTIMAP includes relevant linguistic variables, including name agreement, frequency (per one million), and number of letters, across several languages (i.e., Spanish, Basque, Catalan, Italian, French, English, German, Mandarin Chinese, and Arabic). However, data on number letters were not included in the present study as this variable seems less relevant when performance time is not an issue. Also, some languages, including Chinese Mandarin, are nonalphabetical languages making this variable inconvenient. MULTIMAP name agreement was established through an online survey with 99 (English) to 128 (Mandarin Chinese) speakers of each language, and frequency data for the words in each language was extracted from text corpora in various online databases (Gisbert-Muñoz et al., Reference Gisbert-Muñoz, Quiñones, Amoruso, Timofeeva, Geng, Boudelaa, Pomposo, Gil-Robles and Carreiras2021).
Based on the original set of MULTIMAP drawings and the procedures described for developing cross-language combinations (Gisbert-Muñoz et al., Reference Gisbert-Muñoz, Quiñones, Amoruso, Timofeeva, Geng, Boudelaa, Pomposo, Gil-Robles and Carreiras2021), an initial set of 38 items (26 objects and 12 actions) was selected by considering MULTIMAP name agreement (≥ 80% for objects, ≥ 75% for actions) data across Spanish, Italian, French, English, German, Mandarin Chinese, and Arabic languages. The adopted name agreement cutoff for object items followed the recommendations for developing bilingual naming tests based on the MULTIMAP drawings (Gisbert-Muñoz et al., Reference Gisbert-Muñoz, Quiñones, Amoruso, Timofeeva, Geng, Boudelaa, Pomposo, Gil-Robles and Carreiras2021). However, as action word meanings are more variable across languages than object name meanings (Gentner, Reference Gentner, Hirsh-Pasek and Golinkoff2006), a slightly lower cutoff was used for action items in order to be able to include more items. Subsequently, eight items (glasses, horse, onion, egg, hand, table, weigh, hunt) were excluded due to their ambiguity in the Danish cultural context and to reduce cross-language differences in name agreement and frequency.
The final C-CLNT consisted of 30 standardized color drawings (20 objects and 10 actions) with comparable name agreement (F (6, 174) = 2.05, p = .06) and frequency F (6, 174) = 1.62, p = .11) across seven languages (Supplementary Table S1). The items were ordered according to mean frequency across the target languages and pilot tested in 10 memory clinic patients (2 AD, 4 MCI, 4 affective disorders; 6 male/4 female; mean age 71.2 ± 10.8 years; mean education 14.3 ± 2.3 years). Based on pilot test performances, ambiguity in the scoring criteria for the item bone was resolved (i.e., “meat bone” and “chicken bone” was not accepted as correct in Danish). The final set of items selected for the C-CLNT is presented in Table 1, and examples of items are provided in Figure 1.
MCI = mild cognitive impairment; N/A = not applicable.
Administration and scoring
Administration and scoring procedures for the C-CLNT are similar to those of the BNT (Kaplan et al., Reference Kaplan, Goodglass and Weintraub2001). Participants are shown each item one at a time and allowed 20 seconds to respond. When appropriate (e.g., in case of a visual misperception), a semantic cue can be provided. However, a semantic cue is not provided if the incorrect response falls within the same semantic category as the correct response (i.e., if a nail is named a “screw” or a fly is named a “bee” or “wasp”). If the semantic cue fails to elicit a correct response, a phonetic cue may be given. Participants are allowed 5 seconds to respond following a semantic cue or phonetic cue. There is no discontinuation rule. The administration time is generally < 5 minutes.
The C-CLNT total score is the number of correct responses, including responses after semantic cues. Responses after phonemic cues are not added to the total correct but may be noted to provide qualitative information about naming performance. In the context of multilingualism and inherent language mixing, participants are allowed to respond in any language. A correct response in any language is considered correct.
Statistical analyses
The significance of group differences on continuous variables was determined using analysis of variance (ANOVA) with pretesting for homogeneity of variances. Welch’s ANOVA was used when the assumption of homogeneity of variances was not met. Effect sizes were calculated as partial eta squared (PES). Fischer’s Exact Test or Pearson’s χ 2-test was used to test the significance of group differences in the distribution of categorical variables. Internal consistency of the C-CLNT was determined by coefficient α as an approximation of scale reliability. To assess construct validity, Spearman’s rank correlation coefficient was used to assess associations between the C-CLNT and traditional language tests. The effect of years of education, age, sex, and immigrant background on neuropsychological test scores was evaluated using hierarchical regression analyses with plots of residuals as model control. To assess discriminant validity, a receiver operating characteristic curve (ROC) was applied to examine the areas under the curve (AUC), sensitivity, and specificity of the C-CLNT and other language tests for dementia. AUCs were compared using the method proposed by DeLong et al. (Reference DeLong, DeLong and Clarke-Pearson1988). Optimal cutoff values were established with Youden’s J (calculated as: J = sensitivity + specificity – 1). All analyses were performed with SPSS version 28.0. A p-value < .05 (two-tailed) was considered significant.
Results
A total of 186 participants were included in the study, of which 126 (68%) had a majority ethnic Danish background and 60 (32%) had an immigrant background. Among participants with immigrant backgrounds, 24 originated from a Middle Eastern country, 15 from a South or East Asian country, 13 from another European country, 4 from a North African country, 3 from a Sub-Saharan African country, and one from an Oceanian country. In total, 45 (75%) of the participants with immigrant background had a non-European language as their mother tongue. The mean KSI cultural distance between the original cultures and Danish culture was 86.2 ± 19.3, ranging from 18.3 (Sweden) to 120.9 (Iraq). Compared to majority ethnic Danish participants, participants with immigrant background were significantly younger (68.3 (range: 42–87) vs 73.4 (range: 48–91) years; F (1, 183) = 13.93, p < .001) and had fewer years of education (10.4 (range: 0–17) vs 13.2 (range: 7–17) years; Welch’ s F (1, 72.73) = 11.89, p < .001). There was no significant difference in sex distribution.
Among patients, 56 were diagnosed with dementia (19 AD, 14 vascular dementia (VaD), 4 mixed AD/VaD, 6 dementia with Lewy bodies/Parkinson’s disease dementia, 2 frontotemporal dementia, 6 other specified dementia (normal pressure hydrocephalus, encephalitis, Wernicke-Korsakoff syndrome, HIV-associated neurocognitive disorder), and 5 unspecified dementia), 67 with MCI, 20 with affective disorder, and 19 with SCD. As patients with SCD did not have formal impairment on neuropsychological testing, or any other neurological or psychiatric diagnosis explaining cognitive complaints (Jessen et al., Reference Jessen, Amariglio, Van Boxtel, Breteler, Ceccaldi, Chételat, Dubois, Dufouil, Ellis and Van Der Flier2014), they were grouped with cognitively intact participants to form a control group. Participants’ characteristics and neuropsychological test performance for the resulting four groups are presented in Table 2.
BNT = Boston Naming Test; C-CLNT = Copenhagen Cross-Linguistic Naming Test; MCI = mild cognitive impairment; MMSE = mini-mental state examination; RUDAS = Rowland Universal Dementia Assessment Scale.
a Comparison based only on participants with an immigrant background.
There were no significant group differences in sex and years of education, but there were significant differences in age (F (3, 181) = 10.87, p < .001), proportion of participants with immigrant background (χ 2(3, n = 185) = 15.18, p = .002), and MMSE/RUDAS scores (Welch’ s F (3, 67.74) = 9.36, p < .001). Comparison of performances on the C-CLNT and other language tests displayed significant group differences for the C-CLNT (Welch’ s F (3, 72.73) = 11.74, p < .001, PES = .26), full BNT (Welch’ s F (3, 63.40) = 7.22, p < .001, PES = .09), BNT-15 (Welch’ s F (3, 59.91) = 6.88, p < .001, PES = .08), and Category Fluency (Welch’ s F (3, 64.44) = 26.66, p < .001, PES = .29). Participants with immigrant background obtained significantly lower scores on all language tests compared to participants with majority ethnic Danish background (see Fig. 2). However, differences were considerably lower for the C-CLNT.
Table 1 lists the items of the C-CLNT ordered according to word frequency. There were four items (i.e., sun, tree, chair, drink) showing 100% correct responses across diagnostic groups. Several other items (i.e., hat, apple, umbrella, scissors, write, eat, measure) also had very few incorrect responses in all groups. The lowest percentage of correct responses was found on items 9 (bone), 13 (skirt), 14 (fly), 16 (nail), and 19 (squirrel). Examination of these low-accuracy items revealed typical error types. Participants frequently responded with “meat bone” (“kødben”) or “chicken bone” (“kyllingeben”) for bone (Danish language differentiates between bones in living creatures (“knogle”) and bones for consumption), “bee” or “wasp” for fly, “screw” for nail, “dress" for skirt (only males), and “mouse,” “rat,” or “rabbit” for squirrel (mainly participants with immigrant background). These typical error types constituted 97%, 91%, 74%, 41%, and 26% of all errors on the bone, fly, nail, skirt, and squirrel item, respectively, and were more frequent in the dementia group compared to the other groups but only significantly so for bone (39% vs 19%; Fisher’s Exact Test, p < .001) and skirt (14% vs 5%; Fisher’s Exact Test, p = .03).
Scale reliability
Across all participant groups, coefficient α for C-CLNT was .67 indicating acceptable scale reliability.
Construct validity
The C-CLNT was significantly related to other language function and general cognitive function tests. The C-CLNT was most robustly correlated with the full BNT (r = .55, p < .001) and BNT-15 (r = .54, p < .001), followed by Category Fluency (r = .44, p < .001), and MMSE/RUDAS (r = .35, p < .001). Repeating the analyses in a majority of ethnic Danish participants only, correlations with the full BNT (r = .72, p < .001) and BNT-15 (r = .67, p < .001) were stronger, while the correlation with Category Fluency (r = .42, p < .001) was similar, and correlation with the MMSE (r = .19, p = 03) was weaker.
Discriminant validity
ROC curve analysis revealed that the C-CLNT was highly accurate in discriminating the group of patients with dementia from the other groups (control, affective disorder, MCI). AUCs for the C-CLNT, full BNT, BNT-15, and Category Fluency are illustrated in Figure 3, and AUC values, optimal cutoff scores, sensitivity, and specificity are presented in Table 3. The AUC value for the C-CLNT (AUC = .80) was significantly higher than for the full BNT (AUC = .64, z = 3.50, p < .001) and BNT-15 (AUC = .59, z = 4.46, p < .001), but comparable to Category Fluency (AUC = .83). In a subsample of patients with MCI and controls alone (n = 109), the AUC for the C-CLNT was .53.
AUC = area under the curve; BNT = Boston Naming Test; CI = confidence interval; C-CLNT = Copenhagen Cross-Linguistic Naming Test.
a Optimal cutoff for discriminating between patients with dementia and other groups based on Youden’s J.
Overall, the accuracy of the C-CLNT in discriminating patients with dementia from the other groups did not significantly differ between participants with majority ethnic Danish and immigrant background (AUC of .80, 95% CI [.70– .88] vs .86, 95% CI [.74– .98], z = .83, p = .41).
Effects of demographic variables
When combining the four groups and correcting for MMSE/RUDAS score, there was a significant positive correlation between years of education and the C-CLNT (r = .18, p = .01), the full BNT (r = .47, p < .001), BNT-15 (r = .42, p < .001), and Category Fluency (r = .15, p = .05), and between age and the full BNT (r = .20, p = .008) and BNT-15 (r = .21, p = .005). Sex was not significantly related to any of the tests.
When the influence of demographic variables on the C-CLNT and other language tests was evaluated with a series of hierarchical regression analyses controlling for MMSE/RUDAS score, significant effects of age and years of education, and immigrant background were present on all tests. However, the variance in test scores explained by immigrant background was 3% for the C-CLNT compared to 8, 28%, and 34% for Category Fluency, BNT-15, and full BNT, respectively (see Table 5).
Repeating the regression analysis in participants with immigrant background and entering years of residence in Denmark, non-European mother tongue, and KSI cultural distance in the last block instead of immigrant background led to a similar picture (see Supplementary Table S2). In these analyses, the variance in test scores explained by years of residence in Denmark, non-European mother tongue, and KSI cultural distance was 3, 8, 18, and 24% for the C-CLNT, Category Fluency, BNT-15, and full BNT, respectively. Adding the use of an interpreter to the regression analyses did not show any significant effects.
Abbreviated 20-item version of C-CLNT
An abbreviated version of the C-CLNT was created by excluding the 10 action items included in the full C-CLNT, leaving only the 20 object items. The 20-item C-CLNT was highly correlated with the full C-CLNT (r = .87, p < .001) and had comparable psychometric properties. The AUC of the 20-item C-CLNT for dementia was .78, 95% CI [.70– .86], which did not significantly differ from the AUC of full C-CLNT (z = 1.21, p = .23). At the cutoff ≤ 18, the 20-item C-CLNT had a sensitivity of .83 and a specificity of .70.
Discussion
In this study, we described the development and validation of the C-CLNT for the assessment of naming impairment in a culturally, linguistically, and educationally diverse memory clinic patient population in Denmark. The C-CLNT is based on a set of standardized color drawings with data on linguistic variables available across several languages. Items for the final 30-item version of the C-CLNT were selected by considering name agreement and frequency across five European and two non-European languages. The C-CLNT was found to have promising psychometric properties and diagnostic accuracy for detecting naming impairment in culturally, linguistically, and educationally diverse patients with dementia but not MCI. Concerning psychometric properties, the internal reliability of the C-CLNT was acceptable according to standard criteria (coefficient alpha = .67). The convergent validity of the C-CLNT was also good, with scores being moderately to strongly correlated with traditional confrontation naming tests, moderately with Category Fluency, and only weakly with general cognitive functioning. Correlations between the C-CLNT and BNT were strongest in the subsample of majority ethnic Danish participants, reflecting the suboptimal utility of the BNT in culturally, linguistically, and educationally diverse populations.
In the context of the cultural, linguistic, and educational diversity among patients in European memory clinics, it is desirable to have a single standardized confrontation naming test for the cross-linguistic assessment of naming impairment in neurocognitive disorders. The results from this study indicate that the C-CLNT is suitable for detecting naming impairment in patients referred for neuropsychological evaluation in such a setting. Compared with traditional language tests, the diagnostic accuracy of the C-CLNT for dementia (AUC of .80) was significantly better than that of the full BNT (AUC of .64) and BNT-15 (AUC of .59), but comparable to Category Fluency (AUC of .83). At the cutoff ≤ 28, the sensitivity of the C-CLNT was .75, which indicates that it does not suffer from low sensitivity and diagnostic accuracy as described for several other cross-linguistic naming tests (Abou-Mrad et al., Reference Abou-Mrad, Chelune, Zamrini, Tarabey, Hayek and Fadel2017; Araujo et al., Reference Araujo, Nielsen, Barca, Engedal, Marinho, Deslandes, Coutinho and Laks2020; Ardila, Reference Ardila2007; Dick et al., Reference Dick, L.Teng, Kempler, S.Davis, Taussig and Ferraro2002; Gálvez-Lara et al., Reference Gálvez-Lara, Moriana, Vilar-López, Fasfous, Hidalgo-Ruzzante and Pérez-García2015; Nielsen et al., Reference Nielsen, Segers, Vanderaspoilden, Beinhoff, Minthon, Pissiota, Bekkhus-Wetterberg, Bjorklof, Tsolaki, Gkioka and Waldemar2019b). Overall, the diagnostic accuracy of the C-CLNT was slightly lower than that reported for the MINT (AUC of .85; Stasenko et al., Reference Stasenko, Jacobs, Salmon and Gollan2019), ICMR-PNT (AUC of .81–1.00; Paplikar et al., Reference Paplikar, Varghese, Alladi, Vandana, Darshini, Iyer, Kandukuri, Divyaraj, Sharma and Dhaliwal2022), and NAME (AUC of .88; Franzen et al., Reference Franzen, van den Berg, Ayhan, Satoer, Türkoğlu, Akpulat, Visch-Brink, Scheffers, Kranenburg and Jiskoot2023). However, a direct head-to-head comparison of the tests is impossible as study methods and samples differed between studies. In future research, it would be interesting to make a head-to-head comparison of cross-linguistic naming tests in the same study population. Like other (cross-linguistic) naming tests (Li et al., Reference Li, Zeng, Neugroschl, Aloysi, Zhu, Xu, Teresi, Ocepek-Welikson, Ramirez and Joseph2022; Paplikar et al., Reference Paplikar, Varghese, Alladi, Vandana, Darshini, Iyer, Kandukuri, Divyaraj, Sharma and Dhaliwal2022; Stasenko et al., Reference Stasenko, Jacobs, Salmon and Gollan2019), the C-CLNT showed poor diagnostic accuracy for MCI. This is most likely due to anomia being an uncharacteristic feature of MCI but typically presents in later stages of AD and other dementia disorders (McKhann et al., Reference McKhann, Knopman, Chertkow, Hyman, Jack, Kawas, Klunk, Koroshetz, Manly, Mayeux, Mohs, Morris, Rossor, Scheltens, Carrillo, Thies, Weintraub and Phelps2011).
Examination of error types revealed that typical errors generally reflected conceptual deficits and stimulus-bound responses (Rouleau et al., Reference Rouleau, Salmon, Butters, Kennedy and McGuire1992). For instance, participants frequently responded “bee” or “wasp” for fly which was considered to reflect a stimulus-bound response as the drawing used to depict a fly has yellow and black stripes on its lower back (see Fig. 1). Also, patients with dementia more frequently responded “dress” for skirt and “meat bone” or “chicken bone” for bone, which was considered to represent a conceptual deficit as Danish language differentiates between bones in living creatures and bones for consumption, like the difference between cow and beef or pig and pork in English. These error types may not be related to impaired lexical retrieval and anomia but rather reflect impaired semantic memory and/or visual perception, which is also commonly impaired in dementia disorders and known to be important for confrontation naming ability (Taler & Phillips, Reference Taler and Phillips2008).
C-CLNT scores were not associated with sex or age, and only negligibly with years of education. Only 6% of the variance in C-CLNT scores was explained by age and years of education, with slightly lower variance explained by immigrant background (3%). In comparison, 34, 28, and 8% of the variance in test scores was explained by immigrant background on the full BNT, BNT-15, and Category Fluency, respectively. Combined with the findings on diagnostic accuracy, these findings support the cross-linguistic properties of the C-CLNT in a diverse memory clinic setting and highlight important limitations of traditional confrontation naming tests. Conversely, when adapting administration procedures to reduce bias in interpreter-mediated assessment, Category Fluency proved to have high clinical utility for cross-linguistic assessment. This is in line with previous reports of Category Fluency being relatively uninfluenced by culture and language (Nielsen et al., Reference Nielsen, Segers, Vanderaspoilden, Bekkhus-Wetterberg, Minthon, Pissiota, Bjørkløf, Beinhoff, Tsolaki and Gkioka2018), and having high diagnostic accuracy for dementia in multicultural populations (Nielsen et al., Reference Nielsen, Segers, Vanderaspoilden, Beinhoff, Minthon, Pissiota, Bekkhus-Wetterberg, Bjorklof, Tsolaki, Gkioka and Waldemar2019b).
An abbreviated 20-item version of the C-CLNT, using only the object items, was highly correlated with the full C-CLNT, which examined both action and object naming, and showed comparable psychometric properties and diagnostic accuracy for dementia. Although assessment of action naming has been suggested to contribute to differential diagnostics as object and action naming may be differentially affected across dementia disorders (Cotelli et al., Reference Cotelli, Borroni, Manenti, Alberici, Calabria, Agosti, Arévalo, Ginex, Ortelli and Binetti2006; Parris & Weekes, Reference Parris and Weekes2001), in the present study action naming did not contribute to the overall classification of dementia. Thus, the 20-item C-CLNT may generally be adequate for cross-linguistic assessment of naming impairment in a memory clinic setting. However, future studies might test the effects of action naming with a larger sample of items than used in the present study and a more diverse patient group.
This study has some limitations. Although we were able to analyze C-CLNT performance across patients with affective disorders, MCI, and dementia, the clinical groups were not fully matched on age and proportion of participants with immigrant background, which may have exacerbated some group differences. Furthermore, our dementia sample was too small to analyze the C-CLNT across specific dementia subtypes. Also, the C-CLNT demonstrated a ceiling effect in all clinical groups, except the dementia group, and was not able to discriminate between participants in the control and MCI group. Although the C-CLNT appears to be sensitive to naming impairment in patients with mild dementia, this may indicate that the C-CLNT is not sensitive to more subtle naming impairment. The scale reliability of the C-CLNT was acceptable, but not high. This means that it may not be consistent in measuring naming ability, and the results should be interpreted with caution. Additionally, in interpreter-mediated assessments, interpreters often struggled with translations of responses for items of the BNT as they did not know the corresponding words in Danish. Also, interpreters assisted in determining whether a nonstandard response was a correct synonym or an incorrect response. On Category Fluency, instructing the interpreters to simply say “yes” for every new animal name did not allow for more careful inspection of repetitions, intrusions, or questionable responses (e.g., same animal name in two languages). Other approaches include having the interpreters write down the responses in the language of their choice or recording and transcribing the responses. However, as the use of interpreters did not differ between clinical groups, these issues are unlikely to have significantly influenced the results. Finally, although the diagnostic accuracy of the C-CLNT did not significantly differ between participants with majority ethnic Danish and immigrant backgrounds, further studies comparing larger cultural and language groups are needed to support the cross-linguistic, cross-cultural, and diagnostic properties of the C-CLNT. Also, reliability metrics, including test-retest, intra-rater, and inter-rater reliability, should be established to provide further support for the psychometric properties of the C-CLNT. As suggested by the European Consortium on Cross-Cultural Neuropsychology (ECCroN) (Franzen et al., Reference Franzen, Neuropsychology, Watermeyer, Pomati, Papma, Nielsen, Narme, Mukadam, Lozano-Ruiz and Ibanez-Casas2022), such studies should preferably take several diversity-related variables into account, including limited education and literacy, quality of education, and acculturation.
In conclusion, the novel C-CLNT has promising clinical utility for cross-linguistic assessment of naming impairment in culturally, linguistically, and educationally diverse older adults. Although the C-CLNT was developed by taking into consideration the cultural and linguistic diversity in Europe and was validated in a diverse memory clinic population, the C-CLNT may also be suitable for assessment of naming impairment in other cultural and clinical contexts, including culturally and linguistically diverse populations in other world regions, and patients with stroke or traumatic brain injury. However, before such applications further research is needed to establish the utility of the C-CLNT in these contexts.
BNT = Boston Naming Test; ; C-CLNT = Copenhagen Cross-Linguistic Naming Test; MMSE = mini-mental state examination; RUDAS = Rowland Universal Dementia Assessment Scale.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S1355617723000437.
Acknowledgments
The authors report no competing interests. This research was supported by THE VELUX FOUNDATIONS (grant number 00042578), which had no role in the formulation of research questions, choice of study design, data collection, data analysis, or decision to publish. The Danish Dementia Research Centre is supported by the Danish Ministry of Health. The authors would like to thank all neuropsychologists at the Copenhagen University Hospital Memory Clinic at Rigshospitalet who contributed to the data collection, and Kasper Jørgensen who contributed with constructive comments to the article draft.