Over the last century, numerous memory assessment procedures have been developed to measure and differentiate normal from abnormal memory functioning. Most of these measures are highly face valid and focus on quantifying the amount of to-be-remembered information that can be recalled immediately after presentation and after a specified time delay. Most of these approaches are also highly verbal. Such procedures might evaluate the number of details from prose passages, the number of paired words, or the number of items on a list that can be recalled. Even recall of geometric figures can be susceptible to verbal labeling of shapes, suggesting that some linguistic processing occurs during purportedly visuospatial memory tasks. While most of these approaches effectively differentiate normal from abnormal memory functioning, performance on these measures can be influenced by demographic characteristics, such as sex, age, education, cultural and linguistic factors, and an examinee’s familiarity with the to-be-recalled test material. For example, females commonly show at least a slight verbal episodic memory performance advantage over males (Asperholm et al., Reference Asperholm, Högman, Rafi and Herlitz2019; Hirnstein et al., Reference Hirnstein, Stuebs, Moè and Hausmann2023). These variables can influence an individual’s processing efficiency and may lead to memory scores contaminated by demographic, cultural, and linguistic differences.
Though basic cognitive mechanisms are considered similar cross-culturally (Nell, Reference Nell1999), the behavioral manifestations of higher-order processes are undeniably influenced by an individual’s culture (Fernández & Abe, Reference Fernández and Abe2018; Puente & Agranovich, Reference Puente, Agranovich, Hersen, Goldstein and Beers2004; Rivera Mindt et al., Reference Rivera Mindt, Byrd, Saez and Manly2010). The influence of culture on cognitive performance typically favors individuals born and raised in the geographical region where a test was developed (Cole, Reference Cole1998). Moreover, specific cognitive tasks may be more complex and require more cognitive and neural resources in one culture than another (Gutchess et al., Reference Gutchess, Schwartz and Boduroğlu2011). In addition to these culture-specific effects, individual cognitive measures may assess different cognitive abilities depending on the examinee’s cultural background (Fasfous et al., Reference Fasfous, Hildalgo-Ruzzante, Vilar-López, Catena-Martínez and Pérez-García2013). Most cognitive tests are developed, standardized, and normed in predominantly “Western” and industrialized regions, such as the USA and UK, which share similar languages and origins and have high social and economic development rates. Several studies have reported that cross-cultural variability in socioeconomic and health status, as well as inequalities in educational opportunities, are confounds that can strongly influence cognitive test performance (Chin et al., Reference Chin, Negash, Xie, Arnold and Hamilton2012; Ferraro, Reference Ferraro2016; Krch et al., Reference Krch, Lequerica, Arango-Lasprilla, Rogers, DeLuca and Chiaravalloti2015; Rosselli & Ardila, Reference Rosselli and Ardila2003; Schwartz et al., Reference Schwartz, Glass, Bolla, Stewart, Glass, Rasmussen, Bressler, Shi and Bandeen-Roche2004; Weuve et al., Reference Weuve, Rajan, Barnes, Wilson and Evans2018).
An essential cultural factor related to neuropsychological assessment is the language the examinee speaks. Because of the substantial verbal demands imposed by most memory measures, the words comprising a word list likely have shared and unique meanings to examinees who speak a specific language that do not lend themselves to translation. Additionally, word frequencies may differ substantially across languages. Simple word-for-word translations from one language to another may also change word phonology, such as the number of syllables per word, and subtle differences in semantic meanings of words, which collectively may influence memory performance. These limitations make direct translations of word list items from one language to another flawed and minimally informative.
Conventional word list memory measures usually use highly familiar words or material frequently occurring in the examinee’s language. Familiarity with test stimuli could give some examinees an advantage on the task and may produce ceiling effects. Historically, Hermann Ebbinghaus (Reference Ebbinghaus1885, Reference Ebbinghaus1964) made use of consonant-vowel-consonant (CVC) trigrams, a structured way of creating nonsense syllables, as stimuli in his quest to develop material that was devoid of inter-item associations and prior experience (Thorne & Henley, Reference Thorne and Henley2001). Nonword CVC trigrams are pronounceable combinations of letters (consonant-vowel-consonant) with no meaning or associations with other nonsense syllables. Nonword repetition depends more on the temporary storage of phonological representations in short-term memory during initial learning due to limited access to long-term lexical models that facilitate the recall of unfamiliar items. This manipulation eliminates the ability to use preexisting meaning, knowledge, or experience to facilitate recall.
To our knowledge, only one group has used CVC material within a list-learning task in two neuropsychological contexts. Bourke et al. (Reference Bourke, Porter, Carter, McIntosh, Jordan, Bell, Frances, Colhoun and Joyce2012) presented the CVC nonsense syllables utilizing the structure of the Auditory Verbal Learning Test (AVLT; Rey, Reference Rey1958) as part of a larger neuropsychological battery to study neuropsychological differences among persons with major depressive and social anxiety disorders, relative to matched controls. Persons with depression showed lower recall of CVC items across Trials 1–5 and a flatter learning curve than persons with social anxiety and matched controls. Delayed recall and recognition did not differ between groups. In a second study, Vierck et al. (Reference Vierck, Porter, Spittlehouse and Joyce2015) used the same paradigm to evaluate and screen persons for mild cognitive impairment. They reported that the CVC list-learning task showed similar psychometric characteristics as traditional list-learning tasks but with a reduced tendency for a ceiling effect. As in the first study, the CVC task was also sensitive to depression. These two studies suggest that using the AVLT paradigm with CVC material can evaluate processes associated with learning and memory, with concomitant reductions in the likelihood of ceiling effects. Importantly, this group does not appear to have investigated this procedure’s potential for cross-cultural applications across speakers of different languages.
In this study, we implemented a similar CVC list-learning procedure to that of Bourke et al. (Reference Bourke, Porter, Carter, McIntosh, Jordan, Bell, Frances, Colhoun and Joyce2012) and Vierck et al. (Reference Vierck, Porter, Spittlehouse and Joyce2015) with independent samples of undergraduate students collected in the United States and Italy. Although the United States and Italy are considered “Western” and industrialized (Masuda et al., Reference Masuda, Batdorj and Senzaki2020), the predominant languages used in each country, a fundamental component of their respective cultures, differ phonologically and semantically. CVC trigrams eliminate the semantic aspect of words, and they hold the phonological structure constant across items. Beyond language, other critical cultural differences that could plausibly impact cognitive function exist between the US and Italy, such as access to education and health care (Petrelli et al., Reference Petrelli, Di Napoli, Demuru, Ventura, Gnavi, Di Minco, Tamburini, Mirisola and Sebastiani2020), nutritional status (Zhang et al., Reference Zhang, Wang, Fei, Zhou, Zheng, Wang, Huang, Jiang, Liu, Jiang and Yu2015), climate, and other material, social, and cultural resources (Marks et al., Reference Marks, Cresswell and Ainley2006). Thus, we investigated whether similar learning and memory processes would be observed using CVC stimuli in native speakers of two different languages from North America and Italy. If similar learning material across languages can evaluate the same memory processes, such a finding could lead to developing a standardized cross-cultural memory task. We hypothesized that both groups of participants would show a comparable and increasing number of items recalled over the five learning trials, a similar number of items retrieved on the first learning trial and the interference trial, and an equivalent ability to retain CVC material over a 20-min delay. We also hypothesized that the American and Italian samples would perform equally on all performance indexes.
Method
Participants
The American sample included 75 (21 males, 54 females) Wayne State University undergraduate psychology students. The Italian sample consisted of 104 undergraduate psychology students (38 males, 66 females) from the University of Bergamo. Participants were recruited through the online Sona System at Wayne State University or an online Google form at the University of Bergamo. All participants received research credit for participation. Exclusion criteria included reporting any current or previous history of cognitive or neurological deficit, major psychiatric disorders, or any concern that would make participating in the research study challenging. Informed consent was obtained from all participants after fully explaining the research protocols and before starting the experimental session. The individual studies were approved by the Institutional Review Boards of Wayne State University and the University of Bergamo, and the research was completed in accordance with the Helsinki Declaration.
Measures
During the 20-min delay, both groups completed a demographic and health history questionnaire. Salthouse’s Synonym and Antonym Test (Salthouse, Reference Salthouse1993) and the Center for Epidemiologic Screening-Depression Scale (CES-D) (Radloff, Reference Radloff1977) were also administered during the delay to examine general intellectual functioning and depressive symptoms, respectively, for the American sample only. No parallel standardized Italian measures were available for the Italian participants, but they were given unstandardized direct translations of the Synonym and Antonym test and CES-D to fill the 20-min time delay. The Synonym and Antonym Test presents 20 target words with five response options for each word. Examinees must choose the correct synonym for the first ten target words, followed by selecting the correct antonym of the remaining ten target words. The CES-D consists of 20 statements reflecting depressive symptoms rated regarding their frequency of occurrence during the past week in four ordinal categories (Rarely/None of the Time to Most of the Time). We evaluated the possible contribution of these two constructs to performance on the CVC learning task, given that depression has reportedly been associated with performance on the CVC task (Bourke et al., Reference Bourke, Porter, Carter, McIntosh, Jordan, Bell, Frances, Colhoun and Joyce2012; Vierck et al., Reference Vierck, Porter, Spittlehouse and Joyce2015) and crystallized intelligence is related to long-term memory (Hundal & Horn, Reference Hundal and Horn1977).
The CVC free recall task’s structure used the AVLT framework. The AVLT is multi-trial list-learning approach for assessing immediate and delayed episodic memory, attention, and concentration (Lezak et al., Reference Lezak, Howieson, Bigler and Tranel2012; Magalhães & Hamdan, Reference Magalhães and Hamdan2010). Trigrams were presented using the software program, PsychoPy (Version V3.0.0b11; Peirce et al., Reference Peirce, Gray, Simpson, MacAskill, Höchenberger, Sogo, Kastman and Lindeløv2019) on a Macintosh 27-inch 2015 iMac computer for the American study and an Apple MacBook Pro (13-inch, Mid 2012) for the Italian study. The learning material consisted of 15 test trigrams (List A) and 15 interference trigrams (List B). Each trigram consisted of three English letters in a consonant-vowel-consonant pattern. Candidate items were randomly generated and selected. Trigrams that were actual words, meaningful acronyms, popular abbreviations, homophones, or challenging to pronounce were excluded. The items were displayed in black, bold Arial font on a white background on the computer screen to enhance illumination.
Because there are only 21 letters in the Italian alphabet (the letters j, k, w, x, and y are not used), some of the original CVC trigrams that had originally been developed for the American group were substituted with new items for the Italian sample (10 items for List A and five for List B). Table 1 shows the items used for each list for the two groups.
a Denotes List A trigrams that differed between nationality samples.
b Denotes List B trigrams that differed between nationality samples.
Procedure
The American study involved one in-person study visit between May and December of 2019. The Italian study took place during the COVID-19 pandemic via an online Skype video call using screen sharing of the list words. For the American study, participants were seated directly opposite the examiner, with the computer monitor in a room illuminated by overhead fluorescent light. Before the start of the experiment, the experimenter briefed the participant on the study’s aims with the study information sheet. Informed consent was obtained, and all participants received a copy of this document. To begin the task, the participant faced the computer screen. Participants were seated about 91.44 cm (3 ft) from the computer screen. For the Italian online study, the experimenter briefed the participant on the study’s aims and presented an informed consent form to the participant. Once participants agreed to participate in the study after reading the informed consent, the study began. Participants were asked to be in a quiet environment and indicated whether the screen was visible before the experiment commenced.
The CVC trigrams used for the American and Italian samples adopted the AVLT paradigm (Rey, Reference Rey1958), replacing list words with CVC trigram nonwords. A 15-item list of nonword CVC trigrams (List A) was presented in a fixed order on five successive study-test recall trials. Trigrams appeared at a rate of one every 2 s, and participants read each item aloud as it was presented. At the end of each trial, participants were given a fixed period of 60 s to recall as many items as possible, and precisely 60 s separated the presentation of each trial. After the fifth trial, List B (consisting of a new 15-item list of nonword CVC trigrams) was presented, and participants were given 60 s to recall as many items as possible. Immediately after the recall of List B, participants were asked to identify as many trigrams from List A as possible within 60 s (Trial 6). Participants then completed a demographic questionnaire, the Salthouse Synonyms and Antonyms test, and the CES-D to fill a 20-min delay period. Next, for the delayed recall trial, participants were asked to recall as many items from list A as possible within 60 s. Performance indices included the number of items recalled on each trial (Trials 1 through 6, List B, and 20-min Delayed Recall), and the sum of items recalled across Trials 1 through 5.
Data analysis
Bayesian independent group t tests, correlations, and mixed-design analyses of variance (ANOVAs) were used to analyze the data. All analyses were conducted with JASP (JASP Team, 2023), version 0.17.1. A Bayesian analytic approach quantified the population parameter estimates most likely to underlie the observed data. The Bayes factor (BF10) evaluates the degree of support for the alternate hypothesis relative to the null hypothesis given the observed data. BF10 values between 3 and 10 suggest anecdotal evidence favoring the alternate hypothesis, while values greater than 10 strongly support the alternate hypothesis. In contrast, increasing support for the null hypothesis is indicated as BF10 values become smaller than 1/3. As Bayesian effect size indexes for ANOVAs are difficult to compute using available software, frequentist partial η2 effect sizes are presented to convey the magnitudes of the main and interaction effect sizes. The JASP default priors were as follows: ANOVAs used the uniform distribution as the prior; contingency table prior concentration was set to 1; t tests used the Cauchy prior scale 0.707; correlations used a stretched beta prior width of 1. Bayesian independent groups t-tests report the BF10, the median Standardized Mean Difference (SMD) effect size, and the 95% Credible Interval (CI) of parameter estimates likely to have given rise to the data. Small, medium, and large SMD effect size benchmarks are typically considered to be 0.2, 0.5, and 0.8, respectively. Small, medium, and large partial η2 effect size benchmarks are 0.01, 0.06, and 0.14, respectively. Because Bayesian analysis primarily focuses on the posterior distributions that are unaffected by the number of comparisons one wishes to make, and p-values are not used, corrections for multiple comparisons are unnecessary (Kruschke, Reference Kruschke2015).
Results
Demographic characteristics
The Italian sample (M age = 21.3 years, SD = 2.7 years, range = 18–29 years) was slightly older than the American sample (M age = 20.3 years, SD = 2.7 years, range = 18–30 years; BF10 = 3.7, SMD effect size = 0.37, 95% Credible Interval (95% CI) = 0.1–0.7). The American sample mean CES-D score was 15.6 (SD = 10.9, range = 0–56, 95% CI = 13.1–18.1) and the mean Synonym-Antonym score was 6.4 out of 20 (SD = 3.3, range = 1–16, 95% CI = 5.6–7.2). A Bayesian test of independence between nationality (Italy, USA) and biological sex produced a BF10 of 0.392, suggesting approximately equal distributions of men and women across the two studies.
Correlations between learning indexes, age, CES-D, and synonym-antonym test
For the overall sample (N = 179), there was no evidence of a relationship between age and performance on any learning trial or learning index, as evidenced by extremely small correlation coefficients (all r’s < 0.1) and BF10s (range = 0.09–0.24). Within the American sample only, none of the traditional learning measures correlated with the CES-D: Trial 1: r = 0.12 (BF10 = 0.2, 95% CI = −0.21 to 0.23); List B: r = 0.02 (BF10 = 0.15, 95% CI = −0.20 to 0.24); Trial 6: r = −0.12 (BF10 = 0.24, 95% CI = −0.41 to 0.02); Sum of Trials 1–5: r = −0.17 (BF10 = 0.42, 95% CI = −0.38 to 0.06); Delayed Recall: r = −0.21 (BF10 = 0.72, 95% CI = −0.41 to 0.02). Similarly, in the American sample, performance on the Synonym-Antonym Test was uncorrelated with these indexes: Trial 1: r = 0.11 (BF10 = 0.22, 95% CI = −0.12 to 0.32); List B (interference): r = 0.03 (BF10 = 0.15, 95% CI = −0.20 to 0.25); Trial 6: r = 0.23 (BF10 = 0.96, 95% CI = 0.0 to 0.43); Sum of Trials 1–5: r = 0.26 (BF10 = 1.72, 95% CI = 0.03–0.45); Delayed Recall: r = 0.21 (BF10 = 0.74, 95% CI = −0.02 to 0.41). Given the absence of correlations between the learning measures and age, depressive symptoms, or crystallized intelligence, they were not considered further as potential covariates.
Learning over trials
A 2 (Nationality: Italy, USA) × 2 (Sex: Male, Female) × 5 (Learning Trial: Trial 1 through Trial 5) Bayesian mixed-design ANOVA examined learning over trials. The dependent variable was the number of items correctly recalled on each trial. Of 19 possible models, including each combination of main and/or interaction effects, a model containing only the Learning Trial main effect had the highest BF10 (BFModel = 13.22, partial η2 = .56). This finding indicates that there was no evidence for main effects of Sex or Nationality, nor were there any other interaction effects, as illustrated in the model-averaged results table for each main and interaction effect in Table 2. Partial η2 values for all other main and interaction effects were 0.01 or less. The BF10 for the model including the Learning Trial main effect was 5.1 × 10+13, which indicates the robust evidence in the data for the model that only includes Learning Trial. The BF10s for the other main and interaction effects, were less than 0.28. Notably, there were no appreciable effects of Sex or Nationality on the learning curve.
Note: P(incl): prior probability associated with the plausibility of an effect before looking at the data; P(excl): prior probability associated with the implausibility of an effect before looking at the data; P(incl|data): posterior probability associated with the plausibility of an effect given the data; P(excl|data): posterior probability associated with the implausibility of an effect given the data; BFinclu: the change from the prior inclusion odds to the posterior inclusion odds for each effect, averaged by all models including the effect, broadly reflecting the support for the effect given the data.
Comparison of American and Italian samples on memory performance indexes
Table 3 presents the means, SDs, BF10s, and standardized mean group difference effect sizes from a series of independent groups Bayesian t-tests for each of the five learning trials, List B, Trial 6, Delayed Recall, and the sum of words recalled across Trials 1–5 for the American and Italian participants, separately. Figure 1 presents the means and 95% credible intervals for each index by each nationality. Except for performance on List B, no group differences on any other performance index were observed, as indicated by the minimal BF10 values. Italian participants performed slightly better than the American participants on List B by approximately one item. Except for List B, which showed a medium effect size, all performance indexes showed conventionally small or near-zero effect sizes.
Note: DR = Delayed Recall; T1–T5 Sum = Sum of words recalled across Trials 1–5; SD = standard deviation; SE = Standard Error; 95% Credible Interval reflects the lower and upper bounds of the 95% most likely mean value for each nationality; 95% Credible Interval for Effect Size reflects the upper and lower bounds for the 95% most likely standardized mean difference effect sizes.
Comparison of male and female performance on memory performance indexes
Table 4 presents the descriptive statistics for the AVLT learning and memory indexes separately for men and women, collapsed over nationality. Figure 2 displays the means and 95% credible intervals for each index in graphical form. As can be gleaned from the table, the small BF10 values suggest no group differences in memory performance between men and women. All effect sizes are near zero or in the conventionally small range.
Note: DR = Delayed Recall; T1–T5 Sum = Sum of words recalled across Trials 1–5; SD = standard deviation; SE = Standard Error; 95% Credible Interval reflects the lower and upper bounds of the 95% most likely mean value for each nationality; 95% Credible Interval for Effect Size reflects the upper and lower bounds for the 95% most likely standardized mean difference effect sizes.
Susceptibility to proactive and retroactive interference
Susceptibility to proactive interference was investigated using a 2 (Nationality: Italy, USA) × 2 (Trial: Trial 1, List B) mixed-design Bayesian ANOVA. The model-averaged results in Table 5 demonstrate a robust main effect of Nationality (inclusion BF10 = 20.8, partial η2 = 0.06), whereby the Italian sample performed better than the American sample, averaged over Trial 1 and List B. There was no evidence of susceptibility to proactive interference for the overall sample (Trial main effect inclusion BF10 = 0.1, partial η2 = 2.1 × 10−4) or a different pattern for the two Nationalities (Nationality × Trial interaction effect inclusion BF10 = 0.3, partial η2 = 0.02). These results are presented in Figure 3, whereby the Nationality main effect seems to be driven by the previously observed difference between the Italian and American samples, primarily on List B.
Note: P(incl): prior probability associated with the plausibility of an effect before looking at the data; P(excl): prior probability associated with the implausibility of an effect before looking at the data; P(incl|data): posterior probability associated with the plausibility of an effect given the data; P(excl|data): posterior probability associated with the implausibility of an effect given the data; BFinclu: the change from the prior inclusion odds to the posterior inclusion odds for each effect, averaged by all models including the effect, broadly reflecting the support for the effect given the data.
Another 2 (Nationality: Italy, USA) × 2 (Trial: Trial 5, Trial 6) Bayesian ANOVA examined possible pre and postinterference differences in recall that would reflect susceptibility to retroactive interference. Table 6 displays the model-averaged effects suggesting a robust main effect of Trial (inclusion BF10 = 1.4 × 10+14, partial η2 = 0.56), no main effect of Nationality (inclusion BF10 = 1.0; partial η2 = 4.8 × 10-4), and anecdotal evidence suggesting an interaction effect (inclusion BF10 = 3.9, partial η2 = 0.04). Figure 4 illustrates the means and 95% credible intervals for the two Nationality groups on the preinterference (Trial 5) and postinterference (Trial 6) recall trials. As indicated by the overlapping 95% credible intervals for the two Nationality groups, the evidence favoring an interaction does not appear compelling. In contrast, there does appear to be robust evidence in favor of susceptibility to retroactive interference for both groups, given the substantial performance decline following the presentation of the interference list.
Note: P(incl): prior probability associated with the plausibility of an effect before looking at the data; P(excl): prior probability associated with the implausibility of an effect before looking at the data; P(incl|data): posterior probability associated with the plausibility of an effect given the data; P(excl|data): posterior probability associated with the implausibility of an effect given the data; BFinclu: the change from the prior inclusion odds to the posterior inclusion odds for each effect, averaged by all models including the effect, broadly reflecting the support for the effect given the data.
Discussion
The primary aim of this cross-cultural study was to determine whether a list of nonword CVC trigrams presented in the framework of the AVLT (Rey, Reference Rey1958) could be used to evaluate learning and memory for native speakers of different languages. Our results suggested that this approach can assess auditory learning and delayed recall in native English and Italian speakers. No performance differences were observed between the Italian and American participants on any of the five learning trials, the postinterference recall trial, the sum of Trials 1–5, or the Delayed Recall trial. The Italian group performed better than the American participants only on List B by approximately one item. No evidence for susceptibility to proactive interference was observed for either group. In contrast, robust susceptibility to retroactive interference was observed for both groups. Susceptibility to retroactive interference is not commonly observed on the AVLT for healthy individuals in this age group, but it is more commonly seen after age 60 (Vakil et al., Reference Vakil, Greenstein and Blachstein2010). No performance differences were observed between men and women, which was surprising given the typical verbal memory advantage often observed for women (Asperholm et al., Reference Asperholm, Högman, Rafi and Herlitz2019; Crossley et al., Reference Crossley, D’Arcy and Rawson1997; Hirnstein et al., Reference Hirnstein, Stuebs, Moè and Hausmann2023). In addition, in the American sample, none of the memory outcome measures were related to age, depressive symptoms, as indexed by the CES-D, or crystallized intelligence, as measured by the Salthouse Synonym and Antonym Test.
Language is the most prominent issue when attempting to establish the cross-cultural equivalence of a standardized assessment. Standardized learning and memory assessments have historically tended to be heavily language-based and were developed primarily in “Western” cultures, thus biasing assessment results (Mushquash & Bova, Reference Mushquash and Bova2007). For example, the most commonly used learning and memory assessments are auditory verbal list-learning tasks, such as the AVLT paradigm (Rey, Reference Rey1958). Ideally, simple translations of the assessment items and instructions would yield comparable psychometric properties. However, previous research suggests that several significant considerations must be made during translation, and psychometric equivalence is not ensured (Cromer et al., Reference Cromer, Krishna, Nguyen, Acquadro and Fuller2013; Rendu et al., Reference Rendu, Caveney, Miner, Nomikos and Acquadro2012). Simple translations, especially for heavily language-based tasks or complex verbal instructions, can be problematic because word meaning and usage vary as a function of language and culture (Ardila, Reference Ardila and Shackleford2021). As a result, the direct translation of word learning tasks seems to be insufficient due to factors such as preexisting semantic connections, frequency and familiarity, different cultural approaches to testing and conceptualizations of memory, and other cultural nuances (Ardila, Reference Ardila and Shackleford2021; Leger & Gutchess, Reference Leger and Gutchess2021). These factors become further complicated when considering the role of language functioning in bilingual individuals (Rivera Mindt et al., Reference Rivera Mindt, Arentoft, Kubo Germano, D’Aquila, Scheiner, Pizzirusso, Sandoval and Gollan2008). Even within “Western” nations, ethnic diversity has dramatically increased in recent decades as international immigration has become more accessible. As a result, neuropsychologists must be prepared to encounter and assess individuals from diverse backgrounds (Franzen et al., Reference Franzen, Papma, van den Berg and Nielsen2021; Goudsmit et al., Reference Goudsmit, Uysal-Bozkir, Parlevliet, van Campen, de Rooij and Schmand2017). Thus, there is a pressing need for a more thorough development of standardized assessments of learning and memory that remain psychometrically useful across languages and, thus, cultures.
Our study offers a novel demonstration that performance on a verbal list-learning task of nonword CVC items was functionally equivalent across samples from the USA and Italy. Using nonword CVC items delivered in the AVLT paradigm precludes many issues inherent in language-based tasks, such as individual and cultural differences in preexisting semantic associations and familiarity with items. Thus, using unfamiliar nonwords standardized in form and length (e.g., CVC) reduces language-related factors, such as differences in phonology and semantics across languages, that may bias participant performance. Unlike most word list-learning tasks, participants read each item aloud as it was visually presented in our task. This procedure may have introduced a unique supportive learning process not seen in other measures. However, we still did not observe a ceiling effect. Despite these advantages, relatively little work has been done to modify the AVLT paradigm with nonword CVC items (Bourke et al., Reference Bourke, Porter, Carter, McIntosh, Jordan, Bell, Frances, Colhoun and Joyce2012; Vierck et al., Reference Vierck, Porter, Spittlehouse and Joyce2015). The two previous studies using a similar approach to our current work were conducted with samples from New Zealand. With the addition of our research to the limited extant literature, there is evidence from three distinct cultures that using nonword CVC items in the AVLT paradigm is a valid measure of learning and memory, with participants performing similarly across cultures. The current results suggest that this modified paradigm is a suitable cross-cultural measure of auditory-verbal learning and memory, given that similar stimuli evoked roughly equivalent responses across two distinct cultures and languages.
No significant disparities between males and females or learning, interference, or delayed recall trials were observed. This finding was somewhat surprising, as females tend to perform better on verbal memory tasks than males (Crossley et al., Reference Crossley, D’Arcy and Rawson1997; Geffen et al., Reference Geffen, Moar, O’Hanlon, Clark and Geffen1990; Gordon & Clark, Reference Gordon and Clark1974; Graves et al., Reference Graves, Moreno, Seewald, Holden, Van Etten, Uttarwar, McDonald, Delano-Wood, Bondi, Woods, Delis and Gilbert2017; Kimura & Seal, Reference Kimura and Seal2003; Kramer et al., Reference Kramer, Delis and Daniel1988; Norman et al., Reference Norman, Evans, Miller and Heaton2000; Weiss et al., Reference Weiss, Ragland, Brensinger, Bilker, Deisenhammer and Delazer2006; Woodard, Reference Woodard and Poreh2006). Bleecker et al. (Reference Bleecker, Bolla-Wilson, Agnew and Meyers1988) found that women outperformed men on most AVLT performance indexes. Interestingly, Kimura and Seal (Reference Kimura and Seal2003) found that females outperformed men in recalling actual words but not nonsense words. One possible reason for the absence of sex differences in performance may have been due to the nonsemantic and associative nature of the stimulus items. Because the CVC trigrams are equally unfamiliar to men and women, neither group had an advantage, resulting in no sex disparity in recall performance. These results imply that female superiority in verbal memory may result from using items that are familiar to the examinee, have preexisting associations, or have a unique semantic salience (e.g., “bird” may be more memorable than “kestrel” due to its greater frequency of use and because “bird” represents a superordinate category).
Performance on the CVC trigram memory test was also unrelated to age. This finding is unsurprising due to the restricted age range consisting only of younger adults. Future studies using this approach should consider the possible effects of age in a sample that spans a more extensive age range. More surprising, however, was the absence of a relationship between crystallized intelligence and memory performance. As noted earlier, crystallized intelligence has a demonstrated association with long-term memory (Hundal & Horn, Reference Hundal and Horn1977). In a more recent study, Rapport et al. (Reference Rapport, Axelrod, Theisen, Brines, Kalechstein and Ricker1997) showed strong relationships between Verbal IQ on the Wechsler Adult Intelligence Scale-Revised (Wechsler, Reference Wechsler1981) and learning indexes on the Wechsler Memory Scale-Revised (Wechsler, Reference Wechsler1987) and California Verbal Learning Test (Delis et al., Reference Delis, Kramer, Kaplan and Ober1987) in a young adult sample. The strong relationships between crystallized intelligence and conventional verbal memory scores imply a performance advantage for individuals with larger vocabularies or preexisting familiarity with the to-be-learned material. The absence of associations between crystallized intelligence and all learning indexes on the CVC trigram memory test in the American sample suggests that performance on this task may be a purer index of auditory memory, as the CVC trigrams are devoid of semantic information that could potentially confer any performance advantage.
Depression has been inconsistently related to verbal memory performance. Some studies show relatively intact memory functioning in depression (Egeland et al., Reference Egeland, Sundet, Rund, Asbjørnsen, Hugdahl, Landrø, Lund, Roness and Stordal2003; Hammar et al., Reference Hammar, Isaksen, Schmid, Årdal and Strand2011; Hammar & Årdal, Reference Hammar and Årdal2013), and others report impaired memory functioning (Chen et al., Reference Chen, Jiang, Wang, Ma, Li, Wu, Hashimoto and Gao2018; Lee et al., Reference Lee, Hermens, Porter and Redoblado-Hodge2012; Wang et al., Reference Wang, Xiongwei, Yang, Fan, Dou, Guo, Wang, Chen, Li and Ma2022), particularly for individuals with recurrent depression (Basso & Bornstein, Reference Basso and Bornstein1999). We found no relationships between CES-D score and memory performance on the CVC trigram test in the American sample, suggesting that it may be relatively insensitive to depressive symptoms. However, this finding should be interpreted cautiously, as our sample was relatively young and not a clinical group, which would likely restrict the range of CES-D scores. In addition, the two previous studies using the CVC trigram memory test (Bourke et al., Reference Bourke, Porter, Carter, McIntosh, Jordan, Bell, Frances, Colhoun and Joyce2012; Vierck et al., Reference Vierck, Porter, Spittlehouse and Joyce2015) did observe modest but significant relationships with other measures of depression in older clinical and community samples (Bourke et al., Reference Bourke, Porter, Carter, McIntosh, Jordan, Bell, Frances, Colhoun and Joyce2012: mean age approximately 38 years, range 18–65 years; Vierck et al., Reference Vierck, Porter, Spittlehouse and Joyce2015: mean age not reported, range 49–51 years) and they included participants meeting criteria for major depressive disorder. Thus, greater depression severity would be expected in those studies compared to our study where major depressive disorder was an exclusion criterion. The other two studies also used a faster presentation time than in our study (1 s/item versus 2 s/item), and they provided external auditory presentation of list items instead of having participants read the items aloud. Future cross-cultural research with this measure should evaluate the possible effects of depression on performance in both community and clinical samples with a broad age range.
The single significant nationality difference across individual trials reflected better List B performance for the Italian participants than for the American group. Although the mean difference amounted to only one word, it was associated with a medium effect size. While this isolated difference is puzzling, possible explanations could include a possible advantage conferred by online administration of the task or slightly better performance associated with the trigram set composition of List B for the Italian sample than the trigrams used for List B in the American group. A larger working memory capacity in the Italian sample relative to the American sample might be a third possibility that could be directly tested in future research.
Limitations and future directions
Though the results of this study provide preliminary evidence supporting the cross-cultural use of CVC trigrams to assess aspects of auditory-verbal learning and memory, several limitations must be considered. First, the administration contexts differed for the two nationality groups, as the American participants were tested in person. In contrast, the Italian participants were tested online due to the COVID-19 pandemic. However, there were no group differences on any learning index, apart from List B, and this difference was relatively modest. This finding seems to imply that the administration format did not systematically influence CVC trigram learning task performance. Second, the CVC word lists shared many to most of the same items, but they were not identical because the original list of CVC trigrams used with the American sample included trigrams containing certain letters not used in the Italian alphabet. Again, except for List B, no systematic performance differences were observed between the two groups despite having somewhat different trigrams, suggesting that the specific composition of the individual trigrams may have minimal to no influence on performance. Nevertheless, future work with the cross-cultural application of the CVC trigram learning task should consider differences in alphabet composition and other phonological differences across languages when constructing a list of trigrams. In doing so, a common list of items could be developed for use across several languages, assuming they use a common set of Latin/Roman alphabetical characters. This approach may be more challenging to apply in languages that use non-Latin/Roman alphabetical characters. Ease of pronunciation of CVC trigrams across languages should also be carefully considered. Some trigrams may be challenging to enunciate for speakers of some languages or dialects. Finally, the trigram memory task was not as limited by a ceiling effect as some word list-learning measures. However, CVC trigrams assess pure phonological verbal memory due to the nonsemantic nature of the items. The extent to which phonological verbal memory is sensitive to the effects of aging, neurodegenerative conditions, or brain injury as word list learning would be worthwhile to investigate in future research. A direct comparison of this novel task with existing word list-learning measures in the same participants would be helpful to determine similarities and differences between the learning processes tapped by each measure. Because the task removes semantic information but holds item phonology constant, it will be interesting to contrast this measure’s sensitivity to preclinical Alzheimer’s disease with existing word list-learning tasks. Finally, using this procedure with individuals from “non-Western” and less industrialized geographical regions would be an essential next step to establishing further the clinical utility of this approach.
Conclusion
Typical word list-learning measures that require examinees to recall words after the presentation of the word list commonly use semantically associated or nonassociated words or material already familiar to the learner. Because the word list stimuli typically include everyday items or high-frequency words, they may be less difficult to recall than nonsense syllables, especially among younger adults at the peak of their cognitive abilities. Nonword CVC trigrams are nonsemantic, eliminating familiarity with and preexisting associations among items. CVC trigram lists have a particular advantage over word lists in that the same common core set of trigrams can potentially be constructed to assess phonological memory across speakers of many different languages that use the same Latin/Roman alphabetical characters. The present study’s findings suggest that using nonword CVC trigrams in a list-learning paradigm can overcome the challenges posed by preexisting familiarity and associations among words used in traditional list-learning measures of memory. This approach can also be used for cross-cultural memory assessment, as we found minimal differences in performance between Italian and American young adults. Moreover, the task appeared to assess learning and memory equally well for males and females, regardless of nationality. Finally, the trigram memory test reduces the likelihood of ceiling effects, minimizes sex differences in verbal memory performance and performance advantages conferred by extensive vocabulary knowledge, and lends itself to creating many different alternate forms.
Acknowledgements
The authors gratefully acknowledge the assistance of Rebecca Campbell, Jonathan Lundblad, Mike Shihadeh, and Jonathan Sober from Wayne State University (USA) for their help with data collection.
Competing interests
The authors have no competing interests or sources of financial support to declare.