INTRODUCTION
The generalizability of behavioral research depends on random recruitment and freedom from selection bias. However, neuropsychological (NP) testing is an uncommon medical procedure that can be arduous and time consuming. If patients are not adequately reimbursed for their time, their motivation to participate may stem from idiosyncratic characteristics, thus creating a selection bias. In the multiple sclerosis (MS) literature, the recruitment context in NP research is often poorly described. Nelson and colleagues (1998) compared clinical and demographic measures between two MS groups: (a) those treated at a center of specialty care of MS and (b) patients randomly recruited from the surrounding community and treated by a non-MS-specialist neurologist. Patients treated at the center were younger, had more mobility impairment, had earlier diagnosis supported by laboratory tests, and more often reported recent disease worsening. The authors cautioned researchers that limiting research to MS center patients could lead to erroneous conclusions pertaining to demographics, disease course and clinical presentation. Another study, conducted in the university setting, investigated methods of recruiting healthy volunteers (Tomporowski et al., 1993). Cognitive tests obtained from university psychology students required to participate for class credit were compared to non-psychology students paid to undergo testing. The latter group performed better on tests of attention and memory. These studies suggest that differences in recruitment strategy and incentives for participation may seriously impact the conclusions drawn from NP research.
During the course of a recent cross-sectional study of 291 MS patients (Benedict et al., 2006), we found that seven NP tests discriminated MS patients from healthy controls, and relapsing-remitting (RR) from secondary-progressive (SP) patients. Tests that emphasized episodic memory and processing speed were most sensitive and logistic regression models revealed that tests emphasizing processing speed, verbal memory, and executive function were predictive of vocational status. Depression, was not significantly related to cognitive performance, consistent with some previous findings (Benedict et al., 2002a) but not others (Arnett et al., 1999; Arnett et al., 2002). Patients were gleaned from a number of different sources as documented prior to testing. Therefore, we were able to retrospectively categorize patients into groups based on their recruitment context and incentive to participate. We decided to investigate whether such bias may affect conclusions drawn about commonly asked research questions in the MS literature concerning the frequency of cognitive impairment, correlation with depression, and external validity.
METHOD
The data from Benedict et al. (2006), including 291 patients diagnosed with clinically definite MS and 56 normal controls, were reanalyzed. Exclusionary criteria were (a) medical disorder other than MS affecting cognitive function, (b) psychiatric disorder (American Psychiatric Association, APA, 2000) other than mood, personality, or behavioral change following the onset of MS, (c) drug or alcohol dependence or current abuse, (d) motor or sensory impairment that could compromise testing, or (e) relapse or corticosteroid treatment within four weeks of assessment. Prior to participation, all research participants signed consent forms approved by institutional review boards.
Patients were coded on a new variable called “recruitment context.” The Research Volunteer (RV) group included patients volunteering to undergo NP testing for financial compensation (n = 57); the Routine Monitoring (RM) group consisted of clinical patients referred for evaluation in order to monitor for changes in cognitive capacity (n = 106), and patients referred for evaluation of a specified clinical problem [repeated complaints of impairment (n = 64), alleged failure in work place (n = 31), differential diagnosis or psychiatric co-morbidity (n = 33)] comprised the Clinically Complex (CC) group (n = 128).
The MACFIMS (Minimal Assessment of Cognitive Function in MS) neuropsychological test battery was employed, as recommended by a consensus panel (Benedict et al., 2002b) and recently validated in MS (Benedict et al., 2006). The Controlled Oral Word Association Test (COWAT) (Benton & Hamsher, 1989) and the Judgment of Line Orientation Test (JLO) (Benton et al., 1994) were used to assess language and spatial processing abilities, respectively. We used two indices from each of two memory tests, the Total Learning (TL) and Delayed Recall (DR) scores. The California Verbal Learning Test—second edition (CVLT2) (Delis et al., 2000) was employed for auditory/verbal memory and the Brief Visuospatial Memory Test–Revised (BVMTR) (Benedict, 1997) was used for visual/spatial memory. Rao adaptations (Rao et al., 1991a) of the Symbol Digit Modalities Test (SDMT) (Smith, 1982) and the Paced Auditory Serial Addition Test (PASAT) (Gronwall, 1977) were used to assess processing speed and working memory. Number correct on the 3.0 and 2.0 ISI PASAT were added together and coded as a single score. The Delis-Kaplan Executive Function System Sorting Test (DKEFS) (Delis et al., 2001) was employed for the assessment of executive function, including the Correct Sorts and Descriptions scores. Finally, the Beck Depression Inventory—Fast Screen (BDIFS) (Beck et al., 2000), validated in the MS population (Benedict et al., 2003), was included as an index of depression.
As in our previous work (Benedict et al., 2006), patients were classified into two groups based on their employment/disability status. Working patients were employed at least 20 hours per week. Disabled patients were receiving formal disability benefits from either public or private sources or were unemployed for reasons reported by them or informants to be disease-related. The classification was determined prior to NP testing.
The analysis plan was to examine common research questions in the MS literature using conventional statistical techniques and determine if the conclusions reached would differ across the motivation groups (RV, RM, CC). We employed an alpha criterion of p < .05 for statistical significance throughout. ANOVA and Chi-square tests were utilized to compare disease characteristics. Effect sizes were calculated using Cohen's d statistic. Following our previous work (Benedict et al., 2006), z-scores were calculated for each individual NP test based on a demographically matched sample of 56 healthy control volunteers. Patients were classified as impaired if they performed in the impaired range (≤1.5 standard deviations below normal controls) on two or more NP tests. Age adjusted z-scores were also calculated. Using the control group, we regressed raw scores on age and saved the standardized residuals for each NP measure. We then applied the regression equations from the healthy controls to calculate age predicted raw scores for each MS patient. These predicted scores were subtracted from the obtained scores and the differences were divided by the standard deviation of the control group's residuals.
In the first analysis we asked the question, what is the frequency of cognitive impairment in MS? To answer the question we used chi-square tests to determine if the proportion of cognitively-impaired patients differed across group.
Next, we asked, is cognitive impairment associated with depression in MS? The frequency of depression was compared statistically across groups using Chi-square tests. Partial correlations were examined between BDIFS and each cognitive measure controlling for age.
Finally, ANOVA was used to compare differences in mean performance on NP tests between two vocational sub-groups of patients (employed vs. disabled). In the Benedict et al., 2006 paper, tests emphasizing memory, processing speed, and executive function were most valid for discriminating MS patients from healthy controls, and were most predictive of vocational status. Thus, we limited our analysis to tests in these domains to address this question.
RESULTS
RV patients were significantly younger (mean age ± SD; RV = 42.4 ± 8.5, RM = 46.8 ± 9.3, CC = 45.6 ± 8.6) than those in the RM group [F(3,343) = 3.53, p = .015]. The three groups did not differ in educational attainment, gender, or ethnicity. Expanded Disability Status Scale (Kurtzke, 1983) scores were also similar across group: RV 2.7 ± 1.9, RM 3.2 ± 1.8, CC 3.1 ± 1.8; p =.33. Chi-square analysis showed that the percentage of patients with progressive disease course was significantly greater in the CC group than in the RV group [χ2(1) = 4.46, p = .035]. There were no group differences in disease duration.
Analysis A, Frequency of Cognitive Impairment
Frequency of impairment was 45.6% for RV patients, 59.4% for RM and 65.6% for CC patients. Chi-square analysis showed that the percentage in the RV group was significantly lower than in the CC group [χ2(1) = 6.55, p = .010]. Frequencies of cognitive impairment for each individual test can be found in Table 1. Chi-square analysis showed that more CC patients were impaired on CVLT2-DR as compared to the RV group (21%) [χ2(2) = 5.99, p = .05] and more CC patients were impaired on the PASAT than RV patients [χ2(2) = 7.42, p = .024]. ANOVAs evaluating age-corrected z-scores revealed group differences on the PASAT [F(2,288) = 8.52, p < .001]. Tukey post-hoc analyses showed that the RV group performed significantly better than the RM and CC groups.
Analysis B, Correlation with Depression
As in previous work (Benedict et al., 2003) patients were considered to be depressed if their score on the BDIFS was >3 (Beck et al., 2000). ANCOVA, with age as the covariate, comparing group means on the total score of the BDI-FS showed no significant group differences. The incidence of depression did not differ significantly between group: RV = 35.1%, RM = 47.2%, CC = 44.5%. Significant partial correlations (controlling for age) between the BDI-FS and NP tests were only revealed within the RV group: JLO (r = −.49, p < .001), BVMTR-DR (r = −.32, p = .015), PASAT (r = −.33, p = .013), SDMT (r = −.42, p = .001), DKEFS Sorting Test, correct sorts (r = −.40), p = .002.
Analysis C, Predicting Vocational Status
Chi-square analyses revealed significant differences [χ2 (2) = 6.59, p =.037] for the proportion of disabled participants in each group. The proportion of disabled participants in the RV group (39.39%) was significantly lower than that of RM (62.11%) and CC (63.63%) patients [χ2 (1) = 5.13, p =.023 and χ2(1) = 6.13, p =.013, respectively]. ANOVAs using the age-corrected cognitive data were conducted within each MS group to examine the effect of NP test on vocational status (Table 2). Within the RV group, significant differences between vocational sub-groups were revealed on the SDMT [F(1,31) = 4.85, p = .035]. Significant differences were noted on CVLT2-TL [F(1,93) = 8.22, p = .005], CVLT2-DR [F(1,93) = 8.48, p = .004], SDMT [F(1,93) = 8.11, p = .005], DKEFS Sorting-Correct Sorts [F(1,93) = 4.80, p =.031], and DKEFS Sorting Test-Description Score [F(1,93) = 7.69, p = .007] within the RM group. Significant differences between groups were noted on every NP test except the COWAT and JLO in the CC group, with p-values ranging from p = .007 to p < .001.
DISCUSSION
We investigated three basic questions addressed in many previously published MS studies and determined if the results would differ depending on the reasons patients were brought to the NP testing milieu. Patients were categorized into research volunteers (RV), clinical patients seen for routine monitoring of cognitive capacity (RM), and patients seen for more complex referral questions pertaining to differential diagnosis, psychiatric co-morbidity, determination of vocational capacity, and alike. We found that substantially different conclusions would be drawn about the frequency of cognitive impairment in MS, the relationship between depression and cognitive impairment, and to a lesser extent, the external validity of NP testing in MS, depending on the group studied.
Were our study limited to research volunteers, we would conclude that the frequency of cognitive impairment in MS is roughly 46%, a figure approximating the 43% figure reported in Rao et al.'s (1991a) seminal study of 100 MS patients recruited by advertisement from a US metropolitan area. Rao et al. (1991a) indicated that subjects were paid for their participation, but they did not disclose how much. In our study, patients were drawn from multiple studies and paid at a rate of roughly $20 per hour. It is important to consider this point. If research volunteers are not paid at a rate commensurate to or higher than their regular income, other factors may be motivating them to participate. NP testing requires considerable effort and patients who are not cognitively impaired may be less threatened and more willing to volunteer their time to please doctors/researchers or help in the cause of understanding their disease. We did not query each volunteer about these issues. Presumably, a mixture of incentives contributed to the composition of the RV group.
The RM group seems most representative of the MS population. These patients were referred to monitor NP status (cognitive and psychiatric) such that changes would be detected as early as possible. Indeed, this was the intent of the MACFIMS battery: a streamlined collection of tests known to be sensitive to MS associated impairment and administered in a relatively short time. Conclusions reached from the RM group data were intermediate between the RV and CC groups. For this group, the frequency of cognitive impairment was 59% and roughly ½ of the NP tests discriminated employed from disabled sub-groups.
The relationship between depression and cognitive impairment has been addressed in many MS studies. In general, correlations between NP testing and depression are modest, sometimes significant (e.g., Arnett et al., 1999; Arnett et al., 2002), sometimes not (e.g., Benedict et al., 2002a). Our analysis provided an opportunity to learn if such variation is related to recruitment context. While there were no significant differences in the incidence of depression, partial correlations between NP testing and BDIFS, were significant only in the RV group. One explanation concerns the etiology of depression. The RV group may be composed of patients who are depressed in response to increased coping demands following the onset of cognitive impairment. In other words, these patients may have reactive depression which motivates them to seek information about their mental state. Alternatively, these RV patients with time to volunteer for research may have less complicated disease, fewer side effects from pharmacological therapy, and less co-morbid illness. Fewer complications may facilitate the detection of cognition-depression associations in statistical analysis.
Finally, we examined the external validity of NP testing by searching for tests that distinguish employed vs. disabled patients. Technically, the same conclusion is reached within each group: there is a significant relationship between cognitive testing and work capability as shown in prior research (Benedict et al., 2006; Rao et al., 1991b). However, the effects are more consistent and stronger among the RM and CC patients. These patients are more cognitively impaired, hence the enhanced sensitivity of the analysis in these groups. Among patients in the RM group, we found that verbal memory, processing speed and higher executive function defects were significantly predictive of work disability.
In sum, this re-analysis of the Benedict et al. (2006) data highlights the importance of measuring and reporting the recruitment context and incentives to participate in NP testing. Such bias may lead to very different conclusions about fundamental questions in the neuropsychology of MS.
ACKNOWLEDGMENT
We would like to acknowledge the financial support of Biogen, Inc. for a portion of this study.