Introduction
Apathy, defined as a reduction in goal-directed behavior (Miller et al., Reference Miller, Robert, Ereshefsky, Adler, Bateman, Cummings, DeKosky, Fischer, Husain, Ismail, Jaeger, Lerner, Li, Lyketsos, Manera, Mintzer, Moebius, Mortby, Meulien, Pollentier, Porsteinsson, Rasmussen, Rosenberg, Ruthirakuhan, Sano, Zucchero Sarracini and Lanctôt2021), is the most common neuropsychiatric symptom in Alzheimer’s disease (AD) (Zhao et al., Reference Zhao, Tan, Wang, Jiang, Tan, Tan, Xu, Li, Wang, Lai and Yu2016). It is associated with worse outcomes including impaired quality of life, faster disease progression, need for institutional care, and increased mortality, leading to increased caregiver burden and cost of care (Kruse et al., Reference Kruse, Maier, Spottke, Bach, Bartels, Buerger, Fellgiebel, Fliessbach, Frölich, Hausner, Hellmich, Klöppel, Klostermann, Kornhuber, Laske, Peters, Priller, Richter‐Schmidinger, Schneider, Shah‐Hosseini, Teipel, von Arnim, Wiltfang, van der Wurp, Dodel and Jessen2023). As such, apathy is an important target for treatment (Mortby et al., Reference Mortby, Adler, Agüera-Ortiz, Bateman, Brodaty, Cantillon, Geda, Ismail, Lanctôt, Marshall, Padala, Politis, Rosenberg, Siarkos, Sultzer and Theleritis2022). Apathy has been shown to respond to several treatments such as methylphenidate (Mintzer et al., Reference Mintzer, Lanctôt, Scherer, Rosenberg, Herrmann, van Dyck, Padala, Brawman-Mintzer, Porsteinsson, Lerner, Craft, Levey, Burke, Perin, Shade, Michalak, Ni, Good, Mecca, Salem‐Spencer, Keltz, Morales, Clark, Williams, Kindy, Freeman, Jamil, Schultz, Sami, Padala, Parkes, Lah, Vaughn, Hales, Rapoport, Gallagher, Li, Sandra, Vieira, Ruthirakuhan and Babani2021; Padala et al., Reference Padala, Padala, Lensing, Ramirez, Monga, Bopp, Roberson, Dennis, Petty, Sullivan and Burke2018a; Rosenberg et al., Reference Rosenberg, Lanctôt, Drye, Herrmann, Scherer, Bachman and Mintzer2013), repetitive transcranial magnetic stimulation (Padala et al., Reference Padala, Padala, Lensing, Jackson, Hunter, Parkes, Dennis, Bopp, Caceda, Mennemeier, Roberson and Sullivan2018b), and possibly cholinesterase inhibitors (Lopez et al., Reference Lopez, Mackell, Sun, Kassalow, Xu, McRae and Li2008). While treatments for apathy have shown potential, there is currently little evidence to determine whether the observed change in apathy symptoms on rating scales is clinically meaningful.
Clinical trials for apathy assessed outcomes with a variety of scales such as the informant-rated apathy sub-scale of the Neuropsychiatric Inventory (NPI-A) (Cummings et al., Reference Cummings, Mega, Gray, Rosenberg-Thompson, Carusi and Gornbein1994), Apathy Evaluation Scale (AES-I) (Marin, et al., Reference Marin, Biedrzycki and Firinciogullari1991), or the clinician rated Dementia Apathy Interview and Rating scale (DAIR) (Strauss and Sperry, Reference Strauss and Sperry2002), among others. These rating scales assess apathy symptoms (including loss of interest, loss of emotional expression, and difficulty initiating activities) by administering questionnaires to the clinician, care partner, or patient. These scales were developed to assess the severity of apathy symptoms but it is not known whether they are sensitive to changes in symptom severity, and if so, what threshold of a change score is clinically meaningful. The need for such thresholds was recognized for evaluating treatments for AD (Robert et al., Reference Robert, Ferris, Gauthier, Ihl, Winblad and Tennigkeit2010) and have been developed for outcome measures such as composite scores of cognition assessed on neuropsychological batteries and for integrated multidimensional measures (Lansdall et al., Reference Lansdall, McDougall, Butler, Delmar, Pross, Qin, McLeod, Zhou, Kerchner and Doody2023; Papp et al., Reference Papp, Buckley, Mormino, Maruff, Villemagne, Masters, Johnson, Rentz, Sperling and Amariglio2020; Robert et al., Reference Robert, Ferris, Gauthier, Ihl, Winblad and Tennigkeit2010; Schneider and Goldberg, Reference Schneider and Goldberg2020; Siemers et al., Reference Siemers, Holdridge, Sundell and Liu‐Seifert2016). Given the growing interest in developing treatments for apathy (Mortby et al., Reference Mortby, Adler, Agüera-Ortiz, Bateman, Brodaty, Cantillon, Geda, Ismail, Lanctôt, Marshall, Padala, Politis, Rosenberg, Siarkos, Sultzer and Theleritis2022), there is a similar need to identify thresholds of meaningful within-person change on apathy scales to evaluate treatment response.
Thresholds for meaningful change on clinical outcome measures can be estimated with anchor-based or distribution-based methods. Anchor-based methods utilize an anchoring measure of within-person change in symptom severity compared to pretreatment level such as a rating of improvement (or deterioration) by a clinician or care partner or patient. Those ratings are used to establish a threshold for meaningful change on a target measure that also assesses the same symptom concept. In contrast, distribution-based methods use the variance (standard deviation, effect sizes, etc.) in scores among participants on a target measure to determine a threshold for minimum clinically important difference at a group level. The former approach is favored by regulatory bodies as the symptom improvement due to treatment is determined for each individual rather than at the group level (Food & Drug Administration, 2019). However, distribution-based methods can provide supporting evidence for a threshold derived by an anchor-based method.
The aim of this study was to determine the minimal clinically important difference (MCID) thresholds for commonly used apathy scales (NPI-A, DAIR, and AES-I) using data from two randomized trials for treating apathy in AD.
Methods
Participants
This study analyzed data from the Apathy in Dementia Methylphenidate Trial (ADMET) (NCT01117181) and ADMET 2 (NCT02346201), which were phase 2 and 3, multicenter, randomized, double-blind, placebo-controlled, parallel group studies of methylphenidate for the treatment of clinically significant apathy in patients with mild-to-moderate AD (Mintzer et al., Reference Mintzer, Lanctôt, Scherer, Rosenberg, Herrmann, van Dyck, Padala, Brawman-Mintzer, Porsteinsson, Lerner, Craft, Levey, Burke, Perin, Shade, Michalak, Ni, Good, Mecca, Salem‐Spencer, Keltz, Morales, Clark, Williams, Kindy, Freeman, Jamil, Schultz, Sami, Padala, Parkes, Lah, Vaughn, Hales, Rapoport, Gallagher, Li, Sandra, Vieira, Ruthirakuhan and Babani2021; Rosenberg et al., Reference Rosenberg, Lanctôt, Drye, Herrmann, Scherer, Bachman and Mintzer2013). The studies were conducted between 2010 and 2011 (ADMET) and 2016–2020 (ADMET 2). Inclusion and exclusion criteria for both studies were identical and are published elsewhere (Drye et al., Reference Drye, Scherer, Lanctôt, Rosenberg, Herrmann, Bachman and Mintzer2013; Scherer et al., Reference Scherer, Drye, Mintzer, Lanctôt, Rosenberg, Herrmann, Padala, Brawman-Mintzer, Burke, Craft, Lerner, Levey, Porsteinsson and van Dyck2018); key criteria were as follows: diagnosis of possible or probable AD based on the National Institute of Neurological and Communicative Disorders and Stroke – Alzheimer’s Disease and Related Disorders Association (McKhann et al., Reference McKhann, Drachman, Folstein, Katzman, Price and Stadlan1984); Mini-Mental State Exam (MMSE) score of 10–28; clinically significant apathy as evidenced by an NPI-A score of 4 or more. Participants were excluded if they had Major Depressive Episode as defined in the Diagnostic and Statistical Manual of Mental Disorders – IV (TR); had clinically significant agitation/aggression, delusions, or hallucinations on the NPI; recent changes to AD or antidepressant medications; use of trazodone >50 mg or lorazepam >0.5 mg for indications other than insomnia; failure to respond to past methylphenidate treatment for apathy; current or recent use of amphetamines, antipsychotics, bupropion, monoamine oxidase inhibitors, tricyclic antidepressants; need for acute psychiatric hospitalization or were suicidal (Drye et al., Reference Drye, Scherer, Lanctôt, Rosenberg, Herrmann, Bachman and Mintzer2013; Scherer et al., Reference Scherer, Drye, Mintzer, Lanctôt, Rosenberg, Herrmann, Padala, Brawman-Mintzer, Burke, Craft, Lerner, Levey, Porsteinsson and van Dyck2018).
Intervention
Both ADMET and ADMET 2 consisted of a treatment and a placebo arm. Participants receiving methylphenidate were titrated to a target dose of 20 mg/day. The placebo arm received identical appearing and tasting over-encapsulated pills. Both groups also received a psychosocial intervention consisting of counseling, provision of education material on AD, its course, behavioral symptoms and treatment expectations, and 24-hours crisis support for the care partner. The intervention was given for 6 weeks (ADMET) or 6 months (ADMET 2). Both ADMET and ADMET 2 found methylphenidate to be a safe and efficacious treatment for apathy in AD (Mintzer et al., Reference Mintzer, Lanctôt, Scherer, Rosenberg, Herrmann, van Dyck, Padala, Brawman-Mintzer, Porsteinsson, Lerner, Craft, Levey, Burke, Perin, Shade, Michalak, Ni, Good, Mecca, Salem‐Spencer, Keltz, Morales, Clark, Williams, Kindy, Freeman, Jamil, Schultz, Sami, Padala, Parkes, Lah, Vaughn, Hales, Rapoport, Gallagher, Li, Sandra, Vieira, Ruthirakuhan and Babani2021; Rosenberg et al., Reference Rosenberg, Lanctôt, Drye, Herrmann, Scherer, Bachman and Mintzer2013).
Measures
Clinical Global Impression of Change in Apathy symptoms (CGIC-A): The CGIC-A, administered in both studies, was used as the anchor measure. It is a 7-level ordinal scale where each level indicates a clinically meaningful change in symptoms compared to baseline and includes “Marked Improvement,” “Moderate Improvement,” “Minimal Improvement,” “No Change,” “Minimal Worsening,” “Moderate Worsening,” and “Marked Worsening.” It is rated by a clinician. The CGIC-A was a co-primary outcome in the ADMET 2 study.
Neuropsychiatric Inventory – apathy sub-scale (NPI-A): The NPI consists of twelve items that assess behavioral symptoms in the preceding four weeks including apathy and is administered to the care partner (Cummings et al., Reference Cummings, Mega, Gray, Rosenberg-Thompson, Carusi and Gornbein1994). Each item is first rated for presence, and then for frequency on a 4-point scale [“Rarely,” “Sometimes,” “Often,” “Very Often”] and severity on a 3-point scale [“Mild,” “Moderate,” “Severe”]. The frequency × severity score yields the symptom score [range 0–12] with higher scores indicating more apathy. The NPI-A was a co-primary outcome in the ADMET 2 study.
Dementia Apathy Interview and Rating (DAIR): The DAIR is a 16-item questionnaire that is administered to the care partner by a clinician. It assesses apathy symptoms on a 4-point scale [0–3] to rate how often a specific behavior was observed in the last month, and whether the behavior was changed from its pre-illness level. The final score is a mean calculated as the sum of only those item scores where a change in symptoms from their pre-illness level is indicated, divided by the number of included items [range 0–3] (Strauss and Sperry, Reference Strauss and Sperry2002).
Apathy Evaluation Scale – Informant (AES-I): The AES-I is an 18-item questionnaire with each item rated on a 4-point scale by the care partner. Individual item scores are summed to yield a total score [range: 18–72] with higher scores indicating more apathy (Clarke et al., Reference Clarke, Rvan Reekum, Simard, Streiner and Freedman2007; Marin et al., Reference Marin, Biedrzycki and Firinciogullari1991). The AES-I was the primary outcome in the ADMET study.
Mini Mental State Examination (MMSE): The MMSE is a screening instrument administered by a clinician for rapid assessment of cognitive functions. It consists of 11 items summed for a total score ranging from 0 to 30. Higher scores indicate better cognition (Folstein et al., Reference Folstein, Folstein and McHugh1975). Participants were categorized into “Mild” or “Moderate” AD if their MMSE score was 20 or more or between 10 and 19, respectively (Wattmo et al., Reference Wattmo, Wallin and Minthon2013).
Timepoints: In ADMET, the NPI-A was completed at baseline and at the 6 week endpoint. The AES-I was collected at baseline and at the 2-, 4-, and 6-week visits. In ADMET 2, the NPI-A and DAIR were administered at baseline and each month (months 1 to 6). In both studies, the CGIC-A was administered at each visit beginning with the first follow-up visit. For this study, all visits were analyzed together as the aim was to determine the change scores on the target measures corresponding to the clinical ratings. Visit-level data are provided in table S1 published as supplementary material online attached to the electronic version of this paper at https://www.cambridge.org/core/journals/international-psychogeriatrics
Statistical analyses
The CGIC-A served as the anchor measure and the NPI-A, DAIR, and AES-I were the target measures. Thresholds of meaningful within-person change in target measures were estimated by anchor-based analyses and supported by distribution-based analyses. Analyses were conducted for all timepoints with available data from both studies. Characteristics of measures of apathy were assessed by descriptive statistics (mean [standard deviation (SD)] or median values, proportions of categorical variables including floor and ceiling scores for target measures). Spearman correlations between CGIC-A ratings and change scores were assessed to determine appropriateness of anchor-based analyses, for which a correlation strength of ≥0.3 between the anchor and target measures is suggested (Lansdall et al., Reference Lansdall, McDougall, Butler, Delmar, Pross, Qin, McLeod, Zhou, Kerchner and Doody2023). To estimate thresholds of change, the following linear mixed model was used: change in target measure ∼1 + CGIC-A + (1|Subj) to estimate the least square means (LS means) and [standard deviation (SD)] of change on the target measures for each change level of the CGIC-A using data from all visits. In addition, the mean and SD of change on the target measures at each visit for each rating level on the CGIC-A were determined.
Supporting the anchor-based estimates of within-person change on target measures, the empirical cumulative distribution function (eCDF) for each CGIC-A rating level across all visits was plotted. For levels that contained ≥10 participants, 95% confidence intervals for the eCDF were calculated based on the Kolmogorov–Smirnov’s D distribution, which produces a continuous distribution band (Bickel and Doksum, Reference Bickel and Doksum2015), and was implemented with the function “ecdf.ksCI” in the “sfsmisc” package in R. A greater separation between the eCDF for CGIC-A levels at a change score on a target measure suggests a better ability to indicate a clinically meaningful benefit at that score, which determined the within-person thresholds for improvement. As both ADMET and ADMET 2 showed positive results and provided a psychosocial intervention for the drug and placebo groups, thresholds were only determined for improvement.
For distribution-based analyses, the test-retest reliability of the target measures was first determined by the intra-class correlation coefficient (ICC) with a two way random model, single measurement and consistency definition between a target measure assessed at baseline and at the first follow-up visit among participants who were rated as “No Change” on the CGIC-A. A correlation of ≥0.7 is considered to provide adequate reliability. Distribution-based measures utilize the between-subject variance in a target measure to determine the minimum clinically important difference. These include half-SD [SD * 0.5], standard error of measurement (SEM) [ $SD*\sqrt{1-ICC}$ ], standard error of difference (S diff) [ $SEM*\sqrt{2}$ ], and reliable change index (RCI) [ $1.96*SEM\sqrt{2}$ ] (Charter, Reference Charter1997; Hays et al., Reference Hays, Brodsky, Johnston, Spritzer and Hui2005; Norman et al., Reference Norman, Sloan and Wyrwich2003; Wyrwich et al., Reference Wyrwich, Tierney and Wolinsky1999). In progressive order, the obtained values provide cutoff values with increasing confidence of meaningful change. For instance, a half-SD change indicates a moderate effect size whereas a change score greater than the RCI has <5% of lying within the SEM. The distribution-based values are only intended to aid interpretation of the target measures.
The threshold scores determined from anchor-based methods provide within-person assessment of meaningful change. Performance of the thresholds for each scale were evaluated by defining “Responders” and “Non-responders” in ADMET and ADMET 2 and metrics including sensitivity (probability of positive test result among true positives), specificity (probability of negative test result among true negatives), accuracy (ratio of correct predictions [true positives and true negatives] to all cases examined), precision (ratio of number of true positives to all positive results), and F1 score (harmonic mean of precision and sensitivity or $\small{2{*true}\ {positives} \over 2{*true}\ {positives}+{false}\ {positives}+{false}\ {negatives}}$ ) were determined.
Sensitivity analyses were performed to assess whether the severity of AD or apathy influenced the estimation of clinically significant change scores. AD severity was based on MMSE scores as described above. Apathy severity was based on the NPI – apathy subitem assessing severity as “mild,” “moderate,” or “severe.” As the severity score is multiplied with the frequency score, the change scores on the NPI-A were expected to differ by the level of apathy severity at baseline. The categorical variable for MMSE or apathy severity was included as an interaction term with the CGIC-A in linear mixed models as above, and the estimates of change scores were derived with subsets of the data with participants in each category for both variables. These analyses were done only in the larger ADMET 2 sample so that adequate data were available to assess interaction effects.
All analyses were carried out in R and significance on statistical tests was set to two-sided p-value of <0.05.
Results
Participants
The analyses included participants enrolled in ADMET and ADMET 2 whose characteristics are provided in Table 1. Participants in both studies were of similar age, had similar neuropsychiatric symptom burdens and levels of cognition as measured by the MMSE. Males comprised 38.3% of ADMET participants and 65.5% of ADMET 2 participants. In both studies, the majority of participants were on cholinesterase inhibitors; in ADMET, nearly two-thirds of participants were on memantine as compared to four in ten participants in ADMET 2.
Abbreviations: ADMET, Apathy in Dementia Methylphenidate Trial; AES-I, Apathy Evaluation Scale – Informant version; ChEI, Cholinesterase Inhibitor; DAIR, Dementia Apathy Interview and Rating; MMSE, Mini-Mental State Exam; MPH, methylphenidate; NPI, Neuropsychiatric Inventory; NPI-A, Neuropsychiatric Inventory Apathy subscale; SD, standard deviation.
^Combination pill of donepezil and memantine.
Distribution of anchor and target measures
The distribution of ratings on the CGIC-A and the scores on the NPI-A, DAIR, and AES-I are shown in Table 2. On the CGIC-A in ADMET and ADMET 2 across follow-up visits, 54.4% and 45.3% were rated as having no change in symptoms, and 41.1% and 39.3% were rated as having at least minimal improvement at the endpoint visits. On the NPI-A in ADMET and ADMET 2 across follow-up visits (v = 57 and v = 1084), the change score showed improvement in 73.7% and 72.7%, no change in 21.1% and 19.2%, and worsening in 5.2% and 8.1%. These numbers on the DAIR (v = 1051) were 72.6%, 3.7%, and 23.7%, and on the AES-I (v = 165) were 50.3%, 6.7%, and 43.0%. The NPI-A showed floor and ceiling values in a maximum of 17.5% and 15.0% participants in ADMET and 20.4% and 17.0% in ADMET 2. A maximum of 4.7% participants were rated at the lowest score on the DAIR, whereas none of the participants were rated at the lower or higher end of the score range on the AES-I. The visit level distributions of scores are shown in Supplementary table S1.
Abbreviations: ADMET, Apathy in Dementia Methylphenidate Trial; AES-I, Apathy Evaluation Scale – Informant version; DAIR, Dementia Apathy Interview and Rating; NPI, Neuropsychiatric Inventory; SD, standard deviation.
Thresholds for clinically meaningful change on the NPI-A, DAIR, and AES-I
The test-retest reliability of target measures was first ±*/ with ICC (2,1) with consistency definition, calculated between the scores at baseline and at the first follow-up visit (week 2 for AES-I and week 6 for NPI-A in ADMET or Month 1 in ADMET 2) among participants rated as having “No Change” on the CGIC-A. In ADMET, the ICC for NPI-A was 0.32 (95% CI = −0.05 to 0.59; n = 31; F(30,30) = 1.93, p = 0.04) and for AES-I was 0.88 (95% CI = 0.75 to 0.94; n = 27; F(26,26) = 15.1, p < 0.001). In ADMET 2, the ICC for NPI-A was 0.31 (95% CI = 0.12 to 0.48; n = 94; F(93,93) = 1.91, p = 0.001) and for DAIR was 0.48 (95% CI = 0.30 to 0.62; n = 91; F(90,90) = 2.81, p < 0.001).
The responsiveness of target measures to assess change was determined by Spearman correlations between CGIC-A and the change score on target measures. The correlation between CGIC-A and (a) NPI-A (ADMET) was 0.24 (p = 0.07), (b) AES-I was 0.39 (p < 0.001), (c) NPI-A (ADMET 2) was 0.51 (p < 0.001), and (d) DAIR - 0.38 (p < 0.001). The thresholds for within-person change score were estimated for each level on the CGIC-A across all visits (Fig. 1, Supplementary table S2). For the NPI-A, the threshold change score for minimal improvement was −4.4 points (95% CI: −4.0 to −4.8, visits = 303) in ADMET 2 and −3.5 points (95% CI: −2.0 to −5.0, visits = 48) in ADMET. The corresponding score for DAIR was −0.56 points (95% CI: −0.47 to −0.65, visits = 303), and AES-I was −3.2 points (95% CI: −0.9 to −5.4, visits = 48).
Figure 2 shows the empirical cumulative distribution frequencies (eCDF) of target measures for each level of the CGIC-A across all visits along with confidence intervals bounds. A greater separation between the eCDF of CGIC-A level suggests better discrimination between clinical impressions of change at that corresponding change score on the target measure. Figure 2C shows that a change score (x-axis) of -5 points on the NPI-A was observed in 54% of ADMET 2 visits that were considered to show “Minimal improvement” and 30% of the visits where “No change” was indicated on the CGIC-A. Similarly, a 0.56 points improvement on the DAIR was found in 53 and 30% visits indicated as “Minimal improvement” and “No change,” respectively (Fig. 2D).
Distribution-based analysis
The variance in target measures at baseline among participants was used to determine metrics of minimal clinically important change; these included the half-SD, SEM, Sdiff, and RCI. The estimated values for each metric for the NPI-A (ADMET) were – 1.2, 1.2, 1.8, and 3.5 points; for the AES-I – 6.0, 4.2, 5.9, and 11.6 points, and for the NPI-A with ADMET 2 study data were 1.2, 1.2, 1.8, and 3.6 points; and for the DAIR – 0.2, 0.2, 0.3, and 0.5 points.
Application of proposed thresholds
The performance of the estimated thresholds were evaluated in the empirical data from ADMET and ADMET 2. In the evaluation, the threshold scores derived from the model were rounded to the closest clinically obtainable scores on the scales – NPI-A: −4, DAIR: −0.56, AES-I: −3. The number of participants who improved or not as per the CGIC-A ratings and as per the estimated MCID on each target measure (same or larger change than the estimated threshold) are shown in Supplementary table S3. Figure 3 shows metrics of classification performance. The accuracy (or positive predictive value) and precision of the NPI-A (ADMET 2) and DAIR thresholds were found to be around 66% and ∼60%.
Influence of disease and symptom severity on clinical significance thresholds
In sensitivity analyses, MMSE categories showed an interaction effect with CGIC-A levels with NPI-A but not DAIR as the dependent variable (Supplementary table S4). Minimal improvement on the CGIC-A was associated with larger change score (least square mean) among those with MMSE >= 20 compared to those with MMSE score of 10–19 (B = 0.95, t = −2.3, p = 0.02) (Supplementary table S4). The thresholds for each CGIC-A level on both scales are shown in Supplementary table S5 and Supplementary figure S1. The threshold score for minimal improvement among participants with MMSE >= 20 was −4.8 (NPI-A) and −0.66 (DAIR), and for MMSE of 10–19 was −3.6 (NPI-A) and −0.60 (DAIR). Similar analyses assessing NPI-A severity categories showed an interaction effect with CGIC-A for both NPI-A and DAIR (supplementary table S6). Compared to those with mild apathy, those with severe apathy showed a larger decrease than those with mild apathy for minimal improvement on the NPI-A (B = -1.59, t = −2.13, p = 0.03) and the DAIR (B = −4.22, t = −3.18, p = 0.001). On the DAIR, minimal worsening was also associated with a larger decrease in change scores among those with moderate (B = −0.33, t = −2.18, p = 0.03) and severe (B = −3.94, t = −2.35, p = 0.02) apathy compared to those with mild apathy. The estimated thresholds increased on both scales with an increase in baseline severity of apathy Supplementary table S7 and Supplementary figure S2).
Discussion
This study estimated clinically meaningful within-person change scores on three apathy scales with anchor- and distribution-based methods utilizing data from two multisite, randomized double-blind placebo controlled trials of methylphenidate for treating apathy in AD. Using anchor-based methods, the MCIDs were at least a 4 point decrease on the NPI-A, 0.5 point decrease on the DAIR, and 3 point decrease on the AES-I. These thresholds were supported by distribution-based methods that yielded similar values on the reliable change index. Additional sensitivity analyses suggest that baseline AD severity and apathy severity may affect the estimated thresholds. Specifically, minimal improvement on the CGIC-A among those with greater severity of AD or apathy was associated with a larger change score on both the NPI-A and DAIR than those with mild AD or mild apathy. While the proposed thresholds will help evaluate treatments for apathy using apathy scales and may improve design of clinical trials for apathy, the metrics evaluating the performance of the thresholds indicate a need to consider additional scales that may be better correlated with the anchor measure.
This study included three target measures of apathy that are rated by either a trained rater following an interview with the care partner (NPI-A, AES-I) or by a clinician following an interview with the care partner (DAIR), which provides thresholds for several target outcomes for future studies. The estimated thresholds were consistent between the two trials (NPI-A) and with distribution-based measures (NPI-A and DAIR). These thresholds are expected to help assess within-person clinically meaningful improvement in apathy and evaluate efficacy of treatments (Assunção et al., Reference Assunção, Sperling, Ritchie, Kerwin, Aisen, Lansdall, Atri and Cummings2022; Webster et al., Reference Webster, Groskreutz, Grinbergs-Saull, Howard, O’Brien, Mountain, Banerjee, Woods, Perneczky, Lafortune, Roberts, McCleery, Pickett, Bunn, Challis, Charlesworth, Featherstone, Fox, Goodman, Jones, Lamb, Moniz-Cook, Schneider, Shepperd, Surr, Thompson-Coon, Ballard, Brayne, Burns, Clare, Garrard, Kehoe, Passmore, Holmes, Maidment, Robinson, Livingston and Quinn2017). As apathy is a predictor of several poor outcomes in people with AD, these thresholds may also aid assessment of apathy in trials where it is not a primary outcome. Neuropsychiatric symptoms are now recognized as a key feature of AD (Cummings, Reference Cummings2021), with a negative impact on affected persons and care partners (Naglie et al., Reference Naglie, Hogan, Krahn, Black, Beattie, Patterson, MacKnight, Freedman, Borrie, Byszewski, Bergman, Streiner, Irvine, Ritvo, Comrie, Kowgier and Tomlinson2011). As disease-modifying therapies show clinical benefits, these thresholds can help determine whether burdensome symptoms like apathy also improve with treatment at the individual level.
The implementation of these thresholds in prospective studies may need to consider the severity of apathy and stage of the disease. The inclusion criteria for the two trials required the presence of clinically significant apathy (minimum NPI-A score of 4 points at baseline) that can be expected to affect the range of change scores that were included in the analysis. However, 94% of participants in ADMET 2 met the diagnostic criteria for apathy, suggesting that the thresholds were estimated in those with clinically relevant apathy (Lanctôt et al., Reference Lanctôt, Scherer, Li, Vieira, Coulibaly, Rosenberg, Herrmann, Lerner, Padala, Brawman-Mintzer, van Dyck, Porsteinsson, Craft, Levey, Burke and Mintzer2021). Similarly, inclusion criteria in both studies aimed to include participants with mild to moderate AD. Although MMSE is a relatively crude measure of AD severity, the MCID on the NPI-A differed among those stratified into higher and lower MMSE. Moreover, as the prodromal stages of AD were not included in both studies, the MCID in those individuals may differ. As the frequency and severity of apathy increases with disease progression, the distribution of change scores included in the analyses may be influenced by disease stage of the sample. For example, the MCID for participants with higher MMSE scores was greater than the MCID for the complete sample. In other words, for a change to be considered clinically relevant, a larger improvement is needed among individuals with better cognition than those with worse cognition. Finally, while the current study included change scores at 2 weeks until 6 months (only for the NPI-A), providing a wide time range for the change scores, the duration of the intervention needs to be considered before applying these thresholds (Mintzer et al., Reference Mintzer, Lanctôt, Scherer, Rosenberg, Herrmann, van Dyck, Padala, Brawman-Mintzer, Porsteinsson, Lerner, Craft, Levey, Burke, Perin, Shade, Michalak, Ni, Good, Mecca, Salem‐Spencer, Keltz, Morales, Clark, Williams, Kindy, Freeman, Jamil, Schultz, Sami, Padala, Parkes, Lah, Vaughn, Hales, Rapoport, Gallagher, Li, Sandra, Vieira, Ruthirakuhan and Babani2021; Padala et al., Reference Padala, Padala, Lensing, Ramirez, Monga, Bopp, Roberson, Dennis, Petty, Sullivan and Burke2018a; Rosenberg et al., Reference Rosenberg, Lanctôt, Drye, Herrmann, Scherer, Bachman and Mintzer2013).
The results also show a limited correlation between within-person change in clinical impression and change on apathy rating scales. The recommended value for correlation between an anchor and a target measure is greater than or equal to 0.3 (Cohen, Reference Cohen1992; Hays et al., Reference Hays, Brodsky, Johnston, Spritzer and Hui2005; Revicki et al., Reference Revicki, Hays, Cella and Sloan2008). In addition, the test-retest correlations of the scales were relatively weak, especially for the NPI-A, suggesting that informants considered that apathy symptoms had changed among those participants who were not considered to have clinically meaningful change in symptoms by clinicians (who were blind to the ratings on the apathy scales). Although both scales (NPI-A and CGIC-A) showed group-level improvement in ADMET 2, the weak correlations indicate a discrepancy between informant and clinician rating. Considering that both types of correlations obtained in this study were on the lower end and that the metrics for performance of the thresholds ranged from 60 to 65%, the apathy rating scales included in this study may be limited in assessing within-person change in apathy symptoms. The NPI-A in particular also had substantial proportion of values at lower or upper end of the scoring range, suggesting that the scoring range may limit its ability to detect change in apathy symptoms. Continuous measures such as the DAIR may provide a better estimate of change in symptoms (as suggested by Fig. 2D); however, the performance metrics of the DAIR were similar to the NPI-A (Figure 3) and no treatment effect was detected with the DAIR in ADMET 2, unlike the NPI-A. Finally, while these rating scales are widely used, they do not account for the multidimensional nature of apathy (Le Heron et al., Reference Le Heron, Apps and Husain2018). The new consensus-based diagnostic criteria for apathy in neurocognitive disorders require that diminished function be present in at least two of the three dimensions of apathy (interest, initiative, and emotional expression/response) (Miller et al., Reference Miller, Robert, Ereshefsky, Adler, Bateman, Cummings, DeKosky, Fischer, Husain, Ismail, Jaeger, Lerner, Li, Lyketsos, Manera, Mintzer, Moebius, Mortby, Meulien, Pollentier, Porsteinsson, Rasmussen, Rosenberg, Ruthirakuhan, Sano, Zucchero Sarracini and Lanctôt2021). Thus, other rating scales such as the 12-item apathy sub-scale NPI-Clinician (de Medeiros et al., Reference de Medeiros, Robert, Gauthier, Stella, Politis, Leoutsakos, Taragano, Kremer, Brugnolo, Porsteinsson, Geda, Brodaty, Gazdag, Cummings and Lyketsos2010) may better assess apathy symptoms in line with the diagnostic criteria and may need to be considered to measure within-person change in apathy symptoms.
As noted above, the limitations of this study include, first, the design factors of the parent studies including participant inclusion criteria and study duration. However, those studies included participants with clinically significant apathy who were assessed at multiple visits over a maximum of six months, providing a broad time duration to assess within-person change in apathy. Second, the assessments were performed across several sites by various raters, which may affect the estimated thresholds as inter-rater reliability could not be assessed; however, on the other hand, the thresholds may be generalizable to real-world use. Analysis to assess inter-rater reliability was not possible due to lack of data. However, the large cohort that was collected from multiple sites and representative of patient populations is a strength of this study. Third, the replication of results was only possible for the NPI-A, which alone was measured in both trials. Fourth, the estimates for the AES-I and for the following CGIC-A levels “Moderate worsening,” “Minimal worsening,” and “Marked improvement” were derived from a limited number of participants. As such, those estimates must be interpreted with caution. Fifth, while anchor-based methods for assessing meaningful change are favored, the CGIC-A may itself be influenced by the subjectivity of clinical raters; care partners who spend more time with the person may have better insight to identify symptomatic changes and meaningful improvement at the individual level. Such a measure was not available for this study and should be considered in future trials.
Conclusions
The current study estimated threshold change scores for within-person meaningful change on the NPI-A, DAIR, and AES-I using anchor- and distribution-based measures. The estimated thresholds were consistent across studies and methods and will allow for assessing the benefit of interventions for individual participants in clinical trials for apathy in AD. Given the recently published consensus criteria for apathy in AD (Miller et al., Reference Miller, Robert, Ereshefsky, Adler, Bateman, Cummings, DeKosky, Fischer, Husain, Ismail, Jaeger, Lerner, Li, Lyketsos, Manera, Mintzer, Moebius, Mortby, Meulien, Pollentier, Porsteinsson, Rasmussen, Rosenberg, Ruthirakuhan, Sano, Zucchero Sarracini and Lanctôt2021), symptoms from the revised criteria could be better “mapped” to apathy scales. This has been recently done for agitation in AD resulting in scales that are more sensitive to change (De Mauleon et al., Reference De Mauleon, Ismail, Rosenberg, Miller, Cantet, O'Gorman, Vellas, Lyketsos and Soto2021). Improved scales may lead to better clinical trial design and thus help us develop and validate improved treatments for apathy in AD.
Conflicts of interest
Dr Mintzer reported being an advisor for Praxis Bioresearch and Cerevel Therapeutics outside the submitted work. Dr Lanctôt reported grants from the National Institutes of Health during the conduct of the ADMET 2 study, the Bernick Chair in Geriatric Psychopharmacology from Sunnybrook Research Institute and the Universisty of Toronto, and personal fees for serving on the advisory boards of BioXcel Therapeutics, Cerevel Therapeutics, Eisai, Exciva, Kondor Pharma, Lundbeck Otsuka, Novo Nordisk, and Sumitomo outside the submitted work. Dr Scherer reported grants from Johns Hopkins University during the conduct of the ADMET 2 study. Dr Rosenberg has received research grants from the National Institutes of Aging, Alzheimer’s Clinical Trials Consortium, Richman Family Precision Medicine Center of Excellence on Alzheimer’s Disease, Eisai, Functional Neuromodulation, and Lilly; honoraria from GLG, Leerink, Cerevel, Cerevance, Bioxcel, Sunovion, Acadia, Medalink, Novo Nordisk, Noble Insights, TwoLabs, Otsuka, Lundbeck, Acadia, MedaCorp, ExpertConnect, HMP Global, Synaptogenix, and Neurology Week, all outside the submitted work. Dr Herrmann reported grants from the National Institute on Aging during the conduct of the ADMET 2 study. Dr van Dyck reported grants from the National Institute on Aging during the conduct of the ADMET 2 study; personal fees for consulting for Roche, Eisai, Cerevel, and Ono Pharmaceutical outside the submitted work; and grants from Roche, Eisai, Eli Lilly and Company, Biogen, Biohaven Pharmaceuticals, Novartis, Janssen, Genentech, UCB, Cerevel, and Merck outside the submitted work. Dr Padala reported grants from Office of Research Development, Department of Veterans Affairs and National Institutes of Health during the conduct of the ADMET 2 study. Dr Brawman-Mintzer reported grants from the National Institute on Aging during the conduct of the ADMET 2 study. Dr Porsteinsson reported grants from the National Institutes of Health during the conduct of the ADMET 2 study; personal fees for serving on the data and safety monitoring boards of Acadia Pharmaceuticals, Cadent Therapeutics, Functional Neuromodulation, Novartis, and Syneos outside the submitted work; grants from Avanir Pharmaceuticals, Biogen, Eisai, Eli Lilly and Company, Genentech/Roche, Biohaven, Athira, Alector, Vaccinex, and Novartis outside the submitted work; and personal fees from Avanir, Biogen, Eisai, Alzheon, MapLight Therapeutics, Premier Healthcare Solutions, Sunovion, IQVIA, and Ono Pharmaceuticals outside the submitted work.
Funding
Funding for ADMET was provided by the National Institute on Aging (R01 AG033032-01 and 1 K08 AG029157-01A1), and for ADMET 2 was provided by the National Institute on Aging (grant R01 AG046543). The CAN-TAP-TALENT is funded by the Canadian Institutes of Health Research (CIHR) – FRN 184898. The authors wish to acknowledge the CAN-TAP-TALENT for its role in supporting the completion of this CAN-TAP-TALENT Research Project. The sponsors had no role in data collection, analysis, or drafting of the manuscript.
Description of author(s)’ roles
ST and KLL conceived the study; NH, JP, PBR, AJL, JM, PRP, OBM, CHvD, APP, SC, AL, DS, and KLL acquired the data; ST analyzed the data; ST and KLL drafted the manuscript; NH, JP, PBR, AJL, JM, PRP, OBM, CHvD, APP, SC, AL, DS, and KLL were major contributors to critical revision of the manuscript for important intellectual content. All authors read and approved the final manuscript.
Acknowledgements
Not applicable.
Availability of data and materials
The data that support the findings of this study are available from the authors but restrictions apply to the availability of these data. The data are available from the corresponding author on request after approval of a proposal with a signed data sharing agreement.
Ethics approval and consent to participate
The ADMET and ADMET 2 studies’ participants were recruited at 3 (2 US clinics and 1 Canadian clinic) and 10 (9 US clinics and 1 Canadian clinic) centers, respectively, specializing in dementia care where they or their legally authorized representatives and the primary caregiver for the participant provided informed consent. The study adhered to the Declaration of Helsinki and was approved by the ethical review boards of each site.
Consent for publication
Not applicable.
Supplementary material
For supplementary material accompanying this paper visit https://doi.org/10.1017/S1041610224000711