Introduction
Depression and anxiety are leading causes of disability worldwide (GBD 2019 Mental Disorders Collaborators, 2022). Despite effective interventions existing, the current treatment success leaves room for improvement (Cuijpers, van Straten, Bohlmeijer, Hollon, & Andersson, Reference Cuijpers, van Straten, Bohlmeijer, Hollon and Andersson2010). Much of the research efforts evaluating treatments to date have focused on sum scores. When sum scores are used, individual symptoms are assumed to be equivalent in value as common indicators of a disorder. Subsequently, individual symptoms have received comparably little attention (Fried & Nesse, Reference Fried and Nesse2015a). However, there may be important insights to be gained from better understanding individual symptoms and how they respond over the course of treatment (Fried & Nesse, Reference Fried and Nesse2015a).
Depression and anxiety are heterogenous disorders. For example, the Diagnostic and Statistical Manual of Mental Disorders (DSM-5) lists nine depressive symptoms (APA, 2013). In order to meet the criteria for a diagnosis of depression, patients have to meet at least five of the nine symptom criteria, including at least one of the core symptoms – depressed mood or diminished interest/pleasure (APA, 2013). These criteria include compounded symptoms that are grouped together such as worthlessness or inappropriate guilt as well as symptoms that lie on the opposite spectrum such as psychomotor agitation or retardation. Given the multitude of possible symptom combinations for depression alone, it is perhaps unsurprising that considerable symptom profile variability has been reported with few exact symptom combinations being shared between patients (Fried & Nesse, Reference Fried and Nesse2015b). When considering individual symptoms, they do not appear be to equal, with evidence suggesting that individual symptoms may vary in the degree to which they contribute to functional impairment (Fried & Nesse, Reference Fried and Nesse2014), their risk factors (Fried, Nesse, Zivin, Guille, & Sen, Reference Fried, Nesse, Zivin, Guille and Sen2014), and their heritability (Jang, Livesley, Taylor, Stein, & Moon, Reference Jang, Livesley, Taylor, Stein and Moon2004). Furthermore, they may differ in their response to different treatment (Bekhuis et al., Reference Bekhuis, Schoevers, de Boer, Peen, Dekker, Van and Boschloo2018; Boschloo et al., Reference Boschloo, Bekhuis, Weitz, Reijnders, DeRubeis, Dimidjian and Cuijpers2019a) and in terms of their association with clinical outcomes (O'Driscoll et al., Reference O'Driscoll, Buckman, Fried, Saunders, Cohen, Ambler and Pilling2021). High variability in the presence of symptoms, combined with evidence suggesting that symptoms are not interchangeable implies that rather than being clearly bound disorders, depression and anxiety have a complex and heterogenous presentation. Sum scores, which assume equivalence of symptoms, may therefore hide important differences.
Outcome research for depression and anxiety is further complicated by the fact that psychotherapies such as cognitive-behavioural therapy (CBT) are often considered to be a ‘black box’ (Huibers, Lorenzo-Luaces, Cuijpers, & Kazantzis, Reference Huibers, Lorenzo-Luaces, Cuijpers and Kazantzis2021). They commonly target a broad group of symptoms (Eronen, Reference Eronen2020) and identifying how treatments works, and for whom, is arguably the main scientific challenge facing depression and anxiety outcome researchers (Carey, Griffiths, Dixon, & Hines, Reference Carey, Griffiths, Dixon and Hines2020; Huibers et al., Reference Huibers, Lorenzo-Luaces, Cuijpers and Kazantzis2021; Paul, Reference Paul1967). Despite multiple putative mechanisms being proposed, that reflect both ‘active ingredients’ of specific psychotherapies and non-specific effects shared across psychotherapies, the evidence remains ambiguous with no universal consensus concerning specific mechanisms of change (Carey et al., Reference Carey, Griffiths, Dixon and Hines2020; Huibers et al., Reference Huibers, Lorenzo-Luaces, Cuijpers and Kazantzis2021).
Exploring how symptoms respond to treatment, rather than sum scores alone, may lead to important insights and provide first steps toward precision psychiatry. If specific therapeutic interventions could be mapped onto change in specific symptoms it may begin to explain how psychotherapies work and for whom. More pragmatically perhaps, understanding possible symptom-specific effects of treatments could also potentially lead to a more personalised matching of symptoms profiles to treatments as well as having the potential to explain heterogeneity in clinical outcomes. Some treatments may cause side effects that increase some of the very symptoms used to measure depression, such as antidepressant medication and the possible side effect of sleep disturbances (Wichniak, Wierzbicka, Walęcka, & Jernajczyk, Reference Wichniak, Wierzbicka, Walęcka and Jernajczyk2017). These could potentially mask or dilute specific benefits on other symptoms when sum scores are used (Fried & Nesse, Reference Fried and Nesse2015a).
Overall, there is a strong argument to be made around exploring symptom-specific effects. The evidence suggests that CBT is an effective treatment for depression and anxiety (Cuijpers, Cristea, Karyotaki, Reijnders, & Huibers, Reference Cuijpers, Cristea, Karyotaki, Reijnders and Huibers2016). However, most evaluations of CBT focus on sum scores. As such, less is known about the symptom-specificity of CBT. In the present study, we aim to explore how individual symptoms of depression and generalised anxiety behave across a course of CBT in an observational, retrospective cohort by examining symptom trajectories and how they compare to one another.
Methods
Settings
Improving Access to Psychological Therapies (IAPT) is a national programme that delivers psychological therapy for depression and anxiety across England in primary care settings (Clark, Reference Clark2011; Clark et al., Reference Clark, Canvin, Green, Layard, Pilling and Janecka2018). IAPT offers a variety of psychological therapies, including both low-intensity therapy (LIT) and high-intensity therapy (HIT) (Clark, Reference Clark2011; Clark et al., Reference Clark, Canvin, Green, Layard, Pilling and Janecka2018). IAPT has implemented routine outcome monitoring, where detailed information is gathered about patients, their treatment, and their clinical outcomes (Clark, Reference Clark2011). These data are collected on a session-by-session basis to increase complete-case recording.
Sample
The clinical records for the present study were obtained from ten IAPT practices across the southwest of England and London, who consented to share their data via Mayden, the providers of a patient management software used in IAPT, for the purposes of this research. The data from each participating services were extracted and fully anonymised by Mayden before sharing with us for processing and analysis. The present analysis contained data for referrals being referred from the year 2014 to mid-2019 – earlier years were excluded due to operational issue early in IAPT such as missing or inappropriate diagnostic labels which lead to poorer outcomes as a result of inappropriate matching of treatment protocols to clinical needs of the patient (Saunders et al., Reference Saunders, Cape, Leibowitz, Aguirre, Jena, Cirkovic and Buckman2020b).
Referrals were included if they had received high-intensity CBT for depression or generalised anxiety. Patients receiving treatment for depression and generalised anxiety were identified by their diagnostic labels of having a depressive episode, recurrent depressive disorder, or generalised anxiety disorder as well as having their primary outcomes measures as the PHQ-9 (Kroenke, Spitzer, & Williams, Reference Kroenke, Spitzer and Williams2001) and GAD-7 (Spitzer, Kroenke, & Williams, Reference Spitzer, Kroenke and Williams2006), which are used to monitor treatment process and evaluate treatment response – other anxiety measures indicate treatment for different anxiety disorders. Treatment was defined as CBT if all the recorded treatment labels were labelled as CBT. Referrals receiving other therapies in addition to CBT were excluded due to difficulties isolating the effects of CBT. In order to examine the effect of CBT on specific questionnaire items, only patients who had at least one full set of item-level questionnaire scores recorded were included in the analyses. Due to the observational nature of the data no fixed amount of treatment is provided. To counteract the issue of unequal treatment doses, a minimum treatment dose was defined as eight appointments – the overall average number of delivered appointments in IAPT (NHS Digital, 2021). Eight appointments may also be more likely to at least partially capture some of the later responses which are overserved for some subgroups of patients in therapy which can occur approximately after six appointments (Saunders et al., Reference Saunders, Buckman, Cape, Fearon, Leibowitz and Pilling2019). As such, all subsequent analyses examine the first eight appointments from each referral that had at least eight appointments. To minimise the impact of previous treatment effects, each patient's first referral, that met the criteria above, was used.
Intervention
CBT was delivered by mental health professionals trained in accordance with the national curriculum (Department of Health, 2019). CBT is a time-limited, structured, and problem-focused psychological therapy (Fenn & Byrne, Reference Fenn and Byrne2013). CBT explores the links between cognitions, emotions, and behaviours and supports patients in identifying and modifying less helpful cognitions and behaviours and develop alternative, more adaptive ones (Fenn & Byrne, Reference Fenn and Byrne2013).
Measures
Patient Health Questionnaire 9 item (PHQ-9)
The PHQ-9 is a 9-item self-report questionnaire that assesses the severity of depressive symptoms over the past 2 weeks (Kroenke et al., Reference Kroenke, Spitzer and Williams2001). Each item is rated on a 4-point Likert scale ranging from 0 (‘not at all’) to 3 (‘nearly every day’). Patients are asked to rate the severity of symptoms relating to: (1) anhedonia, (2) low mood/hopelessness, (3) sleeping problems (4) tired/low energy (5) appetite, (6) guilt/worthlessness, (7) concentration, (8) psychomotor retardation/agitation, and (9) suicidal thoughts.
Generalised Anxiety Disorder 7 item scale (GAD-7)
The GAD-7 is a 7-item self-report questionnaire assesses the severity of generalised anxiety over the past 2 weeks (Spitzer et al., Reference Spitzer, Kroenke and Williams2006). Each item is rated on a 4-point Likert scale ranging from 0 (not at all) to 3 (nearly every day). Patients are asked to rate the severity of symptoms relating to: (1) nervous/anxious, (2) uncontrollable worry, (3) too much worry, (4) trouble relaxing, (5) restlessness, (6) irritability, and (7) fear.
Statistical analysis
All analyses were performed in the R statistical programming language (R Core Team., 2013). We used a generalised logistic mixed model to examine how items of the PHQ-9 and GAD-7 responded over the course of CBT treatment appointments using the glmer function from the ‘lme4’ package (Bates, Maechler, Bolker, & Walker, Reference Bates, Maechler, Bolker and Walker2014). To simplify modelling, the item scores were dichotomised to model the probability of being symptom free, where a score of 0 denotes being symptom free and scores of 1–3 denotes having symptoms. The default call for glmer() uses Gaussian-Hermite quadrature which we used to estimate the model parameters but is computationally expensive. To obtain the confidence intervals we used bootstrap computation with the approximation nAGQ = 0 to make the calculation feasible.
Two separate models were built for patients being treated for depression and generalised anxiety, using the PHQ-9 and GAD-7 respectively as outcome measures. All dichotomised item scores for each questionnaire were specified as the outcome, with the appointments and question item being specified as predictors with an interaction term. As such, all PHQ-9 items were analysed in one model and all GAD-7 items being analysed in a separate model. The models were adjusted for baseline characteristics: gender, age, ethnicity, employment status, Index of Multiple Deprivation (McLennan et al., Reference McLennan, Noble, Noble, Plunkett, Wright and Gutacker2019), disability status, long-term health condition status, diagnosis, sum score baseline PHQ-9, sum score baseline GAD-7, sum score baseline Work and Social Adjustment Scale (Mundt, Marks, Shear, & Greist, Reference Mundt, Marks, Shear and Greist2002) medication status, referral number, referral source, service and year. IAPT services collect various information about patients at baseline in a core dataset, with some services collecting additional information. Baseline characteristics were chosen on the data availability such as variables that were most consistently recorded across services, data quality of the recorded variables, and informed by considered variables in previous literature using IAPT records (see for example Delgadillo & Gonzalez Salas Duhne, Reference Delgadillo and Gonzalez Salas Duhne2020; Delgadillo, Moreea, & Lutz, Reference Delgadillo, Moreea and Lutz2016; Green et al., Reference Green, Honeybourne, Chalkley, Poots, Woodcock, Price and Green2015; Saunders, Buckman, & Pilling, Reference Saunders, Buckman and Pilling2020a). Missing baseline covariates were singly imputed using random forest with ‘missForest’ (Stekhoven & Bühlmann, Reference Stekhoven and Bühlmann2012). The cost of computation and the complexity of the model made the preferred option of multiple imputations impractical. Single imputation is adequate here because (a) the small amount of missing baseline data (maximum 7.4%), (b) evidence that missForest is effective in similar scenarios (Stekhoven & Bühlmann, Reference Stekhoven and Bühlmann2012; Waljee et al., Reference Waljee, Mukherjee, Singal, Zhang, Warren, Balis and Higgins2013) and (c) satisfactory out-of-bag error suggests the imputation was successful with a normalised root mean square error of 0.39 for continuous variables and proportion of falsely classified of 0.15 for categorical variables. Random intercepts were specified in all models for referral id and appointment id to account for repeated observations. As there is no natural reference group on questionnaires which would provide an appropriate comparator to measure all other items against, sum coding was used. Sum coding allows the trajectory of each individual question to be compared to the mean trajectory of all other questions. Thus, proving a comparison of how specific trajectories compare relative to all others. Variability estimates for all questions relative to other questions are provided, except the last question of each questionnaire as these are estimated by the inverse of the sum of the log odds of all other questions. Longitudinal models were used to predict the probability of being symptom free on the questionnaire items across CBT treatment appointments. Predictions were made for a hypothetical ‘average’ patient with continuous variables fixed at the mean and categorical covariate set to the most frequently occurring group. The first eight appointments were used as a basis for predictions of up to 20 and 15 appointments to depict how item trajectories might respond for the NICE-recommend durations of treatment for depression and generalised anxiety respectively (NICE, 2009, 2011). Bootstraps with 5000 simulations were used to approximate 95% confidence intervals for predictions. The ‘ggplot2’ package was used to visualise questionnaire item trajectories (Wickham, Reference Wickham2011).
Results
Baseline characteristics
Of the 5306 referrals included in the main analysis, the majority of patients were female, White, with an average age of 38 years, experiencing severe depression and moderate/severe anxiety (Table 1).
All PHQ-9 and GAD-7 symptoms appeared to improve across CBT appointments. However, the rate at which they improved relative to the average rate of improvement across all questionnaire items varied.
Patient Health Questionnaire-9
We found the strongest evidence that PHQ-9 items 2 (low mood/hopelessness) and 6 (guilt/worthlessness) improved fastest across CBT appointments in comparison to the average rate of all questionnaire items (Table 2). Patients had 7% higher odds of having no low mood/hopelessness with every appointment compared to all other depression symptoms, or 56% across 8 appointments. Patients had 6% higher odds of having no guilt/worthlessness with every appointment compared to all other depression symptoms, or 48% across 8 appointments. We also found the strongest evidence that PHQ-9 items 3 (sleeping problems), 5 (appetite) and 8 (psychomotor retardation/agitation) improved at a slower rate across CBT appointments in comparison to the average response of all other questionnaire items. Patients had 6, 7, and 6% lower odds of having no symptoms with every additional CBT appointment on sleeping problems, appetite and psychomotor retardation/agitation respectively, compared to all other depression symptoms; or 48, 56 and 48% across 8 appointments. Some evidence suggested that the odds of being symptom free where higher for PHQ-9 item 1 (anhedonia) and PHQ-9 item 4 (tired/low energy); however, these associations were weaker. PHQ-9 item 7 (concentration) and PHQ-9 item 9 (suicidal thoughts) did not appear to differ in their rate of improvement compared to the average rate.
Adjusted for main effects: age, gender, ethnicity, employment status, Index of Multiple Deprivation, disability status, long-term health conditions, diagnosis (for depression only), baseline Patient Health Questionnaire-9, baseline Generalised Anxiety Disorder Scale-7, baseline Work and Social Adjustment Scale, medication status, referral number, referral source, service, year, question, appointment as well as random effects of referral id and appointment id. p values for the reference group of the sum coding can not be estimated.
Figure 1 shows that the probability of being symptom free has different starting points for the PHQ-9 items. The predicted probability of being symptom free is much higher at baseline for PHQ-9 items 8 (psychomotor retardation/agitation) and 9 (suicidal thoughts). This is a result of the distribution of the response scores on each item at baseline, with items 8 (psychomotor retardation/agitation) and 9 (suicidal thoughts) having a higher frequency of 0 scores at baseline and through treatment.
Generalised Anxiety Disorder scale-7
We found the strongest evidence that GAD-7 items 2 (uncontrollable worry) and 3 (too much worry) improved fastest across CBT in comparison to the mean response of all questionnaire items (Table 2). For both items, patients had 9% higher odds of having no symptoms compared to the average of all other anxiety symptoms, or 72% across 8 appointments. We also found the strongest evidence that GAD-7 items 5 (restlessness) and 6 (irritability) improved slower across CBT appointments in comparison to all other questionnaire items. Patients were 11% and 7% less likely to have no symptoms of restlessness and irritability respectively, compared to all other anxiety symptoms; or 88% and 56% across 8 appointments. There was some evidence to suggest that the odds of being symptom free were higher for GAD-7 item 1 (nervous/anxious); however, this association was weak. While no variability estimates are computed for the last question, it appears that the odds of being symptom free on the GAD-7 item 7 (fear) improved at a slower pace compared to all other questions. GAD-7 item 4 (trouble relaxing) did not appear to different in the rate of improvement compared to the average rate.
Figure 2 shows that the probability of being symptom free has different starting points for the GAD-7 items. The predicted probability of being symptom free is somewhat higher at baseline for GAD-7 items 5 (restlessness), 6 (irritability), and 7 (fear). This is a result of the distribution of the response scores on each item at baseline, these items having a somewhat higher frequency of 0 scores at baseline and throughout treatment.
We performed a sensitivity analysis in a subset of patients who had all their item-level scores recorded and found effects of a similar magnitude and direction (online Supplementary Tables S5 and S6). We further examined whether the slope of the trajectories for each item varied by baseline medication status and found no evidence of a differential improvement across items by medication status (PHQ-9: p = 0.769 and GAD-7: p = 0.561). Furthermore, we assessed whether there where inherent differences in baseline patient characteristics amongst those who had item-level data recorded and those who did not but found no evidence to suggest that this was the case, with all standardised mean differences between baseline characteristics falling < 0.25 (Panos & Mavridis, Reference Panos and Mavridis2020; Rubin, Reference Rubin2001).
Discussion
We examined the trajectories of individual depression and anxiety symptoms as measured by two widely used measures, the PHQ-9 and the GAD-7, in a large, retrospective, observational cohort of patients receiving CBT. We found evidence to suggest low mood/hopelessness and guilt/worthlessness on the PHQ-9, as well as uncontrollable worry and too much worry on the GAD-7 improved at a faster rate relative to other symptoms. We found that sleeping problems, appetite, and psychomotor agitation/retardation on the PHQ-9 and restlessness and irritability on the GAD-7 improved at a slower rate compared to the average response of all other symptoms.
Worry is clinically central to generalised anxiety (APA, 2013; NICE, 2011) and working with worry and tolerance of uncertainty forms a critical part of treatment protocols for generalised anxiety disorder (Department of Health, 2019; University College London, n.d.). Low mood is considered one of the core clinical features of depression (APA, 2013; NICE, 2009) and has been shown be the biggest contributor of depressive symptoms to functional impairment explaining ~ 20% of the variance, with self-blame accounting for ~ 6% (Fried & Nesse, Reference Fried and Nesse2014). Various cognitive and behavioural components are delivered during CBT for depression, commonly starting with behavioural work in severe depression (Department of Health, 2019). However, working with negative automatic thoughts and themes of guilt or self-blame are also emphasised (Department of Health, 2019). Working with hopelessness may also be relevant to clinicians due to its association with clinical risk (McMillan, Gilbody, Beresford, & Neilly, Reference McMillan, Gilbody, Beresford and Neilly2007). Our research suggests that some of the symptoms that traditionally feature strongly in the clinical conceptualisation and treatment protocols improved relatively faster than other symptoms. However, sleeping problems, poor appetite or overeating, and psychomotor agitation/retardation appeared to improve to a lesser degree in the present research yet account for ~ 4, 11 and 8% of impairment respectively and may still be relevant symptoms to patients that may require additional attention (Fried & Nesse, Reference Fried and Nesse2014).
Our findings that depressive symptoms like low mood/hopelessness and guilt/worthlessness as well generalised anxiety symptoms such as uncontrollable worry and too much worry change comparatively the most during early treatment resonate with findings from network analyses. These have previously suggested that symptoms relating to low mood or hopelessness and failure, guilt or worthlessness are amongst the most central symptoms (Beard et al., Reference Beard, Millner, Forgeard, Fried, Hsu, Treadway and Björgvinsson2016; O'Driscoll et al., Reference O'Driscoll, Buckman, Fried, Saunders, Cohen, Ambler and Pilling2021). Symptoms relating to worry have been reported to be the most central anxiety symptoms (Beard et al., Reference Beard, Millner, Forgeard, Fried, Hsu, Treadway and Björgvinsson2016). Some arguments suggest that central symptoms are critical elements within networks as they interlink closely with other symptoms (Beard et al., Reference Beard, Millner, Forgeard, Fried, Hsu, Treadway and Björgvinsson2016). They are often considered to be highly relevant in the maintenance of a network and are potentially important treatment targets (Beard et al., Reference Beard, Millner, Forgeard, Fried, Hsu, Treadway and Björgvinsson2016). However, caution should be taken to avoid overinterpretation given that centrality does not necessarily equate to clinical importance (Bringmann et al., Reference Bringmann, Elmer, Epskamp, Krause, Schoch, Wichers and Snippe2019; O'Driscoll et al., Reference O'Driscoll, Buckman, Fried, Saunders, Cohen, Ambler and Pilling2021) or clinical utility, such as prognostic capacity (Buckman et al., Reference Buckman, Cohen, O'Driscoll, Fried, Saunders, Ambler and Pilling2021; O'Driscoll et al., Reference O'Driscoll, Buckman, Fried, Saunders, Cohen, Ambler and Pilling2021). As such, the literature appears ambiguous regarding the practical meaning and application of centrality metrics. It is nonetheless encouraging that our findings suggest several of the symptoms thought to be central in networks, as well as clinical conceptualisations, change comparatively more during treatment. However, given these noted limitations more research is needed.
The present research has primarily focused on how symptoms change across CBT broadly. Ideally, these changes could be mapped onto specific therapeutic techniques. This may begin to explain which elements of CBT are acting on specific symptoms and why some symptoms may increase at a faster pace. Disentangling the complex nature of psychotherapies and how these relate to specific symptoms would have important implications for precision psychiatry – a more detailed understanding of which therapeutic techniques have an effect on specific symptoms could potentially lead to better treatment matching on both empirical and theoretical grounds. However, having no detailed information regarding the order/nature of specific techniques and the lack of fidelity measures in IAPT make this difficult to assess (Martin, Iqbal, Airey, & Marks, Reference Martin, Iqbal, Airey and Marks2022). Component studies of CBT, which also look at symptoms rather than sum scores alone, may be beneficial in gaining more granular insights into CBT; however, to date, efforts to understand specific therapeutic techniques appear underpowered (Cuijpers, Cristea, Karyotaki, Reijnders, & Hollon, Reference Cuijpers, Cristea, Karyotaki, Reijnders and Hollon2019). Including more detailed recording of therapeutic interventions as part of routine care may be a feasible way to generate large and rich datasets to begin to address some of these questions with sufficient statistical power.
Strengths & limitations
The present study was based on data from a large retrospective cohort of patients receiving CBT that was obtained from clinical practice to examine individual symptoms. The naturalistic setting adds to the ecological validity of the findings. We also used an interpretable model to examine how individual symptoms change relative to one another and used a treatment duration that is reasonably representative of the average adult being treated for depression and anxiety in primary care in England.
Despite these strengths, several limitations should be considered. The reliability and validity of using item-level questionnaire responses, rather than sum scores, is still under debate (Boschloo et al., Reference Boschloo, Cuijpers, Karyotaki, Berger, Moritz, Meyer and Klein2019b). While examining how symptoms improve relative to one another is strength, the lack of a control group limits causal conclusions regarding the symptom-specific effects of CBT. For example, the differential trajectories of specific symptoms might be explained by items with higher baseline scores, such as guilt/worthless and uncontrollable worry, having more scope to regress to the mean. However, not all symptoms that improved comparatively less had lower baseline scores, i.e. sleeping problems improved comparably less than other symptoms but had a similar baseline severity to symptoms that improved most. A further explanation may be that symptoms that improved the most may be more amendable to natural change/recovery. However, this seems unlikely given that symptoms that improved the most included low mood and worry, which are considered core symptoms of depression and anxiety (APA, 2013). We also model the probability of being symptom free. It may nonetheless be clinically meaningful if patients move from a score of 3 (experiencing symptoms ‘nearly every day’) to a score of 1 (experiencing symptoms ‘several days’). Due to the modelling choice, we do not capture this nuance.
While efforts were made to capture a sample of patients who received high-intensity CBT, there are limitations of examining treatment in observational data. While clinicians are trained in accordance with clinical protocols, the lack of treatment fidelity measures in IAPT (Martin et al., Reference Martin, Iqbal, Airey and Marks2022) raises possible concerns about the consistency and fidelity to protocol of treatment that are delivered in routine care. A lack fidelity is likely to dilute any observed effects. Relatedly, clinicians may already be tailoring interventions to individual clients with clinical work placing importance on the prioritisation of problem areas (Department of Health, 2019). We also cannot exclude the possibility of any data entry errors or variations between clinicians and services on how data is recorded, related to treatment but also for other recorded characteristics.
The main analysis was based on eight appointments; this has multiple implications. A proportion of patients will require less than eight appointments to improve whilst others drop out of treatment. This likely affects the probability of overall response but is less likely to influence relative symptom trajectories, unless specific symptoms are predictors of faster response or drop out. Secondly, other symptoms might improve at a later stage of treatment either by being targeted by specific CBT techniques which may be scheduled later in treatment or as a consequence of cascading effects of improvements in other symptoms. Given our choice of eight appointments, which is the average number of appointments delivered in IAPT, and potentially covers some of the later responses observed for some patients after approximately six appointments (Saunders et al., Reference Saunders, Buckman, Cape, Fearon, Leibowitz and Pilling2019) we can be somewhat confident that these results may generalise to the average patient. However, we cannot exclude the possibility that the rate of change for symptoms may differ over the course of longer treatment. As predictions for appointments beyond eight are extrapolations based on the first eight appointments, they remain speculative and are primarily for illustrative purposes.
Implications
This study suggests that sum scores may obscure important differences of the effect of interventions on specific symptoms. However, we only captured symptoms recorded on the PHQ-9 and GAD-7, which are less extensive than other questionnaires. Given their suggested relevance, questionnaires that include non-DSM symptoms may be insightful (Fried, Epskamp, Nesse, Tuerlinckx, & Borsboom, Reference Fried, Epskamp, Nesse, Tuerlinckx and Borsboom2016). Similarly, questionnaires that capture symptoms more granularly, such as disaggregating symptoms like psychomotor agitation and retardation, may be beneficial given their differential contribution to impairment (Fried & Nesse, Reference Fried and Nesse2014). A particular focus on anxiety may also be beneficial given that there is comparably less focus on individual symptoms.
While the findings of the present research require replication and more rigorous evaluation, they do provide a promising avenue for understanding the heterogeneity in clinical outcomes as well precision psychiatry/psychotherapy. It could potentially be possible to match patients to treatment based on the baseline presentation of symptoms; particularly if the symptom-specific effects of CBT and other interventions are better understood and evaluated in the future. Furthermore, it provides the opportunity of more targeted combinations of interventions that may work on symptoms which respond comparatively less. That is, in the present research, sleeping problems appeared to improve comparatively less. As such, augmenting CBT with an intervention that targets sleep problems, such an internet-delivered insomnia programme, amongst patients with higher levels of insomnia may provide a relatively cost-effective way to improve treatment outcomes (Darden et al., Reference Darden, Espie, Carl, Henry, Kanady, Krystal and Miller2021; Ye et al., Reference Ye, Chen, Chen, Liu, Lin, Liu and Jiang2016).
However, much of the evidence that will likely lead to making such improvements in personalised psychotherapy more tangible is yet to be robustly established and replicated. In addition to a lack of consensus on how psychological therapy works, the pathophysiology and aetiological mechanisms of depression and anxiety is also poorly understood (Nemeroff, Reference Nemeroff2020). Further research that addresses both areas will undoubtedly contribute to improving clinical outcomes if therapeutic interventions can then be matched to clinical presentations based on their mechanisms.
Conclusion
Trajectories of symptoms appear to differ during treatment with CBT, highlighting the importance of examining individual symptoms rather than sum scores alone. It appears that CBT improves at least some of the core symptoms of depression and anxiety, or symptoms that could be considered very clinically/theoretically important, fastest. While the research is observational and warrants further evaluation, there are possible implications for clinical practice and provide potential avenues for future research. Examining symptoms may provide an avenue for precision psychiatry/psychotherapy, allowing for more targeted prescription of treatments and/or more evidence-based augmentation of treatments.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S0033291722001556.
Financial support
This work was conducted as part of a PhD studentship from the University of Bath awarded to Clarissa Bauer-Staeb.