Introduction
It has long been recognized that major depression is a heterogeneous disorder (Blumenthal, Reference Blumenthal1971). Indeed, it is increasingly appreciated that mental illnesses display both strong heterogeneity within disorders and strong pathophysiological and symptomatic overlap across disorders. Symptoms frequently transcend discrete, mutually exclusive diagnostic categories (Marshall, Reference Marshall2020): a Danish population-based study found that every mental illness is associated with an increased risk of every other mental illness (Plana-Ripoll et al., Reference Plana-Ripoll, Pedersen, Holtz, Benros, Dalsgaard, de Jonge and McGrath2019). The same genetic variants often affect the risk of multiple mental illnesses (Anttila et al., Reference Anttila, Bulik-Sullivan, Finucane, Walters, Bras and Murray2018), to such an extent that all mental illnesses have been classified neurobiologically as variations along with a single ‘p factor’ (Caspi et al., Reference Caspi, Houts, Belsky, Goldman-Mellor, Harrington, Israel and Moffitt2014), albeit with some disorder-specific variation (Shanmugan et al., Reference Shanmugan, Wolf, Calkins, Moore, Ruparel, Hopson and Satterthwaite2016). Depressive symptoms in particular do not only constitute an autonomous disorder, but may also arise reactively to the experience of environmental stressors or occur as a comorbidity in numerous other mental disorders, for instance in schizophrenia (Häfner et al., Reference Häfner, Maurer, Trendler, Heiden, Schmidt and Könnecke2005).
These insights have driven a revisionary approach to psychiatric nosologies. The DSM-5 broadened the use of ‘specifiers’ (e.g. ‘with atypical features’, ‘with psychotic features’) in an attempt to refine clinical subtypes within a major depressive episode (American Psychiatric Association, 2013). The ICD-11 introduced an analogous notion of ‘qualifiers’, e.g. ‘with prominent anxiety symptoms’ (Stein et al., Reference Stein, Szatmari, Gaebel, Berk, Vieta, Maj and Reed2020). Clinical research, spurred partly by the US National Institute of Mental Health's Research Domain Criteria (RDoC) initiative (Insel et al., Reference Insel, Cuthbert, Garvey, Heinssen, Pine, Quinn and Wang2010), has described several symptom-based (van Loo, de Jonge, Romeijn, Kessler, & Schoevers, Reference van Loo, de Jonge, Romeijn, Kessler and Schoevers2012) and biological (Beijers, Wardenaar, van Loo, & Schoevers, Reference Beijers, Wardenaar, van Loo and Schoevers2019; Drysdale et al., Reference Drysdale, Grosenick, Downar, Dunlop, Mansouri, Meng and Liston2017) depressive subtypes.
The use of large-scale data is promising in its potential to identify latent dimensions of psychopathology (Weissman, Reference Weissman2020): the availability of large, deeply phenotyped cohort studies, perhaps most strongly exemplified by the UK Biobank (Bycroft et al., Reference Bycroft, Freeman, Petkova, Band, Elliott, Sharp and Marchini2018), has the potential to enhance our understanding of the neuropsychiatric disease. What the UK Biobank lacks in depth of psychiatric measurement it makes up for in breadth, with genotypes and thousands of phenotypes assessed on up to half a million British participants. Analyses of this large-scale sample suggest that there are no detectable genetic subgroups of depressed patients (Howard et al., Reference Howard, Folkersen, Coleman, Adams, Glanville, Werge and McIntosh2020) and that atypical depression (defined based on the presence of both hypersomnia and weight gain) is associated with more severe symptoms and more frequent psychiatric and non-psychiatric comorbidities (Brailean, Curtis, Davis, Dregan, & Hotopf, Reference Brailean, Curtis, Davis, Dregan and Hotopf2020).
Rather than relying on established nosologies, we embarked upon an agnostic, data-driven approach. Specifically, we performed a factor analysis on nine questions from the UK Biobank Mental Health Questionnaire (Davis et al., Reference Davis, Coleman, Adams, Allen, Breen, Cullen and Hotopf2020) pertaining to an individual's worst reported lifetime episode of depression. Factor analysis and other techniques for elucidating the underlying symptom structure of multi-dimensional data have a rich history in depression research, dating back to the development of early depression rating scales such as the Hamilton Depression Rating Scale (Hamilton, Reference Hamilton1960), which found four symptom dimensions roughly corresponding to negative cognition/psychomotor retardation, gastrointestinal symptoms/initial insomnia/weight loss/anhedonia, anxiety/agitation and somatic anxiety/nighttime awakening. Other studies have since explored a richer set of symptoms by pooling questions across multiple depression rating scales (Ballard et al., Reference Ballard, Yarrington, Farmer, Lener, Kadriu, Lally and Zarate2018; Fried, Reference Fried2017). For instance, one study found eight factors encompassing depressed mood, tension, negative cognition, suicidal thoughts, impaired sleep, reduced appetite, anhedonia and amotivation (Ballard et al., Reference Ballard, Yarrington, Farmer, Lener, Kadriu, Lally and Zarate2018).
However, the unique characteristics of the UK Biobank cohort allow us to go beyond previous studies of depressive symptoms in two key ways. First, its clinical heterogeneity enables a direct comparison between individuals with major depression and demographically similar individuals in the community with undiagnosed self-reported depression. Second, the UK Biobank's extraordinary breadth of phenotyping facilitates the painting of a rich clinical portrait of individual depressive symptom dimensions.
Methods
Participants
Participants were included from the UK Biobank (Fig. 1), a community-based cohort study with genetics and deep phenotyping on approximately half a million individuals from across the UK, aged 40–69 years at recruitment (Bycroft et al., Reference Bycroft, Freeman, Petkova, Band, Elliott, Sharp and Marchini2018). A total of 157 338 participants completed an online Mental Health Questionnaire (Davis et al., Reference Davis, Coleman, Adams, Allen, Breen, Cullen and Hotopf2020), of whom 33 414 (21%) reported ever being diagnosed with depression by a health professional, a case definition we call ‘major depression’ following the terminology of the Psychiatric Genomics Consortium (McIntosh, Sullivan, & Lewis, Reference McIntosh, Sullivan and Lewis2019).
A total of 85 943 participants of the 157 338 (55%) answered yes to the question ‘Have you ever had a time in your life when you felt sad, blue, or depressed for two weeks or more in a row?’. Note that this 55% is likely larger than the percentage of the general population who would endorse this question, because of selection bias in who responded to the emailed questionnaire invitation (Davis et al., Reference Davis, Coleman, Adams, Allen, Breen, Cullen and Hotopf2020). This question is analogous to one of the two questions on the Patient Health Questionnaire-2 (PHQ-2) (Kroenke, Spitzer, & Williams, Reference Kroenke, Spitzer and Williams2003), a clinically validated screening tool for major depressive disorder (MDD) (Levis et al., Reference Levis, Sun, He, Wu, Krishnan and Bhandari2020), as well as to one of the two questions on the Composite International Diagnostic Interview short-form (Kessler, Andrews, Mroczek, Ustun, & Wittchen, Reference Kessler, Andrews, Mroczek, Ustun and Wittchen1998). This question was a prerequisite for being asked the nine questions included in the factor analysis.
Of the 33 414 participants reporting a diagnosis of major depression, almost all, 31 675 (95%), also reported ever feeling ‘sad, blue or depressed’ for 2 weeks or more. These 31 675 participants were further subsetted to lists of self-reported White (N = 25 261) and non-White (N = 655) participants.
Exploratory factor analysis
An exploratory factor analysis (maximum likelihood with oblimin rotation) was performed for the largest ancestry group, self-reported White participants, across nine questions from the Mental Health Questionnaire pertaining to an individual's worst reported lifetime episode of depression (Table 1), with the aim of identifying a small number of latent factors that could explain the majority of variance in responses across the nine questions. We note the possibility that these questions might pertain to a depressive episode even worse (from the patient's perspective) than the episode during which they were formally diagnosed, but we consider it unlikely that this even worse episode would not also merit the same formal depression diagnosis.
All questions pertain to an individual's worst lifetime episode of depression, and all were coded as binary variables unless otherwise indicated.
The exploratory factor analysis was conducted using version 1.9.12.31 of the psych package in version 3.5.3 of the R programming language. Specifically, the polychoric function was used to compute polychoric correlations among all pairs of the nine questions; then, a maximum likelihood factor analysis was run on the resulting correlation matrix using the fa function with oblimin rotation (Lawley, Reference Lawley1940). We selected the minimum number of factors with a high goodness of fit, defined as a Tucker–Lewis index (Tucker & Lewis, Reference Tucker and Lewis1973) above 0.95 and root mean square error of approximation below 0.05. Correlation-preserving ‘ten Berge’ factor scores (ten Berge, Krijnen, Wansbeek, and Shapiro, Reference ten Berge, Krijnen, Wansbeek and Shapiro1999) were computed using the factor.scores function.
Confirmatory factor analyses across diverse ancestries and depression case definitions
We performed several confirmatory factor analyses (CFAs) to replicate the symptom structure derived from the exploratory factor analysis of our primary cohort. First, to confirm generalizability to individuals of diverse ancestries, including those underrepresented in medical research (Smart & Harrison, Reference Smart and Harrison2017), we performed a CFA on 655 self-reported non-White participants. Second, we performed a CFA across sexes, in male and female White participants. Third, we performed a CFA on 7190 White participants with an ICD-10 code for MDD (F32 or F33) from linked inpatient or primary care records, which we call ‘ICD-coded MDD’ (Fig. 1) following the terminology of a recent genome-wide association study from the PGC (Howard et al., Reference Howard, Adams, Shirali, Clarke, Marioni, Davies and McIntosh2018). Finally, we performed CFA on 43 090 White participants who reported ever feeling ‘sad, blue, or depressed for two weeks or more in a row’ but not ever receiving a depression diagnosis, which we call ‘undiagnosed self-reported depression’ (Fig. 1). CFAs were conducted using the cfa function from version 0.6-6 of the lavaan R package (Rosseel, Reference Rosseel2012), with default parameter settings.
Polygenic risk scores
Polygenic risk scores (PRSs) were derived from public genome-wide association study (GWAS) results for MDD (Wray et al., Reference Wray, Ripke, Mattheisen, Trzaskowski, Byrne and Abdellaoui2018; https://pgcdata.med.unc.edu/major_depressive_disorders/daner_pgc_mdd_meta_w2_no23andMe_rmUKBB.gz), bipolar disorder (Stahl et al., Reference Stahl, Breen, Forstner, McQuillin, Ripke and Trubetskoy2019; https://pgcdata.med.unc.edu/bipolar_disorder/daner_PGC_BIP32b_mds7a_0416a.gz) and schizophrenia (Pardiñas et al., Reference Pardiñas, Holmans, Pocklington, Escott-Price, Ripke, Carrera and Walters2018; https://pgcdata.med.unc.edu/schizophrenia/SCZ_wave3/PGC3_SCZ_wave3_public.v2.tsv.gz) across self-reported White participants.
The UK Biobank's imputed genotypes were filtered using version 2.00 of the plink GWAS analysis software (Chang et al., Reference Chang, Chow, Tellier, Vattikuti, Purcell and Lee2015). Non-autosomal variants, duplicates, indels and variants with imputation INFO score <0.8 were removed, as were variants with Hardy–Weinberg equilibrium p-value <10−10, over 5% missingness, or minor allele frequency below 0.1% across self-reported White participants. Summary statistics were harmonized with the UK Biobank imputed genotypes with respect to reference/alternate allele and strand, using the allele harmonization framework from munge_sumstats.py in the ldsc software package (Bulik-Sullivan et al., Reference Bulik-Sullivan, Loh, Finucane, Ripke, Yang and Neale2015). Ambiguous variants (A/T, C/G, G/C, T/A) and variants missing from the UK Biobank were excluded. Summary statistics were then subset to p < 0.05, a threshold found to be most predictive across all self-reported White participants in the UK Biobank (Table 2). Frequency-informed linkage disequilibrium (LD) pruning to r 2 > 0.2 across the self-reported White participants was then performed using a 500 kb sliding window. The remaining variants constituted the trait's PRS, with the variants’ effect sizes (beta coefficients for educational attainment, log odds ratios for the other three case-control studies) constituting the weights of the PRS. Finally, PRSs were scored on each individual in the study cohort by summing, across the variants in the PRS, the variant's weight times the individual's number of effect alleles of that variant; missing genotypes were mean-imputed.
The area under the curve (AUC), also known as the area under the receiver operating characteristic curve (AUROC) or concordance statistic (C statistic), is the fraction of the time that the polygenic risk score would rank a randomly chosen case higher than a randomly chosen control.
Associations with phenotypic variables and PRSs
The factors or ‘symptom dimensions’ resulting from factor analysis were associated with 19 mental illness-related fields, including 15 self-reported diagnosed mental illnesses, family history of severe depression, and the 3 PRSs described above. Statistical associations were performed using linear regression for continuous traits (i.e. PRSs), with results reported as effect sizes (β coefficients); or logistic regression for binary traits (i.e. diagnoses and family history), with results reported as odds ratios. Effect sizes and odds ratios were adjusted for age and sex, except for PRS associations, which were also adjusted for the top 10 genotype principal components.
Associations were performed using version 0.11.0 of the statsmodels Python package. For each phenotype and factor, a separate logistic (statsmodels.Logit; for binary phenotypes) or linear (statsmodels.OLS; for non-binary phenotypes) regression was conducted with the phenotype as the output variable and the factor scores as the input variable, with age and sex as covariates and the regression taking place across all individuals with non-missing values for the phenotype. Factor scores and non-binary phenotypes were both standardized to zero mean and unit variance, and can thus be interpreted as being per standard deviation increase in the factor score. To avoid convergence issues due to the presence of sex as a covariate, rare binary phenotypes exhibited by fewer than five males or five females were excluded.
Two-tailed p-values were calculated from the factor's regression coefficient in the usual way, by dividing the coefficient by its standard error and then converting this z-score to a p-value by inverse-normal transformation; statistical significance was set at a false discovery rate of 5%. Since many phenotypes were correlated with most or all factors, but to varying degrees, we applied a difference-of-effect sizes test in order to compare the effect sizes with each other. z-scores for the difference of two regression coefficients β 1 and β 2 with standard errors σ 1 and σ 2 were calculated using the formula z diff = (β 1 − β 2)/√(σ 12 + σ 22). p-values were then computed by inverse-normal transforming these z-scores.
Results
Identification of depressive symptom dimensions
Four factors (‘symptom dimensions’) were identified by exploratory factor analysis of nine questions from the UK Biobank Mental Health Questionnaire pertaining to an individual's worst reported lifetime episode of depression (Table 3) across 25 261 individuals with major depression. Factors were labeled Factor A through Factor D, in order of decreasing variance explained, and roughly corresponded to the categories of atypical depressive symptoms (Factor A), functional impairment (Factor B), insomnia (Factor C) and negative cognition (Factor D). Four was the minimum number of factors required for high goodness of fit to the underlying questionnaire data (Methods), with a Tucker–Lewis index of 0.963 and root mean squared error of approximation of 0.047. As expected, these four factors do not fully represent the cohort's symptom structure; there is marked heterogeneity in which of the questions associated with each factor were endorsed individually or in combination. The internal structure of each factor is shown in Fig. 2.
The largest factor loading for each symptom is bolded; loadings with a magnitude >0.1 are underlined. Symptoms refer to specific questions from the UK Biobank Mental Health Questionnaire (Table 1).
Symptom dimensions are consistent across ancestries, sexes and depression case definitions
The identified factor structure replicated across ancestries, in 655 non-White participants with major depression (χ226 = 58.6, p = 0.0002). It also replicated across sexes, in 7960 male (χ226 = 470.1, p = 5 × 10−83) and 17 301 female (χ226 = 1302.9, p = 1 × 10−258) White participants with major depression.
The factor structure also replicated across depression case definitions. It replicated in 7190 White participants with ICD-coded MDD (χ226 = 600.06, p = 6 × 10−110). It also replicated in 43 090 White participants with undiagnosed self-reported depression, who reported ever feeling ‘sad, blue, or depressed for two weeks or more in a row’ but never receiving a depression diagnosis (χ226 = 2565.27, p = 3 × 10−529), with a similar pattern of correlations among symptoms as in the full cohort (Fig. 3). This suggests that at least some individuals with undiagnosed self-reported depression would have met the criteria to be diagnosed with major depression, if they had only sought help at the time.
Associations with mental illness diagnoses, PRSs and family history
Strikingly, every symptom dimension was associated with increased risk of nearly every mental illness (according to self-report of professional diagnoses in the Mental Health Questionnaire) and PRS or family history thereof (Table 4). 44 of 60 factor-illness associations were significant after multiple testing correction. All but one of these 44 associations were between higher factor scores and increased risk of mental illness. Similarly, 10 of 12 factor-PRS associations and 4 of 4 factor-family history associations were between higher factor scores and increased risk of mental illness. This is reminiscent of how every mental illness has been associated with an increased risk of every other mental illness (Plana-Ripoll et al., Reference Plana-Ripoll, Pedersen, Holtz, Benros, Dalsgaard, de Jonge and McGrath2019).
Mental illnesses and polygenic risk scores are ordered in descending order by the largest odds ratio/effect size across the four factors. Significant associations (FDR < 0.1) are bolded while non-significant ones are italicized; associations significantly larger (↑) or smaller (↓) than for all other factors (p < 0.05, difference-of-effect-sizes test) are denoted in red and blue, respectively. For binary traits, N denotes the number of people with the trait. ADD/ADHD = attention-deficit (and hyperactivity) disorder, GAD = generalized anxiety disorder.
However, certain illnesses were particularly associated with specific symptom dimensions. For instance, anorexia nervosa (AOR = 2.00 [1.53, 2.61]) and bulimia nervosa (AOR = 2.30 [1.60, 3.30]) were exclusively associated with factor D (negative cognition). Factor D was also significantly more associated than other factors with social anxiety/phobia (AOR = 2.79 [2.33, 3.35]), reflective of transdiagnostic contributions of negative cognition to multiple psychiatric illnesses (Ehring & Watkins, Reference Ehring and Watkins2008). Despite being associated with nearly every symptom dimension, PRSs and family history did not display significant differential associations between symptom dimensions.
Discussion
In this study, we analyzed the latent symptom structure of major depression in a population-based cohort of 25 261 self-reported White participants with a lifetime depression diagnosis. The identified symptom structure replicated across ancestries and case definitions. Each symptom dimension had a unique comorbidity profile, being associated with a specific combination of mental illnesses, though not showing any obvious genetic signatures relative to other dimensions. To our knowledge, this study represents the largest-ever analysis of the structure of depressive symptoms.
Every symptom dimension was associated with an increased risk of nearly every mental illness. In part, this reflects shared diagnostic criteria between depression and comorbid disorders. For instance, functional impairment (‘clinically significant distress or impairment in social, occupational, or other important areas of functioning’) forms part of the DSM-5 diagnostic criteria for both MDD and generalized anxiety disorder (American Psychiatric Association, 2013), which might help explain why anxiety is significantly more associated with the functional impairment symptom dimension than with any other dimension. It is also consistent with the notion of transdiagnostic subtypes – a key focus of RDoC (Insel et al., Reference Insel, Cuthbert, Garvey, Heinssen, Pine, Quinn and Wang2010) – in which neurobiologically similar subtypes cut across existing diagnostic categories (Grisanzio et al., Reference Grisanzio, Goldstein-Piekarski, Wang, Rashed Ahmed, Samara and Williams2018) and genetic variants have pleiotropic effects across multiple dimensions of psychopathology (Anttila et al., Reference Anttila, Bulik-Sullivan, Finucane, Walters, Bras and Murray2018).
Similarly, every symptom dimension was positively associated with every or nearly every PRS. However, unlike for diagnosed mental illnesses, symptom dimensions were not differentially associated with polygenic risk. This is concordant with a recent study finding no evidence of genetically defined depressive subtypes (Howard et al., Reference Howard, Folkersen, Coleman, Adams, Glanville, Werge and McIntosh2020).
Dimensions showed highly distinct patterns of association with comorbid mental illnesses. Clinicians should be aware of these associations between specific types of depressive symptoms and specific comorbidities, as patients presenting with one may also have the other. Moreover, treating certain of these comorbidities may lead to concomitant improvement in depressive symptoms.
Despite the UK Biobank's substantial size, breadth and diversity of phenotyping, it has at least three major disadvantages for this application. First, the Mental Health Questionnaire does not fully correspond to established rating scales and DSM-5 specifiers. For instance, it lacks questions on rejection sensitivity and leaden paralysis that would improve ascertainment of atypical depression, and it lacks questions on psychomotor agitation that would improve ascertainment of depression with mixed features. While the Mental Health Questionnaire does ask about psychotic experiences, anxiety and mania, they are not asked with reference to a particular depressive episode, so we chose not to use them within our factor structure here. Second, the Mental Health Questionnaire's temporal ascertainment is limited: an individual's worst reported lifetime episode of depression may be only a small fraction of what is often a prolonged course of illness, with many relapses and remissions. Third, being specific to a single developed country, the dataset lacks a broad representation of the world's population, despite our trans-ancestral replication.
The use of self-report data is considered controversial (Abbasi, Reference Abbasi2017), but has at least two related advantages for this application. First, it enables a direct comparison between major depression and undiagnosed self-reported depression among demographically similar individuals, which we find largely shares the same symptom structure. Thus, this enables ascertainment of a potentially broad segment of the population who may well have experienced an episode of bona fide major depressive disorder, but not sought help at the time (Boerema et al., Reference Boerema, Kleiboer, Beekman, van Zoonen, Dijkshoorn and Cuijpers2016), and consequently go undiagnosed and untreated. For instance, a meta-analysis by the World Health Organization found that across 24 countries, 56.3% of individuals with depression and 56.0% with dysthymia did not receive any treatment for their illness (Kohn, Saxena, Levav, & Saraceno, Reference Kohn, Saxena, Levav and Saraceno2004). Second, as the cohort is composed of a broad spectrum of individuals in the community, rather than merely those seeking treatment at a psychiatric research hospital, it is arguably more representative of the general population than the patients typically recruited into psychiatric research protocols. The high prevalence of undiagnosed self-reported depression with a concordant symptom structure to major depression is consistent with the notion of a large burden of untreated patients (Kohn et al., Reference Kohn, Saxena, Levav and Saraceno2004) who might benefit from psychiatric care.
On the whole, this study provides perhaps the highest-resolution view to date of depressive symptom dimensions in the community. Additional research is needed to further elucidate the underlying neurobiological correlates of these dimensions in ways that can inform treatment decisions.
Acknowledgements
MW and SJT were funded by the Kavli Foundation, Krembil Foundation, CAMH Discovery Fund, the McLaughlin Foundation, NSERC (RGPIN-2020-05834 and DGECR-2020-00048) and CIHR (NGN-171423). PZ was funded by the Labatt Family Postdoctoral Fellowship in Depression Biology. DF is supported by the Koerner Family Foundation New Scientist Program. CH was funded by the CAMH Foundation, the Brain and Behavior Research Foundation, and the NIMH. This work was conducted under the auspices of UK Biobank application 61530, ‘Multimodal subtyping of mental illness across the adult lifespan through integration of multi-scale whole-person phenotypes’. The authors declare no conflicts of interest.