Introduction
Major depressive disorder (MDD) is a common, often chronic and recurrent condition, marked by persistent suffering and poor overall health and with deleterious effects on psychosocial, academic, vocational and family functioning. MDD is one of the most prevalent mental disorders and the leading cause of disability worldwide (World Health Organization, 2017), with lifetime prevalence estimates ranging from 7% to 21% (Kessler & Bromet, Reference Kessler and Bromet2013).
In 1991, the MacArthur Foundation Network on the Psychobiology of Depression concluded that the randomness with which investigators referred to key changes in clinical status of individuals with depression led to considerable confusion in the literature (Prien et al. Reference Prien, Carpenter and Kupfer1991). Subsequently, a task force was initiated to achieve consensus about the definition of key stages, change points and outcome definitions for MDD among clinical investigators and practicing clinicians. The resulting report by Frank et al. (Reference Frank, Prien, Jarrett, Keller, Kupfer, Lavori, Rush and Weissman1991) defined conceptualisations of an MDD episode, remission, recovery, relapse and recurrence (see Fig. 1 and supplementary Table) by a set of five parameters or thresholds: two severity scores (cut-offs for ‘asymptomatic’ and fully symptomatic ranges) and three durations (minimal consecutive time durations in the fully or a-symptomatic range before an episode, remission, or recovery can be declared).
Specific consensus-based recommendations for these thresholds were provided in Frank et al.’s (Reference Frank, Prien, Jarrett, Keller, Kupfer, Lavori, Rush and Weissman1991) report and revised in a follow-up report by Rush et al. (Reference Rush, Kraemer, Sackeim, Fava, Trivedi, Frank, Ninan, Thase, Gelenberg, Kupfer, Regier, Rosenbaum, Ray and Schatzberg2006). Both reports explicitly requested empirical validations of these now widely used consensus-based definitions. Therefore, the present paper reviews the accumulated evidence over the past 27 years to validate the proposed conceptualisations and operationalisations and to provide suggestions for future avenues.
Conceptual discussion
Here we focus on conceptualisations of MDD episode, remission, recovery, relapse and recurrence by Frank et al. (Reference Frank, Prien, Jarrett, Keller, Kupfer, Lavori, Rush and Weissman1991, see supplementary Table), which are based exclusively on severity (number/intensity) and duration of clinical symptoms, and each has its own rationale and clinical implications. An MDD episode means that illness is present and that treatment is indicated. When the state of remission (a relatively brief period without clinically relevant symptoms during or at the end of an episode) is reached, no intensified treatment regimen is required or justified. A recovery (a sustained period of absence of clinically relevant symptoms, i.e. a sustained remission) means that the episode has ended and treatment can be discontinued or aimed at preventing subsequent episodes. Relapse/recurrence imply a return of symptoms during remission/recovery, respectively, and indicate a need for treatment intensification. The implicit distinction between relapse and recurrence is that a relapse is thought to be a return of symptoms of an ongoing episode that was symptomatically suppressed, whereas a recurrence represents an entirely new episode.
Importantly, these concepts can only have the ascribed interpretations and treatment implications if they have substantial predictive value for a future course. For example, treatment is indicated for those experiencing an episode because they have a worse prognosis than those who are experiencing symptoms that do not meet episode criteria. Therefore, the operationalisations of these concepts (i.e. the choice of severity and duration thresholds) should be chosen in such a way that they have optimal prognostic significance.
In particular, it should be possible to distinguish remission from recovery (and therefore relapse from recurrence), which are different only in their duration, by a difference in prognosis. The hypothesis is that those in remission have not (yet) fully recovered from the latently present episode (i.e. they are still undergoing a healing process) and therefore have a relatively high relapse rate compared with those who recovered. Those who recovered have a low recurrence rate that is no longer dependent on the time since their last episode and equal to the incidence rate of a risk factor-comparable population who never experienced an episode. Similarly, in cancer research, ‘full remission’ is defined as the period during which any sign of the disease is lacking, but during which a patient is particularly vulnerable for a relapse of the tumour since latent disease might still be present. When the remission is of sufficiently long duration, the patient can be (retrospectively) considered to be recovered or ‘cured’ as the passing of even more time does not provide additional protection to disease recurrence, the risk of which is similar to the incidence risk of a comparable healthy population.
Some of the clinical status concepts that are the subject of this review are also defined in the Diagnostic and Statistical Manual of mental disorders (DSM-5; American Psychiatric Association, 2013) and the International Classification of Diseases (ICD-10; World Health Organization, 1993), as summarised in Table 1.
a If the symptoms are particularly severe and of very rapid onset, it may be justified to make the diagnosis after less than 2 weeks.
b Although the term ‘recovery’ is mentioned in the DSM-5 and ICD-10, it is not explicitly defined. ‘Relapse’ is not mentioned in DSM-5 and ICD-10, whereas a recurrent episode is defined in DSM-5 as a return of symptoms during a remission (i.e. equivalent to the concept of ‘relapse’ by Frank et al. (Reference Frank, Prien, Jarrett, Keller, Kupfer, Lavori, Rush and Weissman1991)) and in ICD-10 as a depressive episode separated from a previous episode by at least 2 months free from any significant mood symptoms.
Methods/literature search
This systematic review largely adhered to PRISMA guidelines (Moher et al. Reference Moher, Liberati, Tetzlaff and Altman2009). To review empirical evidence regarding the definitions proposed by Frank et al. (Reference Frank, Prien, Jarrett, Keller, Kupfer, Lavori, Rush and Weissman1991), we searched both Web of Science and Pubmed for studies that referenced them without imposing language restrictions (see supplementary PRISMA flow diagram). Duplicates and non-obtainable studies were excluded. Based on title and abstract, studies were excluded that (i) did not focus on individuals with MDD, (ii) were non-empirical, (iii) were of study types not expected to be useful for the purpose of this review (see online supplement), or (iv) focused on the evaluation of some association or cause-effect relation between variables.
The remaining articles were scrutinised for data that could (in)validate at least one of Frank's definitions. Because severity related criteria were necessarily instrument-specific we focused on articles determining cut-offs on the HAMD-17 and the Montgomery-Åsberg Depression Rating Scale (MADRS), which are the most widely used instruments (Zimmerman et al. Reference Zimmerman, Chelminski and Posternak2004a). Studies using different methodologies were included (see results section). Criteria to define state duration should be maximally predictive of remaining in that state (Frank et al. Reference Frank, Prien, Jarrett, Keller, Kupfer, Lavori, Rush and Weissman1991). Therefore, we sought studies that show the remission/recovery and relapse/recurrence of depressive episodes over time (via survival curves or equivalent).
Two authors (PLdZ, BFJ) extracted data independently and resolved discrepancies through discussion and consensus. References of included articles were searched for additional relevant studies. The literature search was last updated on August 30, 2017.
Results
The 1570 identified papers (supplementary eFigure) included 214 duplicates and 26 non-obtainable papers. The study selection criteria (as outlined above) reduced the number to 117 papers and yielded 49 additional records via reference checks. From these 166 papers, 110 were excluded based on the full-text assessment. Thus 56 studies covering 39 315 subjects were included, and summarised in Tables 2–5.
CGI-S, Clinical Global Impression – Severity scale (1, very much improved; 2, much improved; 3, minimally improved; 4, no change; 5, minimally worse; 6, much worse; 7=very much worse); HAMD, Hamilton Rating Scale for Depression (a.k.a. HRSD, HDRS); MADRS, Montgomery–Åsberg Depression Rating Scale; M, mean; min., minimal; N, number of participants; N/A, not available; R, range; s.d., standard deviation; SF-12 , 12-item short-form health survey; SOFAS, Social and Occupational Functioning Assessment Scale; S x, symptoms; T 1, baseline.
a Country codes (ISO Alpha-2 and 3): DE, Germany; DK, Denmark; ES, Spain; IT, Italy; USA, United States of America.
b cut-off preferred by authors, typically because this subgroup scored better on psychosocial functioning.
c High value attributed to specificity.
d Equal value placed on sensitivity and specificity (AUC: 0.81).
Gen.pop., general population; HAMD, Hamilton Rating Scale for Depression (a.k.a. HRSD, HDRS); MADRS, Montgomery–Åsberg Depression Rating Scale; M, mean; min., minimal; N, number of participants; N/A, not available; R, range; s.d., standard deviation; S x, symptoms; T 1, baseline. USA, United States of America.
a Based on a review of 10 studies for the MADRS and a review of 27 studies for the HAMD.
DSM, Diagnostic and Statistical Manual; Ex., Episode; Gen.pop., General population; HAMD, Hamilton Rating Scale for Depression (a.k.a. HRSD, HDRS); LIFE-PSR, Longitudinal Follow-up Examination (LIFE) Psychiatric Status Rating Scale (PSR); MADRS, Montgomery–Åsberg Depression Rating Scale; M, mean; min., minimal; N, number of participants; N/A, not available; R, range; PSR, Psychiatric Status Ratings**; RDC, Research Diagnostic Criteria; s.d., standard deviation; S x, symptoms; T 1, baseline; QIDS-SR16, Quick Inventory of Depressive Symptomatology 16-item self-rating scale.
a Country codes (ISO Alpha-2 and 3): DE, Germany; ES, Spain; GB, United Kingdom; IT, Italy; JP, Japan; USA, United States of America.
b Psychiatric status ratings: (1) asymptomatic (return to usual self); (2) residual/mild affective S x; (3) partial remission, moderate S x or impairment; (4) marked/major S x or impairment; (5) meets definite MDD criteria without prominent psychotic S x or extreme impairment; (6) meets definite criteria with prominent psychotic S x or extreme impairment.
c The authors state that relapse becomes less likely when the MADRS score is lower, but there is no single cut-off that has high sensitivity and specificity for predicting relapse: ‘This suggests that there is no particular cut-off that is sufficient to consider as ‘low enough’ to protect against future relapse, so the primary conclusion would be to strive for the lowest score possible’.
d No particular cut-off: those with a greater number of residual symptom domains (out of nine possible DSM-IV criterion symptom domains) had a greater probability of relapse.
ass., assessment; DSM, Diagnostic and Statistical Manual; CIDI, Composite International Diagnostic Interview; Dx, diagnosis; def., definition; Ex., Episode; Gen.pop., General population; HAMD, Hamilton Rating Scale for Depression; LCI, Life chart interview; M, mean; MDD, Major Depressive Disorder or unipolar depression; min., minimal; N, number of participants; N/A, not available; NIMH, National Institute of Mental Health; PSR, Psychiatric Status Ratingsb; R, range; RDC, Research Diagnosis Criteria; retr., retrospectively; SADS-L, Schedule for Affective Disorders and Schizophrenia-Lifetime interview; s.d., standard deviation; Sx, symptoms; T 1, baseline wave; T 2, follow-up wave.
a Country codes (ISO Alpha-2 and 3): CA, Canada; DE, Germany; DK, Denmark; ES, Spain; FI, Finland; GB, United Kingdom; IE, Ireland; IT, Italy; JP, Japan; MX, Mexico; NL, Netherlands; SE, Sweden; USA, United States of America.
b Psychiatric status ratings: (1) asymptomatic (return to usual self); (2) residual/mild affective S x; (3) partial remission, moderate S x or impairment; (4) marked/major S x or impairment; (5) meets definite MDD criteria without prominent psychotic S x or extreme impairment; (6) meets definite criteria with prominent psychotic S x or extreme impairment.
(1) Respondents rated whether they experienced ‘a time when you felt sad or blue and had some of these other problems (e.g., weight loss or sleeplessness)’.
(2) Response was defined in various ways, and each definition was tested for validity.
(3) Authors appear to mix up recurrence and relapse, but we denote time after patient recovered as recurrence.
(4) Medication use was seen as indication for not being healthy, thus these people were not at risk for recurrence.
(5) PSR ≥3 during some of these weeks count as residual S x after remission, i.e., the patient is not yet considered to be relapsed or recurred before PSR ≥5.
(6) The authors suggest that 8 week duration was the standard before their paper was published, mistakenly, see Rush et al. (Reference Rush, Kraemer, Sackeim, Fava, Trivedi, Frank, Ninan, Thase, Gelenberg, Kupfer, Regier, Rosenbaum, Ray and Schatzberg2006).
Severity thresholds
Frank et al. (Reference Frank, Prien, Jarrett, Keller, Kupfer, Lavori, Rush and Weissman1991) categorised the level of MDD symptomatology in three clinical ranges: a fully symptomatic range that can indicate the start of an episode, an asymptomatic range that can indicate the start of a full remission, and a partially symptomatic range in between. The ‘asymptomatic range’ is supposed to represent the normal range consistent with the absence of disorder. The term is a bit of a misnomer as this range includes the presence of a minor level of symptomatology associated with the ‘healthy’ (non-depressed) population, in which the average HAMD-17 score is about 3.2 (Zimmerman et al. Reference Zimmerman, Chelminski and Posternak2004b); however, for consistency, the term asymptomatic will be used throughout this review.
Two instrument-specific ‘thresholds’ need to be defined on the HAMD-17 and MADRS (most widely used as endpoints in clinical trials; Zimmerman et al. Reference Zimmerman, Chelminski and Posternak2004a) to operationalise these three different levels of symptomatology (see Fig. 1). Frank et al. (Reference Frank, Prien, Jarrett, Keller, Kupfer, Lavori, Rush and Weissman1991) defined HAMD-17 scores ⩾15 to correspond to the fully symptomatic range while HAMD-17 ⩽7 would indicate the asymptomatic range, the latter of which is roughly equivalent to MADRS ⩽10–11 (Zimmerman et al. Reference Zimmerman, Posternak and Chelminski2004c).
Regarding the severity thresholds, the 32 studies that provided data are summarised in Tables 2–4.
Severity threshold for the asymptomatic range
Studies focusing on the asymptomatic threshold could be roughly divided into three groups, reflecting differences in the used criteria for determining the ‘best’ threshold for the asymptomatic range.
The first group of studies selected the optimal threshold by maximising the correspondence to some gold standard (Hawley et al. Reference Hawley, Gale and Sivakumaran2002; Zimmerman et al. Reference Zimmerman, Posternak and Chelminski2004d, Reference Zimmerman, Posternak and Chelminski2005; Bandelow et al. Reference Bandelow, Baldwin, Dolberg, Andersen and Stein2006; Ballesteros et al. Reference Ballesteros, Bobes, Bulbena, Luque, Dal-Ré, Ibarra and Güemes2007; Riedel et al. Reference Riedel, Möller, Obermeier, Schennach-Wolff, Bauer, Adli, Kronmüller, Nickel, Brieger, Laux, Bender, Heuser, Zeiler, Gaebel and Seemüller2010; Romera et al. Reference Romera, Pérez, Menchón, Polavieja and Gilaberte2011; Leucht et al. Reference Leucht, Fennema, Engel, Kaspers-Janssen, Lepping and Szegedi2013; Sacchetti et al. Reference Sacchetti, Frank, Siracusano, Racagni, Vita and Turrina2015), most often the Clinical Global Impression-Severity scale (CGI-S) or some measure of functioning (see Table 2). The second group of studies based on the optimal asymptomatic threshold on the mean scores or statistical upper limits of the general population (Zimmerman et al. Reference Zimmerman, Chelminski and Posternak2004a, Reference Zimmerman, Chelminski and Posternakb; see Table 3). These two groups mentioned a variety of optimal asymptomatic thresholds for the HAMD-17 ranging from ⩽2 (Zimmerman et al. Reference Zimmerman, Posternak and Chelminski2005) to ⩽10 (Zimmerman et al. Reference Zimmerman, Chelminski and Posternak2004b) and for the MADRS ⩽4 (Zimmerman et al. Reference Zimmerman, Chelminski and Posternak2004a, Reference Zimmerman, Posternak and Chelminski2004d) to ⩽11 (Bandelow et al. Reference Bandelow, Baldwin, Dolberg, Andersen and Stein2006).
The third and largest group of studies compared the prognosis of patients with different levels of depressive symptomatology, usually in terms of relapse/recurrence risk (see Table 4). Based on this information, a threshold can be chosen that best distinguishes those with a favourable from those with a bad prognosis, argued by Zimmerman et al. (Reference Zimmerman, Chelminski and Posternak2004a) to be the best method of validating a threshold. Most of these studies show that the presence of ‘subthreshold’ symptoms (often called residual symptoms if occurring after an MDD episode) was associated with an enhanced risk of a (recurrent) episode or relapse (Maier et al. Reference Maier, Gänsicke and Weiffenbach1997; Riso et al. Reference Riso, Thase, Howland, Friedman, Simons and Tu1997; Judd et al. Reference Judd, Akiskal, Maser, Zeller, Endicott, Coryell, Paulus, Kunovac, Leon, Mueller, Rice and Keller1998, Reference Judd, Paulus, Schettler, Akiskal, Endicott, Leon, Maser, Mueller, Solomon and Keller2000, Reference Judd, Schettler, Rush, Coryell, Fiedorowicz and Solomon2016; Van Londen et al. Reference Van Londen, Molenaar, Goekoop, Zwinderman and Rooijmans1998; Fava et al. Reference Fava, Rafanelli, Grandi, Conti and Belluardo1999; Kanai et al. Reference Kanai, Takeuchi, Furukawa, Yoshimura, Imaizumi, Kitamura and Takahashi2003; Taylor et al. Reference Taylor, McQuoid, Steffens and Krishnan2004; Nierenberg et al. Reference Nierenberg, Husain, Trivede, Fava, Warden, Wisniewski, Miyahara and Rush2010; Dunlop et al. Reference Dunlop, Holland, Bao, Ninan and Keller2012; Kiosses & Alexopoulos, Reference Kiosses and Alexopoulos2013; Peselow et al. Reference Peselow, Tobia, Karamians, Pizano and IsHak2015). One study (Romera et al. Reference Romera, Pérez, Menchón, Polavieja and Gilaberte2011) did not find this increased risk. Often authors implicitly argued for a lower threshold for remission that does not encompass this level of symptomatology. Some studies also showed that remission as defined by Frank et al. (Reference Frank, Prien, Jarrett, Keller, Kupfer, Lavori, Rush and Weissman1991), HAMD-17 ⩽7, is associated with a better prognosis than not achieving this level of remission (Paykel et al. Reference Paykel, Ramana, Cooper, Hayhurst, Kerr and Barocka1995; Pintor et al. Reference Pintor, Torres, Navarro, Matrai and Gastó2004).
Saliently, some other noteworthy studies showed a large discrepancy between Frank's definition of depression and patient's own judgement regarding their remission (Zimmerman et al. Reference Zimmerman, Martinez, Attiullah, Friedman, Toba, Boerescu and Rahgeb2012a, Reference Zimmerman, Martinez, Attiullah, Friedman, Toba and Boerescub). Within the group of remitters as defined by Frank et al. (Reference Frank, Prien, Jarrett, Keller, Kupfer, Lavori, Rush and Weissman1991), a substantial heterogeneity was observed with respect to reported symptoms (Zimmerman et al. Reference Zimmerman, Martinez, Attiullah, Friedman, Toba and Boerescu2012c), psychosocial impairment (Zimmerman et al. Reference Zimmerman, Posternak and Chelminski2004e, Reference Zimmerman, Posternak and Chelminski2007) and a range of other relevant outcomes (Zimmerman et al. Reference Zimmerman, Martinez, Attiullah, Friedman, Toba, Boerescu and Rahgeb2012d) (see Table 3).
Severity threshold for the fully symptomatic range
Only one study focusing on the fully symptomatic threshold was obtained (see Table 2). By using the CGI-S of 2 or 3 as the gold standard, Leucht et al. (Reference Leucht, Fennema, Engel, Kaspers-Janssen, Lepping and Szegedi2013) advise a HAMD-17 threshold of ⩾7 or ⩾14, respectively.
Duration threshold for episode
Frank et al. (Reference Frank, Prien, Jarrett, Keller, Kupfer, Lavori, Rush and Weissman1991) categorised the symptomatic period following any non-depressive state using a time boundary, separating the time period before the symptoms were recognised as part of a depressive episode from the time period afterwards. The underlying assumption was that developing transient depressive symptoms is not necessarily pathological, as long as they do not culminate in a long-lasting depressive episode. Regarding the validation of this duration criterion, Frank et al. (Reference Frank, Prien, Jarrett, Keller, Kupfer, Lavori, Rush and Weissman1991) state that an episode should be declared ‘when it is unlikely that the patient will spontaneously recover in the next day or two’. Although rather arbitrary, the concept is clear: for the validation of this duration criterion, data are necessary that shed light on the prognosis of those with recently started depressive symptomatology.
Such data was provided by four studies (see Table 5). The meta-analysis by Whiteford et al. (Reference Whiteford, Harris, McKeon, Baxter, Pennell, Barendregt and Wang2013) covering the rate of spontaneous remission in untreated depression showed that this rate decreases continuously over time. However, the amount of data in the range of short duration of follow-up is rather scarce and the studied population (wait-list and primary care samples) is not representative of the general population with depressive symptoms.
One study in the general population showed that 25% of depressive episodes remitted after 4 weeks and 50% after 8–12 weeks, using a methodology in which onset and end of depressive episodes were retrospectively assessed by asking the respondents for their depressive symptomatology in the past (Eaton et al. Reference Eaton, Anthony, Gallo, Cai, Tien, Romanoski, Lyketsos and Chen1997). The finding of a median duration of 12 weeks was replicated in the NEMESIS study using a similar methodology, which also shows that the rate of recovery quickly diminishes after these 12 weeks (Spijker et al. Reference Spijker, De Graaf, Bijl, Beekman, Ormel and Nolen2002).
Wakefield & Schmitz (Reference Wakefield and Schmitz2013) argued that ‘uncomplicated’ depressive episodes, defined as <2 months in duration combined with the absence of certain ‘heavy’ symptoms such as suicidal ideation and psychomotor retardation, should not be classified as MDD. They argued that the risk of developing new depressive episodes for those who had such an uncomplicated episode is not higher than for the general population. Thus, this subgroup of patients does not seem to suffer from an underlying disorder that increases their risk of developing subsequent depressive episodes. This suggests that, at least for this subgroup, the depressive symptomatology should be at least 2 months of duration before it should be considered as a depressive episode.
Duration thresholds for remission and recovery
Frank et al. (Reference Frank, Prien, Jarrett, Keller, Kupfer, Lavori, Rush and Weissman1991) categorised the asymptomatic period following a fully symptomatic period with two time boundaries, yielding three distinct time periods: those (i) before the onset of full remission, (ii) following the onset of full remission but before declaration of recovery and (iii) after declaration of recovery. The underlying assumption is that these three successive periods are each associated with a certain ‘hazard’ for a return of symptoms, which diminishes significantly at each time boundary and becomes constant when recovery is declared.
In the available literature, the hazard for a return of symptoms for asymptomatic individuals is usually shown indirectly in the form of survival curves, showing the fraction of subjects without relapse/recurrence over time. An exponential survival curve is thus equivalent to a constant hazard, whereas a sudden decrease in a hazard (for example, when remission is achieved) should be visible as an upward discontinuity in the survival curve slope.
Survival curves (or equivalent) for asymptomatic individuals until relapse/recurrence or equivalent data were obtained from 31 studies (see Table 5). There is a substantial difference between studies in their studied populations (viz., general population, 1st, 2nd or 3rd line ambulant patients or inpatients), their operationalisations of remission, recovery, relapse and recurrence (because of different instruments or cut-offs on the same instruments) and in the involved treatments that are often uncontrolled.
Several studies show some indication of a sudden drop in relapse/recurrence rate a certain time after remission/recovery was obtained (Paykel et al. Reference Paykel, Ramana, Cooper, Hayhurst, Kerr and Barocka1995; Riso et al. Reference Riso, Thase, Howland, Friedman, Simons and Tu1997; Judd et al. Reference Judd, Akiskal, Maser, Zeller, Endicott, Coryell, Paulus, Kunovac, Leon, Mueller, Rice and Keller1998; Van Londen et al. Reference Van Londen, Molenaar, Goekoop, Zwinderman and Rooijmans1998; Heinze et al. Reference Heinze, Villamil and Cortés2002; Kanai et al. Reference Kanai, Takeuchi, Furukawa, Yoshimura, Imaizumi, Kitamura and Takahashi2003; Kennedy et al. Reference Kennedy, Abbott and Paykel2003; Naz et al. Reference Naz, Craig, Bromet, Finch, Fochtmann and Carlson2007; Holma et al. Reference Holma, Holma, Melartin, Rytsälä and Isometsä2008; de Jonge et al. Reference de Jonge, Conradi, Kaptein, Bockting, Korf and Ormel2010; O'Leary et al. Reference O'Leary, Hickey, Lagendijk and Webb2010; Kiosses & Alexopoulos, Reference Kiosses and Alexopoulos2013). However, the exact amount of time necessary to achieve this drop (as counted from the start of the asymptomatic period) differs per study, ranging from about 2 months (O'Leary et al. Reference O'Leary, Hickey, Lagendijk and Webb2010) to about 3 years (Judd et al. Reference Judd, Akiskal, Maser, Zeller, Endicott, Coryell, Paulus, Kunovac, Leon, Mueller, Rice and Keller1998). Other studies do not find such a sudden drop at all, instead suggesting that the diminishing hazard of return of symptoms is a gradual process rather than a discrete one (Maj et al. Reference Maj, Veltro, Pirozzi, Lobrace and Magliano1992; Shea et al. Reference Shea, Elkin, Imber, Sotsky, Watkins, Collins, Pilkonis, Beckham, Glass, Dolan and Parloff1992; Flint & Rifat, Reference Flint and Rifat1997; Kessing et al. Reference Kessing, Andersen, Mortensen and Bolwig1998; Van Weel-Baumgarten et al. Reference Van Weel-Baumgarten, Van den Bosch, Van den Hoogen and Zitman1998; Mueller et al. Reference Mueller, Leon, Keller, Solomon, Endicott, Coryell, Warshaw and Maser1999; O'Leary et al. Reference O'Leary, Costello, Gormley and Webb2000; Solomon et al. Reference Solomon, Keller, Leon, Mueller, Lavori, Shea, Coryell, Warshaw, Turvey, Maser and Endicott2000; Mattisson et al. Reference Mattisson, Bogren, Horstmann, Munk-Jörgensen and Nettelbladt2007; Dunlop et al. Reference Dunlop, Holland, Bao, Ninan and Keller2012; Martínez-Amorós et al. Reference Martínez-Amorós, Cardoner, Soria, Gálvez, Menchón and Urretavizcaya2012; Seemüller et al. Reference Seemüller, Meier, Obermeier, Musil, Bauer, Adli, Kronmüller, Holsboer, Brieger, Laux, Bender, Heuser, Zeiler, Gaebel, Riedel, Falkai and Möller2014; Peselow et al. Reference Peselow, Tobia, Karamians, Pizano and IsHak2015; Judd et al. Reference Judd, Schettler, Rush, Coryell, Fiedorowicz and Solomon2016). In particular, several studies of the long-term course of MDD show that recurrence rates stabilise only after many years, such as 2.5 years (Solomon et al. Reference Solomon, Keller, Leon, Mueller, Lavori, Shea, Coryell, Warshaw, Turvey, Maser and Endicott2000), 10 years (Mattisson et al. Reference Mattisson, Bogren, Horstmann, Munk-Jörgensen and Nettelbladt2007) or about 15 years (Kessing et al. Reference Kessing, Andersen, Mortensen and Bolwig1998). A third group of studies shows atypical survival curves where the time-specific risk of return of symptoms even increases over time during certain time intervals (Eaton et al. Reference Eaton, Anthony, Gallo, Cai, Tien, Romanoski, Lyketsos and Chen1997; Emslie et al. Reference Emslie, Rush, Weinberg, Gullion, Rintelmann and Hughes1997; Birmaher et al. Reference Birmaher, Williamson, Dahl, Axelson, Kaufman, Dorn and Ryan2004; Pintor et al. Reference Pintor, Torres, Navarro, Matrai and Gastó2004; Yiend et al. Reference Yiend, Paykel, Merritt, Lester, Doll and Burns2009).
Discussion
Severity thresholds
The obtained studies that aimed to identify the optimal thresholds for the asymptomatic and fully symptomatic depressive ranges differed widely in their methodologies (see Tables 2–4). Frank et al. (Reference Frank, Prien, Jarrett, Keller, Kupfer, Lavori, Rush and Weissman1991) postulated that these ranges should (i) correspond to what clinicians view as asymptomatic and fully symptomatic and (ii) that classification of patients within these ranges should be reasonably stable over time. Other theorists argued that the optimal thresholds should be selected based on their predictive value for the future course (Zimmerman et al. Reference Zimmerman, Posternak and Chelminski2004e), which would be most consistent with methods used in other medical fields (Zimmerman et al. Reference Zimmerman, Chelminski and Posternak2004a).
Severity threshold for the asymptomatic range
Multiple studies showed that those who scored below a certain threshold on depressive symptom scales had a better prognosis than those who scored above it (Paykel et al. Reference Paykel, Ramana, Cooper, Hayhurst, Kerr and Barocka1995; Maier et al. Reference Maier, Gänsicke and Weiffenbach1997; Riso et al. Reference Riso, Thase, Howland, Friedman, Simons and Tu1997; Judd et al. Reference Judd, Akiskal, Maser, Zeller, Endicott, Coryell, Paulus, Kunovac, Leon, Mueller, Rice and Keller1998; Reference Judd, Paulus, Schettler, Akiskal, Endicott, Leon, Maser, Mueller, Solomon and Keller2000, Reference Judd, Schettler, Rush, Coryell, Fiedorowicz and Solomon2016; Van Londen et al. Reference Van Londen, Molenaar, Goekoop, Zwinderman and Rooijmans1998; Fava et al. Reference Fava, Rafanelli, Grandi, Conti and Belluardo1999; Kanai et al. Reference Kanai, Takeuchi, Furukawa, Yoshimura, Imaizumi, Kitamura and Takahashi2003; Pintor et al. Reference Pintor, Torres, Navarro, Matrai and Gastó2004; Taylor et al. Reference Taylor, McQuoid, Steffens and Krishnan2004; Nierenberg et al. Reference Nierenberg, Husain, Trivede, Fava, Warden, Wisniewski, Miyahara and Rush2010; Dunlop et al. Reference Dunlop, Holland, Bao, Ninan and Keller2012; Kiosses & Alexopoulos, Reference Kiosses and Alexopoulos2013; Peselow et al. Reference Peselow, Tobia, Karamians, Pizano and IsHak2015). Often this finding was presented as evidence for the perspective that the asymptomatic threshold is currently too high (Judd et al. Reference Judd, Akiskal, Maser, Zeller, Endicott, Coryell, Paulus, Kunovac, Leon, Mueller, Rice and Keller1998). However, even though none of these studies systematically studied and compared the predictive value of all possible thresholds, we hypothesise that this is a general finding that can be obtained irrespective of the chosen threshold, as a lower score on a depressive symptom scale increases the ‘symptomatic distance’ to the fully symptomatic threshold and therefore the average time required for reaching that state. Indeed, some studies show that the currently often-used threshold (HAMD-17 ⩽7; Frank et al. Reference Frank, Prien, Jarrett, Keller, Kupfer, Lavori, Rush and Weissman1991) also differentiates in this regard (Paykel et al. Reference Paykel, Ramana, Cooper, Hayhurst, Kerr and Barocka1995; Pintor et al. Reference Pintor, Torres, Navarro, Matrai and Gastó2004). Studies using other methodologies for determining the best asymptomatic threshold – such as optimising correspondence to clinical impressions of clinicians (using the CGI-S as a gold standard), different functioning scales, or the general population – yield different optimal thresholds. The consensus among these authors seems to be that the currently often-used threshold of HAMD-17 ⩽7 is too high, as it leads to the inclusion of too many patients with poor functioning (Sacchetti et al. Reference Sacchetti, Frank, Siracusano, Racagni, Vita and Turrina2015), who are psychosocially impaired (Zimmerman et al. Reference Zimmerman, Posternak and Chelminski2007) and who do not consider themselves as remitted (Zimmerman et al. Reference Zimmerman, Martinez, Attiullah, Friedman, Toba, Boerescu and Rahgeb2012a).
Ultimately, the particular choice of asymptomatic threshold is rather arbitrary given the available evidence. Nonetheless, the currently often-used threshold seems to be too high. We, therefore, suggest lowering the asymptomatic threshold to ⩽4 on the HAMD-17; this is on the low side of the suggested values in the obtained studies – which we think is justified given the better functioning below this score (Sacchetti et al. Reference Sacchetti, Frank, Siracusano, Racagni, Vita and Turrina2015) – although still above the mean score in the general population (Zimmerman et al. Reference Zimmerman, Chelminski and Posternak2004b). It has been shown that some patients who scored ⩽7 on the HAMD-17 still met diagnostic criteria for MDD (Zimmerman et al. Reference Zimmerman, Posternak and Chelminski2004e), which is another argument for our suggestion to lower the asymptomatic threshold to ⩽4, as this largely prevents ‘remitted’ people from meeting the diagnostic criteria for MDD. This new HAMD-17 threshold is roughly equivalent to a threshold of ⩽5 on the MADRS (Mittmann et al. Reference Mittmann, Mitter, Borden, Herrmann, Naranjo and Shear1997), which is plausible given the reviewed evidence. Note that these thresholds are useful as endpoints in clinical studies, but do not necessarily mean that scoring below these thresholds should be the main treatment goal for clinicians, as treating individual patients by striving for the lowest score possible still improves prognosis (Taylor et al. Reference Taylor, McQuoid, Steffens and Krishnan2004).
Severity threshold for the fully symptomatic range
Only one study was obtained that provides some evidence for the fully symptomatic cut-off (Leucht et al. Reference Leucht, Fennema, Engel, Kaspers-Janssen, Lepping and Szegedi2013). This relative lack of evidence is understandable, as this seems to be the definition that least ‘needs’ empirical validation; this can be understood as a rather subjective clinical decision regarding the minimal level of symptomatology that can be considered to be a disorder. Therefore, there is not enough evidence to make any recommendations regarding this threshold.
Duration threshold for episode
Only a limited amount of studies showed data on the prognosis of those with ‘recent-onset’ depression (see Table 5). This can be explained by epidemiological investigations that typically include depressed populations, for which it is unclear how long the depressive symptoms have been present at the start of the studies. Although two studies show that half of the depressive episodes in the general population remit within 3 months after their onset (Eaton et al. Reference Eaton, Anthony, Gallo, Cai, Tien, Romanoski, Lyketsos and Chen1997; Spijker et al. Reference Spijker, De Graaf, Bijl, Beekman, Ormel and Nolen2002), it seems likely that many short ‘episodes’ of only a few days are missed since these episodes are infrequently retrospectively indicated, and short episodes are more easily forgotten than long ones (Moffitt et al. Reference Moffitt, Caspi, Taylor, Kokaua, Milne, Polanczyk and Poulton2010). Therefore, the rate of early remission is probably even higher than suggested by these studies.
In general, the reviewed data suggest that the rate of (spontaneous) remission of depressive symptoms is relatively high when the onset of these symptoms is recent, especially during the first 12 weeks, but diminishes quickly thereafter. This provides some justification for the suggestion by Frank et al. (Reference Frank, Prien, Jarrett, Keller, Kupfer, Lavori, Rush and Weissman1991) of requiring a certain amount of time at the fully symptomatic level before defining a depressive episode. However, the currently required ‘waiting time’ of only 2 weeks (see Table 1; DSM-5 criteria, APA, 2013; ICD-10 criteria, WHO, 1993) does not seem to be based on empirical evidence. The reviewed studies suggest that a longer time period might be advisable. Nonetheless, we refrain from a definitive conclusion, for which a prospective study in which the general population is screened with a high frequency (e.g. weekly) for depressive symptomatology is required but hitherto unavailable.
Duration thresholds for remission and recovery
A substantial body of literature studying depressive relapse/recurrence risk over time has been obtained (see Table 5), but comparing the studies is not straightforward; the studies differed in their studied populations, their operationalisations of remission, recovery, relapse and recurrence, and in the involved treatments. Some studies were consistent with the idea of a ‘point of rarity’ (Frank et al. Reference Frank, Prien, Jarrett, Keller, Kupfer, Lavori, Rush and Weissman1991) at which the relapse/recurrence risk suddenly drops or becomes stable. However, there is no consistency in the estimation of this time point. Combined with the fact that the majority of studies do not show such a point of rarity, the most likely conclusion is that prognosis gradually improves as remission/recovery duration is longer, rather than suddenly at a particular point in time.
The reviewed data do not suggest that any specific duration threshold to distinguish remission from recovery is warranted to add predictive value to the observation that prognosis improves over time as the duration of the asymptomatic period increases. Not only were the specific operationalisations of the duration criteria by Frank et al. (Reference Frank, Prien, Jarrett, Keller, Kupfer, Lavori, Rush and Weissman1991) and Rush et al. (Reference Rush, Kraemer, Sackeim, Fava, Trivedi, Frank, Ninan, Thase, Gelenberg, Kupfer, Regier, Rosenbaum, Ray and Schatzberg2006) not empirically supported, it seems that the whole concept of these duration criteria must be rejected. The idea that a reoccurrence of depressive symptoms shortly after their initial remission constitutes a ‘relapse’ of the previous episode, whereas their later reoccurrence is the first sign of an entirely new episode, is a model that lacks empirical support. Additionally, it is of no additional value to the patient or clinician as the assumed origin of the reoccurring symptoms has no implications for treatment or prognosis.
Thus, based on these results, the duration criteria for declaring remission and recovery seem unnecessary. We suggest that depressive remission can simply be defined as the asymptomatic state after a depressive episode, without applying any duration criterion. Stability of remission is then relatively low on the first day but increases gradually with its duration. The term recovery can then be used as a concept that includes more than just absence of symptoms, such as social functioning or subjective well-being, possibly including the absence of significant treatment as this would better fit the concept of recovery from a patient's perspective.
Limitations
Limitations of this review include the greatly varying study populations and treatments within the included studies (which is also a strength). Moreover, a substantial part of the data had to be extracted from survival curves that only rarely showed confidence intervals and often did not possess a clearly labelled time axis, making it difficult to assess exactly when the measurement began.
Conclusions
More than a quarter-century after the landmark paper in which Frank et al. (Reference Frank, Prien, Jarrett, Keller, Kupfer, Lavori, Rush and Weissman1991) provided their consensus-based definitions for depressive states (episode, remission, recovery, relapse, recurrence), we reviewed the empirical evidence. The data suggest that remission can best be defined as a less symptomatic state than assumed earlier (HAMD-17 ⩽4 instead of ⩽7), without applying a duration criterion. Specific duration thresholds to separate remission from recovery are not meaningful. Evidence suggests that the minimal duration of depressive symptoms before a depressive episode can be defined should be longer than 2 weeks, although further studies are required to recommend an exact duration threshold.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S2045796018000227
Acknowledgements
We thank the editor and reviewers for their comments.
Conflict of interest
None.
Ethical standards
None applicable.
Availability of data and materials
The data regarding the process of screening and selection of the articles included in this systematic review (after removal of identical articles) are available in an online supplement.