Introduction
In two recent meta-analyses of second-generation antidepressants versus placebo in mild to moderate forms of major depression, the data on all trials submitted to the US Food and Drug Administration (FDA) were used (Kirsch et al. Reference Kirsch, Deacon, Huedo-Medina, Scoboria, Moore and Johnson2008; Turner et al. Reference Turner, Matthews, Linardatos, Tell and Rosenthal2008). In these meta-analyses the full Hamilton Depression Scale (HAMD; Hamilton, Reference Hamilton1967) was employed as outcome measure and effect-size statistics were used to demonstrate the clinical response. Both meta-analyses concluded that the advantage of second-generation antidepressants over placebo in acute 6–8 weeks' therapy of patients with major depressive episodes in terms of effect-size statistics was minimal, thereby maintaining the myth of a merely placebo-like effect of antidepressant medication in mild to moderate depression.
First-generation antidepressant medication was introduced by Kuhn when, 50 years ago, he demonstrated that over a treatment period of 4 weeks imipramine significantly improved the depressive symptoms of patients hospitalized for depressive illness (Kuhn, Reference Kuhn1958). In the 1960s, randomized, placebo-controlled clinical trials in hospitalized patients confirmed by use of the full HAMD that imipramine was superior to placebo (Beck, Reference Beck1973).
When developing first-generation cognitive psychotherapy of depression, Beck and his group (Rush et al. Reference Rush, Beck, Kovacs and Hollon1977) used imipramine as an active comparator to assess the antidepressive effect of cognitive psychotherapy in depressed patients because placebo cognitive therapy was too difficult to perform. The clinical evaluations in the Rush et al. (Reference Rush, Beck, Kovacs and Hollon1977) study were obviously not blind regarding treatment assignment. As outcome scale the clinicians used the full HAMD. The baseline score on the HAMD was ∼18, corresponding to mild depression. The results showed that both cognitive therapy and imipramine were effective, although no placebo–imipramine arm was included. One of the first placebo-controlled imipramine trials in a family doctor setting failed, however, to discriminate between active drug and placebo (Porter, Reference Porter1970). On the other hand, a consequence of the Rush et al. (Reference Rush, Beck, Kovacs and Hollon1977) study was the gradual acceptance of imipramine for use even in patients with mild depression.
With the advent in the 1980s of second-generation antidepressants [specific serotonin reuptake inhibitors (SSRIs)], the use of this type of antidepressive medication in the milder forms of depression became popular due to the very favourable SSRI side-effect profile compared to that of imipramine. In the 1970s the FDA requirement for achieving approval of an experimental antidepressant was a substantial amount of evidence based on randomized, placebo-controlled trials including fixed graded dose of the experimental drug (Leber, Reference Leber, Healy and Doogan1996). In such trials the side-effects of the experimental drug were to be systematically evaluated as well. In the placebo-controlled clinical trials with SSRIs, imipramine was often used as an active comparator, and it was shown that the advantage of SSRIs was their favourable side-effect profile compared to imipramine rather than an advantage regarding their antidepressive effect (Øhrberg et al. Reference Øhrberg, Christiansen and Severin1992). The imipramine side-effects such as sedation and increased appetite might be considered as desired effects in the initial acute therapy of moderately to severely depressed patients but are generally regarded as serious side-effects in long-term therapy (e.g. Mayer, Reference Mayer1975).
In the mild to moderate degree of depressive illness, the use of placebo is still important for regulatory issues, and the Kirsch et al. (Reference Kirsch, Deacon, Huedo-Medina, Scoboria, Moore and Johnson2008) and Turner et al. (Reference Turner, Matthews, Linardatos, Tell and Rosenthal2008) meta-analyses using the full HAMD as outcome measure have certainly subscribed to the myth of the placebo-like effect of SSRIs in patients with mild to moderate depression.
To explore a myth is, as stated by Ryle (Reference Ryle1949), not to deny the facts but rather to re-allocate them. In the following, the facts emerging from the Kirsch et al. (Reference Kirsch, Deacon, Huedo-Medina, Scoboria, Moore and Johnson2008) and Turner et al. (Reference Turner, Matthews, Linardatos, Tell and Rosenthal2008) meta-analyses will not only be re-allocated but also itemized within the universe of items in the full HAMD, while the effect-size statistics will be maintained as relevant for the measuring of clinical response in the acute therapy of major depressive episodes.
Effect size as descriptive outcome statistics in the acute therapy of depression
Both the Kirsch et al. (Reference Kirsch, Deacon, Huedo-Medina, Scoboria, Moore and Johnson2008) and the Turner et al. (Reference Turner, Matthews, Linardatos, Tell and Rosenthal2008) meta-analyses have used effect size to indicate antidepressive activity of an experimental drug versus placebo. In these meta-analyses effect size is defined as the difference in mean change from baseline to the respective time-points in HAMD scores between patients treated with the experimental drug and patients treated with placebo divided by the pooled standard deviation for the two groups of patients (Hedges & Olkin, Reference Hedges and Olkin1985). According to Kirsch et al. (Reference Kirsch, Deacon, Huedo-Medina, Scoboria, Moore and Johnson2008) the FDA seems to accept a mean drug–placebo difference in HAMD change score of ∼2 as evidence of an antidepressive effect whereas the British National Institute for Clinical Excellence (NICE, 2004) recommends a drug–placebo difference in HAMD improvement score of 3.
When estimating the sample size for a randomized, placebo-controlled trial with duloxetine in patients with major depression, the mean drug–placebo improvement score on HAMD was assumed to be 3.25 and the pooled standard deviation 7.0, resulting in an effect size of 0.46 (Detke et al. Reference Detke, Wiltse, Mallinckrodt, Mc Namara, Demitrack and Bitter2004). However, in the placebo-controlled trial with desvenlafaxine, the latest antidepressant approved by FDA (Young & Plosker, Reference Young and Plosker2008), the mean drug–placebo improvement score on HAMD was assumed to be 3.0 and the pooled standard deviation 8.0, resulting in an expected effect size of 0.38 (Boyer et al. Reference Boyer, Montgomery and Lepola2008).
The pooled standard deviation in placebo-controlled trials of antidepressants in patients with mild to moderate major depression seems thus to be in the interval between 7 and 8 when using HAMD17 as outcome measure. With a pooled standard deviation of 7.5, the drug–placebo improvement score on HAMD seems to have been 2.4 to obtain the average effect size of 0.32 as reported by Kirsch et al. (Reference Kirsch, Deacon, Huedo-Medina, Scoboria, Moore and Johnson2008). This meta-analysis focused on fluvoxetine, paroxetine, nefazodone and venlafaxine. In the meta-analysis of Turner et al. (Reference Turner, Matthews, Linardatos, Tell and Rosenthal2008) the average effect size was 0.31, and this analysis had identified 12 different second-generation antidepressants, including the drugs covered by Kirsch et al. (Reference Kirsch, Deacon, Huedo-Medina, Scoboria, Moore and Johnson2008).
Turner (Reference Turner2008) correctly states that when Cohen (Reference Cohen1976) introduced the effect size as descriptive statistics in clinical trials, he recommended the value of 0.50, as adopted by NICE (2004), for a clinically significant improvement. However, this was made on Cohen's own subjective intuition, not with any reference to trials of antidepressants.
In both meta-analyses (Kirsch et al. Reference Kirsch, Deacon, Huedo-Medina, Scoboria, Moore and Johnson2008; Turner et al. Reference Turner, Matthews, Linardatos, Tell and Rosenthal2008) it seems that the unbiased effect-size formula as recommended by Hedges & Olkin (Reference Hedges and Olkin1985) has been used. In this formula (published elsewhere, see Bech, Reference Bech2007) there is a correction for trials with less than 100 patients in each treatment arm. An effect size of 0.40 corresponds to an antidepressant advantage of 15–20% over placebo in the acute therapy of depression when using the response criterion of ⩾50% reduction on HAMD from baseline to endpoint (Bech et al. Reference Bech, Cialdella, Haugh, Birkett, Hours, Boissel and Tollefson2000). This 15–20% advantage was also demonstrated in the most comprehensive review of first-generation antidepressants (Smith et al. Reference Smith, Tranganzo and Harrison1969).
Re-allocating the HAMD items to measure response in the acute therapy of depression
The full HAMD has been used in all the trials analysed by Kirsch et al. (Reference Kirsch, Deacon, Huedo-Medina, Scoboria, Moore and Johnson2008) and almost all trials analysed by Turner et al. (Reference Turner, Matthews, Linardatos, Tell and Rosenthal2008). However, no attempt has been made in their analyses to identify the various versions of HAMD used in regard to, e.g. the number of items (HAMD17, HAMD21 or HAMD24; Bech, Reference Bech2009). Thus, the HAMD24 was recommended by Beck and his group (Riskind et al. Reference Riskind, Beck, Brown and Steer1987) to cover the cognitive triad of depression with the extra items of worthlessness, helplessness and hopelessness. However, these three items are already included in the HAMD17 as observed by Hamilton (Reference Hamilton, Sartorius and Ban1986) as ‘guilt’ covers worthlessness, ‘depressed mood’ covers hopelessness and ‘work and interests’ covers helplessness.
In the meta-analysis Turner et al. (Reference Turner, Matthews, Linardatos, Tell and Rosenthal2008) the calculated mean effect size for each drug is a combination of both fixed-graded experimental-drug dose trials and flexible dose trials. Therefore, a drug such as mirtazapine, in which no dose–response relationship has been demonstrated (Pinder & Zivkov, Reference Pinder, Zivkov, Balant, Benitez, Dahl, Gram, Pinder and Potter1998), has been favoured by Turner et al. (Reference Turner, Matthews, Linardatos, Tell and Rosenthal2008) compared, for example, with duloxetine or venlafaxine in which a dose–response relationship has been demonstrated (Bech, Reference Bech2009).
In our original study on the clinical validity of the HAMD17 to measure severity of depressive states we identified six of the 17 items as corresponding symptomatologically to experienced psychiatrists' global perception of depressive states (Bech et al. Reference Bech, Gram, Dein, Jacobsen, Vitger and Bolwig1975). In our next studies (Bech et al. Reference Bech, Allerup, Gram, Reisby, Rosenberg, Jacobsen and Nagy1981, Reference Bech, Allerup, Reisby and Gram1984) we used item response theory models to investigate to what extent the total score of these six symptoms [depressed mood, guilt feelings, work and interests, psychomotor retardation, psychic anxiety, and general somatic symptom (tiredness)] was a sufficient statistic, i.e. a profile of the individual items is not necessary. In contrast to factor analysis which focuses on conceptual knowledge, item response theory models, e.g. the Rasch model (Rasch, Reference Rasch1980), focus on perceptual knowledge by considering the individual items to be placed on the graduated line on which the severity of the depressive state is listed (Bech, Reference Bech2009).
Whereas the total score of these six HAMD items (HAMD6) is a sufficient statistic for measurement of the pure antidepressive effect, many of the other HAMD items (insomnia, agitation, somatic anxiety, gastrointestinal symptoms, weight loss and sexual problems) might be considered to be antidepressant medication side-effects. The total score of these items is not a sufficient statistic as side-effects must be analysed individually (Lingjærde et al. Reference Lingjærde, Ahlfors, Bech, Dencker and Elgren1987). Adverse events increase significantly with antidepressant dose (Bollini et al. Reference Bollini, Pampallona, Tbaldi, Kupelnick and Munizza1999).
Table 1 shows the effect-size statistics in placebo-controlled trials of second-generation antidepressants in the acute therapy (6–8 weeks) of patients with mild to moderate major depression.
HAMD, Hamilton Depression Scale; n.a., not applicable.
In our fluoxetine study on placebo-controlled trials in patients with DSM-III major depression we obtained an effect size of 0.38 using the HAMD6 and an effect size of 0.30 on HAMD17 (Table 1). When fluoxetine was used as an active comparator in placebo-controlled venlafaxine trials (Table 1) an effect size of 0.40 was obtained with HAMD6 while the full HAMD17 only obtained an effect size of 0.24. In both fluoxetine analyses (Bech et al. Reference Bech, Cialdella, Haugh, Birkett, Hours, Boissel and Tollefson2000; Entsuah et al. Reference Entsuah, Shaffer and Zhang2002) the fluoxetine dose range was 20–60 mg/day and no dose–response relationship was found.
Regarding citalopram, a dose–response relationship was identified (Bech et al. Reference Bech, Tanghøj, Andersen and Overø2002), as 40 mg was found to be the optimal dose. In most placebo-controlled escitalopram trials the Montgomery–Åsberg Depression Rating Scale (Montgomery & Åsberg, Reference Montgomery and Åsberg1979) has been used as outcome measure but in our analysis (Bech et al. Reference Bech, Tanghøj, Cialdella, Friis Andersen and Pedersen2004) the HAMD6 has also been included, and we showed that 20 mg was the optimal dose with an effect size of 0.61. For duloxetine (Bech et al. Reference Bech, Kajdasz and Porsdal2006) both HAMD17 and HAMD6 showed effect sizes >0.40, even for 60 mg/day.
For SSRIs as well as for duloxetine HAMD6 effect sizes have all been higher than HAMD17 effect sizes, while for mirtazapine (which has many different actions, e.g. antihistamine, adrenergic, and serotonin receptor 2A blocking) the full HAMD17 scale has an effect size higher than that of the HAMD6 (Bech, Reference Bech2001).
Relapse prevention in the continuation treatment of major depressive episode
According to the European Guidelines for Clinical Investigation (European Union, 1994) a substance is accepted as an antidepressant only if the advantage over placebo in clinical effect can be demonstrated both in the acute phase and in the continuation phase. As discussed elsewhere, electroconvulsive therapy (ECT), as conventionally administered with about 12 sessions over 4 weeks, is effective only within 2–3 months after the last session (Lauritzen et al. Reference Lauritzen, Odgaard, Clemmesen, Lunde, Ôhrström, Black and Bech1996). However, when compared to placebo in this controlled clinical trial lasting up to 6 months after the last ECT session, paroxetine was found to be relapse preventive, as 65% relapsed on placebo and 12% on paroxetine (p⩽0.05) (Lauritzen et al. Reference Lauritzen, Odgaard, Clemmesen, Lunde, Ôhrström, Black and Bech1996). A systematic review on relapse prevention with antidepressant drug treatment in depressive disorders showed an approximate relapse of 41% for placebo continuation therapy while 18% relapsed on antidepressants, without any difference between SSRIs and tricyclic antidepressants (Geddes et al. Reference Geddes, Carney, Davies, Furukawa, Kupfer, Frank and Goodwin2003). This is similar to the results of Bent-Hansen et al. (Reference Bent-Hansen, Lunde, Klysner, Andersen, Tanghøj, Solstad and Bech2003). Here the recurrence of depression in maintenance therapy with placebo was 43% but with citalopram was 13% (p⩽0.01).
In order to cover the pure antidepressive effect as demonstrated by Kuhn (Reference Kuhn1958, Reference Kuhn, Ayd and Blackwell1970), the six items in HAMD covering clinical depression symptoms (depressed mood, guilt, work and interests, tiredness, anxiety, and psychomotor retardation) should be used as outcome measure. These six items (HAMD6) have been found not only to reflect experienced psychiatrists' global assessment of depression (Bech et al. Reference Bech, Gram, Dein, Jacobsen, Vitger and Bolwig1975) but also to fulfil the item response theory model when the total score is used as a sufficient statistic during a trial of antidepressants (Bech et al. Reference Bech, Allerup, Reisby and Gram1984). These items constitute the dimension of manifest depression (Overall, Reference Overall1962) and were identified by Steinmeyer & Möller (Reference Steinmeyer and Möller1992) using facet theory analysis of HAMD during treatment with paroxetine or amitriptyline. When testing the sensitivity of the individual items in HAMD17 to response to paroxetine in all placebo-controlled trials with this drug, Santen et al. (Reference Santen, Gomeni, Danhof and Pasqua2008) identified the HAMD6 items as superior to the other items.
Many of the other HAMD items (insomnia, agitation, somatic anxiety, gastrointestinal symptoms, weight loss and sexual problems) might be considered to be side-effects of antidepressants. Together these items are not a sufficient statistic, as side-effects have to be analysed individually (Lingjærde et al. Reference Lingjærde, Ahlfors, Bech, Dencker and Elgren1987).
Discussion
Parker (Reference Parker2009) has discussed the meta-analysis of Kirsch et al. (Reference Kirsch, Deacon, Huedo-Medina, Scoboria, Moore and Johnson2008) but not that of Turner et al. (Reference Turner, Matthews, Linardatos, Tell and Rosenthal2008) and concludes that if we: ‘wish to reject the imputation that antidepressant drugs are little better than placebo, we need first to recognize limitations of current RCT (randomized clinical trial) procedures’. However, Parker (Reference Parker2009) seems not to reject the necessity for randomized clinical placebo-controlled trials but rather to call for the use of better outcome scales as well as better diagnostic classification of depression.
By focusing on the HAMD6 rather than the HAMD17 we need not reject the Hamilton scale but rather use it much more appropriately. Antidepressants act, as discussed by Angst (Reference Angst2007), on the target dimension of depression across disorders.
When Kuhn demonstrated the antidepressive effect of imipramine he had no access to the Hamilton Depression Scale but he did confess that depressed patients ‘recount absolutely nothing spontaneously about their depressive experience, and these come to light only on questioning’ (Kuhn, Reference Kuhn, Ayd and Blackwell1970). Hamilton developed his scale (Hamilton, Reference Hamilton, Hindmarch and Stonier1987) to enable clinicians to focus on the current symptomatology of patients, implying that the total scores on HAMD17 should give a global impression of the burden of the illness (Bech, Reference Bech2009). However, the dimension on which the antidepressive drugs act clinically is that of the HAMD6 symptoms; this is also in accord with Kuhn (Reference Kuhn1958, Reference Kuhn, Ayd and Blackwell1970).
In a review on item response theory models and health outcome measurements in the 21st century it was shown that these models, especially the Rasch model, have a potential advantage over the classical models, e.g. factor analysis, when improving existing rating scales (Hays et al. Reference Hays, Morales and Reise2000). Responsiveness to change during treatment is not a separate dimension of validity (Hays & Hadorn Reference Hays and Hadorn1992). In other words, the HAMD6 is really a unidimensional depression scale.
Effect-size statistics are of especial importance when comparing the responsiveness of two scales such as the HAMD17versus HAMD6 in the acute therapy of depression. In continuation therapy it is the standardization of the scales that is important when defining relapse. In the meta-analysis trials a HAMD17 score of ⩾16 and a HAMD6 score of ⩾9 are often used to indicate a relapse (Bent-Hansen et al. Reference Bent-Hansen, Lunde, Klysner, Andersen, Tanghøj, Solstad and Bech2003; Ruhé et al. Reference Ruhé, Dekker, Peen, Holmen and de Jonghe2005). In post-stroke prevention of depression when comparing the SSRI sertraline with placebo, we demonstrated that as early as after 6 weeks of therapy that sertraline was statistically superior to placebo on HAMD6 (p⩽0.05) but on the HAMD17 the advantage of sertraline over placebo only appeared after ∼20 weeks of therapy (Rasmussen et al. Reference Rasmussen, Lunde, Poulsen, Sørensen, Qvitzau and Bech2003).
The prevention of post-stroke depression by SSRIs is still the best example of the prevention of depression in patients who had never previously had an episode of depression but due to their physical illness belonged in a high-risk group for the development of depression (Robinson & Jorge, Reference Robinson and Jorge2009).
Conclusion
The antidepressant effect of second-generation antidepressants does seem to be a myth when using HAMD17 as outcome scale in the acute therapy of depressed patients. However, when using the core items of depression (HAMD6), to measure antidepressive activity, no such myth of mere placebo activity is in operation for second-generation antidepressants, for which even a dose–response relationship can be demonstrated in the acute phase. In relapse prevention during continuation therapy the advantage of second-generation antidepressants over placebo is even more pronounced than in the acute phase treatment.
Declaration of Interest
Over the past 3 years until August 2008 Professor Bech has occasionally received funding from and been a speaker or member of advisory boards for pharmaceutical companies with an interest in the drug treatment of affective disorders (AstraZeneca, Lilly, Lundbeck A/S, Lundbeck Foundation, Organon).