Dear Editor,
In their recent paper entitled ‘Selective serotonin reuptake inhibitors versus placebo in patients with major depressive disorder. A systematic review with meta-analysis and Trial Sequential Analysis’ (Reference Jakobsen, Katakam and Schou1), Jakobsen et al. conclude that ‘The observed harmful effects seem to outweigh the potential small beneficial clinical effects of SSRIs, if they exist’ (Reference Jakobsen, Katakam and Schou1). In a follow-up article in a Danish popular science journal (Videnskab.dk), Jakobsen is quoted for the following statement on the selective serotonin reuptake inhibitors (SSRIs) (freely translated from Danish): ‘We are dealing with medicine that affects important neurotransmitters in the brain and has severe side effects. To justify giving this to people, we have to be sure that it works against depression. But it doesn’t’ (2). This message was broadly disseminated via the Danish news media in the days following the publication of the meta-analysis.
It is very important to communicate research findings to the general public. However, when making as blunt a statement as the one outlined above, which is likely to affect both the opinion and behaviour of individuals (for instance, the adherence to SSRI treatment or the likelihood of accepting SSRI treatment when indicated), researchers are ethically obliged to ensure that their interpretation of their results is completely unchallengeable. Below, I will make the argument that the interpretation made by Jakobsen et al. is far from unchallengeable, and that the statements in the paragraph above are therefore highly questionable.
Jakobsen et al. have performed a very extensive systematic search of both published and unpublished results of randomised clinical trials (RCTs) comparing the effect of SSRIs with that of placebo (Reference Jakobsen, Katakam and Schou1). They are to be complimented for that effort. Most of the RCTs included in the meta-analysis used the total score on the 17-item or the 21-item Hamilton Depression Rating Scale (HDRS) (Reference Hamilton3) as the outcome measure, and the primary results in the article by Jakobsen et al. (Reference Jakobsen, Katakam and Schou1) therefore also refer to the HDRS. The results of their meta-analyses ‘showed that SSRIs versus placebo significantly reduced the HDRS score (mean difference −1.94 points; 95% CI −2.50 to −1.37; p<0.00001)’ (Reference Jakobsen, Katakam and Schou1). Jakobsen et al. consider this ‘numerical’ superiority of the SSRIs over placebo on the HDRS to be below the threshold for clinical significance (3 points on the HDRS), which was suggested by the National Institute for Clinical Excellence (NICE) from the United Kingdom. Thus, the difference between 1.94 and 3 is essentially what leads Jakobsen et al. to the conclusion that there are only ‘small beneficial clinical effects of SSRIs, if they exist’ (Reference Jakobsen, Katakam and Schou1). The rationale for this conclusion may seem bulletproof, but there is a fundamental problem with the HDRS as outcome measure, which Jakobsen et al. do not take into consideration.
The HDRS was developed in 1960 (Reference Hamilton3) and consists of 17 symptom items in the original version (the 21-item version was never intended to be used for severity measurement in depression according to the scale’s developer Max Hamilton (Reference Hamilton4)). In the RCTs included in the meta-analysis by Jakobsen et al., the total score of these items is used as a measure for the overall severity of depression – and reduction in the HDRS total score over time is used as a measure of clinical improvement. In order for the total score to actually contain this clinical information, the HDRS must meet two fundamental criteria: (I) the total score of the items must correlate with evaluations of depressive severity made by clinical experts (gold standard), and (II) each of the items must convey unique information regarding the severity of the latent syndrome being measured, that is, depression (this is commonly referred to as ‘unidimensionality’ or ‘scalability’ (Reference Bech5)). In two landmark studies from 1975 (Reference Bech, Gram, Dein, Jacobsen, Vitger and Bolwig6) and 1981 (Reference Bech, Allerup and Gram7), respectively, Bech et al. demonstrated that the original HDRS met none of these two criteria. This lack of psychometric validity of the HDRS has been confirmed in a large number of studies since then (Reference Ostergaard, Bech, Trivedi, Wisniewski, Rush and Fava8–Reference Korner, Lauritzen and Abelskov15). Therefore, the total score of the HDRS cannot be considered as a clinically valid measure of the severity of depression (Reference Bech5,Reference Bagby, Ryder, Schuller and Marshall16). As the conclusions made by Jakobsen et al. (Reference Jakobsen, Katakam and Schou1) are based on the results of analyses of HDRS total scores, it entails that they are not clinically valid either.
The landmark studies by Bech et al. (Reference Bech, Gram, Dein, Jacobsen, Vitger and Bolwig6,Reference Bech, Allerup and Gram7) also demonstrated that although the total score of the HDRS is not a valid measure of depression severity, the scale contains a subscale of six items, that meets both of the validity criteria (clinical validity and unidimensionality/scalability) described in the paragraph above. These six items are as follows: item 1 – depressed mood; item 2 – guilt feelings; item 7 – work and interests; item 8 – psychomotor retardation; item 10 – psychic anxiety; and item 13 – somatic symptoms general (Reference Bech, Gram, Dein, Jacobsen, Vitger and Bolwig6). The subscale defined by these six items is now commonly referred to as ‘Hamilton-6’ (HDRS6) (Reference Bech5). As opposed to the HDRS, the psychometric validity of the HDRS6 has been confirmed numerous times (Reference Ostergaard, Bech, Trivedi, Wisniewski, Rush and Fava8–Reference Korner, Lauritzen and Abelskov15) since its derivation from the HDRS in 1975 (Reference Bech, Gram, Dein, Jacobsen, Vitger and Bolwig6). Importantly, when using the total score on the HDRS6 as outcome measure in RCTs of SSRIs (and related antidepressants) versus placebo, the effect sizes are markedly larger than those obtained when using the total score of the HDRS as outcome measure (Reference Bech, Cialdella and Haugh17–Reference Bech, Boyer and Germain20). There are two reasons for this difference, namely (i) the superior psychometric properties of HDRS6 compared with HDRS, and (ii) the fact that three of the items in the HDRS (item 12 – somatic symptoms, gastrointestinal; item 14 – genital symptoms; and item 16 – loss of weight) tap into three common side effects of the SSRIs, namely diarrhoea/constipation, loss of libido, and loss of appetite. Indeed, in a recent meta-analysis of RCTs comparing SSRIs and placebo by Hieronymus et al. (Reference Hieronymus, Emilsson, Nilsson and Eriksson18), these three HDRS items were the only ones yielding negative effect sizes – whereas the effect sizes of the remaining 14 items were positive. Thus, the HDRS contains an inherent bias against the SSRIs due to the side-effect profile of this class of drugs. Notably, none of these three problematic items are included in the HDRS6. As the wanted effect (antidepressant effect in this context) and unwanted effects (side effects) are ideally evaluated independently in clinical studies (Reference Bech21–Reference Bech, Gefke, Lunde, Lauritzen and Martiny23), the HDRS6 is an ideal measure of the wanted effects of antidepressant agents (Reference Bech21,Reference Papakostas, Ostergaard and Iovieno24).
It should be mentioned that RCTs using either the Montgomery–Asberg Depression Rating Scale (MADRS) (Reference Montgomery and Asberg25) or the Beck Depression Inventory (BDI) (Reference Beck, Ward, Mendelson, Mock and Erbaugh26) as outcome measures were also included in the meta-analysis by Jakobsen et al. (Reference Jakobsen, Katakam and Schou1). The psychometric problems associated with the MADRS (Reference Bech, Allerup, Larsen, Csillag and Licht27) and the BDI (Reference Bech, Gram, Dein, Jacobsen, Vitger and Bolwig6,Reference Bouman and Kok28) are, however, equivalent to those mentioned in relation to the HDRS, so this makes little difference.
When confronted with the shortcomings of the HDRS, Jakobsen stated (freely translated from Danish): ‘We have used the conducted research as point of reference. You may have a fantasy that if a different scale had been used, the result would have been different. But that is very theoretical’ (2). Using terms such as ‘fantasy’ and ‘theoretical’ in this context does not seem particularly fitting as there are published studies documenting that when using a psychometrically valid depression rating scale (HDRS6) as outcome measure, the clinical superiority of SSRIs over placebo is quite consistent (Reference Bech, Cialdella and Haugh17,Reference Hieronymus, Emilsson, Nilsson and Eriksson18).
For the reasons outlined above, I strongly suggest that not only independent researchers like Jakobsen et al. (Reference Jakobsen, Katakam and Schou1), but also organisations like the NICE, the pharmaceutical industry, and the pharmaceutical evaluation authorities, such as the US Food and Drug Administration and the European Medicines Agency, will no longer consider the total score on the HDRS (or the MADRS or BDI for that reason) as being a valid outcome measure in studies of antidepressants – because this practice is in conflict with the results of a very large body of literature based on clinical psychometric research. Furthermore, it is my hope that Jakobsen et al. (Reference Jakobsen, Katakam and Schou1) will see this comment on their work as an encouragement to reanalyse their data using the HDRS6 total score as outcome measure. This would be a highly clinically relevant contribution to the literature.
Conflicts of Interest
The author declares no conflicts of interest.