Extracts of Hypericum perforatum (St John's wort) are widely used to treat depression. Systematic reviews published between 1996 and 2000 concluded that such extracts are more effective than placebo and are comparable with older antidepressants in the treatment of mild to moderate depression (Reference Linde, Ramirez and MulrowLinde et al, 1996; Reference VolzVolz, 1997; Reference Linde and MulrowLinde & Mulrow, 1998; Reference Josey and TacketJosey & Tacket, 1999; Reference Gaster and HolroydGaster & Holroyd, 2000; Reference Williams, Mulrow and ChiquetteWilliams et al, 2000). Several older trials included in these reviews were criticised because they included patients with few or mild symptoms who did not meet criteria for major depression, were conducted by primary care physicians who were not experienced in depression research, or used low doses of comparator drugs (Reference Shelton, Keller and GelenbergShelton et al, 2001). Also, smaller trials included in the reviews tended to report larger treatment effects, which might be explained by publication bias or lower methodological quality of smaller trials (Reference Sterne, Gavaghan and EggerSterne et al, 2000).
Several large studies, including some with negative findings, have been published recently (Reference Montgomery, Hübner and GrigoleitMontgomery et al, 2000; Reference Shelton, Keller and GelenbergShelton et al, 2001; Hypericum Depression Trial Study Group, 2002). We therefore updated our previous review (Reference Linde, Ramirez and MulrowLinde et al, 1996; Reference Linde and MulrowLinde & Mulrow, 1998), paying particular attention to factors such as type and severity of depression and trial size that might explain conflicting results. Our updated review addresses the following specific questions. Are extracts of St John's wort (Hypericum perforatum) more effective than placebo, and as effective as standard antidepressants, in improving symptoms in adults with depression? Are Hypericum extracts less effective in patients who meet criteria for major depression than in patients with depressive symptoms who may not meet criteria for major depression? Do trials show that Hypericum extracts have less adverse effects than standard antidepressants?
METHOD
Data sources
We searched for English and non-English language and published and unpublished trials indexed in the register of the Cochrane Collaborative Review Group for Depression, Anxiety and Neuroses (last search July 2003) and PubMed (text word HYPERICUM, search dates 1998 to May 2004). We also checked reference lists of trials and reviews, contacted manufacturers and experts in the field, and relied on our prior extensive searches (Reference Linde, Ramirez and MulrowLinde et al, 1996; Reference Linde and MulrowLinde & Mulrow, 1998). One reviewer (K.L.) initially screened reference lists to identify controlled clinical studies of Hypericum preparations in humans. At least two reviewers independently reviewed the full text of all such articles to assess whether they met inclusion criteria. Disagreements occurred for two studies; these were resolved by consensus.
Inclusion criteria
We selected studies that met the following criteria:
-
(a) study design - double-blind, randomised, controlled trial;
-
(b) participants - adult patients treated for depressive disorders;
-
(c) experimental intervention - Hypericum monopreparation for at least 4 weeks;
-
(d) control intervention - placebo or a synthetic standard antidepressant;
-
(e) outcome measure - assessment of symptoms with a depression scale or general assessment of clinical response.
These criteria were more restrictive than those used in our prior reviews, which allowed single-blind trials, controlled trials without explicit randomisation, trials shorter than 4 weeks, combinations of Hypericum and other plant extracts, and comparison groups that were treated with drugs other than standard antidepressants, for example diazepam (Reference Linde, Ramirez and MulrowLinde et al, 1996; Reference Linde and MulrowLinde & Mulrow, 1998).
Data extraction, outcome definition and assessment of methodological quality
Using a pre-tested form, two reviewers independently extracted information regarding trial participants, methods, interventions, outcomes and study quality. Authors and/or sponsors were contacted to provide missing information. Disagreements were resolved through discussion. We extracted the numbers of patients who were randomised and analysed and who completed protocols, the number and reasons for drop-outs and withdrawals, numbers of patients reporting adverse effects, and the number and type of adverse effects that were reported. We assessed numbers of patients who were classified as responders based on score improvements on the Hamilton Rating Scale for Depression (HRSD; first preference), the Clinical Global Impression index (CGI; sub-scale global improvement rating as at least ‘much improved’; second preference) or any other clinical response measurement (third preference). We used the Jadad scale (items on randomisation, masking and reporting of drop-outs and withdrawals) and a checklist developed by one of us (items on treatment allocation, concealment of allocation, baseline comparability, physician and patient masking, and selection bias after allocation) to help guide assessments of study quality (Reference Jadad, Moore and CarrollJadad et al, 1996; Reference Linde, Jonas and MelchartLinde et al, 2001).
Statistical analyses
We considered the proportion of responders at the end of treatment as the main outcome measure, or in case of treatment phases longer than 6 weeks, at the time point defined for primary outcome measurement by the study investigators. We used response rate ratios (ratios of the number of patients classified as responders divided by the number of patients randomised to the respective group) and their 95% confidence intervals for the analysis of treatment response. Rate ratios greater than 1 indicate better response in the Hypericum group. The main outcome measure for the safety analysis was the number of patients who dropped out because of adverse effects. Secondary measures were the total number of patients who dropped out and the number of patients reporting adverse effects. Because of the highly variable frequency of side-effects or adverse effects reported, odds ratios instead of rate ratios were calculated. Odds ratios less than 1 indicate that fewer events occurred in the Hypericum group. We combined results on the rate ratio or odds ratio using fixed or random effects models, using the Cochrane Collaboration's Review Manager Software 4.1 (Update Software, Oxford, UK). In addition, meta-regression analyses were performed using Stata 8.0 (Stata Corporation, College Station, TX, USA). To investigate the degree of between-trial heterogeneity, the chi-squared test was performed and I squared (Reference Higgins, Thompson and DeeksHiggins et al, 2003) and tau squared (Reference Thompson and SharpThompson & Sharp, 1999) were calculated. A statistical test of funnel plot asymmetry, which may indicate the presence of publication bias, was performed (Reference Egger, Davey Smith and SchneiderEgger et al, 1997). The extent to which one or more study-level variables explained heterogeneity in the treatment effects was then explored by fitting random effects meta-regression models (Reference Thompson and SharpThompson & Sharp, 1999; Reference Sterne, Egger and Davey SmithSterne et al, 2001). The following variables were entered in the model: type of depression (major depression v. other); severity of depression (HRSD scores at baseline; as both the 17-item and the 21-item HRSD scales were used, baseline scores were standardised by multiplying the scores from the 21-item scale by 0.81 (17/21)); dosage of Hypericum extract (mg per day); type of extract (LI 160 v. other); study location (German-speaking Europe v. other); study location (German-speaking Europe v. other), study duration (weeks); and year of publication. Two variables relating to the quality of trials were also included (whether or not an adequate method of allocation concealment was described, and whether or not patients dropping out were reported). Finally, we included the variance of the rate or odds ratio to explore the importance of small-study effects (the tendency for smaller studies to show larger treatment effects; Reference Sterne, Egger and Davey SmithSterne et al, 2001). For reasons of simplicity more precise studies (trials with smaller variance) are described in the results as larger trials, less precise studies as smaller trials.
RESULTS
Identification of eligible trials
Of 68 possible trials, 37 trials met inclusion criteria and contributed 26 comparisons with placebo and 14 comparisons with standard antidepressants (Fig. 1). We excluded 18 trials that involved either healthy volunteers (Reference HerbergHerberg, 1991; Johnson et al, Reference Johnson, Siebenhüner and Hofer1992, Reference Johnson, Ksciuk and Woelk1993; Reference Schmidt, Harrer and KuhnSchmidt et al, 1993; Reference Schulz and JobertSchulz & Jobert, 1993; Reference Staffeldt, Kerb and BrockmöllerStaffeldt et al, 1993; Reference Brockmöller, Reum and BauerBrockmöller et al, 1997; plus one unpublished trial by Wienert et al, described at the Third Phytotherapy Congress in Lübeck-Travemünde in 1991) or patients without depression (Reference Bendre and DharmadhikariBendre & Dharmadhikari, 1980; Reference PanijelPanijel, 1985; Reference AlbertiniAlbertini, 1986; Reference WerthWerth, 1989; Reference DittmerDittmer, 1992; Reference Maisenbacher, Schmidt and SchenkMaisenbacher et al, 1995; Reference Häring, Hauns and HermannHäring et al, 1996; Reference Hottenrott, Sommer and LehrlHottenrott et al, 1997; Reference Sindrup, Madsen and BachSindrup et al, 2000; Reference Volz, Murck and KasperVolz et al, 2002); five that lacked placebo or standard antidepressant control groups (Reference SpielbergerSpielberger, 1985; Reference Martinez, Kasper and RuhrmannMartinez et al, 1993; Reference Lenoir, Degenring and SallerLenoir et al, 1999; Reference ZellerZeller, 2000; plus one unpublished trial by Bernhardt et al described at the Fifth Phytotherapy Congress in Bonn in 1993); two that only measured physiological outcomes (electroencephalograph) (Reference Czekalla, Gastpar and HübnerCzekalla et al, 1997; Reference Kugler, Schmidt and GrollKugler et al, 1990a ), two that were not masked (Reference WarneckeWarnecke, 1986; Reference Kugler, Weidenhammer and SchmidtKugler et al, 1990b ), and three that tested combinations of Hypericum and other plant extracts (Reference StegerSteger, 1985; Reference Ditzler, Gessner and SchattonDitzler et al, 1994; Reference Hiller and RahlfsHiller & Rahlfs, 1995). Among the 30 excluded trials, seven had been included in previous versions of our reviews. We were unable to obtain the report of one trial (Reference Agrawal, Dixit and DubeyAgrawal et al, 1994) and only had a report from an oral presentation for another: anonymous (2000) on a study by Bjerkenstedt et al. The latter trial was included in the descriptive review but not in meta-analyses. One trial was available only as a thesis (Reference KönigKönig, 1993). Published abstracts of two trials were supplemented with additional information from an author (Reference Osterheider, Schmidtke and BeckmannOsterheider et al, 1992), and a detailed hand-out and additional information from a sponsor (Reference Montgomery, Hübner and GrigoleitMontgomery et al, 2000). Overall, we obtained additional information from authors, sponsors or both for 31 trials.
Placebo comparisons
Twenty-six trials involving 3320 patients had placebo-control groups (Table 1). Twenty-one originated from German-speaking countries (Germany, Austria and Switzerland), two from the USA and one each from the UK, France and Sweden. The latter five trials, as well as eight trials from German-speaking countries, were restricted to patients with a diagnosis of major depression according to DSM (III or later) (American Psychiatric Association, 1980, 1987, 1994) or ICD-10 (World Health Organization, 1993) criteria. Severity of depression was classified as mild to moderate in most trials.
Study | Country | n | Major depression | HRSD baseline score (version) | Duration (weeks) | Hypericum extract | Definition of response1 | |
---|---|---|---|---|---|---|---|---|
Preparation | Dosage (mg) | |||||||
Hoffman & Kühl (1979) | Germany | 60 | No | 6 | Hyperforat | NA | 4 | |
Schlich et al (Reference Schlich, Braukmann and Schenk1987) | Germany | 49 | No | 31.3 (21) | 4 | Psychotonin M | 350 | 1 |
Schmidt et al (Reference Schmidt, Schenk and Schwarz1989) | Germany | 40 | No | 29.4 (21) | 4 | Psychotonin M | 500 | 1 |
Halama (Reference Halama1991) | Germany | 50 | No | 18.2 (17) | 4 | LI 1603 | 900 | 1 |
Harrer et al (Reference Harrer, Schmidt and Kuhn1991) | Austria | 120 | No | 21.3 (NA) | 6 | Psychotonin M | 500 | |
Osterheider et al (Reference Osterheider, Schmidtke and Beckmann1992) | Germany | 47 | No | 22.2 (NA) | 8 | Psychotonin M | 500 | 3 |
Reh et al (Reference Reh, Laux and Schenk1992) | Germany | 50 | No | 19.5 (21) | 8 | Neuroplant3 | 380 | 1 |
Hübner et al (Reference Hübner, Lande and Podzuweit1993) | Germany | 40 | No | 12.5 (17) | 4 | LI 1603 | 900 | 1 |
Lehrl & Woelk (Reference Lehrl and Woelk1993) | Germany | 50 | Yes | 22.7 (21) | 4 | LI 1603 | 900 | 1 |
Schmidt & Sommer (Reference Schmidt and Sommer1993) | Germany | 65 | No | 16.5 (21) | 6 | LI 1603 | 900 | 1 |
Quandt et al (Reference Quandt, Schmidt and Schenk1993) | Germany | 88 | No | 17.6 (21) | 4 | Psychotonin M | 500 | 1 |
König (Reference König1993) | Switzerland | 112 | No | 6 | Z 90017 | 500-1000 | 4 | |
Sommer & Harrer (Reference Sommer and Harrer1994) | Germany/Austria | 105 | No | 15.8 (21) | 4 | LI 1603 | 900 | 1 |
Witte et al (Reference Witte, Harrer and Kaptan1995) | Germany | 97 | Yes | 23.6 (21) | 6 | Psychotonin f. | 240 | 1 |
Hänsgen & Vesper (Reference Hänsgen and Vesper1996) | Germany | 197 | Yes | 20.7 (21) | 4 | LI 160 | 900 | 1 |
Laakmann et al (Reference Laakmann, Schüle and Baghai1998) | Germany | 1474 | Yes | 21.1 (17) | 6 | WS 5572 | 900 | 2 |
Schrader et al (Reference Schrader, Meier and Brattström1998) | Germany | 162 | Yes | 19.4 (21) | 6 | ZE 117 | 500 | 1 |
Philipp et al (Reference Philipp, Kohnen and Hiller1999) | Germany | 2634 | Yes | 22.7 (17) | 8 (6)5 | STEI 3000 | 1050 | 1 |
Winkel et al (Reference Winkel, Koritsch and Piayda2000) | Germany | 119 | No6 | 16.7 (21) | 6 | LI 160 | 900 | 3 |
Volz et al (2000) | Germany | 140 | Yes | 20.9 (21) | 6 | D 0496 | 500 | 5 |
Montgomery et al (Reference Montgomery, Hübner and Grigoleit2000) | UK | 247 | Yes | 21.5 (17) | 12 (6)5 | LI160 | 900 | 1 |
Kalb et al (Reference Kalb, Trautmann-Sponsel and Kieser2001) | Germany | 72 | Yes | 19.9 (17) | 6 | WS 5572 | 900 | 2 |
Shelton et al (Reference Shelton, Keller and Gelenberg2001) | USA | 200 | Yes | 22.5 (17) | 8 | LI 160 | 900-1200 | 2 |
HDTSG (2002) | USA | 3404 | Yes | 22.9 (17) | 8 | LI 160 | 900-1500 | 1 |
Lecrubier et al (Reference Lecrubier, Clerc and Didi2002) | France | 375 | Yes | 21.9 (17) | 6 | WS 5570 | 900 | 2 |
Bjerkenstedt et al 7 | Sweden | 1704 | Yes | NA | 6 | LI 160 | 900 |
Older trials differed from more recent ones in several respects (Table 2). Older trials were exclusively performed in German-language countries. Newer trials had larger sample sizes, were of longer duration and more often used a placebo run-in design. Newer trials also were more often restricted to patients who met criteria for major depression, and tended to include patients with more severe depression (i.e. higher scores on depression scales). Indicators of methodological quality and daily dosage also were slightly higher in more recent trials.
Characteristic | Period of publication | |
---|---|---|
1979 to 1994 (n=13) | 1995 to 2002 (n=13) | |
Performed outside German-speaking Europe, n | 0 | 5 |
Number of patients randomised: mean (range) | 67 (40-120) | 188 (72-375) |
Placebo run-in period mentioned, n | 1 | 7 |
Sample met criteria for major depression, n | 1 | 12 |
Outcome assessment with 17-item HRSD, n | 3 | 7 |
Daily extract dosage at week 1, mg: mean (range) | 640 (350-900) | 800 (240-1050) |
Median HRSD baseline score (adjusted for version) | 18.2 | 20.5 |
Trial duration at least 6 weeks, n | 7 | 13 |
Jadad score: mean (range) | 3.6 (2-5) | 4.3 (3-5) |
Adequate method of concealment described, n | 9 | 10 |
Of 24 trials with data on response to treatment, 21 used HRSD scores to characterise response, but definitions of response were not uniform across trials (see Table 1). One trial (Reference Osterheider, Schmidtke and BeckmannOsterheider et al, 1992) was excluded from pooled analyses because no response occurred in either group. For the remaining 23 trials responder rate ratios were heterogeneous (I2=75.4%, τ2=0.191, P<0.0001) and the funnel plot asymmetric (P<0.0001, Fig. 2). In univariate meta-regression analysis, larger trials with smaller variances of rate ratios (P<0.0001), trials limited to patients with major depression (P=0.026) and trials enrolling patients with higher HRSD scores (P=0.010) showed smaller treatment effects. Other factors associated with smaller treatment effects included more recent year of publication (P=0.001), origin from a non-German-speaking country (P=0.005) and longer trial duration (P=0.005). There was little evidence for an association of response with the daily dosage (P=0.33), the type of extract (P=0.74) or indicators of trial quality (method of concealment, P=0.15; reporting on drop-outs, P=0.12).
A bivariate model, which included the two variables related to our a priori hypotheses (type of depression and variance of rate ratio), explained a large proportion of between-trial heterogeneity (reducing τ2 from 0.191 to 0.030). The results from this model are illustrated in Figure 3, which shows a fixed-effects meta-analysis stratified by type of depression (major v. other) and precision (above or below median of variance). In the six smaller trials that were restricted to patients with major depression, the combined response rate ratio was 2.06 (95% CI 1.65-2.59), whereas in the six larger trials it was 1.15 (95% CI 1.02-1.29). In trials not restricted to patients with major depression, the rate ratio was 6.13 (95% CI 3.63-10.38) in five smaller trials and 1.71 (95% CI 1.40-2.09) in six larger trials.
Response rates in both placebo and intervention groups changed over time (Fig. 4). Weighted linear regression analysis shows that response rates in the placebo groups increased by 1.5% per year (P=0.013), whereas rates decreased in the Hypericum groups by 1.1% per year (P=0.049).
Comparisons with standard antidepressants
Fourteen trials with a total of 2283 patients compared Hypericum extracts with standard antidepressants (Table 3); 13 provided sufficient data for efficacy and safety analyses. In six of these, the comparator drug was a selective serotonin reuptake inhibitor (SSRI; fluoxetine in four studies, sertraline in two). Eight studies were performed in German-speaking countries. All trials but one were restricted to patients with a diagnosis of major depression according to DSM or ICD-10 criteria. Responder rates were similar among patients receiving Hypericum extracts and those receiving standard antidepressants, with little evidence of between-trial heterogeneity (I2=4.2%, P=0.40) or funnel plot asymmetry (P=0.55). Combining trials using a fixed effects model gave a responder rate ratio of 1.01 (95% CI 0.93-1.10) for all 13 trials, a rate ratio of 1.03 (95% CI 0.93-1.14) for seven trials comparing Hypericum extracts with older antidepressants, and a rate ratio of 0.98 (95% CI 0.85-1.12) for six trials comparing Hypericum extracts with SSRIs (Fig. 5). In meta-regression analysis there was some evidence (P=0.033) that Hypericum extracts showed better results in the eight trials from German-speaking countries (RR 1.05, 95% CI 0.95-1.16) whereas in the five trials from other countries standard antidepressants were slightly more effective (RR 0.85; 95% CI 0.71-1.01).
Study | Country | n | HRSD baseline score (version) | Duration (weeks) | Hypericum extract | Antidepressant | ||
---|---|---|---|---|---|---|---|---|
Preparation | Dosage (mg) | Drug | Dosage (mg) | |||||
Older antidepressants | ||||||||
Bergmann et al (Reference Bergmann, Nüssner and Demling1993) | Germany | 80 | 15.6 (21) | 6 | Esbericum | NA | Amitriptyline | 30 |
Harrer et al (Reference Harrer, Hübner and Podzuweit1993) | Austria | 102 | 21.0 (17) | 4 | LI 160 | 900 | Maprotiline | 75 |
Vorbach et al (Reference Vorbach, Hübner and Arnoldt1994) | Germany | 135 | 19.8 (17) | 6 | LI 160 | 900 | Imipramine | 75 |
Vorbach et al (Reference Vorbach, Arnoldt and Hübner1997) | Germany | 209 | 25.7 (17) | 6 | LI 160 | 1800 | Imipramine | 150 |
Wheatley (Reference Wheatley1997) | UK | 165 | 20.7 (17) | 6 | LI 160 | 900 | Amitriptyline | 75 |
Philipp et al (Reference Philipp, Kohnen and Hiller1999) | Germany | 2631 | 22.7 (17) | 8 | STEI 300 | 1050 | Imipramine | 100 |
Woelk (Reference Woelk2000) | Germany | 324 | 22.2 (17) | 6 | ZE 117 | 500 | Imipramine | 150 |
Selective serotonin reuptake inhibitors | ||||||||
Harrer et al (Reference Harrer, Schmidt and Kuhn1999) | Germany | 161 | 16.9 (17) | 6 | LoHyp-57 | 800 | Fluoxetine | 20 |
Brenner et al (Reference Brenner, Azbel and Madhusoodanan2000) | USA | 30 | 21.5 (17) | 7 | LI 160 | 900 | Sertraline | 75 |
Schrader (Reference Schrader2000) | Germany | 240 | 19.6 (21) | 6 | ZE 117 | 500 | Fluoxetine | 20 |
HDTSG (2002) | USA | 3402 | 22.8 (17) | 8 | LI 160 | 900-1500 | Sertraline | 50-100 |
Behnke et al (Reference Behnke, Jensen and Graubaum2002) | Denmark | 70 | 20.4 (17) | 6 | Calmigen | 300 | Fluoxetine | 40 |
Van Gurp et al (Reference Van Gurp, Meterissian and Haiek2002) | Canada | 90 | 19.4 (17) | 12 | NA | 900 | Sertraline | 50-100 |
Bjerkenstedt et al 2 | Sweden | 1741 | 26.3 (NA) | 6 | LI 160 | 900 | Fluoxetine | 20 |
Safety analysis
In all safety analyses there was little evidence of between-trial heterogeneity or funnel plot asymmetry. Comparing Hypericum extracts with placebo, there was a trend for fewer patients to drop out for any reason (OR 0.83, 95% CI 0.64-1.06), fewer to drop out because of adverse effects (OR 0.60, 95% CI 0.28-1.30) and less reporting of adverse effects (OR 0.79, 95% CI 0.61-1.03) among patients receiving Hypericum. In a comparison with standard antidepressants, patients on Hypericum extracts were less likely to drop out (OR 0.65, 95% CI 0.46-0.92), to drop out owing to adverse effects (OR 0.25, 95% CI 0.14-0.45; Fig. 6) and to report adverse effects (OR 0.39, 95% CI 0.31-0.50). There was a trend towards a lower probability of dropping out because of adverse effects (OR 0.60, 95% CI 0.31-1.15; Fig. 6) and lower reporting of adverse effects (OR 0.75, 95% CI 0.52-1.08) for patients treated with Hypericum extracts compared with patients treated with SSRIs. The proportions of patients dropping out for any reason did not differ (OR 0.95, 95% CI 0.65-1.40).
DISCUSSION
In this updated meta-analysis, we found that Hypericum perforatum extracts improved symptoms more than placebo and similarly to standard antidepressants in adults with mild to moderate depression. However, pooled analysis of six recent, large, more precise trials restricted to patients with major depression showed only minimal benefits of Hypericum extract compared with placebo. Hypericum extracts caused fewer adverse effects than older antidepressants, and might have caused slightly fewer adverse effects than SSRIs.
We cannot rule out the possibility that selective publication of over-optimistic results in small trials explains our finding that the older trials more often had positive results than the newer ones, although we doubt that this is the case. Extensive searches identified three ‘negative’ trials that were published only as abstracts or theses (Reference Osterheider, Schmidtke and BeckmannOsterheider et al, 1992; Reference KönigKönig, 1993; Reference Montgomery, Hübner and GrigoleitMontgomery et al, 2000). However, we suspect that there are few (if any) additional unpublished trials; the five manufacturers whose products were tested in most of the trials told us they had no other unpublished research that met our criteria, apart from three trials currently being analysed or in the publication process.
We found no systematic difference between trials in major factors generally related to trial quality, but our subjective judgement was that more recent trials were of better overall quality than older trials. All trials were double-blind. Although adequacy of blinding was usually not formally assessed, achieving similarity between Hypericum extract and placebo preparations is not particularly difficult. Most trials concealed allocation assignments by using consecutively numbered identical medication containers, and drop-out rates were generally low. Some investigators in older trials might have had little experience with diagnostic standards and rating scales (Reference Shelton, Keller and GelenbergShelton et al, 2001), but even so such inexperience is unlikely to have biased findings in double-blind trials.
Newer trials more often included only patients with documented major depression and patients with higher HRSD values at baseline. Two of the newer trials from the USA (Reference Shelton, Keller and GelenbergShelton et al, 2001; Hypericum Depression Trial Study Group, 2002) included large proportions of patients who had been suffering from their current depressive episode for more than 2 years. Older trials were more often carried out in German-speaking countries where extracts are registered as drugs. Primary care physicians in these countries use Hypericum extracts mainly in patients with mild to moderate depressive complaints and use standard antidepressants in patients with more severe and/or long-lasting depression. Accordingly, older trials often included patients with neurotic depression (ICD-9 code 300.4; World Health Organization, 1977) or brief depression (309.0). Some explicitly excluded patients with a current depressive episode lasting longer than 6 months (Reference Hänsgen and VesperHänsgen & Vesper, 1996; Reference Volz, Eberhardt and GrillVolz et al, 2000). Older trials could have involved more patients with atypical depressive features and somatisation, whereas newer trials could have involved more patients with melancholic symptoms who might be diagnosed as suffering from endogeneous depression according to ICD-9 (Reference MurckMurck, 2002). If so, newer trials might have excluded groups that are particularly responsive to Hypericum extract.
Response rates observed in trials have changed over time. In trials of standard antidepressants, response rates increased over the past 20 years among both treatment and control groups (Reference Walsh, Seidman and SyskoWalsh et al, 2002). In trials of Hypericum v. placebo, response rates in the placebo groups increased markedly over time, whereas response rates in the Hypericum groups decreased slightly over time. Explanations for these changes over time are not clear, but older trials with unusually low placebo response rates are likely to provide overoptimistic estimates of the benefits of Hypericum.
Most trials that compared Hypericum extracts with standard antidepressants were restricted to patients with major depression. They showed that Hypericum extracts and older and newer antidepressants had similar efficacy. Do these findings contradict those of the recent placebo-controlled Hypericum trials and prove the efficacy of these extracts in patients with major depression? We do not believe so. Although summary estimates of trials comparing antidepressants with placebo consistently show that antidepressants are better than placebo in treating major depression (Reference Williams, Mulrow and ChiquetteWilliams et al, 2000), a relevant proportion of placebo-controlled trials show no statistically significant benefits of antidepressants (Reference Khan, Warner and BrownKhan et al, 2000; Reference Kirsch, Moore and ScoboriaKirsch et al, 2002). It is possible that patients in the trials comparing Hypericum extracts with standard antidepressants did not benefit from either the extracts or the antidepressants. Several of the older trials used low dosages of standard antidepressants. More recent trials used dosages generally considered adequate, but still in the lower range of recommended dosages. Theoretically, the dosages used in the trials could have led to underestimates of the efficacy of standard antidepressants, although meta-analyses do not conclusively show that higher doses of standard antidepressants are more effective than lower doses (Reference Furukawa, McGuire and BarbuiFurukawa et al, 2002; Reference Kirsch, Moore and ScoboriaKirsch et al, 2002). Three trials of Hypericum included both a placebo and a standard antidepressant control group; however, one of these is not fully published yet (Anonymous, 2000). One trial (Reference Philipp, Kohnen and HillerPhilipp et al, 1999) showed that Hypericum extract and standard antidepressants had similar efficacy and that both were superior to placebo, whereas the other (Hypericum Depression Trial Study Group, 2002) showed no statistically significant difference between any of the groups.
In summary, accumulating evidence regarding the efficacy of Hypericum extracts is complex. We believe that the heterogeneous findings of placebo-controlled trials of these extracts are partly due to an overestimation of their effects in smaller, older studies, and partly to variable efficacy of the extracts in different patient populations. Even though most available comparisons between Hypericum extracts and standard antidepressants suggest similar effects, we believe that current best evidence from placebo comparisons suggests only minor benefits of Hypericum in patients with major depression and no benefit in patients with prolonged duration of depression. There is no evidence about effectiveness in severe depression. We found that current best evidence, derived primarily from older studies in German-speaking countries in primary care settings, still suggests benefits in patients with mild to moderate depressive symptoms who do not necessarily meet criteria for major depression.
Many patients buy St John's wort products from health-food stores and might not disclose this to their physicians. Such uncontrolled use is problematic, because serious interactions can occur with a number of frequently used drugs: see systematic reviews by Hammerness et al (Reference Hammerness, Basch and Ulbricht2003) and Knüppel & Linde (Reference Knüppel and Linde2004). Physicians should therefore regularly ask their patients about their Hypericum intake. Also, the quality of Hypericum preparations can differ considerably, and a number of products contain only minor amounts of bioactive constituents (Reference Wurglics, Schulte-Löbbert and DingermannWurglics et al, 2003). Products that do not provide important information on the content, such as the amount of total extract (e.g. 900 mg), the extraction fluid (e.g. methanol 80% or ethanol 60%) and the ratio of raw material to extract (e.g. 3-6:1) should be avoided. Finally, current best evidence regarding efficacy of Hypericum extracts is not definitive. Mechanisms and specificity of actions of single components need further study. Ultimately, more trials that compare specific extracts with both placebo and standard synthetic antidepressants in clearly defined patient populations with and without major depression are needed.
Acknowledgements
We thank authors and manufacturers who provided additional information.
eLetters
No eLetters have been published for this article.