Cost-effectiveness studies conducted alongside interventions aimed at improving mental health are often limited in scope because the original study omitted to include a utility measure. Reference Knapp, Windmeijer, Brown, Kontodimas, Tzivelekis and Maria Haro1,Reference Barrett, Byford and Knapp2 A utility measure provides a common scale on which to compare the benefits of different interventions, Reference Drummond, Sculpher, Torrance, O'Brien and Stoddart3,Reference Sach, Barton, Doherty, Muir, Jenkinson and Avery4 where zero is equivalent to death and one is equivalent to full health. Without such a measure, it is often difficult to conclude whether the intervention in question represents a cost-effective use of scarce resources, or whether resources would be better spent elsewhere. This is highlighted by a number of recent cost-effectiveness studies in the area of mental health Reference Knapp, Thorgrimsen, Patel, Spector, Hallam and Woods5–Reference Beecham, Sleed, Knapp, Chiesa and Drahorad11 that have found an intervention to be both more costly and more effective, but as effectiveness was not measured in terms of utility it was not possible to compare the cost-effectiveness of these interventions with that for other healthcare interventions. Given the inferred reluctance to use the EQ–5D when evaluating mental health interventions, and the possibility that the EQ–5D may not be sensitive to changes in quality of life in this area, Reference Byford, Knapp, Greenshields, Ukoumunne, Jones and Thompson12 in this paper we seek to assess the appropriateness of using the EQ–5D to measure the benefits of providing social recovery oriented cognitive–behavioural therapy.
Akin to analyses undertaken in other clinical areas, for example Hurst et al Reference Hurst, Kind, Ruta, Hunter and Stubbings13 and Terwee et al, Reference Terwee, Dekker, Wiersinga, Prummel and Bossuyt14 we thereby assessed the validity and responsiveness of the EQ–5D in a group of people with psychosis. The importance of such an assessment is highlighted by the fact that the UK National Institute for Health and Clinical Excellence (NICE) has recently stated that the EQ–5D is the preferred measure to be used in cost-effectiveness analyses. 15
Method
Participants
All participants were taking part in the Improving Social Recovery in Early Psychosis (ISREP) trial. The methods of this study have been outlined elsewhere; Reference Fowler, Hodgekins, Painter, Reilly, Crane and Macmillan16 briefly they were as follows. The ISREP trial was designed to compare the effectiveness and cost-effectiveness of two interventions – case management alone and social recovery oriented cognitive–behavioural therapy in addition to case management, where social recovery oriented cognitive–behavioural therapy was available for a 9-month period post-randomisation. Ethical approval was granted by the Norfolk local research ethics committee and participants in this study were recruited from two secondary mental health services. The inclusion criteria was: a current diagnosis of affective or non-affective psychosis (including schizophrenia, schizoaffective disorder, bipolar disorder and psychotic depression); illness duration ⩽8 years; positive psychotic symptoms (hallucinations and delusions) in relative remission (denoted by a score of ⩽4 on individual symptoms on the Positive and Negative Syndrome Scale (PANSS)); Reference Kay, Fiszbein and Opler17 and currently unemployed or engaged in less than 16 h paid employment or education. Participants were excluded if the psychotic disorder was thought to have an organic basis, acute psychosis was present or the primary diagnosis was drug dependency on opiates or cocaine.
Outcome measures
Participants in the ISREP trial were rated according to seven measures of mental health, and the EQ–5D, at both baseline and 9 months post-randomisation (9-month assessment). Five of these seven measures (the Beck Anxiety Inventory (BAI), Reference Beck and Steer18 Beck Depression Inventory (BDI), Reference Beck, Steer and Brown19 Beck Hopelessness Scale (BHS), Reference Beck and Steer20 PANSS Reference Kay, Fiszbein and Opler17 and Global Assessment of Functioning Scale (GAF; symptom ratings only)) Reference Goldman, Skodol and Lave21 aim to capture the severity of various mental health symptoms, whereas the remaining two (the Quality of Life Scale (QLS), Reference Heinrichs, Hanlon and Carpenter22 and Social and Occupational Functioning Assessment Scale (SOFAS) 23 ) focus more on the level of functioning. The BAI, BDI, BHS and EQ–5D were completed by the study participants, whereas the PANSS, QLS, SOFAS and GAF were rated by a member of the ISREP study team.
The BAI assesses the extent to which an individual is bothered by a particular symptom Reference Beck and Steer18 and the BDI assesses the intensity of a particular depressive symptom. Reference Beck, Steer and Brown19 Both the BAI and BDI consist of 21 items which are scored on a 0–3 scale, where a higher score denotes more severe symptoms. The BHS consists of 20 items and is designed to assess the degree to which an individual holds negative perceptions about the future. Reference Beck and Steer20 A true or false response format is used and a higher score denotes more negative perceptions. The PANSS assesses the levels of positive, negative and cognitive symptoms that are associated with psychosis, Reference Kay, Fiszbein and Opler17 where 30 items are scored on a 1 (absent) to 7 (extreme) scale and a higher score reflects a greater psychopathology. The QLS is designed to assess the functional impairments associated with psychosis, Reference Heinrichs, Hanlon and Carpenter22 where 21 items are assessed on a 0–6 scale and high scores reflect normal or unimpaired function. The GAF assesses symptom level and psychological, social and occupational functioning, Reference Goldman, Skodol and Lave21 and the SOFAS measures social and occupational functioning. 23 Both measures are assessed on a 1–100 rating scale that is divided into ten deciles, each of which provides a description of functioning and symptom level. A lower score on both the GAF and SOFAS denotes a worse response.
The EQ–5D was developed by the EuroQol group (a consortium of researchers from Western Europe) in the 1990s and was informed by a survey of lay people's concepts of health, and a review of existing instruments. Reference Dolan, Gudex, Kind and Williams24 There are five questions, where the respondent is asked to report the level of problems they have (no problems, some/moderate problems, and severe/extreme problems) with regard to mobility, self-care, usual activities, pain, and anxiety/depression. Reference Brooks25 Responses to these five dimensions are also converted into one of 243 different EQ–5D health state descriptions, which range between no problems on all five dimensions (11111) and severe/extreme problems on all five dimensions (33333). We assigned a utility score to each of these 243 health states using the York A1 tariff, Reference Dolan26 giving a measure of health status which ranges between −0.594 and 1 (full health), and where death is equal to zero. This tariff was based on the preferences elicited from a survey of 3395 members of the UK population who used the time trade-off technique to value a number of potential EQ–5D states (the time trade-off seeks to establish how much one would be willing to reduce one's life expectancy by in order to obtain full health). Reference Dolan, Gudex, Kind and Williams24 It should thereby be noted that very few of those who valued the EQ–5D states will actually have experienced severe mental illness. The potential importance of this is highlighted by the fact that members of the public tend to estimate the loss in utility associated with particular health states to be higher than that reported by individuals who have actually experienced the health states in question. Reference de Wit, Busschbach and De Charro27 However, the argument for using valuations based on the general public is that, in a publicly funded system, the views of the general public are most relevant. Reference Dolan and Olsen28
Analyses
Overview
Although NICE stated that the EQ–5D was its preferred measure to be used in cost-effectiveness analyses, it did note that the EQ–5D may not be appropriate in all circumstances. 15 Providing evidence of the latter is however not straight forward given that there is much conceptual and methodological difficulty associated with the decision as to whether a measure is ‘appropriate’, Reference Brazier29 and myriad of differently defined terms have been used within such assessments. Reference Streiner and Norman30 That said, Fitzpatrick et al Reference Fitzpatrick, Davey, Buxton and Jones31 outlined a number of criteria on which evidence should be provided in order to select an appropriate outcome measure. Similarly, Brazier et al Reference Brazier, Deverill, Green, Harper and Booth32 developed a checklist for judging the merits of preference-based measures of health such as the EQ–5D. Informed by these papers, and the fact that very few papers have assessed utility measures with regard to such criteria, Reference Marra, Esdaile, Guh, Kopec, Brazier and Koehler33 we measured the performance of the EQ–5D with regard to the criteria of construct validity, convergent validity and responsiveness.
Validity
Validity was assessed in terms of both construct and convergent validity. Construct validity relates to whether a measure can discriminate between two patient groups, one which has a certain trait, and the other which does not. Reference Streiner and Norman30 Thus, we sought to assess whether participants in the ISREP trial, who had what might be considered milder scores (at baseline) according to each of the seven mental health measures, had different EQ–5D scores to those who might be considered to have more severe scores. For each of the seven mental health measures, participants were thereby categorised into two groups using the following methods.
On the BAI it has been recommend that scores of 0–9 points be interpreted as normal anxiety, 10–18 as mild–moderate, 19–29 as moderate–severe, and 30–63 as severe anxiety. Reference Beck and Steer18 Thus, we developed two groups – those who had BAI scores ⩽18 and those who had BAI scores ⩾19. On the BDI it has been recommended that scores of 0–13 correspond to minimal depression, 14–19 as mild depression, 20–28 as moderate depression, and 29–63 as severe depression. Reference Gleitman34 Accordingly, two groups of participants were created – BDI scores ⩽19 and BDI scores ⩾20. On the BHS a score of 0–3 can be considered minimal, 4–8 as mild, 9–14 as moderate, and 15–20 as severe. Reference Beck and Steer20 Therefore, we compared those with a BHS score ⩽8 with those who had a score ⩾9. On the GAF, a score in the range 51–60 is said to denote moderate symptoms or moderate difficulty in functioning, whereas a score of 61–70 denotes some mild symptoms or some difficulty in functioning. 23 Similarly, on the SOFAS, a score of 51–60 is said to denote moderate difficulty in functioning, whereas a score of 61–70 denotes some difficulty in functioning. 23 Accordingly, for both the GAF and SOFAS, those with scores ⩽60 were compared with those with scores ⩾61. For the remaining two measures we were unable to identify a range on the respective total scores that could be considered to denote mild symptoms (on the PANSS) or a mild level of functional impairment (on the QLS). Consequently, for these two measures, in an attempt to compare those who had milder scores with those who had more severe scores, we simply split the sample into approximately two equal sized groups and compared those who had higher scores with those who had lower scores. Finally, where the aforementioned splits (for either the BAI, BDI, BHS, SOFAS or GAF) resulted in particularly unequal numbers in the milder/more severe groups, the same analysis was also performed using a different split that resulted in approximately equal numbers in each of the two groups.
In order to compare the mean EQ–5D score for participants in each of the above mild/more severe groups, using baseline scores, we conducted the t-test to assess whether the EQ–5D could discriminate between the two groups of participants (who had different scores according to each of the seven measures of mental health). However, as the t-test requires the data to be approximately normally distributed (this was assessed using the Kolmogorov–Smirnov Z-test) and it has been shown that responses to the EQ–5D do not tend to be normally distributed, Reference Gerard, Nicholson, Mullee, Mehta and Roderick35 we also compared the EQ–5D scores for both the mild/more severe groups using the Mann–Whitney U-test. Both the t-test and U-test were conducted in order to assess the hypothesis that non-parametric and parametric methods produce similar results and that the latter are thereby robust to the violation of the normality assumption that health-related quality-of-life data are likely to cause. Reference Walters and Campbell36 Finally, it should be noted that more complex regression approaches were considered superfluous as, in contrast to other studies, for example Hurst et al Reference Hurst, Kind, Ruta, Hunter and Stubbings13 who recruited consecutive out-patients, the strict entry criteria to this study reduced the need to adjust for other factors which might differ between the two groups of participants.
We also assessed whether the mean difference between the EQ–5D scores, at baseline, in different groups could be considered to constitute a minimally important difference (MID). The MID is considered to be the smallest change in score that would necessitate a change in a person's management. Reference Walters and Brazier37 Previously the MID has, for example, been estimated by calculating the mean change in the EQ–5D score for those who reported that their general health was either somewhat better than a year ago, or somewhat worse than a year ago, in response to question two of the Short Form–36 (SF–36). Reference Walters and Brazier37 Here, in line with the assumption made in a previous paper Reference Barton, Sach, Avery, Jenkinson, Doherty and Muir38 that considered estimates of the MID which were reported in a number of studies, we assumed that a difference of >0.03 constituted a MID on the EQ–5D.
Finally, as previously undertaken by Kind et al, Reference Kind, Dolan, Gudex and Williams39 in an attempt to explain the different EQ–5D scores we also estimated the proportion of participants who reported that they had a problem on each of the five dimensions of the EQ–5D. Here the chisquared test was also conducted in order to assess whether the proportion of participants reporting a particular problem also differed between those with milder scores and those with more severe scores. Convergent validity is determined by how closely a measure is related to other measures of the same construct. Reference Streiner and Norman30 Thus, we used the Spearman rank test to assess whether (baseline) scores on the EQ–5D were correlated with the scores on each of the seven measures of mental health in the direction that one would expect.
Responsiveness
Responsiveness has been defined as the ability of a scale to detect changes. Reference Fayers and Machin40 In order to assess this criterion, we estimated the mean change in the EQ–5D score (between baseline and follow-up) for those whose scores improved post-intervention, according to each of the seven mental health measures, and compared it with the mean change for those whose scores did not improve (i.e. those whose score worsened or remained the same post-intervention). This definition of responsiveness was chosen in preference to others (see e.g. Terwee et al Reference Terwee, Dekker, Wiersinga, Prummel and Bossuyt14 ) as it is in line with the argument made by Claxton Reference Claxton41 – that decisions (with regard to cost-effectiveness) should be made on the basis of mean values, irrespective of whether such differences are considered for example clinically important or statistically significant. Nevertheless, for the reasons outlined previously, to assess the significance of any differences between these two groups we again conducted the t-test and the U-test. Additionally, the mean difference between these two groups was assessed in relation to the assumed EQ–5D MID of 0.03. Finally, the change in the proportion of participants who reported having no problems (post-intervention compared with pre-intervention on each of the five dimensions of the EQ–5D) was calculated for both those who improved and did not improve post-intervention according to each of the seven measures of mental health.
Results
Participants
Within the two secondary health centres in question between January 2005 and July 2006 a total of 200 participants were identified as meeting the aforementioned inclusion criteria. Of these, 88 gave consent to take part in the study, although a further 11 of these dropped out during the baseline assessment (5 became symptomatic, 5 stated that they were no longer interested, and 1 for personal reasons). Thus, 77 participants were recruited into the study, 55 (71.4%) of whom were male, 50 (64.9%) of whom had a diagnosis of non-affective psychosis, and the average age was 28.9 years (range 18–52). The average duration of illness and the average length of unemployment were 4.8 years and 209 weeks respectively.
At baseline, the EQ–5D was completed by 68 participants (88.3%), where the mean score was 0.676 (95% CI 0.604–0.748) compared with 0.743 (95% CI 0.671–0.816) at 9 months, giving a mean change of 0.043 (95% CI −00.034 to 0.122) (see Table 1 where the mean scores for the BAI, BDI, BHS, PANSS, QLS, GAF and SOFAS are also summarised). Across the five dimensions the proportion who reported a problem on each of the dimensions of the EQ–5D was 26.5% (mobility), 22.1% (self-care), 51.5% (usual activities), 39.7% (pain/discomfort) and 70.6% (anxiety/depression) at baseline (n = 68), compared with 18.8%, 12.5%, 43.8%, 31.3% and 66.7% respectively, at 9 months post-intervention (n = 48).
Baseline score, mean (n) | 9-month score, mean (n) | Change, mean (n) | |
---|---|---|---|
Beck Anxiety Inventory | 16.97 (74) | 13.11 (62) | –3.50 (62) |
Beck Depression Inventory | 21.90 (73) | 14.05 (59) | –8.15 (59) |
Beck Hopelessness Scale | 8.80 (74) | 7.26 (57) | –1.21 (56) |
Positive and Negative Syndrome Scale | 56.74 (77) | 50.42 (62) | –6.30 (62) |
Quality of Life Scale | 64.54 (76) | 74.13 (63) | 8.96 (63) |
Global Assessment of Functioning Scale a | 56.83 (77) | 59.77 (70) | 2.73 (70) |
Social and Occupational Functioning Assessment Scale | 50.06 (77) | 54.25 (69) | 3.78 (69) |
EQ–5D | 0.676 (68) | 0.743 (48) | 0.043 (45) |
a. Symptom ratings only
Analyses
Construct validity
The two groups of participants that were created according to scores on each of the seven measures of mental health had approximately equal numbers, with the exception of the SOFAS. Thus, as well as comparing the EQ–5D scores for those with a SOFAS score of ⩾61 (n = 7) with those with a score ⩽60 (n = 61), we also compared the EQ–5D scores for those with a SOFAS score ⩾51 (n = 25) with those with a score ⩽50 (n = 43) (a score of 41–50 on the SOFAS denotes serious impairment in social, occupational or school functioning). 23 For each of the seven measures of mental health (including both SOFAS categorisations), those with milder scores were found to have higher mean scores according to the EQ–5D compared with those with more severe scores (Table 2). However, according to the Kolmogorov–Smirnov Z-test (z = 1.964, P<0.001, n = 68), one would reject the null hypothesis that the EQ–5D data were normally distributed. Thus, the significance levels according to the t-test, with regard to the mean difference in utility between those with milder/more severe scores, should be treated with caution, although they are near identical to the significance levels according to the U-test (Table 2). Additionally, it should be noted that the mean difference exceeded the assumed MID of 0.03 for all seven measures (range 0.044–0.301).
EQ–5D score, mean (n) | Difference, mean (95% CI) | |
---|---|---|
Beck Anxiety Inventory | ||
≤18 | 0.788 (41) | 0.301 *** , ‡ (0.171 to 0.431) |
≥19 (more severe) | 0.487 (26) | |
Beck Depression Inventory | ||
≤19 | 0.821 (30) | 0.282 *** , ‡ (0.151 to 0.412) |
≥20 (more severe) | 0.540 (36) | |
Beck Hopelessness Scale | ||
≤8 | 0.794 (36) | 0.255 *** , †† (0.122 to 0.389) |
≥9 (more severe) | 0.538 (31) | |
Positive and Negative Syndrome Scale | ||
≤55 | 0.712 (35) | 0.074 (–0.070 to 0.219) |
≥56 (more severe) | 0.638 (33) | |
Quality of Life Scale | ||
≥66 | 0.697 (31) | 0.044 (–0.105 to 0.193) |
≤65 (more severe) | 0.652 (35) | |
Global Assessment of Functioning Scale a | ||
≥61 | 0.759 (24) | 0.129 (–0.020 to 0.277) |
≤60 (more severe) | 0.631 (44) | |
Social and Occupational Functioning Assessment Scale | ||
≥61 | 0.787 (7) | 0.124 (–0.113 to 0.361) |
≤60 (more severe) | 0.663 (61) | |
≥51 | 0.702 (25) | 0.042 (–0.109 to 0.192) |
≤50 (more severe) | 0.661 (43) |
a. Symptom ratings only
* P<0.05
** P<0.01
*** P<0.001 according to the t-test and
† P<0.05
†† P<0.01
‡ P<0.001 according to the U-test
Similar results were also achieved on each of the five dimensions of the EQ–5D. In Table 3 it can be seen that the proportion of participants who reported having problems with each of the dimensions of mobility, self-care, usual activities, pain/discomfort and anxiety/depression was lower for those with milder scores compared with those with more severe scores according to at least five of the seven measures of mental health.
n | Mobility, % | Self-care, % | Usual activities % | Pain/discomfort % | Anxiety/depression, % | |
---|---|---|---|---|---|---|
Beck Anxiety Inventory | ||||||
≤18 | 41 | 17.07 | 17.07 | 39.02 | 29.27 | 58.54 |
≥19 (more severe) | 26 | 42.31 * | 30.77 | 73.08 ** | 57.69 * | 92.31 ** |
Beck Depression Inventory | ||||||
≤19 | 30 | 23.33 | 13.33 | 30.00 | 23.33 | 53.33 |
≥20 (more severe) | 36 | 30.56 | 30.56 | 69.44 ** | 55.56 ** | 88.89 ** |
Beck Hopelessness Scale | ||||||
≤8 | 36 | 16.67 | 8.71 | 36.11 | 30.55 | 58.33 |
≥9 (more severe) | 31 | 38.71 * | 38.71 ** | 67.74 ** | 48.39 | 83.87 * |
Positive and Negative Syndrome Scale | ||||||
≤55 | 35 | 22.86 | 22.86 | 45.71 | 40.00 | 68.57 |
≥56 (more severe) | 33 | 30.30 | 21.21 | 57.58 | 39.39 | 72.73 |
Quality of Life Scale | ||||||
≥66 | 35 | 22.58 | 22.58 | 51.61 | 35.48 | 67.74 |
≤65 (more severe) | 31 | 28.57 | 22.86 | 51.43 | 42.86 | 74.29 |
Global Assessment of Functioning Scale a | ||||||
≤60 | 24 | 20.83 | 16.67 | 50.00 | 25.00 | 70.83 |
≥61 (more severe) | 44 | 29.55 | 25.00 | 52.27 | 47.73 | 70.45 |
Social and Occupational Functioning Assessment Scale | ||||||
≤60 | 7 | 28.57 | 14.29 | 42.86 | 42.86 | 57.14 |
≥61 (more severe) | 61 | 26.22 | 22.95 | 52.46 | 39.34 | 72.13 |
≥51 | 25 | 24.00 | 20.00 | 56.00 | 40.00 | 28.00 |
≤50 (more severe) | 43 | 27.91 | 23.26 | 48.84 | 39.54 | 82.09 |
a. Symptom ratings only
* P<0.05
** P<0.01
*** P<0.001
In terms of convergent validity the EQ–5D was correlated with each of the seven measures of mental health in the direction that one would expect, i.e. it was negatively correlated with those measures for which a lower score denotes more severe symptoms/functioning (the BAI, BDI, BHS and PANSS) and positively correlated with the QLS, GAF and SOFAS (Table 4). However, it should be noted that the level of correlation was not significant for three of the measures (the PANSS, QLS and SOFAS).
EQ–5D (n) | |
---|---|
Beck Anxiety Inventory | –0.656 (67) *** |
Beck Depression Inventory | –0.360 (55) ** |
Beck Hopelessness Scale | –0.459 (67) *** |
Positive and Negative Syndrome Scale | –0.228 (68) |
Quality of Life Scale | 0.025 (68) |
Global Assessment of Functioning Scale a | 0.263 (68) * |
Social and Occupational Functioning Assessment Scale | 0.053 (68) |
a. Symptom ratings only
* P<0.05
** P<0.01
*** P<0.001
Responsiveness
In Table 5 it can be seen that the mean EQ–5D score was higher (post-intervention) for each of the participant groups who improved according to the seven measures of mental health, i.e. when scores on the BAI, BDI, BHS and QLS increased and scores on the PANSS, GAF and SOFAS decreased. However, contrary to expectations, those who did not improve according to the BHS actually achieved a larger increase in the mean EQ–5D score than was the case for the participants who actually had better BHS scores post-intervention (see Discussion for possible explanations). The significance levels associated with these mean differences are also denoted in Table 5 where, according to the Kolmogorov–Smirnov Z-test (z = 0.857, P = 0.455, n = 45), one cannot reject the null hypothesis that the EQ–5D change scores were normally distributed.
Change in EQ–5D score, mean (n) | Difference, mean (95% CI) | |
---|---|---|
Change in Beck Anxiety Inventory | ||
≥0 | –0.069 (17) | 0.181 * , † (0.027 to 0.335) |
≤ – 1 (improved score) | 0.112 (28) | |
Change in Beck Depression Inventory | ||
≥0 | –0.089 (13) | 0.179 * , † (0.027 to 0.331) |
≤ – 1 (improved score) | 0.090 (30) | |
Change in Beck Hopelessness Scale | ||
≥0 | 0.032 (14) | –0.003 (–0.172 to 0.167) |
≤ – 1 (improved score) | 0.029 (28) | |
Change in Positive and Negative Syndrome Scale | ||
≥0 | –0.029 (9) | 0.103 (–0.096 to 0.301) |
≤ – 1 (improved score) | 0.073 (34) | |
Change in Quality of Life Scale | ||
≤0 | 0.012 (11) | 0.044 (–0.142 to 0.232) |
≥1 (improved score) | 0.057 (31) | |
Change in Global Assessment of Functioning Scale a | ||
≤0 | –0.052 (19) | 0.169 * , † (0.014 to 0.324) |
≥1 (improved score) | 0.117 (25) | |
Change in Social and Occupational Functioning Assessment Scale | ||
≤0 | 0.022 (15) | 0.033 (–0.138 to 0.205) |
≥1 (improved score) | 0.056 (29) |
a. Symptom ratings only
* P<0.05
** P<0.01
*** P<0.001 according to the t-test and
† P<0.05
†† P<0.01
‡ P<0.001 according to the U-test
As for the individual dimensions of the EQ–5D (Table 6), it can be seen that of those who improved (post-intervention) according to each of the seven measures of mental health, a greater proportion tended to report having no problems (post-intervention compared with pre-intervention) with regard to each of the dimensions of mobility, self-care, usual activities, pain/discomfort and anxiety/depression. The main exceptions were that at the 9-month assessment (compared with the baseline) more of those who improved according to the PANSS (2.94%: n = 1 more) reported having a problem with regard to usual activities and more of those who improved according to the SOFAS (6.9%: n = 2 more) reported having a problem with regard to anxiety/depression.
n | Mobility % | Self-care % | Usual activities, % | Pain/discomfort % | Anxiety/depression, % | |
---|---|---|---|---|---|---|
Change in Beck Anxiety Inventory | ||||||
≥0 | 17 | 5.88 | 11.76 | –17.65 | 5.88 | –29.41 |
≤ – 1 (improved score) | 28 | 14.29 | 14.29 | 14.29 | 7.14 | 14.29 |
Change in Beck Depression Inventory | ||||||
≥0 | 13 | 7.69 | –7.69 | 15.38 | –7.69 | 23.08 |
≤ – 1 (improved score) | 30 | 16.67 | 13.33 | 13.33 | 6.67 | 10.00 |
Change in Beck Hopelessness Scale | ||||||
≥0 | 14 | 21.43 | 7.14 | –21.43 | 14.29 | –6.67 |
≤ – 1 (improved score) | 28 | 3.57 | 17.86 | 7.10 | 0.00 | 3.57 |
Change in Positive and Negative Syndrome Scale | ||||||
≥0 | 9 | 0.00 | 22.22 | 22.22 | 0.00 | 0.00 |
≤ – 1 (improved score) | 34 | 14.71 | 11.76 | –2.94 | 11.76 | 0.00 |
Change in Quality of Life Scale | ||||||
=0 | 11 | 9.09 | 18.18 | –9.09 | 0.00 | 0.00 |
≥1 (improved score) | 31 | 9.68 | 9.68 | 3.23 | 9.68 | 0.00 |
Change in Global Assessment of Functioning Scale a | ||||||
≤0 | 19 | 0.00 | 5.26 | –10.53 | –10.53 | –10.53 |
≥1 (improved score) | 25 | 20.00 | 20.00 | 12.00 | 20.00 | 4.00 |
Change in Social and Occupational Functioning Assessment Scale | ||||||
≤0 | 15 | 6.67 | 13.33 | 0.00 | 6.67 | 6.67 |
≥1 (improved score) | 29 | 13.79 | 13.79 | 3.45 | 6.90 | –6.90 |
a. Symptom ratings only
Discussion
We have shown that the mean EQ–5D score for those who were considered to have milder scores was higher than that for those who had more severe scores, according to each of the BAI, BDI, BHS, PANSS, QLS, GAF and SOFAS, and that the mean difference was greater than the assumed MID of 0.03 (Table 2). The EQ–5D was also correlated with each of these seven measures of mental health, in that those with better scores on each of these dimensions also tended to have better EQ–5D scores, although the level of correlation was not significant for three of the measures (Table 4). Additionally, those who improved (post-intervention) according to each of the measures of mental health also had higher mean EQ–5D scores at the 9-month assessment compared with the baseline assessment (Table 5).
It can also be seen that those who had milder scores according to each of the seven measures of mental health, also tended to have fewer problems than those with more severe scores across each of the five dimensions of the EQ–5D (Table 3). Thus, this would suggest that many of the dimensions of the EQ–5D can be sensitive to changes in the level of mental health. Further support for this argument is provided by the fact that a greater proportion of those who improved, according to each of the seven measures of mental health, also tended to report having no problems on each of the five dimensions of the EQ–5D at follow-up compared with baseline (Table 6).
One seemingly unexpected result was, however, that those who did not improve according to the BHS actually had a higher mean post-intervention EQ–5D score than those who did improve. Looking at the results in Table 6, it can be seen that (post-intervention), as one might expect, a greater proportion of those who had more negative perceptions about the future also reported having problems with regard to the anxiety/depression dimension of the EQ–5D. However, a greater proportion also reported having no problems and 15 individuals whose perceptions about the future did not improve actually received other benefits (reflected in the mobility, self-care and pain/discomfort dimensions of the EQ–5D) that outweighed the increased levels of anxiety and depression, although we cannot account for why this occurred. In a similar way, the finding that a greater proportion of those who improved according to the SOFAS actually reported having problems with regard to anxiety and depression may not be contrary to expectations as the SOFAS focuses on social and occupational functioning. That said, the finding that the proportion of those who reported having a problem with regard to usual activities increased among those who improved according to the PANSS cannot be fully accounted for. Similarly, the argument that our results are far from conclusive is bolstered by the fact that only four of the seven measures of mental health were significantly correlated with the EQ–5D.
Comparisons with other studies
A number of previous cost-effectiveness studies in the area of mental health have included the EQ–5D as a measure of benefit, for example Byford et al, Reference Byford, Knapp, Greenshields, Ukoumunne, Jones and Thompson12 Palmer et al, Reference Palmer, Davidson, Tyrer, Gumley, Tata and Norrie42 Byford et al Reference Byford, Barrett, Roberts, Wilkinson, Dubicka and Kelvin43 and Hakkaart-van Roijen et al. Reference Hakkaart-van Roijen, van Straten, Al, Rutten and Donker44 However, in each of these studies, Reference Byford, Knapp, Greenshields, Ukoumunne, Jones and Thompson12,Reference Palmer, Davidson, Tyrer, Gumley, Tata and Norrie42–Reference Hakkaart-van Roijen, van Straten, Al, Rutten and Donker44 according to the EQ–5D, the intervention was estimated to be no more effective than the comparator with which it was compared (this was often termed ‘treatment as usual’). Such results could arise because there was an improvement associated with the intervention, but the EQ–5D may have been insensitive to that improvement, as was pointed out by Byford et al. Reference Byford, Knapp, Greenshields, Ukoumunne, Jones and Thompson12 Alternatively, it could be that any benefits do not constitute sufficient value, according to the EQ–5D, in order for them to increase a person's level of utility. Within this study we have tried to inform this debate by considering the validity and responsiveness of the EQ–5D in a group of people with psychosis. Our findings suggest that the EQ–5D can discriminate between those with milder and more severe scores, and that the EQ–5D is generally responsive to improvements in mental health as measured by the BAI, BDI, BHS, PANSS, QLS, GAF and SOFAS. These findings are consistent with others who have found evidence to support the validity and/or responsiveness of the EQ–5D in people with schizophrenic, schizotypal or delusional disorders, Reference Konig, Roick and Angermeyer45 in a group of individuals receiving mental health services Reference van de Willige, Wiersma, Nienhuis and Jenner46 and in other areas of health. Reference Hurst, Kind, Ruta, Hunter and Stubbings13,Reference Brazier, Harper, Munro, Walters and Snaith47–Reference Russell, Conner-Spady, Mintz and Maksymowych49 However, this does not necessarily imply that similar results will be realised in other groups with severe mental illness.
Readers should also be aware that, in addition to the EQ–5D, other measures of utility measurement are available, including the Health Utilities Index Reference Feeny, Furlong, Torrance, Goldsmith, Zhu and DePauw50 and the SF–6D Reference Brazier, Roberts and Deverill51 (the latter of which is derived from the SF–36). Reference Ware and Sherbourne52 A number of studies have thereby been undertaken to compare the utility scores derived from these different multi-attribute health status classification systems, where a common finding is that there are small but important differences between the utility scores estimated by each of the measures. Reference Bryan and Longworth53 This is in accordance with the conclusion of the only paper which we know to have compared the scores provided by two utility measures (the EQ–5D and SF–6D) in the area of mental health. Reference Lamers, Bouwmans, van Straten, Donker and Hakkaart54 In a group of people with either a major depressive disorder, dysthymic disorder, panic disorder, social phobia, or generalised anxiety, Lamers et al found that those who were less distressed (according to a checklist of 90 psychological symptoms) tended to have higher scores on the EQ–5D (compared with the SF–6D), whereas those who were more distressed tended to have higher scores on the SF–6D. Reference Lamers, Bouwmans, van Straten, Donker and Hakkaart54 Differing scores such as these are often explained by the fact that different measures use both different health-state descriptions and different valuation techniques. Reference Brazier29
Limitations
The main limitation of this study is that it is relatively small (based on 77 participants), and no single experiment can unequivocally prove a construct (evidence of validity can only be provided by a series of converging results). Reference Streiner and Norman30 Additionally, although the t-test has previously been used to analyse responses to the EQ–5D (e.g. Hurst et al Reference Hurst, Kind, Ruta, Hunter and Stubbings13 ) there may be some limitations with regard to the statistical analysis due to the fact that the EQ–5D data were not normally distributed (a requirement for the t-test). That said, we demonstrated that the results of the t-test were robust as the qualitative interpretation of these results (in relation to the P<0.05 cut-off) was identical to those obtained using the Mann–Whitney U-test. Furthermore, it should also be recognised that we did not collect sufficient data to enable all of the criteria developed by both Fitzpatrick et al Reference Fitzpatrick, Davey, Buxton and Jones31 and Brazier et al Reference Brazier, Deverill, Green, Harper and Booth32 to be fully assessed. Further research is therefore required before one can conclude that it is wholly appropriate to use the EQ–5D to measure the benefits in mental health evaluations. That said, our results suggest that, for this particular intervention (which focused on social recovery), Reference Fowler, Hodgekins, Painter, Reilly, Crane and Macmillan16 those who improved according to the BAI, BDI, BHS, PANSS, QLS, GAF and SOFAS, also received benefits in most of the five dimensions of the EQ–5D. Therefore, had we only used a measure of mental health to estimate the benefits of social recovery oriented cognitive–behavioural therapy it could have been that the range of benefits associated with the intervention would have been underestimated.
Implications
Our results suggest that the EQ–5D can discriminate between those with milder and more severe scores, and that the EQ–5D is responsive to improvements in mental health as measured by seven measures of mental health. This suggests that the EQ–5D should be considered for use in future cost-effectiveness studies of mental health interventions. However, as not all of the results were in line with expectations, further research as to the appropriateness of using the EQ–5D in such areas is also warranted.
Acknowledgements
We thank all participants who completed the baseline and 9-month assessment questionnaire, the trial therapists (including Dorothy O'Connor, Annabella Houlden, Neil Harmer, Cas Wright, Mark Wright, Ian Bell, Nick Whitehouse, Patrick Wymbs), Carolyn Crane (Research Nurse), and the UK Mental Health Research Network staff who provided assistance with recruitment and assessment (including Angela Browne, Freya Mellor and Barbara Dickson).
eLetters
No eLetters have been published for this article.