Outcome measures in forensic mental health services: A systematic review of instruments and qualitative evidence synthesis

Howard Ryland; Jonathan Cook; Denis Yukhnenko; Raymond Fitzpatrick; Seena Fazel

doi:10.1192/j.eurpsy.2021.32

Outcome measures in forensic mental health services: A systematic review of instruments and qualitative evidence synthesis

Published online by Cambridge University Press: 28 May 2021

and

Howard Ryland*: Affiliation:
Department of Psychiatry, University of Oxford, Oxford, United Kingdom
Jonathan Cook: Affiliation:
Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, United Kingdom
Denis Yukhnenko: Affiliation:
Department of Psychiatry, University of Oxford, Oxford, United Kingdom
Raymond Fitzpatrick: Affiliation:
Nuffield Department of Population Health, University of Oxford, Oxford, United Kingdom
Seena Fazel: Affiliation:
Department of Psychiatry, University of Oxford, Oxford, United Kingdom
*: *Author for correspondence: Howard Ryland, E-mail: [email protected]

Article contents

Abstract
Background
Methods
Results
Conclusions
Introduction
Methods
Results
Discussion
Financial Support
Conflicts of Interest
Data Availability Statement
Supplementary Materials
References

Abstract

Background

Outcome measurement in forensic mental health services can support service improvement, research, and patient progress evaluation. This systematic review aims to identify instruments available for use as outcome measures in this field and assess the evidence for the most common instruments, specific to the forensic context, which cover multiple outcome domains.

Methods

Studies were identified by searching seven online databases. Additional searches were then performed for 10 selected instruments to identify additional information on their psychometric properties. Instrument manuals and gray literature was reviewed for information about instrument development and content validity. The quality of evidence for psychometric properties was summarized for each instrument based on the COnsensus-based Standards for health Measurement INstruments (COSMIN) approach.

Results

A total of 435 different instruments or variants were identified. Psychometric information on the 10 selected instruments was extracted from 103 studies. All 10 instruments had a clinician reported component with only two having patient reported scales. Half of the instruments were primarily focused on risk. No instrument demonstrated adequate psychometric properties in all eight COSMIN categories assessed. Only one instrument, the Camberwell Assessment of Need: Forensic Version, had adequate evidence for its development and content validity. The most evidence was for construct validity, while none was identified for construct stability between groups.

Conclusions

Despite the large number of instruments potentially available, evidence for their use as outcome measures in forensic mental health services is limited. Future research and instrument development should involve patients and carers to ensure adequate content validity.

Keywords

Forensic mental health services outcome measurement psychometrics quality of life risk assessment

Type: Review/Meta-analysis
Information: European Psychiatry , Volume 64 , Issue 1 , 2021 , e37

DOI: https://doi.org/10.1192/j.eurpsy.2021.32 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright: © The Author(s), 2021. Published by Cambridge University Press on behalf of the European Psychiatric Association

Introduction

Forensic mental health services provide care for people with mental illness who pose a risk to others and have typically perpetrated acts of violence or other antisocial behaviors [Reference Crocker, Livingston and Leclair1]. The structure and legal framework governing such services varies considerably between and even within countries [Reference Sampson, Edworthy, Völlm and Bulten2,Reference Tomlin, Lega, Braun, Kennedy, Herrando and Barroso3]. Demand for such services is rising in many high income countries, with increasing inpatient capacity [Reference Jansman-Hart, Seto, Crocker, Nicholls and Côté4]. Long length of stay, high staffing ratios, and the need for complex security arrangements mean that such services are expensive [Reference Rutherford and Duggan5,Reference Pinals6]. Forensic mental health services can consume a disproportionate portion of overall health budgets given the small numbers of patients [Reference Wilson, James and Forrester7]. Patients frequently spend many years in secure settings and continue to be subject to restrictions on discharge [Reference Völlm8]. The consequences of recidivism are often severe for victims and their families [Reference Lund, Hofvander, Forsman, Anckarsäter and Nilsson9]. Despite the financial and human costs, outcomes of care remain poorly understood and measurement of progress often relies on the individual approach of clinicians [Reference Allnutt, Ogloff, Adams, O’Driscoll, Daffern and Carroll10].

Measuring the outcomes of forensic mental health services is complicated. Unlike most other healthcare services which focus exclusively on improving outcomes for patients, forensic mental health services also have the dual purpose of public protection. In many jurisdictions this is considered their primary, if not sole, purpose. In other forensic mental health systems however, there is an increasing recognition that patient-centered outcomes must also be prioritized [Reference Wallang, Kamath, Parshall, Saridar and Shah11,Reference Livingston12]. Previous research has frequently focused on objective outcomes, such as rehospitalization, reoffending and death, usually obtained from administrative datasets [Reference Fazel, Fimińska, Cocks and Coid13]. While such outcomes are clearly important, they are relatively uncommon and may only occur after considerable time has elapsed, limiting their usefulness to regularly monitor progress. Over the past three decades, there has been increasing interest in standardized questionnaires to quantify progress in a more nuanced way [Reference Cohen and Eastman14–Reference Fitzpatrick, Chambers, Burns, Doll, Fazel and Jenkinson16]. These questionnaires have predominately sought to reflect the assessment of the treating clinical teams, although more recent developments have also considered the views of patients themselves [Reference Davoren, Hennessy, Conway, Marrinan, Gill and Kennedy17,Reference Thomas, Slade, McCrone, Harty, Parrott and Thornicroft18]. In practice, what constitutes progress varies considerably between services. Progress may be formally defined and based on objective criteria, for example as a move to a lower level of security or discharge to the community [Reference Kennedy, O’Neill, Flynn, Gill and Davoren19]. Alternatively, progress may be shown by more internal, less externally measured changes, such as a psychological shift toward responsibility for previous injurious actions [Reference Kennedy, O’Reilly, Davoren, O’Flynn, O’Sullivan, Völlm and Braun20]. Progress should therefore address therapeutic as well as risk reduction interventions. The questionnaires used to assess progress in clinical practice have not always been explicitly developed for this purpose [Reference Ryland, Carlile and Kingdon21]. Thus, dynamic risk assessment, needs assessments, and decision aids for determining the level of security have all been used as measures of outcome in forensic mental health services.

Policy programs are increasingly concerned with measuring outcomes across health services [Reference Dawson, Doll, Fitzpatrick, Jenkinson and Carr22,Reference Calvert, Kyte, Price, Valderas and Hjollund23]. Driving principles highlight the need for measures to reflect the concerns of stakeholders, with adequate psychometric properties for their use as outcome measures [24]. The COnsensus-based Standards for health Measurement INstruments (COSMIN) group has developed a taxonomy to define the various qualities of an instrument that can make it a good outcome measure [Reference Mokkink, Terwee, Patrick, Alonso, Stratford and Knol25]. This includes aspects of validity, reliability, and responsiveness. Validity concerns whether an instrument actually measure the concept of interest, reliability whether it does so consistently, and responsiveness whether it is able to detect change over time. Instruments need to demonstrate good psychometric properties relevant to how they are used in practice. Measurement can be used at an individual level in determining a patient’s pathway. This can support patients to understand their own progress and to evaluate aspects of their treatment. It can also be used at a systemic level for quality assurance, allocation of resources, service evaluation, and research [Reference Black, Burke, Forrest, Ravens Sieberer, Ahmed and Valderas26]. International initiatives have agreed common sets of outcome measures for similar clinical services and to be used in clinical trials to facilitate synthesis of individual study findings [Reference Gargon, Gorst and Williamson27]. Understanding of psychometrics has evolved, placing greater emphasis on good content validity. Content validity asks the question of whether an instrument measures the concept that it is intended to measure. This concept should reflect those outcomes that are most important for stakeholders, including patients [Reference Terwee, Prinsen, Chiarotto, Westerman, Patrick and Alonso28].

Previous reviews of outcome measures in forensic settings have identified a large number of questionnaire-based instruments in clinical practice and research settings [Reference Fitzpatrick, Chambers, Burns, Doll, Fazel and Jenkinson16,Reference Shinkfield and Ogloff29]. These previous reviews noted a focus on risk and clinical symptoms, neglecting quality of life, and functional outcomes. They also highlight the lack of patient involvement in the development and rating of these instruments.

The present study seeks to update the evidence base, as previous reviews were completed almost a decade ago or only consider a small subset of measures [Reference Keulen-de Vos and Schepers30]. It aims to identify existing instruments from published literature which have been, or could be used, as outcome measures. To ensure that the full range of instruments used in practice is included, we used a wide definition of what constitutes an outcome measure. This includes all instruments with a dynamic component that could be used to measure change over time, regardless of whether these were originally designed to be, or are termed as, an “outcome measure.” In this context dynamic components measured indicators that vary with time, where this variation may have a significant effect on the measurement result, in contrast to static items that measure historical factors, such as previous behaviors, which will not change on repeated measurement. Observed changes could be the result of a number of factors, including response to treatment and variations in symptoms. We decided to focus our quality assessment on instruments that are multidimensional, as these are more likely to be relevant to routine clinical practice in forensic services, where multiple outcomes are assessed for each patient. Although it is possible to combine many different instruments that are narrowly focused on measuring single domains, this can be cumbersome and time-consuming in clinical practice, and multidimensional instruments can reduce clinician burden. We gave equal weight to patient centered and service outcomes and the four outcome dimensions we consider are risk, clinical symptoms, recovery (including functioning), and quality of life. We also prioritize instruments that are specific to the forensic context, over more generic instruments. We identify the 10 instruments most frequently occurring within the literature that are also multidimensional and forensic specific. We then assess their quality, including development and content validity, drawing on the latest consensus-based approaches for evaluating instruments for the purpose of measuring outcomes from COSMIN [Reference Prinsen, Mokkink, Bouter, Alonso, Patrick and de Vet31]. To the best of our knowledge, this is the first review of this type to apply the COSMIN criteria to outcome measures in forensic mental health services. The purpose of the quality assessment was to determine how well the selected instruments function as outcome measures, using the COSMIN criteria as a benchmark, and not to determine the appropriateness of other potential uses for the included instruments, such as risk prediction or needs assessment.

Methods

We report this review following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) reporting items, adapting where appropriate for this type of study [Reference Moher, Liberati, Tetzlaff and Altman32]. We followed an adapted version of the COSMIN protocol for systematic reviews, including their risk of bias tool for assessing study quality [Reference Prinsen, Mokkink, Bouter, Alonso, Patrick and de Vet31].The COSMIN approach is an internationally agreed standard for evaluating outcome measures. It can be used to assess all types of outcome measures, including both clinician and patient reported instruments [33]. The study protocol was registered on PROSPERO, an international prospective register of systematic reviews.

Step 1: Database search

We searched seven databases (MEDLINE, PsycINFO, CINAHL [Cumulative Index to Nursing and Allied Health Literature], EMBASE, National Criminal Justice Reference Service [NCJRS], the Cochrane Database, and Web of Science) from database inception until spring 2018 using a combination of terms including “tool”; “instrument”; “scale”; “outcome”; “recovery”; “risk”; “rehabilitation”; “quality of life”; “symptom”; “forensic”; “secure”; “unit”; “ward”; and “hospital”. See Supplementary Material 1 for an example of the full search strategy.

Step 2: Screening and eligibility criteria

We reviewed the titles and abstracts of identified records. Included papers needed to describe the use of relevant instruments in a forensic mental health setting. The full text had to be available in English. All types of empirical or review paper were included. Papers describing use in prison or general psychiatric services only were excluded. Papers describing assessments of personality, which are generally not dynamic, and competency to stand trial and malingering, which are outcomes related to the legal process, rather than treatment response, were also excluded.

Step 3: Full text review and identification of instruments

Papers meeting the screening criteria were reviewed in full text to identify relevant instruments described within. The format, type of study and geographical location were recorded. The frequency each instrument or subvariant appeared was noted. To determine the 10 most frequently appearing instruments, counts for all subvariants of each instrument were summed. We then considered each instrument, starting with the most frequently identified, to determine which met the criteria of being both multidimensional and designed for use in a forensic mental health context, until we had identified the 10 most frequently occurring within the literature. Multidimensional instruments included items on more than one of the four domains identified in previous reviews in this field (clinical symptoms, risk, recovery, and quality of life) [Reference Shinkfield and Ogloff15,Reference Fitzpatrick, Chambers, Burns, Doll, Fazel and Jenkinson16]. Forensic specific instruments were those concerned with mental health outcomes for offenders or outcomes for individuals assessed or treated in forensic mental health services. We then considered each of the 10 selected instruments to determine the most relevant version or variant to undergo quality assessment in the next stage of the review. This was either the most recent version or, for instruments that combined multiple components, those components designed to measure patient progress over time.

Step 4: Further searching for literature on selected instruments

We conducted additional searches for each of the 10 selected instruments. We searched the PubMed database using common variants of each instrument’s name combined with the COSMIN filter of psychometric terms [34]. We reviewed the manuals for each instrument and other gray literature for further information on instrument development. We reviewed the reference lists of all included papers and contacted experts in the field as necessary. All sources of information were included until the end of 2019.

Step 5: Data extraction

We developed a data extraction tool, based on the COSMIN systematic review protocol and risk of bias tool. A number of adaptations to the standard approach were necessary, as the identified instruments were predominantly clinician reported. Content validity focused on the qualitative comprehensiveness and relevance of items in relation to the concept of interest, while all other psychometric properties were assessed using quantitative studies of numerical scores generated by the instruments. Quantitative data were extracted on seven psychometric properties (structural validity, internal consistency, measurement invariance, reliability, measurement error, hypothesis testing, and responsiveness). According to COSMIN, the dimensionality of a scale should be determined by factor analysis before internal consistency is considered [Reference Mokkink, Prinsen, Patrick, Alonso, Bouter and de Vet35]. In this context, dimensionality considers whether there is statistical evidence that respondents answer an instrument’s items in a similar way, indicating that they relate to the same underlying construct. Measurement error refers to the systematic or random error of a patient’s score that is not attributable to true changes that have occurred. It requires a qualitative estimation of the minimal important change, which is the smallest change in a score that would be clinically meaningful. We assigned a quality rating to the evidence for each property for each instrument in each study in one of three categories (no concerns/quality of evidence unclear/quality of evidence inadequate).

Step 6: Overall strength of evidence

We assigned an overall rating to the strength of evidence available for each of the seven psychometric properties for each instrument based on all included studies in one of three categories. For properties with adequate evidence of good performance we assigned the highest category. Properties with either inadequate evidence of good measurement properties or evidence of inadequate measurement properties were assigned to the middle category. Those properties with no evidence were assigned to the lowest category.

We used the same categorization system for content validity, including the instrument development process. However, due to the lack of published studies, we used a qualitative synthesis of information available from a range of sources, including instrument manuals and other gray literature, based on the COSMIN methodology for assessing content validity, which focuses on establishing relevance and comprehensiveness in the target population [Reference Terwee, Prinsen, Chiarotto, Westerman, Patrick and Alonso36].

Results

Description of full text articles retrieved

The initial screening process identified 4,494 unique references, of which 502 met the inclusion criteria for full text review. Four hundred and fifty-six (91%) were articles in scientific journals. Almost half (49%; n = 247) were studies of the psychometric properties of instruments, while only 3% (n = 17) concerned interventional trials. Almost half (45%; n = 227) originated in the UK and Ireland (see Supplementary for a full description of the studies reviewed in full text).

Description of the instruments identified

Four hundred and thirty-five different instruments or their variants of were identified. It was necessary to review 14 instruments until we identified the 10th instrument most frequently occurring within the literature that also met the multidimensional and forensic-specific criteria (see Supplementary Material 3). The most frequently occurring instrument within the literature was the Historical, Clinical, Risk 20 (HCR-20) [Reference Douglas, Hart, Webster, Belfrage, Guy and Wilson37], which appeared 196 times, followed by the Short Term Assessment of Risk and Treatability (START) [Reference Webster, Nicholls, Martin, Desmarais and Brink38] with 53 mentions.

There was considerable variation in the format and stated purpose of the selected instruments. This included assessments of progress, risk factors, protective factors, patient need, and clinical decision aids.

Overview of the 10 instruments selected for the quality assessment

Half of the 10 instruments assessed were developed primarily as risk assessments (HCR-20, START, Sexual Violence Risk 20 [SVR-20], Violence Risk Scale [VRS], Level of Service: Case Management Inventory [LS/CMI]) [Reference Douglas, Hart, Webster, Belfrage, Guy and Wilson37–Reference Andrews, Bonta and Wormith42]. Two instruments explicitly included items on patients’ strengths or protective factors (START and SAPROF) [Reference Webster, Nicholls, Martin, Desmarais and Brink38,Reference de Vogel, de Ruiter, Bouman and de Vries Robbé43]. Only one instrument, the Health of the Nation Outcome Scale Secure (HoNOS Secure), was explicitly developed as a progress measure [Reference Dickens, Sugarman and Walker44]. All instruments included a clinician reported scale. Only one, the Camberwell Assessment of Need Forensic Version (CANFOR), was originally developed to include a patient reported scale [Reference Thomas, Slade, McCrone, Harty, Parrott and Thornicroft18]. A patient reported scale has subsequently been developed for the Dangerousness, Understanding, Recovery, and Urgency Manual (DUNDRUM) [Reference Davoren, Hennessy, Conway, Marrinan, Gill and Kennedy17]. The number of items ranged from 12 (DUNDRUM 3 and 4) to 150 Behavioral Status Index (BEST) [Reference O’Dwyer, Davoren, Abidin, Doyle, McDonnell and Kennedy45,Reference Woods, Reed and Robinson46]. See Table 1 for a full description of each of the instruments.

Table 1 An overview of the 10 outcome measurement instruments included in the quality assessment

Quality of evidence for the selected instruments

Eighty-six (17%) of the references identified by the review strategy contained relevant data on the psychometric properties of the 10 selected instruments. An extra 29 references were identified through the additional search techniques described in Step 4 of the methods (see Figure 1). See Supplementary Material 4 for details of the identified studies containing psychometric information about the selected instruments.

Figure 1. PRISMA flow diagram showing the flow of studies through the review.

All 10 selected instruments had some evidence of empirical processes to support their development, however, this often emphasized quantitative reviews of the literature on risk factors for violence, rather than considering the views of relevant stakeholders [Reference Terwee, Prinsen, Chiarotto, Westerman, Patrick and Alonso36]. When there was evidence of consultation with stakeholders, this was usually unstructured, with limited details on the methods used or individuals involved. Only one instrument, CANFOR, demonstrated adequate evidence of stakeholder involvement, including patients, in its development [Reference Thomas, Slade, McCrone, Harty, Parrott and Thornicroft18]. CANFOR also had evidence to support its relevance and comprehensiveness for the target population.

The degree of evidence for the remaining psychometric properties was mixed, with evidence on testing hypotheses for construct validity identified for every instrument, but none for measurement invariance [Reference Mokkink, Prinsen, Patrick, Alonso, Bouter and de Vet35]. Evidence for structural validity was available for three of the instruments (BEST, VRS, and DUNDRUM), none of which demonstrated adequate performance [Reference Wong and Gordon41,Reference O’Dwyer, Davoren, Abidin, Doyle, McDonnell and Kennedy45,Reference Woods, Reed and Collins47]. This was either due to insufficient numbers, the use of exploratory, rather than confirmatory factor analysis, or results that were not supportive of the hypothesized structure of the instrument [Reference Prinsen, Mokkink, Bouter, Alonso, Patrick and de Vet31]. There was evidence for internal consistency identified for 8 instruments out of 10. Despite the lack of evidence for structural validity, four instruments were deemed to have evidence of adequate internal consistency.

Nine instruments had some evidence for their reliability, which focused primarily on interrater, rather than test–retest reliability. Measurement error had limited evidence with studies identified for three instruments [Reference Horgan, Charteris and Ambrose48–Reference Longdon, Edworthy, Resnick, Byrne, Clarke and Cheung50]. The quality of evidence for measurement error in the review was consistently low, relying on quantitative methods alone, with no attempt to relate the statistical error to the minimal important clinical change [Reference Prinsen, Mokkink, Bouter, Alonso, Patrick and de Vet31]. Testing hypotheses for construct validity was the category with the greatest quantity of evidence. Three primary types of hypotheses were identified: prediction of future events, such as violence, self-harm, and victimization; difference between subgroups, based on characteristics such as sex, ward type or behavior; and correlation with other measures. Evidence for responsiveness was identified for seven instruments, with only two demonstrating adequate properties in this respect [Reference Horgan, Charteris and Ambrose48,Reference de Vries Robbe, de Vogel, Douglas and Nijman51]. See Table 2 for an overview of the evidence for the selected instruments and Supplementary Material 5 for a detailed summary.

Table 2. Summary synthesis of evidence for the 10 outcome measurement instruments included in the quality assessment.

Note: This table provides an overall summary of the evidence for the psychometric properties of each of the included measurement instruments. The eight psychometric properties assessed are listed at the top of the table and the 10 instruments on the left hand side. The numbers in the cells signify the number of studies identified which contain information about the relevant psychometric property for each instrument. Numbers are not included for content validity, as this was not possible to accurately quantify, due to the diverse range of sources of information for this property. The shading categorizes the level of evidence within each cell according to the schedule outlined below:

• Adequate evidence of good measurement properties
• Inadequate evidence of good measurement properties or evidence of inadequate measurement properties
• No evidence

Definition of terms used in Table 2.

Discussion

This systematic review aimed to provide an overview of instruments currently available for use as outcome measures in forensic mental health services. A broad definition of what constitutes an outcome measure ensured a wide range of instruments were considered. The review focused on instruments which are clinically relevant, to increase applicability of findings to real world settings. It assesses the quality of evidence for the 10 most frequently occurring instruments within the literature, which are also multidimensional and forensic specific. This review was based on a recognized quality assessment process and, to our knowledge, this is the first time that such a systematic approach has been applied in this field [Reference Mokkink, Prinsen, Patrick, Alonso, Bouter and de Vet35]. This quality assessment specifically considered the use of these instruments as broad outcome measures, covering a wide range of clinically relevant domains. It made no evaluation about the use of instruments for other purposes, such as risk prediction or needs assessment.

Key findings

Overall, the evidence for the appropriateness of the selected instruments as broad outcome measures is limited (see Table 2). At least half focused primarily on risk assessment and management, which is in line with previous similar reviews and unsurprising given the nature of forensic mental health services [Reference Shinkfield and Ogloff15–Reference Thomas, Slade, McCrone, Harty, Parrott and Thornicroft18]. The Overt Aggression Scale, developed to measure aggressive behavior in inpatients with intellectual disabilities, appeared frequently but was excluded from more detailed assessment due to not being multidimensional [Reference Yudofsky, Silver, Jackson, Endicott and Williams52]. Although clinical symptoms of mental illness featured in many of the selected instruments, this was not the primary focus of any. The Positive and Negative Symptoms Scale [Reference Kay, Opler and Lindenmayer53] and Brief Psychiatric Ratings Scale [Reference Faustman and Overall54] both appeared frequently, but were excluded from more detailed assessment as they only focused on symptoms and were not designed for use in a forensic context (see Supplementary Material 3). Recovery and quality of life were less prominent in the selected instruments, although there were both some generic and forensic specific measures of these domains in the other instruments identified (such as the Global Assessment of Functioning [Reference Aas55], which appeared frequently but was excluded as not forensic specific or multidimensional, or the forensic specific Lancashire Quality of Life Profile [Reference Eklund and Michalos56], which did not appear frequently enough to warrant more detailed assessment). In accordance with previous reviews, few instruments were reported by patients, with only 2 of the 10 selected instruments having a patient reported scale [Reference Shinkfield and Ogloff15,Reference Fitzpatrick, Chambers, Burns, Doll, Fazel and Jenkinson16]. The systematic gathering of the views of a wider group of stakeholders, especially patients, was rarely performed to inform content validity.

The differing attention to various aspects of validity and reliability in the quality assessment reflects the original purposes of the instruments. For example, as an assessment of patient need, the CANFOR has a much greater focus on content validity, while the HCR-20, as a risk assessment, focuses more on prediction of negative outcomes [Reference Thomas, Slade, McCrone, Harty, Parrott and Thornicroft18,Reference Douglas57]. Studies of the DUNDRUM quartet often focus on differences between levels of security, as an aid to support decisions on pathway placement rather than outcome measurement, while the HoNOS-Secure has the highest number of studies of responsiveness, commensurate with its role as a progress measure [Reference O’Dwyer, Davoren, Abidin, Doyle, McDonnell and Kennedy45,Reference Dickens, Sugarman, Picchioni and Long58].

Implications for research

The COSMIN guidelines emphasize the need for outcome measures to demonstrate adequate stakeholder involvement in their development [Reference Terwee, Prinsen, Chiarotto, Westerman, Patrick and Alonso28]. Even for clinician reported scales, this should include input from patients and carers. This holds for forensic services, which must balance the needs of patients with those of public protection. Evidence for instrument development was only adequate for the CANFOR [Reference Thomas, Slade, McCrone, Harty, Parrott and Thornicroft18]. Although other selected instruments had some stakeholder involvement, this was limited, unstructured and the reporting often inadequate. Subsequent empirical validation of instrument content was similarly lacking, again except for CANFOR. Testing of comprehensiveness and relevance should be completed in the population for which instruments are intended [Reference Terwee, Prinsen, Chiarotto, Westerman, Patrick and Alonso28]. This can take place after the instrument is available in its final form and does not have to occur contemporaneously with development [Reference Prinsen, Mokkink, Bouter, Alonso, Patrick and de Vet31]. Further research is therefore necessary to establish the content validity of these instruments as outcome measures in a forensic psychiatric population.

Overall, the evidence for the other psychometric properties of the instruments as outcome measures is limited, with numerous gaps in the published research. This lack of a comprehensive evidence base is perhaps surprising given the age and popularity of many of these instruments, but may reflect the diversity of their intended uses. Adequate evidence for other uses, such as risk predication, may be well established, but does not necessarily support their use as outcome measures. Further research should seek to ensure that the identified gaps in the evidence base are addressed, if these instruments are used as outcome measures. Certain properties, such as measurement error and measurement invariance are almost entirely overlooked, so should be considered in future studies. Evidence for other fundamental characteristics, such as structural validity, is also often absent or inadequate.

The ability to detect change over time was explicitly considered in the review under the category of responsiveness. While seven instruments had some evidence for responsiveness, this property was only deemed adequate for the VRS and SAPROF. Demonstrating reliable change in this population can be challenging, due to the long timescales involved [Reference Tomlin, Lega, Braun, Kennedy, Herrando and Barroso59]. Admissions to inpatient forensic psychiatric care often last years. The timeframe of most psychometric studies however, including many in this review, is limited to a few months [Reference Völlm8]. Despite these difficulties, it is essential for outcome measures to demonstrate responsiveness to change over a time period that is relevant for the population of interest [Reference Terwee, Dekker, Wiersinga, Prummel and Bossuyt60].

Authorship bias has been identified as a potential problem in the literature on risk assessments in the forensic context [Reference Singh, Grann and Fazel61]. While authorship bias was not formally assessed in this review, much of the evidence identified was produced by the teams that originally developed the instruments. Sufficient validation studies should therefore be conducted independently of the original authors.

New instruments are needed for forensic mental health services to enable clinicians and patients to report and measure individual and service outcomes. These should be developed according to the latest best practice guidelines, including the participation of relevant stakeholders, such as clinicians and patients [62,Reference De Vet, Terwee, Mokkink and Knol63]. Developing new instruments will require working with these stakeholders to identify and prioritize the most important outcomes. This should be followed by further work to develop an instrument that fits the needs of individuals and services. Finally, empirical studies should confirm adequate psychometric properties for the new instrument, such as content validity and responsiveness.

Implications for policy and practice

This review identified many instruments that have been, or could be, used as outcome measures in forensic mental health services. These vary considerably in format, content, length, stated purpose, and evidence base. Of the 10 instruments reviewed in detail, only HoNOS-Secure is designed with the sole primary purpose of measuring progress, although other instruments such as the VRS and LS/CMI are also intended to assess change over time [Reference Dickens, Sugarman and Walker44]. The ways that clinicians and researchers use instruments can differ considerably. Risk assessments, such as the HCR-20, can be used by clinicians to develop risk formulations, while researchers may use it to predict negative outcomes. Instruments can be used in practice or in research in several different ways, for example using the same instruments to predict the risk of future events and to establish if an intervention has already reduced that risk [Reference Hogan and Olver64]. This type of repurposing may be possible, but is limited by how to interpret scores. It will also need considerable additional work to establish relevant psychometric properties, in particular adequate content validity and responsiveness [Reference Terwee, Prinsen, Chiarotto, Westerman, Patrick and Alonso28,Reference Terwee, Dekker, Wiersinga, Prummel and Bossuyt60]. While some commonly used instruments, such as HCR-20 and START, have been used as outcome measures, the underlying evidence for their use in this way is weak. Use of such risk assessments as outcome measures in isolation may lead to an unbalanced view of progress, as they do not include important outcomes such as quality of life and social functioning. Services should therefore start by deciding which outcomes are important, before selecting high quality outcome measures that cover all such outcomes in a way that is practical to implement.

Most instruments identified in this review are reported by clinicians only. For instruments that do include a patient reported scale, these scales may have been designed after the development of the clinician reported ones, with limited patient input [Reference Davoren, Hennessy, Conway, Marrinan, Gill and Kennedy17,Reference van den Brink, Troquete, Beintema, Mulder, van Os and Schoevers65]. This risks inadequate attention to the patient perspective in the overall design and implementation of such measures [Reference Rothrock, Kaiser and Cella66]. In instruments selected in this review that include a patient reported scale, the patient reported scales mirror their clinician reported components. They contain identical items, reframed from the patient’s perspective, to allow direct comparison between the two scales. A disadvantage of this approach is that certain outcome areas, such as those related to subjective quality of life, may only meaningfully be rated by patients [Reference Boardman67]. A patient reported scale that exactly mirrors the clinician reported scale therefore risks neglecting such areas. Services wishing to implement patient reported measures should consult their own users and other key stakeholders, such as family members, when selecting scales, to ensure that they are fit for the purpose of measuring those outcomes deemed of greatest relevance [Reference Wallang, Kamath, Parshall, Saridar and Shah11].

Comprehensiveness is an essential quality for outcome measures [Reference Terwee, Prinsen, Chiarotto, Westerman, Patrick and Alonso28]. While risk and clinical symptoms are the dominant domains within the most frequently occurring instruments within the literature, quality of life and functional outcomes are either absent or remain of secondary importance. By relying on existing instruments services may overlook outcomes of importance, such as quality of life, and over-emphasize the importance of other domains, such as risk to others [Reference Connell, O’Cathain and Brazier68].

Limitations

Given the very large number of instruments identified, it was only possible to assess the quality of evidence for a small proportion of them. Some of the included instruments were not intended to be used as outcome measures, and their utility is not limited to this. A frequency based approach was chosen to select instruments for the quality assessment. This was deemed the most systematic method of identifying instruments that were likely to have a sufficient evidence base to judge their qualities against the COSMIN criteria. There may be instruments that did not meet our selection criteria that have the potential to perform well against the COSMIN criteria, when sufficient evidence is available. The use of frequency of appearance in the literature to select instruments for quality assessment has a number of drawbacks. Firstly, older tools are likely to appear in more published studies, simply by virtue of being in existence for longer. Secondly, some studies were published as multiple papers, meaning that a limited evidence base generates a disproportionate number of references. Thirdly, although we grouped variants of instruments together, there may be important differences between variants. Finally, all types of paper were included in the count and the proportion of studies that contained psychometric information on a particular instrument varied, so the overall count does not necessarily reflect the quantity of psychometric evidence available.

Language was a limitation in two ways. Firstly, the search was limited to those references where the full text was available in English. Secondly, studies involving translations of instruments were included, although evidence from a translated version may not always apply directly to the English version, due to subtle cultural and linguistic differences [Reference Sartorius and Kuyken69].

Assessing the quality of instrument development and content validity studies according to the full COSMIN criteria proved challenging [Reference Terwee, Prinsen, Chiarotto, Westerman, Patrick and Alonso28,Reference Mokkink, Prinsen, Patrick, Alonso, Bouter and de Vet35]. The review team simplified the COSMIN approach, to make it more pragmatic and streamlined. This included reducing the quality assessment to three levels, rather than four. The summary assessment of the quality of evidence for instrument development and content validity was also simplified, as the limited evidence in this area rendered the full process recommended by COSMIN unworkable. Despite these limitations, we think that the COSMIN framework is the most robust and relevant mechanism currently available for assessing instruments for use as outcome measures.

Conclusions

Although there are a large number of instruments available that can be used as outcome measures in forensic mental health services, the evidence base for their use in this way is limited. Despite recommendations from previous reviews, instruments that appear most frequently in the literature remain focused on risk and fail to adequately involve all stakeholders, especially patients [Reference Shinkfield and Ogloff15,Reference Fitzpatrick, Chambers, Burns, Doll, Fazel and Jenkinson16]. Repurposing instruments developed for other uses as outcome measures should be avoided where possible. This is particularly the case for risk assessment tools which cannot currently be recommended as outcome measures based on the standard guidelines we have outlined. When this is unavoidable, additional research is necessary to ensure that they demonstrate adequate psychometric properties to be used as outcome measures [Reference Mokkink, Prinsen, Patrick, Alonso, Bouter and de Vet35]. New outcome measures should be designed with input from all relevant stakeholder groups, especially patients and carers, who have hitherto been largely ignored [Reference Boardman67]. This should follow current best practice guidelines for outcome measure development, with a focus on ensuring adequate content validity [Reference Terwee, Prinsen, Chiarotto, Westerman, Patrick and Alonso28].

Acknowledgments

We would like to thank Nia Roberts for her help in developing the search strategy and retrieving full text papers.

Financial Support

Howard Ryland, Doctoral Research Fellow, DRF-2017-10-019, is funded by the National Institute for Health Research (NIHR) for this research project. The views expressed in this publication are those of the authors and not necessarily those of the NIHR, NHS, or the UK Department of Health and Social Care.

Conflicts of Interest

The authors report no conflict of interest.

Data Availability Statement

The data that support the findings of this study are available from the authors on reasonable request.

Supplementary Materials

To view supplementary material for this article, please visit http://dx.doi.org/10.1192/j.eurpsy.2021.32.

References

Crocker, A, Livingston, J, Leclair, M. Forensic mental health systems internationally. Handbook of forensic mental health services. International perspectives on forensic mental health. New York, NY: Taylor & Francis; 2017, p. 3–76.Google Scholar

Sampson, S, Edworthy, R, Völlm, B, Bulten, E. Long-term forensic mental health services: An exploratory comparison of 18 European countries. Int J Forens Ment Health. 2016;15:333–51. doi:10.1080/14999013.2016.1221484.CrossRef Google Scholar

Tomlin, J, Lega, I, Braun, P, Kennedy, H, Herrando, V, Barroso, R, et al. Forensic mental health in Europe: some key figures. Social Psychiatry Psychiatr Epidemiol. 2021;56:109–17. doi:10.1007/s00127-020-01909-6.CrossRef Google Scholar PubMed

Jansman-Hart, EM, Seto, MC, Crocker, AG, Nicholls, TL, Côté, G. International trends in demand for forensic mental health Services. Int J Forens Ment Health. 2011;10:326–36. doi:10.1080/14999013.2011.625591.CrossRef Google Scholar

Rutherford, M, Duggan, S. Forensic mental health services: facts and figures on current provision. Br J Forens Pract. 2008;10:4–10. doi:10.1108/14636646200800020.CrossRef Google Scholar

Pinals, DA. Forensic services, public mental health policy, and financing: charting the course ahead. J Am Acad Psychiatry Law Online. 2014;42:7–19.Google Scholar PubMed

Wilson, S, James, D, Forrester, A. The medium-secure project and criminal justice mental health. Lancet. 2011;378:110–1. doi:10.1016/s0140-6736(10)62268-4.CrossRef Google Scholar PubMed

Völlm, B. How long is (too) long? BJPsych Bull. 2019;43:151–3. doi:10.1192/bjb.2019.24.CrossRef Google Scholar

Lund, C, Hofvander, B, Forsman, A, Anckarsäter, H, Nilsson, T. Violent criminal recidivism in mentally disordered offenders: a follow-up study of 13–20 years through different sanctions. Int J Law Psychiatry. 2013;36:250–7. doi:https://doi.org/10.1016/j.ijlp.2013.04.015.CrossRef Google Scholar PubMed

Allnutt, S, Ogloff, J, Adams, J, O’Driscoll, C, Daffern, M, Carroll, A, et al. Managing aggression and violence: the clinician’s role in contemporary mental health care. Austr NZ J Psychiatry. 2013;47:728–36. doi:10.1177/0004867413484368.CrossRef Google Scholar PubMed

Wallang, P, Kamath, S, Parshall, A, Saridar, T, Shah, M. Implementation of outcomes-driven and value-based mental health care in the UK. Br J Hospital Med. 2018;79:322–7.CrossRef Google Scholar PubMed

Livingston, J. What does success look like in the forensic mental health system? Perspectives of service users and service providers. Int J Offender Ther Comp Criminol. 2016;62:208–28.CrossRef Google Scholar PubMed

Fazel, S, Fimińska, Z, Cocks, C, Coid, J. Patient outcomes following discharge from secure psychiatric hospitals: systematic review and meta-analysis. Br J Psychiatry. 2016;208:17–25. doi:10.1192/bjp.bp.114.149997.CrossRef Google Scholar PubMed

Cohen, A, Eastman, N. Needs assessment for mentally disordered offenders: measurement of ‘ability to benefit’ and outcome. Br J Psychiatry. 2000;177:493–8. doi:10.1192/bjp.177.6.493.CrossRef Google Scholar PubMed

Shinkfield, G, Ogloff, J. A review and analysis of routine outcome measures for forensic mental health services. Int J Forens Ment Health. 2014;13:252–71. doi:10.1080/14999013.2014.939788.CrossRef Google Scholar

Fitzpatrick, R, Chambers, J, Burns, T, Doll, H, Fazel, S, Jenkinson, C, et al. A systematic review of outcome measures used in forensic mental health research with consensus panel opinion. Health Technol Assess. 2010;14:1–94.CrossRef Google Scholar PubMed

Davoren, M, Hennessy, S, Conway, C, Marrinan, S, Gill, P, Kennedy, HG. Recovery and concordance in a secure forensic psychiatry hospital—the self rated DUNDRUM-3 programme completion and DUNDRUM-4 recovery scales. BMC Psychiatry. 2015;15:61. doi:10.1186/s12888-015-0433-x.CrossRef Google Scholar

Thomas, SD, Slade, M, McCrone, P, Harty, MA, Parrott, J, Thornicroft, G, et al. The reliability and validity of the forensic Camberwell Assessment of Need (CANFOR): a needs assessment for forensic mental health service users. Int J Methods Psychiatr Res. 2008;17:111–20.CrossRef Google Scholar PubMed

Kennedy, H, O’Neill, C, Flynn, G, Gill, P, Davoren, M. Dangerousness Understanding, Recovery and Urgency Manual (The DUNDRUM quartet): four structured professional judgement instruments for admission triage, urgency, treatment completion and recovery assessments Version 1.0.26. Dublin: Trinity College Dublin; 2013.Google Scholar

Kennedy, H, O’Reilly, K, Davoren, M, O’Flynn, P, O’Sullivan, O. How to measure progress in forensic care. In: Völlm, B, Braun, P, editors. Long-term forensic psychiatric care. Cham: Springer; 2019, p. 103–21. doi: 10.1007/978-3-030-12594-3_8CrossRef Google Scholar

Ryland, H, Carlile, J, Kingdon, D. A guide to outcome measurement in psychiatry. BJPsych Adv. 2020;1–9. doi:10.1192/bja.2020.58.CrossRef Google Scholar

Dawson, J, Doll, H, Fitzpatrick, R, Jenkinson, C, Carr, A. The routine use of patient reported outcome measures in healthcare settings. BMJ. 2010;340:c186. doi:10.1136/bmj.c186.CrossRef Google Scholar PubMed

Calvert, M, Kyte, D, Price, G, Valderas, J, Hjollund, N. Maximising the impact of patient reported outcome assessment for patients and society. BMJ. 2019;364:k5267. doi:10.1136/bmj.k5267.CrossRef Google Scholar PubMed

NHS England and NHS Improvement. Delivering the five year forward view for mental health: developing quality and outcomes measure. London, UK: NHS England and Improvement; 2016.Google Scholar

Mokkink, L, Terwee, C, Patrick, D, Alonso, J, Stratford, P, Knol, D, et al. The COSMIN study reached international consensus on taxonomy, terminology, and definitions of measurement properties for health-related patient-reported outcomes. J Clin Epidemiol. 2010;63:737–45.CrossRef Google Scholar PubMed

Black, N, Burke, L, Forrest, C, Ravens Sieberer, U, Ahmed, S, Valderas, J, et al. Patient-reported outcomes: pathways to better health, better services, and better societies. Quality Life Res. 2016;25:1103–12. doi:10.1007/s11136-015-1168-3.CrossRef Google Scholar PubMed

Gargon, E, Gorst, SL, Williamson, PR. Choosing important health outcomes for comparative effectiveness research: 5th annual update to a systematic review of core outcome sets for research. PLOS ONE. 2019;14:e0225980. doi:10.1371/journal.pone.0225980.CrossRef Google Scholar PubMed

Terwee, CB, Prinsen, CAC, Chiarotto, A, Westerman, MJ, Patrick, DL, Alonso, J, et al. COSMIN methodology for evaluating the content validity of patient-reported outcome measures: a Delphi study. Quality Life Res. 2018;27:1159–70. doi:10.1023/A:1023499322593.CrossRef Google Scholar PubMed

Shinkfield, G, Ogloff, J. Use and interpretation of routine outcome measures in forensic mental health. Int J Ment Health Nurs. 2015;24:11–8.CrossRef Google Scholar PubMed

Keulen-de Vos, M, Schepers, K. Needs assessment in forensic patients: a review of instrument suites. Int J Forens Ment Health. 2016;15(3):283–300. doi:10.1080/14999013.2016.1152614.CrossRef Google Scholar

Prinsen, CAC, Mokkink, LB, Bouter, LM, Alonso, J, Patrick, DL, de Vet, HCW, et al. COSMIN guideline for systematic reviews of patient-reported outcome measures. Quality Life Res. 2018;27(5):1147–57. doi:10.1007/s11136-018-1798-3.CrossRef Google Scholar PubMed

Moher, D, Liberati, A, Tetzlaff, J, Altman, DG. The PG. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. PLOS Med. 2009;6(7):e1000097. doi:10.1371/journal.pmed.1000097.CrossRef Google Scholar

Consensus-based standards for the selection of health measurement instruments. Guideline for systematic reviews of outcome measurement instruments. Amsterdam, The Netherlands: VU University Medical Centre; 2021.Google Scholar

COSMIN. Search filters. Amsterdam, The Netherlands: VU University Medical Centre; 2019.Google Scholar

Mokkink, LB, Prinsen, CAC, Patrick, DL, Alonso, J, Bouter, LM, de Vet, HCW, et al. COSMIN methodology for systematic reviews of patient‐reported outcome measures. Amsterdam, The Netherlands: VU University Medical Centre; 2018.Google Scholar PubMed

Douglas, KS, Hart, SD, Webster, CD, Belfrage, H, Guy, LS, Wilson, CM. Historical-clinical-risk management-20, Version 3 (HCR-20 V3): development and overview. Int J Forens Ment Health. 2014;13:93–108. doi:10.1080/14999013.2014.906519.CrossRef Google Scholar

Webster, C, Nicholls, T, Martin, M, Desmarais, S, Brink, J. Short-Term Assessment of Risk and Treatability (START): the case for a new structured professional judgment scheme. Behav Sci Law. 2006;24:747–66. doi:10.1002/bsl.737.CrossRef Google Scholar PubMed

Nicholls, T, Brink, J, Desmarais, S, Webster, C, Martin, M. The Short-Term Assessment of Risk and Treatability (START): a prospective validation study in a forensic psychiatric sample. Assessment. 2006;13:313–27.CrossRef Google Scholar

Boer, D. Manual for the sexual violence risk-20: professional guidelines for assessing risk of sexual violence. British Columbia, Canada: British Columbia Institute Against Family Violence; 1997.Google Scholar

Wong, S, Gordon, A. The validity and reliability of the Violence Risk Scale: a treatment-friendly violence risk assessment tool. Psychol Public Policy Law. 2006;12:279–309.CrossRef Google Scholar

Andrews, D, Bonta, J, Wormith, S. The Level of Service/Case Management Inventory (LS/CMI) technical brochure. Toronto, Canada: Multi-Health Systems; 2004.Google Scholar

de Vogel, V, de Ruiter, C, Bouman, Y, de Vries Robbé, M. SAPROF: guidelines for the assessment of protective factors for violence risk [English version of the Dutch original]. Utrecht, The Netherlands: Forum Educatief; 2009.Google Scholar

Dickens, G, Sugarman, P, Walker, L. HoNOS-secure: a reliable outcome measure for users of secure and forensic mental health services. J Forens Psychiatry Psychol. 2007;18:507–14.CrossRef Google Scholar

O’Dwyer, S, Davoren, M, Abidin, Z, Doyle, E, McDonnell, K, Kennedy, HG. The DUNDRUM Quartet: validation of structured professional judgement instruments DUNDRUM-3 assessment of programme completion and DUNDRUM-4 assessment of recovery in forensic mental health services. BMC Res Notes. 2011;4:229.CrossRef Google Scholar PubMed

Woods, P, Reed, V, Robinson, D. The Behavioural Status Index: therapeutic assessment of risk, insight, communication and social skills. J Psychiat Ment Health Nurs. 1999;6:79–90.CrossRef Google Scholar PubMed

Woods, P, Reed, V, Collins, M. Relationships among risk, and communication and social skills in a high security forensic setting. Issues Ment Health Nurs. 2004;25:769–82.CrossRef Google Scholar

Horgan, H, Charteris, C, Ambrose, D. The violence reduction programme: an exploration of posttreatment risk reduction in a specialist medium-secure unit. Crim Behav Ment Health. 2019;29:286–95. doi:10.1002/cbm.2123.CrossRef Google Scholar

Richter, MS, O’Reilly, K, O’Sullivan, D, O’Flynn, P, Corvin, A, Donohoe, G, et al. Prospective observational cohort study of ’treatment as usual’ over four years for patients with schizophrenia in a national forensic hospital. BMC Psychiatry. 2018;18:289. doi:10.1186/s12888-018-1862-0.CrossRef Google Scholar

Longdon, L, Edworthy, R, Resnick, J, Byrne, A, Clarke, M, Cheung, N, et al. Patient characteristics and outcome measurement in a low secure forensic hospital. Crim Behav Ment Health. 2018;28:255–69. doi:10.1002/cbm.2062.CrossRef Google Scholar

de Vries Robbe, M, de Vogel, V, Douglas, K, Nijman, H. Changes in dynamic risk and protective factors for violence during inpatient forensic psychiatric treatment: predicting reductions in postdischarge community recidivism. Law Hum Behav. 2015;39:53–61. doi:10.1037/lhb0000089.CrossRef Google Scholar PubMed

Yudofsky, SC, Silver, JM, Jackson, W, Endicott, J, Williams, D. The Overt Aggression Scale for the objective rating of verbal and physical aggression. Am J Psychiatry. 1986;143:35–9.Google Scholar PubMed

Kay, SR, Opler, LA, Lindenmayer, J-P. The Positive and Negative Syndrome Scale (PANSS): rationale and standardisation. Br J Psychiatry. 1989;155:59–65.CrossRef Google Scholar

Faustman, WO, Overall, JE. Brief Psychiatric Rating Scale. the use of psychological testing for treatment planning and outcomes assessment. 2nd ed. Mahwah, NJ: Lawrence Erlbaum Associates Publishers; 1999, p. 791–830.Google Scholar

Aas, IHM. Guidelines for rating Global Assessment of Functioning (GAF). Ann Gen Psychiatry. 2011;10:2. doi:10.1186/1744-859X-10-2.CrossRef Google Scholar

Eklund, M. Lancashire quality of life profile. In: Michalos, AC, editor. Encyclopedia of quality of life and well-being research. Dordrecht, The Netherlands: Springer; 2014, p. 3493–5.CrossRef Google Scholar

Douglas, KS. Version 3 of the historical-clinical-risk management-20 (HCR-20 V3): relevance to violence risk assessment and management in forensic conditional release contexts. Behav Sci Law. 2014;32:557–76.CrossRef Google Scholar

Dickens, G, Sugarman, P, Picchioni, M, Long, C. HoNOS-Secure: tracking risk and recovery for men in secure care. Br J Forens Practice. 2010;12:36–46.CrossRef Google Scholar

Tomlin, J, Lega, I, Braun, P, Kennedy, HG, Herrando, VT, Barroso, R, et al. Forensic mental health in Europe: some key figures. Social Psychiatry Psychiatr Epidemiol. 2021;56:109–17. doi:10.1007/s00127-020-01909-6.CrossRef Google Scholar PubMed

Terwee, CB, Dekker, FW, Wiersinga, WM, Prummel, MF, Bossuyt, PMM. On assessing responsiveness of health-related quality of life instruments: guidelines for instrument evaluation. Quality Life Res. 2003;12:349–62. doi:10.1023/A:1023499322593.CrossRef Google Scholar PubMed

Singh, JP, Grann, M, Fazel, S. Authorship bias in violence risk assessment? A systematic review and meta-analysis. PLOS ONE. 2013;8:e72484. doi:10.1371/journal.pone.0072484.CrossRef Google Scholar PubMed

U.S. Department of Health and Human Services Food and Drug Administration. Guidance for industry: patient-reported outcome measures: use in medical product development to support labeling llaims. Maryland: Food and Drug Administration; 2009.Google Scholar

De Vet, HC, Terwee, CB, Mokkink, LB, Knol, DL. Measurement in medicine: a practical guide. Cambridge: Cambridge University Press; 2011.CrossRef Google Scholar

Hogan, NR, Olver, ME. Assessing risk for aggression in forensic psychiatric inpatients: an examination of five measures. Law Hum Behav. 2016;40:233–43.CrossRef Google Scholar PubMed

van den Brink, RH, Troquete, NA, Beintema, H, Mulder, T, van Os, TW, Schoevers, RA, et al. Risk assessment by client and case manager for shared decision making in outpatient forensic psychiatry. BMC Psychiatry. 2015;15:120.CrossRef Google Scholar PubMed

Rothrock, NE, Kaiser, KA, Cella, D. Developing a valid patient-reported outcome measure. Clin Pharmacol Therapeut. 2011;90:737–42.CrossRef Google Scholar PubMed

Boardman, J. Routine outcome measurement: recovery, quality of life and co-production. Br J Psychiatry. 2018;212:4–5.CrossRef Google Scholar PubMed

Connell, J, O’Cathain, A, Brazier, J. Measuring quality of life in mental health: are we asking the right questions? Social Sci Med. 2014;120:12–20.CrossRef Google Scholar PubMed

Sartorius, N, Kuyken, W. Translation of health status instruments. Berlin, Germany: Springer; 1994.CrossRef Google Scholar

Table 1 An overview of the 10 outcome measurement instruments included in the quality assessment

Figure 1. PRISMA flow diagram showing the flow of studies through the review.

Table 2. Summary synthesis of evidence for the 10 outcome measurement instruments included in the quality assessment.

Ryland et al. supplementary material

File 193.5 KB

Submit a response

Comments

No Comments have been published for this article.

Article contents

Outcome measures in forensic mental health services: A systematic review of instruments and qualitative evidence synthesis

Abstract

Keywords

Introduction

Methods

Step 1: Database search

Step 2: Screening and eligibility criteria

Step 3: Full text review and identification of instruments

Step 4: Further searching for literature on selected instruments

Step 5: Data extraction

Step 6: Overall strength of evidence

Results

Description of full text articles retrieved

Description of the instruments identified

Overview of the 10 instruments selected for the quality assessment

Quality of evidence for the selected instruments

Discussion

Key findings

Implications for research

Implications for policy and practice

Limitations

Conclusions

Acknowledgments

Financial Support

Conflicts of Interest

Data Availability Statement

Supplementary Materials

References

Ryland et al. supplementary material

Comments

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests