Introduction
The Improving Access to Psychological Therapies (IAPT) programme in England offers rapid access to evidence-based psychological therapies recommended by clinical guidelines for the treatment of anxiety and depressive disorders (National Institute for Health and Care Excellence, 2011). IAPT services follow a stepped-care model, where many patients initially access a low-cost and brief intervention, followed by more intensive psychotherapies for those who do not fully benefit from the initial steps of treatment (Bower and Gilbody, Reference Bower and Gilbody2005). Over one million patients per year are referred to IAPT services and the majority of patients only access low-intensity psychological interventions (Clark, Reference Clark2019). Therefore, the organizational efficiency and clinical effectiveness of the system is heavily dependent on the assessment and treatment skills of the practitioners working in the early steps of IAPT services.
In IAPT, initial assessments are conducted by Psychological Wellbeing Practitioners (PWPs). PWPs are trained via a national curriculum (UCL, 2015) to assess the suitability of patients with mild-to-moderate anxiety and depression for brief psychoeducational and low-intensity interventions. Shafran et al. (Reference Shafran, Myles-Hooton, Bennett and Öst2021) defined one-to-one interventions delivered by PWPs as based on a self-help approach supported by didactic materials, where patients typically receive up to six hours of contact time, with sessions being 30 minutes each. Patients are therefore ‘stepped-up’ to more intensive therapies according to need, risk, and lack of responsivity to any initial low-intensity approach. The role of the PWP was established with the implementation of the IAPT programme and workforce (Kellett et al., Reference Kellett, Simmonds-Buckley, Limon, Hague, Hughes, Stride and Millings2021). Despite the fact that the PWP workforce is widely available across England, contributing to the assessment and treatment of thousands of patients per year, relatively little research is available on clinical decision making by PWPs.
Clinical judgement is known to be influenced by biases, which results in relatively poor agreement between psychological professionals and also variability in the quality and accuracy of clinical decisions (Garb, Reference Garb2005; Grove, Reference Grove2005; Grove and Meehl, Reference Grove and Meehl1996). When making diagnostic or prognostic assessments, clinical judgement has been found to be less accurate compared with structured algorithms or statistical models (Ægisdóttir et al., Reference Ægisdóttir, White, Spengler, Maugherman, Anderson, Cook, Nichols, Lampropoulos, Walker, Cohen and Rush2006). Clinical decision making is influenced by a myriad of variables such as the patient’s characteristics, attitudes, preferences, interpersonal relationships, and the confidence/competence of the practitioner (Anthony et al., Reference Anthony, Baik, Bowers, Tidjani, Jacobson and Susman2010; Pilgrim et al., Reference Pilgrim, Rogers, Clarke and Clark1997; Stavrou et al., Reference Stavrou, Cape and Barker2009; Sigel and Leiper, Reference Sigel and Leiper2004; Visintini et al., Reference Visintini, Ubbiali, Donati, Chiorri and Maffei2007). In an IAPT context, Delgadillo et al. (Reference Delgadillo, Gellatly and Stephenson-Bellwood2015) investigated ‘stepping decisions’ and found that four factors were associated with offering longer treatments for unresponsive patients: (a) when there were perceived obstacles in stepping up or referring on; (b) when the client was liked by the therapist; (c) if there was a positive therapeutic alliance; (d) when a positive outcome was envisaged through extending treatment. In this study, the retention of a patient in a treatment that was not resulting in reliable improvement was considered to be an obstacle to the adequate functioning of a stepped-care treatment system, where other (more intensive) treatments could be offered instead. Delgadillo et al. (Reference Delgadillo, Gellatly and Stephenson-Bellwood2015) therefore concluded that incongruence and inaccuracy in decision making were due to a complex interplay of beliefs, attitudes, subjective norms, and perceptions of self-efficacy. Despite the complex interplay of variables that impinge on clinical judgement, it is likely that psychological professionals integrate this information in an intuitive and ‘fast’ way, often using mental shortcuts. In his seminal work on decision making, Kahneman (Reference Kahneman2011) proposed that rather than combining multiple sources of information using complex mental processes to make decisions, people often rely on ‘system 1’ (i.e. ‘fast’ thinking), which uses heuristics and biases (Kahneman and Tversky, Reference Kahneman and Tversky1972).
Heuristics (i.e. unconscious mental shortcuts) and cognitive biases (i.e. systematic tendency to jump to certain conclusions) influence how people make intuitive decisions in daily life (Tversky and Kahneman, Reference Tversky and Kahneman1974). Tversky and Kahneman (Reference Tversky and Kahneman1974) introduced two cognitive processes that appear important when considering clinical decision making: (a) ‘anchoring and adjustment’ whereby excessive significance is placed on the first piece of information encountered when making a decision; and (b) the ‘halo effect’ whereby an impression formed from a single characteristic is allowed to influence multiple judgements of unrelated factors. Cognitive processes such as those described above have been proposed to influence decisions in routine clinical care (Garb, Reference Garb2005), ultimately having an impact on patients’ treatment pathway, experiences and outcomes.
The most common method of investigating the influence of heuristics and biases is using case vignettes requiring participants to make judgements and decisions when knowing the ‘correct’ course of action (e.g. Berman et al., Reference Berman, Tung, Matheny, Cohen and Wilhelm2016; Garb, Reference Garb1996; Spengler and Strohmer, Reference Spengler and Strohmer1994). However, the reliability and ecological validity of using generic case vignettes has been questioned (e.g. Hyler et al., Reference Hyler, Williams and Spitzer1982). Therefore, it is necessary to develop vignettes that have face-validity with the professional groups being tested and the decisions that are encountered in routine practice. Given the scarcity of research related to clinical decision making in stepped-care psychological services, the present study developed and applied a case vignette methodology to study the potential influence of heuristics and biases in a PWP sample. This is because of the crucial role of PWPs in screening referrals to the IAPT programme. We were interested in investigating whether cognitive processes such as the anchoring and halo effects may influence clinical decisions concerning treatment suitability (i.e. if therapy is an appropriate option), treatment fidelity (i.e. delivering a protocol-driven intervention) and treatment continuation (i.e. decision to lengthen the duration of a treatment). These aspects of decision making are critical for the efficient and effective use of stepped-care interventions (National Institute for Health and Care Excellence, 2011). Two linked studies were conducted. Study 1 used a qualitative design to develop an ecologically valid, clinical vignette-based method to assess the influence of heuristics and biases on clinical decision making. The second study used a randomized crossover experimental design using the clinical vignettes to examine the influence of heuristics and biases in a sample of PWPs. We hypothesized that evoking heuristics and biases would prime counter-normative decisions (i.e. as opposed to normative and clinical guideline-adherent decisions). We also hypothesized that respondents’ general decision-making style (rational vs intuitive) would be significantly associated with clinical decision making.
Method
Ethical approval was granted by the University of Sheffield ethics committee (reference no. 017478). Informed consent to participate and for the results to be published was obtained. Participants’ right to privacy was also respected through the appropriate anonymisation of personally identifiable information.
Study 1
Development of a clinical case vignette methodology
Previous studies using case vignette methods prompted respondents to choose either ‘normative’ (i.e. logical/expected) or ‘counter-normative’ (i.e. unexpected/biased) choices/decisions (e.g. Kahneman and Tversky, Reference Kahneman and Tversky1972; Tversky and Kahneman, Reference Tversky and Kahneman1974). Following this paradigm, we developed case vignettes that would have ecological validity to typical decisions encountered by PWPs in stepped-care services. To design the vignettes, an inductive process was undertaken, informed by ethnographic decision tree modelling (EDTM; Gladwin, Reference Gladwin1989), which included 8-steps in the development of a composite group model. The structure of the decision-making task and the scoring system are presented in Fig. 1.
Participants and processes for focus group and piloting
The development process included three phases. (1) A preliminary draft of clinical case vignettes was developed by the research team, informed by the literature on heuristics and biases, and with a specific focus on anchoring and halo effects. (2) Two experienced PWP educators were recruited to contribute to a consultation focus group, to check the face validity of proposed content of the clinical case vignettes. A semi-structured interview document was developed to guide the focus group and to acquire qualitative data relating to the structure and content of the measure. The focus group lasted one hour, and discussions were audio-recorded and fully transcribed. A ‘living document’ (Shanahan, Reference Shanahan2015) was then developed and was reviewed by research supervisors until consensus was reached regarding content. (3) Then, n=10 PWP educators were recruited for the piloting stage. Piloting entailed participants engaging in the simulation of the full study (see procedure listed in Study 2) and providing feedback. Feedback from the pilot study was then incorporated into the final case vignette design.
Analysis strategy
Data from the focus group were transcribed verbatim and analysed using the six phases of thematic analysis (Braun and Clarke, Reference Braun and Clarke2006). A second coder (trainee clinical psychologist) independently reviewed preliminary codes, themes, and sub-themes. The percentage agreement score was 97.8% and Krippendorff’s alpha was 0.79. The resulting case vignettes and experimental manipulations are described below.
Study 2
Design and participants
A randomized crossover experimental study was designed to evoke specific cognitive processes (i.e. anchoring and halo effects) and to measure their potential influence on clinical decision making.
Participants were recruited using a convenience sampling method from a national workforce of trainee and qualified PWPs working clinically within the IAPT programme. Recruitment took place via email by approaching PWPs via the Psychological Professions Network, Health Education England, British Psychological Society PWP training committee, and PWP course directors network list (nationally). Consenting participants received an information sheet and those who provided consent (using an electronic survey) were consecutively included in the experiment during a four-month study period.
Procedure: experimental manipulation
The experiment was conducted online using the Qualtrics platform (Qualtrics, 2002). The study followed ethical guidelines for internet-mediated studies (British Psychological Society, 2017). Participants were required to read and work through two case vignettes (named ‘Jack’ and ‘Chloe’), each of which prompted them to record the decisions they would make about each patient’s treatment. The experiment was designed in such a way that one of these vignettes contained an experimental manipulation designed to evoke anchoring and halo effects, while the other vignette served as a control condition. As participants completed two tasks, one of which was a control condition, the order in which they completed each scenario was decided by randomization. Furthermore, to control for the influence of spurious details of each case vignette (i.e. word count, gender of the patient, etc.), the inclusion of the experimental manipulation was also decided by randomization. For instance, if a participant was randomized to the sequence Jack–Chloe, where the experimental manipulation was randomized to the Chloe vignette, they would complete the control version of the Jack vignette.
Clinical case vignette methodology
Each case vignette contained a brief description of a patient and presented the participant with three clinical decision points. These decision points reflected common questions arising in routine care relating to (1) suitability (i.e. is this patient suitable for therapy?), (2) treatment fidelity (i.e. should I continue to offer a standard and protocol-driven treatment?), and (3) treatment continuation (i.e. should I continue to offer treatment A, or should I refer the patient to treatment B?). At each decision point, participants were asked to choose the statement (from a list of options) that best resembled what they would decide in routine care. Options were conceptualized a priori as either ‘normative’ (i.e. following clinical guidelines) or ‘counter-normative’ (i.e. deviating from clinical guidelines).
The experimental version of the vignettes was designed to evoke/prime heuristics and biases, in a way that might increase the likelihood of counter-normative responding. Informed by previous research in the context of stepped-care psychological services (Delgadillo et al., Reference Delgadillo, Gellatly and Stephenson-Bellwood2015), we hypothesized that emotionally evocative patient-features (i.e. complicated, highly distressed, potentially difficult to work with) would increase the likelihood of counter-normative (i.e. improvisational and intuitive) decisions which would deviate from those recommended by clinical guidelines. Such features were presented early in the case vignette, expecting that respondents may become influenced by their ‘first impression’ (i.e. the anchor) and then may make later decisions with reference to it (i.e. the halo effect). As each case vignette required three decisions, each ‘normative’ decision was coded ‘1’ and each counter-normative decision was coded ‘0’. Hence, each case vignette yielded a 0–3 clinical decision score (CDS), where higher scores denoted a greater propensity to normative decision making. The CDS was the primary outcome measure.
Measures
Participants completed validated measures of decision-making style, reflective capacity, and personality. The sequence in which questionnaires were presented to each participant was decided by randomization.
Cognitive Reflection Test (CRT; Frederick Reference Frederick2005)
The CRT is a three-item measure that measures the tendency to over-ride an initial ‘gut’ response and engage in further reflection to find a correct answer. It has been shown to account for a substantial unique variance (11.2%, p<.001) in decision-making choices after other measures of individual differences are statistically controlled.
Rational and Intuitive Decision Styles Scale (DSS; Hamilton et al., Reference Hamilton, Shih and Mohammed2016)
This is a 10-item decision style scale capturing a broad range of the rational/intuitive thinking styles construct domains. Test–retest reliability has been reported to be high for both rational (r=.79, p<.01) and intuitive (r=.79, p<.01) dimensions. The DSS has demonstrated high internal consistency and a robust two-factor structure.
Statistical analysis
Sample size calculation
Toplak et al. (Reference Toplak, West and Stanovich2011) demonstrated that the Cognitive Reflection Test (CRT) shows an average correlation of r=.49 with performance on heuristic and biases tasks (i.e. decision-making case vignettes). According to Cohen’s (Reference Cohen1977) method, a sample size of n=38 participants per group would be necessary to detect a large effect size (r=0.50), with an alpha or significance level of 0.05, and 80% power in an experimental design. This yielded a minimum sample size requirement of n=76.
Primary analysis
First, Kruskal–Wallis tests were used to examine differences in mean CDS, comparing experimental vs control tasks. These tests were computed twice, once for each of the case vignettes (i.e. ‘Jack’ and ‘Chloe’), enabling us to examine the overall influence of the experimental manipulation across all clinical decisions featured in these scenarios.
Secondary analyses
Next, linear regression analyses were used to examine if the experimental manipulation (independent variable) influenced CDS (dependent variable) after controlling for individual differences in general decision-making style (CRT, DSS rational/intuitive subscales). Separate regressions were computed for each case vignette. Finally, logistic regressions controlling for general decision-making style (as above) were used to assess if the experimental manipulation was associated with a higher probability of counter-normative decisions for the specific tasks relating to the second (treatment fidelity) and third (treatment continuation) decision points in each vignette. As the first decision point related to a patient’s suitability for treatment, and all participants decided the patients were suitable, there was no variability in the first decision point. Therefore, the logistic regressions were only applied to the second and third decision points.
Results
Study 1
Experimental and control versions of each of the two clinical case vignettes were co-produced through development, consultation and piloting phases. These case vignettes and further information relating to their development are available in the supplemental material. Sample characteristics were not collected in this small sample in order to safeguard anonymity. The vignettes were deemed suitable to then progress onto Study 2.
Study 2
Sample characteristics
One hundred and thirty-three participants who completed the experimental task were included (excluding n=57 who consented but did not complete the task). The participants worked across 16 counties in England. The majority described their gender as female (86.3%). The mean age was 32.86 years (SD=9.13). A large proportion were either qualified PWPs (54.9%) or senior PWPs (19.6%), with an average 5 years of clinical experience in the PWP role (M=5.02; SD=3.37).
Primary analysis
Kruskal–Wallis tests indicated no significant differences in mean CDS comparing experimental vs control conditions in either of the two clinical case vignettes.
Secondary analyses
Linear regression results are reported in Table 1. The regression coefficient for the experimental manipulation was not statistically significant in either model. The only significant predictor in the model was the DSS rational subscale score, which was positively correlated with CDS; β=.19, p<.05.
*p<.05; B, unstandardized regression coefficient; SE B, standard error of the coefficient; β, standardized coefficient; exp, experimental; con, control.
Logistic regression results are presented in Tables 2 and 3. One of the two logistic regressions (case vignette ‘Chloe’) indicated that the experimental manipulation was significantly associated with a higher probability of counter-normative decisions regarding treatment fidelity. However, this did not replicate in the other case vignette (‘Jack’). One of the two logistic regressions (case vignette ‘Jack’) indicated that the experimental manipulation was significantly associated with a higher probability of normative decisions regarding treatment continuation. However, this did not replicate in the other case vignette (‘Chloe’).
B, unstandardised regression coefficient; SE, standard error of the coefficient; d.f., degrees of freedom; 95% CI, confidence intervals; exp, experimental; con, control.
B, unstandardized regression coefficient; SE, standard error of the coefficient; d.f., degrees of freedom; 95% CI, confidence intervals; exp, experimental; con, control.
Discussion
This study investigated clinical decision making by PWPs working within IAPT stepped-care services in England. The study examined whether specific cognitive processes (anchoring and halo effects) and more general decision-making styles (i.e. rational vs intuitive) influenced decisions commonly encountered in stepped-care clinical practice. To this end, we first produced clinical case vignettes in Study 1 that had face validity with PWPs, which resembled realistic clinical cases, and so required PWPs to consider typical dilemmas relating to evidence-based practice. In Study 2, we conducted a cross-over randomized control trial and found that clinical care decisions were not significantly influenced by anchoring and halo effects or the general decision-making style (CRT and DSS scales) of PWPs. In short, the decisions made regarding suitability, treatment fidelity and treatment continuation made by practising PWPs were not systematically biased by anchoring and halo effects.
It is noteworthy that, in one of the vignettes, the apparent effect of the experimental manipulation was contrary to our predictions, as it apparently increased normative rather than counter-normative decisions. This may be because all the sample in the experiment were practising PWPs and therefore would have weekly case management supervision (UCL, 2015). In this type of supervision, PWPs are guided to practise making rational decisions concerning patient care. This practice effect may therefore have dampened the influence of the procedures implemented in the experiment. The normative versus counter-normative ratios suggest that this spurious result may have been also influenced by ‘noise’: variability due to haphazard aspects of the case vignette (e.g. gender, word count, etc.) or occasion (e.g. respondent fatigue, distraction, etc.), rather than the experimental manipulation itself. It is plausible that the case vignettes and priming tasks did not optimally capture the natural variability that may better characterize decision making in routine care. It is likely that there is considerable variability in decisions not only between PWPs but also within PWPs (i.e. over time and across multiple cases). This natural variability or ‘noise’ in professional judgements has been previously documented across various occupations (see Kahneman et al., Reference Kahneman, Sibony and Sunstein2021), and can be modelled by sampling multiple respondents’ judgements across multiple cases (rather than only one or two cases). Data from the present study reconfirmed that clinical decisions can vary from one respondent to another (i.e. some made normative, and others made counter-normative choices). This is consistent with data from previous studies that show wide variability in actual treatment selection decisions made by several psychological professionals across multiple cases in routine practice (e.g. Delgadillo et al., Reference Delgadillo, Huey, Bennett and McMillan2017).
Strengths, limitations and future research
This study followed a rigorous, theoretically informed and multi-phase method to design ecologically valid and realistic clinical case vignettes. This process involved PWPs working in the relevant clinical setting and recruited from nation-wide mailing lists. The experimental design had built-in controls for ordering effects and the recruited sample was adequately powered; the actual sample (n=133) was nearly twice as large as the minimum requirement (n=76).
However, the study also had several limitations that could inform the design of future studies. The convenience sampling approach used to recruit participants could be affected by self-selection bias. For example, very busy PWPs may have declined to participate, potentially limiting the ecological representativeness of the sample. The study only collected basic participant characteristics, and these were not linked to data from the experimental task to safeguard anonymity and to promote participation. Therefore, it was not possible to assess if these or other unmeasured features may have influenced decision making during experimental tasks. The CDS metric had a narrow range (0–3), which may have artificially constrained variability.
An analogue approach was employed through the use of vignettes, rather than studying decision making in a naturalistic setting. Whilst strengths of the analogue approach include tighter control of variables, it is acknowledged that participants might have been inclined to respond in a socially desirable manner (Hare-Mustin and Marecek, Reference Hare-Mustin and Marecek1988). Emotional evocation in case vignettes is likely to be less intense than simulated (e.g. role play) or actual clinical encounters. Participants may have felt less connected and empathic towards the analogue patients than ‘real’ patients and this may have increased the likelihood of providing more normative responses. Results, therefore, may not be a true reflection of how PWPs respond in a real-life clinical setting. Future research should therefore seek to address limitations regarding the design of the dynamic measure such as the lack of variability in the scoring system and potential limitations regarding its ecological validity. Employing fictional video-recorded clinical scenarios rather than case vignettes whilst using a forced choice format to control for variability could be a methodological improvement in future studies. Comparing decision-making styles of participants according to their level of experience could also be appropriate.
Prior to this study, clinical decision making of PWPs has not been widely investigated. Therefore, previous research enabling comparison with the findings from this study were not available. Furthermore, literature related to clinical decision making more generally and the use of the case vignette method in research has not been updated for over 10 years. This is a drawback for the present study, especially compared with the relative newness of the PWP role (UCL, 2015). Another limitation relates to the fact that there is no explicit reference to risk with case vignette ‘Jack’, yet risk is more explicit with case vignette ‘Chloe’. This difference was not considered within the analysis. Furthermore, there is frequent reference to ‘Jack’s’ feelings of hopelessness without any specific risk assessment information. This may have influenced clinical decision making in this case vignette. This is a limitation, considering that hopelessness is a key area of risk assessment in clinical practice (Ribeiro et al., Reference Ribeiro, Huang, Fox and Franklin2018) and therefore likely to be picked up on by participants. Whilst the study design was adequate to study systematic bias, the limited number of vignettes presented to each participant did not allow us to adequately model the true extent to which clinical decisions may be ‘noisy’.
Conclusion
Making clinical decisions is a key part of any clinical role and making rational decisions is in the best interest of the patient. PWPs due to their place in stepped-care IAPT services make many decisions about clinical care, and therefore minimising existing biases in these decisions is important. Clinical decisions can vary between practitioners who encounter similar clinical scenarios. Often, practitioners make decisions that accord with clinical guidelines, but sometimes decisions can be counter-normative. We found no convincing evidence that variability in decision making was systematically biased by anchoring and halo effects, in a clinical profession that is exposed to a regular check on decisions made (i.e. case management supervision). Variability in clinical decisions observed in routine care and in analogue tasks may be better explained by ‘noise’: individual differences and natural fluctuations in judgement quality.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S1352465823000115
Data availability statement
In line with the requirements of the ethics review board for this study, requests for access to data are to be made in writing to the corresponding author.
Acknowledgements
Thank you to all the PWPs who participated.
Author contributions
Benjamin Michael: Conceptualization (equal), Data curation (lead), Formal analysis (equal), Investigation (lead), Methodology (equal), Project administration (lead), Resources (lead), Writing – original draft (lead), Writing – review & editing (lead); Stephen Kellett: Conceptualization (equal), Formal analysis (supporting), Investigation (supporting), Methodology (supporting), Supervision (lead), Writing – original draft (supporting), Writing – review & editing (supporting); Jaime Delgadillo: Conceptualization (equal), Data curation (equal), Formal analysis (equal), Investigation (equal), Methodology (equal), Supervision (supporting), Writing – original draft (supporting), Writing – review & editing (supporting).
Financial support
This research received no specific grant from any funding agency, commercial or not-for-profit sectors.
Competing interests
The authors declare none.
Ethical standards
Informed consent to participate and for the results to be published has been obtained. Authors have abided by the Ethical Principles of Psychologists and Code of Conduct as set out by the BABCP and BPS.
Comments
No Comments have been published for this article.