Appraisal is not a new process. It has been used widely in the private business sector since the early 1980s, in a variety of forms and with various aims (Reference Brown, Parry and OyebodeBrown et al, 2003). Its adoption in the medical profession can be linked to a number of healthcare reports published in the 1980s. A report published in the USA in 2000 (Kohn et al) confirmed what had long been suspected – that there were unacceptable levels of preventable error in healthcare. That report, along with the events at Bristol Royal Infirmary (2007), the Royal Liverpool Children's Hospital (2007) and the Shipman Inquiry (2004) in the UK, increased public awareness of medical errors, and brought the need to restore public confidence in the medical profession to the forefront of the agendas of the government, the General Medical Council (GMC) and British Medical Association.
The Department of Health report, Supporting Doctors, Protecting Patients, states that ‘It is not the primary aim of appraisal to scrutinise doctors to see if they are performing badly but rather to help them consolidate and improve on good performance aiming towards excellence.’ (Department of Health, 1999, Department of Health, 1999).
It would seem, therefore, that appraisal has partly conflicting purposes: regulatory, to regain the public's trust, and developmental, to support and educate doctors.
The inclusion of outcome measures and 360-degree appraisal is recommended in the 2004 report from the Special Committee on Clinical Governance (Reference RoyRoy, 2004). However, cardiac surgeons found great difficulties in interpreting a clear-cut outcome, namely operative mortality (Keogh, 2004). They pointed out that operative mortality was influenced greatly by many factors independent of the individual surgeon: other team members such as the anaesthetist; the quality and proximity of intensive postoperative care; and in particular, case mix. They noted enormous difficulty in adjusting for all those factors. They concluded that the practice of an individual surgeon should be investigated only if his or her unadjusted mortality was very extreme: four or more standard deviations from the mean. Clearly in psychiatry there are additional difficulties with outcome measures: what measure to use, the reliability and validity of that measure, the problem of observer bias, getting adequate numbers for statistical comparisons before the data are out of date, and practical difficulties and expense in collecting unbiased data. Some of those problems also apply to 360-degree appraisal.
The dual purpose of appraisal had led to some disquiet as to its role at a local level. Therefore, we decided to ask consultants in the west of Scotland their opinion on the way appraisal is conducted. We anticipated that respondents would be negative about appraisal because of its proposed links to revalidation. We were particularly interested in their views on outcome measures and the 360-degree appraisal, given the problems those incur, as described above.
Method
We developed a questionnaire which looked at two aspects of consultant appraisal. The first part covered thoughts and feelings it evoked and the second part looked at the practical aspects; each part contained 11 questions. Part one covered the following areas: feelings about appraisal; purpose; feasibility of 360-degree appraisal; inclusion of 360-degree evidence; the use of outcome measures in psychiatry; factors influencing outcome measures; who should conduct appraisal. Part two asked for information on: who conducted appraisal; availability of dedicated time; impact on clinical duties; hours spent; accuracy of data included; training given and its adequacy; whether time limits were met; how stressful the process was and whether it had become easier.
Most questions required yes/no/don't know tick-box responses. For one item, feelings about appraisal, five-point Likert scales were used. The consultants chose one of the following: wholly negative; more negative than positive; neutral; more positive than negative; wholly positive.
In the section on outcome measures, we asked those who thought them to be useful in psychiatry to consider the difficulties with them described above. They were then asked whether they still considered them useful.
We asked also for comments, both about appraisal in general and about particular aspects of it.
The questionnaire was piloted on three consultants from the local hospital and modified, following feedback. A copy is available from the authors.
A list of all the consultant psychiatrists, regardless of specialty, was obtained by phoning the personnel departments and secretaries of general and psychiatric hospitals in the west of Scotland region. The questionnaire was sent out with a covering letter explaining its purpose and anonymous nature. Those who did not respond received a second questionnaire 6 weeks later.
Data analysis
Data were analysed using SPSS version 10.1 for Windows and presented as frequencies and percentages.
Each author independently classified the consultants’ invited freehand comments about appraisal into the same five categories as were used for the Likert scales, ranging from wholly negative to wholly positive. Each was unaware of the other's ratings. Interrater reliability (kappa) was analysed, yielding a satisfactory value of 0.71. Where there was a difference in the authors’ ratings, a consensus rating was obtained by choosing the one closer to (or equal to) neutral.
Results
We identified 219 consultants: 15 were unavailable for the study because of maternity leave, long-term sickness or retirement. Of the remaining 204, 158 returned questionnaires after the second posting, yielding a response rate of 77%.
Feelings about appraisal
Six per cent of consultants felt wholly negative about appraisal, 27% were more negative than positive, 27% neutral, 33% more positive than negative, 6% wholly positive, and 1% did not answer.
Purposes of appraisal
The answers to questions about the purposes of appraisal are shown in Table 1.
% | |||
---|---|---|---|
Yes | No | No answer | |
Protection of patients | 53 | 26 | 20 |
Development of consultants | 72 | 12 | 15 |
Political gimmick | 57 | 20 | 21 |
Other purposes of appraisal suggested by the consultants included protection of managers, offloading responsibility onto consultants, and avoiding sitting exams for revalidation.
360-degree appraisal
As regards 360-degree appraisal, 27% of the consultants felt that it was feasible compared with 39% who felt it was not and 30% who did not know. Only 20% included 360-degree appraisal information in their folder.
The material used for 360-degree appraisal included letters, cards and information about presents from patients and their relatives, and letters from colleagues and students.
Outcome measures
When asked if Health of the Nation Outcome Scales (HONOS) (Wing, 1998) should be used to measure outcome, 16% agreed, 39% disagreed, 36% did not know and 9% did not answer.
Other outcome measures they suggested include: Center for Outcomes Research, Clinical Global Impression, Hospital Anxiety and Depression Scale, Zung Self-Rating Depression Scale, Brief Psychiatric Rating Scale, Avon Mental Health Measure, and Global Assessment of Functioning.
When asked if collecting outcome measures was useful 60% said yes, 36% no, 1% did not know and 3% did not answer. Of the 60% who said yes, four-fifths still thought outcome measures were useful after being asked to consider the problems with them discussed above. Nearly all (71 of 76) of those who still thought they were useful made additional comments. While many expressed reservations about them and advised caution in their interpretation, several stated that a start had to be made, despite difficulties. Some thought they were useful for service development but not for comparisons among individuals. Others thought they were useful for serial comparisons within their own practice.
Practical aspects of appraisal
Of the respondents 94% had been appraised. The majority preferred to be appraised by clinical directors or consultant colleagues who knew them and understood about their specialty. Only 54% of those appraised had time dedicated for their appraisal, with 38% having to cancel clinical activities. The median time spent preparing for appraisal was 6 hours (interquartile range (IQR) 4–9). Only 22% of those appraised felt they had been provided with accurate data; 68% had been given training, but as many as 75% felt it was adequate. Only 52% of appraisals had been completed by the deadline. Five per cent found the process very stressful, 24% moderately stressful, 43% mildly stressful, and only 22% not stressful; 6% did not answer.
The majority of respondents (75%) had been appraised more than once. Of those, 64% found the process easier but 23% thought it became more difficult.
Additional comments about any aspect of appraisal
We received comments from 91 participants (58%). They were more negative than positive: 26 wholly negative, 23 more negative than positive, 21 neutral, 11 more positive than negative, and 10 wholly positive. (For examples of consultants’ comments see the online data supplement to this article).
Discussion
Since we received responses from 77% of our target population of consultant psychiatrists in the west of Scotland, we can have reasonable confidence in the results.
Our initial assumptions about appraisal were not confirmed by the study. There was a roughly Gaussian distribution of feelings about appraisal, with a very slight bias towards positive feelings. However, in the 58% of respondents who chose to make comments about appraisal, there was a clear bias towards negative ones.
Consultant appraisal was seen more as a means of developing consultants rather than protecting patients. Its political purpose was recognised by many.
There were some problems with the practical aspects of appraisal. While for most the process was not unduly stressful, whether undertaken for the first time or subsequently, there were some problems with dedicated time and with clinical activities having to be cancelled. More seriously, the vast majority reported they had not received accurate data from the management. Similar difficulties were described in a report by the Royal College of Physicians commissioned in 2006 by the Department of Health and the Welsh Assembly (Reference CroftCroft, 2006).
Irrespective of the opinions of the consultants we surveyed, it seems (at least in England) that appraisal will be used increasingly in a regulatory way as a means of revalidating doctors. The 2006 report by the Department of Health (Department of Health, 2006) and the subsequent government White Paper for England (Department of Health, 2007) proposed that revalidation have two components: relicensure, for all practising doctors, and recertification for specialists and general practitioners, and that ‘Both relicensure and recertification depend on an objective assessment of doctors against clear standards.’ (Department of Health, 2007).
One of the conditions for relicensure is that the ‘doctor has participated in an independent 360-degree feedback exercise in the workplace’ (Department of Health, 2007). It is of interest then that only 27% of our respondents felt the 360-degree appraisal was feasible, the main concern being the validity and reliability of the measurement tool, with even fewer (20%) including such material in their appraisal folders.
The data were collected before the 360-degree Appraisal for Consultant Psychiatrists (ACP360), an instrument for consultant appraisal developed by the Royal College Psychiatrists, was established (Royal College of Psychiatrists, 2005). Clearly the ACP360 is a huge advance on letters and cards from other people. It compares the self-ratings of the consultants with those of colleagues and patients; it has been found to be reliable, but there are still some questions about its validity.
The GMC is also developing an instrument for 360-degree appraisal, but at the time of writing it is only at a pilot stage (General Medical Council, 2007). Sir Liam Donaldson in his report (Department of Health, 2006) points out that there is no agreed definition of a good doctor. So again the question of validity of the appraisal tools arises. Furthermore, at what point does a doctor become bad rather than good, and at what point does the doctor lose his or her licence? If the right to practise is at risk, then instruments to assess it need more than face validity.
There is a drive both at a national and international level to use outcome measures in routine clinical care (Department of Health, 1991, 1998, 1999, 1999; Reference TrauerTrauer, 2003), and yet it has been shown that psychiatrists do not use these (Reference Gilbody, House and SheldonGilbody et al, 2002). It is then surprising that 60% consultants in our study felt that collecting outcome measures was useful, indicating a willingness to do so despite uncertainty about which ones to use. About half of the consultants still believed outcome measures to be useful even after considering the problems with their reliability and validity. However, the comments they made showed that many recognised practical difficulties.
The 2007 White Paper proposes that, ‘Ideally, recertification will be supported by information that shows how clinically effective each doctor's treatment of his or her patients has been’ (Department of Health, 2007). It suggests adjustment of outcomes for case mix, with ‘robust clinical audit’ becoming ‘in time… an important component of recertification for most specialties’ (op. cit.).
There are clearly great problems in setting, with the full confidence of the profession, objective standards against which to validate relicensure and recertification. The results of our survey confirm that there is a long way to go.
Declaration of interest
None.
Acknowledgements
We thank our consultant colleagues in the west of Scotland region.
eLetters
No eLetters have been published for this article.