Hostname: page-component-586b7cd67f-dlnhk Total loading time: 0 Render date: 2024-11-29T15:28:10.169Z Has data issue: false hasContentIssue false

Quantifying rater drift on the HAM-D in a sample of standardized rater training events: Implications for reliability and sample size calculations

Published online by Cambridge University Press:  16 April 2020

B. Rothman
Affiliation:
ProPhase LLC, New York, NY, USA
C. Yavorsky
Affiliation:
ProPhase LLC, New York, NY, USA
A. De Fries
Affiliation:
ProPhase LLC, New York, NY, USA
J. Gordon
Affiliation:
ProPhase LLC, New York, NY, USA
M. Opler
Affiliation:
ProPhase LLC, New York, NY, USA New York University, New York, NY, USA

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.
Introduction/objectives/aims

Though rater drift in clinical trials has long been understood to negatively impact trial results, few studies have systematically quantified this. We examined training data for the HAM-D (Hamilton Depression Scale, 17-item version) at two time points to measure the impact.

Methods

Raters participating in a standardized training scored the HAM-D based on two videotaped interviews of depressed patients. To assess drift, data from an initial, post-online training session was compared to data obtained 12 months later. Intra-class correlation coefficients (Shrout & Fliess, 1979) and concordance with expert ratings were compared.

Results

Intra-class correlation coefficients (ICC) for raters (n = 167) following initial training were good to excellent for individual raters (.695–.976, p < .0001) and good for the overall cohort (.752, p < .0001). Concordance with expert ratings was excellent at 99.3%. The overall ICC fell to .730 at the second assessment and although the upper bound of individual performance remained in the good to excellent range, the frequency of scores in the poor to fair range (< .65) increased. Concordance also fell slightly to 87%.

Conclusions

Rater drift occurred over 12 months, as gauged by the metrics of reliability and concordance. Drift was apparent in a limited portion of the cohort but resulted in a lower overall ICC at the second time point. Because studies are generally powered assuming that the ICC remains stable, there are implications for both this power calculation and the required sample size.

Type
P02-88
Copyright
Copyright © European Psychiatric Association 2011
Submit a response

Comments

No Comments have been published for this article.