To the Editor— I was interested to read the paper by Kerlin et alReference Kerlin, Trick and Anderson 1 published in the February 2017 issue of the Infection Control and Hospital Epidemiology.Reference Kerlin, Trick and Anderson 1 The authors compared interrater reliabilities for ventilator-associated event (VAE) surveillance, traditional ventilator-associated pneumonia (VAP) surveillance, and clinical diagnosis of VAP by intensivists.Reference Kerlin, Trick and Anderson 1 In total, 150 charts from intensive care units (ICUs) within 5 hospitals, including all VAEs and traditionally defined VAPs identified during the primary study and randomly selected charts of patients without VAEs or VAPs, were selected for review.Reference Kerlin, Trick and Anderson 1 All charts independently reviewed by 2 research assistants (RAs) for VAEs, 2 hospital infection preventionists (IPs) for traditionally defined VAP, and 2 intensivists for any episodes of pulmonary deterioration.Reference Kerlin, Trick and Anderson 1
Based on their results, in total, 93–96 VAEs were identified by RAs; 31–49 VAPs were identified by IPs, and 29–35 VAPs were diagnosed by intensivists. Interrater reliability between RAs for VAEs was high (κ, 0.71).Reference Kerlin, Trick and Anderson 1 The clinical correlation between VAE surveillance and intensivists’ clinical assessments was poor.
It is crucial to know that using κ value to assess agreement is a common mistake in reproducibility analysis. There are 2 important weaknesses of using a κ value to assess agreement of a qualitative variable: First, it depends upon the prevalence in each category, which means that it is possible to have a different κ value with the same percentage for both concordant and discordant cells! The κ value also depends upon the number of categories.Reference Sabour 2 – Reference Sabour 5 In such situations, a weighted κ is the preferable test because it gives an unbiased result. Moreover, for reliability analysis, an individual-based approach should be applied instead of a global average, which is usually applied for assessing the validity (accuracy) of a test.Reference Sabour 2 – Reference Sabour 5 Finally, reproducibility (ie, precision, reliability, repeatability, calibration) and validity (ie, accuracy, discrimination) are completely different methodological issues that should be assessed using appropriate tests.Reference Sabour, Farzaneh and Peymani 6 – Reference Sabour 10 It is crucial to know that to assess validity, sensitivity, specificity,) positive predictive value (PPV), negative predictive value (NPV), the most appropriate tests are likelihood ratio positive and likelihood ratio negative as well as diagnostic accuracy and odds ratio.Reference Sabour, Farzaneh and Peymani 6 – Reference Sabour 10
Kerlin et al concluded that prospective surveillance using VAE criteria is more reliable than traditional VAP surveillance and clinical VAP diagnosis; the correlation between VAEs and clinically recognized pulmonary deterioration is poor. Such a conclusion may be misleading due to the inappropriate use of a statistical test to assess reliability and validity.
ACKNOWLEDGMENTS
Financial support: No financial support was provided relevant to this article.
Potential conflicts of interest: All authors report no conflicts of interest relevant to this article.