Hospital-acquired pneumonia (HAP) is the most common hospital-acquired infection worldwide and is associated with high morbidity, mortality, and healthcare costs.Reference Giuliano, Baker and Quinn1–Reference Micek, Chew, Hampton and Kollef3 Non–ventilator-associated HAP (NV-HAP) accounts for most of the HAP,Reference Giuliano, Baker and Quinn1 but prevention measures are hindered by the difficulty measuring and tracking HAP incidence and outcomes using current definitions. Clinical and surveillance definitions for HAP are subjective, complex, and ambiguous on account of the uncertainty inherent in the diagnosis of pneumonia.Reference Horan, Andrus and Dudeck4–Reference Ramirez Batlle, Klompas and Program7 Prior surveillance efforts using administrative claims data, chart review, or even histologic definitions have historically demonstrated poor sensitivity, low reproducibility, and only moderate accuracy.Reference van Mourik, van Duijn, Moons, Bonten and Lee5,Reference Klompas8–Reference Kerlin, Trick and Anderson10 More objective, consistent, and efficient surveillance may be feasible using readily available information in the electronic health record (EHR) including vital signs, oxygenation data, administration of antibiotics, and chest imaging.Reference Ji, McKenna and Ochoa6,Reference Ramirez Batlle, Klompas and Program7 However, this approach requires validation. We aimed (1) to implement an electronic surveillance definition for NV-HAP in a large healthcare system using granular clinical data for case detection rather than diagnosis codes or claims data, and (2) to conduct detailed chart reviews on a random subset of patients to assess the reliability, validity, and overlap between electronic surveillance and other existing NV-HAP definitions.
Methods
Setting and participants
We retrospectively applied an electronic NV-HAP surveillance definition to all hospitalizations at acute-care facilities within the Veterans’ Affairs (VA) healthcare system between January 1, 2015, and November 30, 2020, in patients aged ≥18 years. The VA network is the largest integrated healthcare network in the United States and includes 152 VA medical centers in all 50 US states.11 The study was approved by the Veterans’ Health Administration (VHA) and University of Utah Institutional Review Boards.
Electronic NV-HAP surveillance definition
The electronic surveillance definition was designed to identify nonventilated patients with new respiratory deterioration (≥2 days of decreased oxygen saturation or increase in supplemental oxygen after ≥2 days of stable or improving oxygenation) and concurrent fever or leukocytosis, performance of chest imaging, and the initiation of new antibiotics continued for at least 3 days (Table 1).Reference Ji, McKenna and Ochoa6 This definition was previously pilot tested in 4 hospitals, where it generated credible NV-HAP incidence and mortality estimates and was shown to detect pneumonia with similar accuracy compared to the Centers for Disease Control and Prevention’s National Healthcare Safety Network (CDC-NHSN) PNU1 surveillance criteria for NV-HAP.Reference Ji, McKenna and Ochoa6,Reference Ramirez Batlle, Klompas and Program7 Complete details and SAS codes describing data extraction and definitions are available in Supplementary Appendix 1 (Supplementary Table 2: Oxygen Device Hierarchy and Supplementary Table 3: Definition of New Antibiotic, and SAS Code online). Patient demographics, comorbidities, and clinical outcomes were extracted using previously validated methods.Reference Charlson, Pompei, Ales and MacKenzie12 Electronic health record data were accessed through the Veterans’ Informatics and Computing Infrastructure, a platform that stores VA clinical data for research purposes.13
Note. WBC, white blood cell count.
Note. PABAK, prevalence-adjusted, bias-adjusted κ; CI, confidence interval. CDC, Centers for Disease Control and Prevention; NHSN, National Healthcare Safety Network; NV-HAP, non–ventilator-associated healthcare-associated pneumonia; N/A, not available.
a Each chart reviewed by 2 clinicians. Estimates and 95% confidence intervals are shown.
b Of the 215 reviewed cases occurring after October 1, 2015.
Claims-based NV-HAP definition
We assessed the overlap between the electronic NV-HAP definition and a claims-based NV-HAP definition used by a VA quality improvement initiative. The claims-based criteria defined NV-HAP as the presence of a primary or secondary discharge diagnosis code for pneumonia (ICD-10 codes B95.3, B96.0, J13, J15.X, J16.X, J17.X, J18.X, J84.111, J84.116, J84.117, J84.2, J85.1, and J85.2) that was not present on admission.Reference Carey, Blankenhorn, Chen and Munro14
Medical record review
We randomly selected 250 hospitalizations meeting the electronic NV-HAP surveillance definition for medical record review. Each case was independently reviewed by 2 of 3 clinician reviewers (S.E.S., M.A.C., and B.E.J.). Reviewers utilized a guide that specified a structured, standardized review process (Supplementary Appendix online). Reviewers underwent an iterative adjudication and training process with 4 batches of 10 charts per reviewer before beginning the formal case reviews for the study. Reviewers first confirmed the presence of worsening oxygenation during a 2-day period surrounding the potential NV-HAP index date (Supplementary Tables 1 and 2 online). They then reviewed all clinical notes and imaging results to assess for each of the following aspects: (1) whether the patient experienced a clinical deterioration, or a qualitative worsening of the clinical status, according to the reviewer; (2) CDC-NHSN PNU1 surveillance criteria15 ; (3) whether the treating clinician diagnosed NV-HAP; (4) whether the discharge summary mentioned a diagnosis of pneumonia; and (5) the reviewer’s net clinical impression of whether NV-HAP was suspected, possible, or unlikely based on the totality of data available (the patient’s clinical trajectory, vital signs, imaging, microbiology, response to treatment if provided, and whether there was an alternative diagnosis). CDC-NHSN criteria were modified to provide a more specific definition of criteria for oxygen deterioration and infiltrate on chest imaging (Supplementary Appendix 2 online). Reviewers also provided a summary narrative of the case, including their determination of the most likely etiology of clinical deterioration if present.
Statistical analysis
Among the entire population, we calculated the incidence of NV-HAP using both the electronic surveillance criteria and the claims-based criteria per 100 hospitalizations and 1,000 hospital days. For each hospitalization, only the first electronic surveillance event was counted. For the claims-based definition, we calculated the incidence only among hospitalizations occurring after October 1, 2015, the date when conversion from International Classification of Disease, Ninth Revision (ICD-9) to ICD-10 codes and adoption of present-on-admission codes occurred.
Among the 250 cases, we assessed the interrater reliability of the reviewer assessments between the 2 reviewers for each of the 5 clinical definitions assessed by the reviewers: clinical deterioration, CDC-NHSN criteria, reviewer assessment of NV-HAP, treating clinician assessment of NV-HAP, and pneumonia diagnosis present in discharge summary. We calculated simple agreement (the number of cases in which both reviewers agreed divided by the total number of cases), the Cohen κ (kappa) statistic, and prevalence-adjusted bias-adjusted κ (PABAK). The PABAK method is used to estimate the true proportion of agreement beyond expected chance agreement that provides more stable estimates of interrater reliability when data patterns are rare or very frequent, leading to paradoxical results from the Cohen κ analysis.Reference Byrt, Bishop and Bias16
We calculated the positive predictive value (PPV) of the electronic surveillance definition against each definition as the percent of cases identified by electronic NV-HAP surveillance criteria that were also positive according to (1) both reviewers, and (2) at least 1 reviewer. For reviewer assessment of NV-HAP, both “NV-HAP suspected” and “NV-HAP possible” were treated as a diagnosis of NV-HAP according to a reviewer. We created a matrix plot of intersecting sets using UpSetR to visualize the degree to which the electronic surveillance definition and the 6 existing definitions overlap with each other.Reference Conway, Lex and Gehlenborg17 All statistical analyses were performed using RStudio version 1.4 software (RStudio, PBC, Boston, MA, 2021).18
Analysis of sources of discordance
Among the cases in which the 2 clinician reviewers (S.E.S. and B.E.J.) disagreed on whether a patient had NV-HAP according to CDC-NHSN criteria, reviewer assessment, or clinician documentation, the 2 reviewers conducted independent secondary reviews to identify sources of disagreement. These secondary reviews, which were free-text entries, were then classified by the reviewers together into categories to identify and explore the discrepancies posed by human review methods. False-positive NV-HAP cases identified by the electronic NV-HAP surveillance definition but in which both reviewers felt that NV-HAP was unlikely were secondarily reviewed in a similar fashion: the 2 clinician reviewers reviewed the cases again for alternative causes of the clinical event identified by the surveillance definition and classified cases into categories defined in the preliminary review.
Results
Implementation of electronic surveillance definition
Among 3.1 million hospitalizations and 17.9 million hospital days, 2.3 million hospitalizations had a length of stay ≥3 days and 14,023 met the electronic surveillance definition for NV-HAP, for an incidence of 0.45 per 100 admissions and 0.78 per 1,000 hospital days. Among the 2.7 million hospitalizations occurring after October 1, 2015, 11,264 cases of NV-HAP were detected using the claims-based definition, for an incidence of 0.42 per 100 admissions and 0.73 per 1,000 hospital days (Fig. 1).
Variability among reviewers
Among the 250 cases selected for medical record review, interrater reliability between 2 reviewers was moderate for CDC-NHSN criteria, NV-HAP according to a reviewer assessment, and NV-HAP according to a treating clinician, with simple agreement ranging from 75% to 82% with PABAK ranging from 0.50 to 0.64 (Table 2). Interrater reliability was highest for presence of pneumonia in discharge summary and presence of clinical deterioration (86% and 89% by simple agreement and PABAKs of 0.72 and 0.78, respectively).
Medical record review
The electronic surveillance definition for NV-HAP had moderate PPV compared to multiple definitions of NV-HAP by medical record review (Fig. 2, left margin, and Table 2). Clinical deterioration was deemed present in nearly all cases of electronic NV-HAP (87% by both reviewers, and 98% by at least 1 reviewer). CDC-NHSN criteria were met in 42% of cases according to both reviewers and in 67% of cases according to at least 1 reviewer. NV-HAP was present by reviewer assessment in 50% according to both reviewers and 71% according to at least 1 reviewer, and NV-HAP according to a treating clinician was present in 42% according to both reviewers and 60% according to at least 1 reviewer. A pneumonia diagnosis was listed in the discharge summary in less than half of all cases (35% according to both reviewers, 49% according to at least 1 reviewer). Among the 215 cases occurring after October 1, 2015, only 7.9% of reviewed patients were also identified by the claims-based definition (Table 2).
We found substantial but imperfect overlap between the existing definitions of NV-HAP in our medical record review (Fig. 2). Ten cases were positive by all 6 definitions, and 79 cases met all definitions except for the claims-based definition. Collectively, 206 (82%) of 250 cases met at least 1 of the reviewed definitions of NV-HAP (CDC-NHSN criteria, reviewer, clinician, or discharge summary diagnosis). Incorporating the claims-based definition in addition to chart review did not identify additional cases. There was more overlap between clinical criteria and bedside clinician diagnosis than there was with discharge or claims-based diagnosis: 123 cases had clinical deterioration, CDC-NHSN criteria, NV-HAP according to reviewer, and treating clinician diagnosis, versus 99 cases with these clinical criteria and a discharge summary or claims-based diagnosis. Moreover, 24 cases met all clinical criteria of NV-HAP by CDC-NHSN and reviewer diagnosis and clinical deterioration but lacked a diagnosis of pneumonia in the medical record according to treating clinician, discharge summary, or diagnostic coding.
Sources of discordance between reviewers
Among 168 cases in which at least 1 reviewer thought CDC-NHSN criteria were met, there were 62 cases (37%) in which reviewers disagreed on the CDC-NHSN criteria. The most common source of discordance between reviewers was interpretation of chest imaging reports (60%). Among 178 cases in which at least 1 reviewer believed NV-HAP was present, there were 54 cases (30%) in which reviewers disagreed on the NV-HAP diagnosis; the most common source of discordance was the interpretation of chest imaging reports (56%). Among 151 cases in which at least 1 reviewer believed the treating clinicians diagnosed NV-HAP, there were 45 cases (30%) in which the reviewers disagreed on whether treating clinicians diagnosed NV-HAP. Discordance was due to differences in clinician attribution of deterioration between reviewers, including sepsis, aspiration, and pulmonary edema.
Sources of false-positive NV-HAP determinations
Among the 250 cases flagged by electronic NV-HAP surveillance criteria that underwent medical record review, both reviewers deemed that NV-HAP was not present in the final review in 72 cases (29%). Of these, 26 cases (36%) were attributable to perioperative airway management with increased respiratory support and antibiotics. The other false-positive results were attributable to sepsis or acute respiratory distress syndrome (N = 22, 31%) not caused by pneumonia, community-acquired pneumonia or pneumonia that was present on arrival (N = 6, 8%), heart failure or pulmonary edema (N = 5, 7%), airway protection related to encephalopathy (N = 5, 7%), cardiac arrest (N = 2, 3%), chronic obstructive pulmonary disease or asthma (N = 2, 3%),VAP for which the existence of mechanical ventilation was not documented (N = 3, 4%), and progression of malignancy (N = 1, 1%).
Discussion
In a detailed chart review analysis of an electronic surveillance definition of NV-HAP using clinical data in a large healthcare system, we found moderate correlation between an electronic NV-HAP definition and existing manual surveillance criteria. The PPV was as high as 82% using the most permissive definition (NV-HAP according to either CDC-NSHN criteria, reviewer assessment, treating clinician diagnosis, or discharge summary diagnosis according to at least 1 reviewer) but as low as 42% using the strictest definition (both reviewers agreed that CDC-NHSN criteria were met). In contrast, a claims-based strategy to identify NV-HAP using diagnosis codes detected <10% of patients flagged by the electronic surveillance definition and correlated poorly with other definitions. The variable PPV of the electronic surveillance criteria mirrors the high rates of reviewer variability that we found in all strategies for identifying clinical diagnoses of NV-HAP. Agreement levels between reviewers were moderate regardless of whether reviewers were applying formal CDC-NHSN criteria (κ = 0.50), assessing whether bedside clinicians diagnosed NV-HAP (κ = 0.64), or whether the discharge summary documented pneumonia (κ = 0.72). These findings underscore the complexity and subjectivity of NV-HAP diagnosis and surveillance using manual chart review, even when trained reviewers apply formal criteria.
The moderate accuracy and reviewer variability that we detected for the electronic criteria is similar to that of other definitions used to identify hospital-acquired pneumonia, including facility reporting, diagnosis codes, and medical record review. In a retrospective chart review by See et al,Reference See, Chang and Gualandi19 CDC medical epidemiologists independently reviewed 250 cases reported to the CDC-NHSN with pneumonia or lower respiratory tract infection and found that 8% of reported adult pneumonia cases did not meet CDC-NHSN criteria for NV-HAP and that 15% lacked clinician diagnoses. Similarly, Wolfensberger et alReference Wolfensberger, Meier, Kuster, Mehra, Meier and Sax20 found that the PPV of ICD codes for NV-HAP was 35% and sensitivity was 59% compared to validated surveillance definitions. A systematic review summarizing the accuracy of diagnosis codes for NV-HAP reported similar performance, with sensitivity and specificity of 40% compared to clinical review.Reference van Mourik, van Duijn, Moons, Bonten and Lee5
To address problems with diagnosis codes and manual evaluation of medical records, others have begun to develop approaches that either augment or replace these approaches. Wolfensberger et alReference Wolfensberger, Jakob and Faes Hesse21 validated a semi-automated surveillance system for NV-HAP. By using an EHR-based surveillance definition to identify patients at-risk for NV-HAP, they were able to rule out NV-HAP in 94% of patients and significantly reduce the workload of manual review with high sensitivity and negative predictive value (NPV).Reference Wolfensberger, Jakob and Faes Hesse21 Similar to our study, Ramirez Battle et alReference Ramirez Batlle, Klompas and Program7 found similar accuracy of electronic surveillance criteria for NV-HAP and CDC-NHSN criteria relative to expert chart review at a single center among 120 cases with oxygen deterioration. The electronic surveillance definition demonstrated sensitivity of 71%, PPV of 48%, and NPV of 90%. The CDC-NHSN definition demonstrated sensitivity of 61%, PPV of 59%, and NPV of 88%.Reference Ramirez Batlle, Klompas and Program7 These findings raise the possibility that EHR-based surveillance strategies could improve reproducibility and efficiency without dramatically reducing accuracy.
We detected substantial reviewer variability, despite adhering to a rigorous training process and formal consensus guide. This finding mirrors previous observations of low reliability of human assessment of pneumonia in hospital-acquired pneumonia.Reference Klompas8,Reference Kerlin, Trick and Anderson10,Reference Naidech, Liebling, Duran, Moore, Wunderink and Zembower22 Klompas et alReference Klompas8 reported 62% agreement with a κ = 0.40 among 3 infection control personnel using CDC criteria for the identification of VAP. Kerlin et alReference Kerlin, Trick and Anderson10 reported interreviewer agreement of 66%–83%, with a κ of 0.12 among infection preventionists and a κ of 0.34 among intensivists assessing VAP. In the same vein, humans demonstrate substantial variability in identifying pneumonia by chest imaging, both among reviewers interpreting reports and radiologists evaluating images.Reference See, Chang and Gualandi19,Reference Melbye and Dale23 Human review has historically been considered the gold standard for case detection, but our study adds to the growing evidence suggesting that it may not be an ideal form of measurement.Reference Vassar and Holzmann24 High levels of disagreement between reviewers despite using a common framework to apply agreed-upon definitions demonstrate the subjectivity of pneumonia diagnosis and the difficulty that human reviewers have applying complex definitions in a consistent fashion. This finding supports the development of surveillance approaches that are independent of human review to increase consistency, reduce burden, and ensure scalability.Reference Schreiber, Krauss, Blake, Boone and Almonte25
Our study had several limitations. The small sample size of charts reviewed may not be truly representative of the variability of the population. Additionally, by restricting our analysis to cases meeting the electronic NV-HAP surveillance definition, we did not assess its sensitivity, which could have been affected by missing data. To provide a reliable measure that is amenable to large-scale examination of system-wide quality improvement interventions, electronic surveillance requires high-quality and stable clinical data that are routinely collected and entered without variation across settings or time. The criteria require physical signs of pneumonia (oxygenation, WBC count, temperature) as well as clinical recognition and responses to those physical signs (antibiotic use and chest imaging). Thus, variation in diagnosis and treatment patterns across settings or time could also influence the surveillance measure. Although we have previously validated several of the data elements used for the surveillance criteria,Reference Jones, Haroldsen and Madaras-Kelly26 continuous validation and analysis of variation in the quality of detailed clinical data across settings, systems, and time is essential before facility comparisons or intervention tracking can be pursued. Finally, our estimates of the accuracy of surveillance strategies are limited by the challenges of identifying a reference standard of “true” pneumonia. Although it does not entirely circumvent these challenges, electronic surveillance that does not rely upon diagnostic labels increases reproducibility and efficiency with accuracy that appears consistent with other approaches.
Our findings have important implications for clinical care and public health. NV-HAP is one of the most common and morbid hospital-acquired infections.Reference Giuliano, Baker and Quinn1,Reference Baker and Quinn2,Reference Magill, O’Leary and Janelle27 Robust surveillance and prevention programs are needed, but robust prevention programs need robust surveillance programs to measure and inform progress.Reference Kazaure, Martin, Yoon and Wren28–Reference Lacerna, Patey and Block32 We cannot improve what we cannot measure, and measurement must be timely and consistent as well as accurate. Implementation of systemwide surveillance and prevention efforts are limited by the poor reliability and validity of current approaches, a reflection of the highly variable and often subjective clinical diagnosis of pneumonia. Electronic surveillance has the potential advantage of being more reproducible and more amenable to large-scale examination of systemwide quality-improvement interventions. Our analysis highlights the ongoing challenge with accurately identifying pneumonia but also suggests a potential strategy to increase the scale, efficiency, and reliability of surveillance. No surveillance approach for NV-HAP is perfect. However, applying clinical criteria using data that are routinely entered into the EHR may provide a practical means to characterize the frequency and morbidity of NV-HAP, catalyze prevention programs, and reliably measure impacts on NV-HAP rates and outcomes.
Supplementary material
To view supplementary material for this article, please visit https://doi.org/10.1017/ice.2022.302
Acknowledgments
Financial support
This work was supported by a grant from the US Centers of Disease Control (grant no. 200-2019-05998).
Conflicts of interest
C.R. reports royalties from UpToDate, Inc. (authoring chapters related to procalcitonin use in pneumonia), and consulting fees from Cytovale (sepsis diagnostics) and Pfizer (Lyme disease surveillance). M.K. reports royalties from UpToDate (authoring chapters on pneumonia). B.J. is supported by a VHA HSR&D career development award (no. 150HX001240); and the VA HSR&D Informatics, Decision-Enhancement, and Analytic Sciences (IDEAS) Center of Innovation (grant no. CIN 13-414). All remaining authors report no conflicts of interest relevant to this article.