To the Editor—We were interested to read the May 2017 article by Bicking Kinsey et al.Reference Bicking Kinsey, Koirala and Solomon 1 The authors investigated an outbreak of Pseudomonas aeruginosa infections. They found that compared with controls, case patients had higher odds of being in a room without a point-of-use filter (odds ratio [OR], 37.55; 95% confidence interval [CI], 7.16–∞).Reference Bicking Kinsey, Koirala and Solomon 1
Although these results are interesting, some methodological and statistical issues should be considered. The estimated effect size for some risk factors such as unfiltered water is biased due to sparse data bias. In other words, the data are inadequate to estimate a valid and precise OR. The main indicators of sparse data bias are a huge effect-size estimate and a remarkably wide and even infinite confidence interval limit.Reference Greenland, Mansournia and Altman 2 The most common strategy to adjust sparse data bias is a correction of one-half, a conventional method in which one-half is added to each level of exposure–outcome combination prior to statistical analysis.Reference Greenland, Mansournia and Altman 2 The problem with the conventional method is that it can lead to implausible ratio estimates.Reference Greenland, Mansournia and Altman 2 Greenland and Mansournia proposed an advanced method, namely, penalization via data augmentation to adjust and minimize sparse data bias.Reference Greenland, Mansournia and Altman 2 , Reference Greenland and Mansournia 3 In this method, the effect-size estimate is assumed to falls in an acceptable and possible range, such 1/40 to 40. Using penalization, the effect-size estimates are reduced to the range specified.Reference Greenland, Mansournia and Altman 2 We analyzed the presented data in the study conducted by Bicking Kinsey using the penalization method to test how the results can be influenced by sparse data bias. We found that the unfiltered water in univariable model had an estimated OR of 17.23 (95% CI, 3.56–83.19). Thus, we think the true and valid estimated OR for unfiltered water is different than 37.55 (95% CI, 7.16, ∞) as reported in the article.
The take-home message for readers is that sparse data bias is a common bias in biomedical researchReference Ayubi and Safiri 4 – Reference Safiri and Ayubi 7 ; however, it is rarely addressed in analyses. Furthermore, sparse data bias can be minimized using efficient statistical methods.