Receiver Operating Characteristic Analysis: Basic Concepts and Practical Applications

doi:10.1017/9781108163781.015

15 - Receiver Operating Characteristic Analysis: Basic Concepts and Practical Applications

from Part III - Perception Metrology

Published online by Cambridge University Press: 20 December 2018

Georgia Tourassi

Edited by

Ehsan Samei and

Elizabeth A. Krupinski

Show author details

Ehsan Samei: Affiliation:
Duke University Medical Center, Durham
Elizabeth A. Krupinski: Affiliation:
Emory University, Atlanta

Book contents

Get access

Summary

A summary is not available for this content so a preview has been provided. Please use the Get access link above for information on how to access this content.

Image of the first page of this content. For PDF version, please use the ‘Save PDF’ preceeding this image.'

Type: Chapter
Information: The Handbook of Medical Image Perception and Techniques , pp. 227 - 244

DOI: https://doi.org/10.1017/9781108163781.015 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2018

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book purchase

Temporarily unavailable

References

Aoki, K., Misumi, J., Kimura, T., Zhao, W., Xie, T. (1997). Evaluation of cutoff levels for screening of gastric cancer using serum pepsinogens and distributions of levels of serum pepsinogen I, Ii and of Pg I/Pg Ii ratios in a gastric cancer case-control study. J Epidemiol, 7, 143–151.Google Scholar

Bamber, D. (1975). The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. J Math Psychol, 12, 387–415.Google Scholar

Begg, C.B., Greenes, R.A. (1983). Assessment of diagnostic tests when disease verification is subject to selection bias. Biometrics, 39, 207–215.Google Scholar

Beiden, S.V., Campbell, G., Meier, K.L., Wagner, R.F. (2000a). The problem of ROC analysis without truth: the EM algorithm and the information matrix. Proc SPIE, 3981, 126–134.Google Scholar

Beiden, S.V., Wagner, R.F., Campbell, G. (2000b). Components-of-variance models and multiple-bootstrap experiments: an alternative method for random effects, receiver operating characteristic analysis. Acad Radiol, 7, 341–349.Google Scholar

Cortes, C., Mohri, M. (2003). AUC optimization vs. error rate. In: Advances in Neural Information Processing Systems 16: Proceedings of the 2003 Conference. Cambridge, MA: MIT Press.Google Scholar

Delong, E.R., Delong, D.M., Clarke-Pearson, D.L. (1988). Comparing the areas under two or more correlated receiver operating characteristics curves: a non-parametric approach. Biometrics, 44, 837–845.Google Scholar

Deneef, P., Kent, D.L. (1993). Using treatment-tradeoff preferences to select diagnostic strategies. Med Decis Making, 13, 126–132.CrossRef Google Scholar PubMed

Dorfman, D.D., Alf, E. (1968). Maximum likelihood estimation of parameters of signal detection theory: a direct solution. Psychometrika, 33, 117–124.CrossRef Google Scholar PubMed

Dorfman, D.D., Alf, E. (1969). Maximum likelihood estimation of parameters of signal detection theory and determination of confidence intervals – rating method data. J Math Psychol, 6, 487–496.Google Scholar

Dorfman, D.D., Berbaum, K.S., Metz, C.E. (1992). Receiver operating characteristic rating analysis: generalization to the population of readers and patients with the jackknife method. Invest Radiol, 27, 723–731.Google Scholar

Dorfman, D.D., Berbaum, K.S., Metz, C.E., Lenth, R.V., Hanley, J.A., Abu Dagga, H. (1997). Proper receiver operating characteristic analysis: the bigamma model. Acta Radiol, 4, 138–149.Google Scholar

Dwyer, A.J. (1997). In pursuit of a piece of the ROC. Radiology, 202, 621–625.Google Scholar

Efron, B., Tibshirani, R.J. (1993). An Introduction to the Bootstrap. New York, NY: Chapman and Hall.CrossRef Google Scholar

Faraggi, D., Reiser, B. (2002). Estimation of the area under the ROC curve. Statistics Med, 21, 3093–3106.CrossRef Google Scholar PubMed

Goddard, M.J., Hinberg, I. (1990). Receiver operating characteristic (ROC) curves and non-normal data: an empirical study. Statistics Med, 9, 325–337.Google Scholar

Greiner, M., Sohr, D., Gobel, P. (1995). A modified ROC analysis for the selection of cut-off values and the definition of intermediate results for serodiagnostic tests. J Immunol Methods, 185, 123–132.Google Scholar

Grmec, I., Kupnik, D. (2004). Does the Mainz emergency evaluation scoring (MEES) in combination with capnometry (MEESC) help in the prognosis of outcome from cardiopulmonary resuscitation in a prehospital setting? Resuscitation, 58, 89–96.CrossRef Google Scholar

Hajian-Tilaki, K.O., Hanley, J.A., Joseph, L., Collet, J.P. (1997). A comparison of parametric and nonparametric approaches to ROC analysis of quantitative diagnostic tests. Med Decis Making, 17, 94–102.Google Scholar

Halpern, E.J., Albert, M., Krieger, A.M., Metz, C.E., Maidment, A.D. (1996). Comparison of receiver operating characteristic curves on the basis of optimal operating points. Acad Radiol, 3, 245–253.Google Scholar

Hand, D.J., Till, R.J. (2001). A simple generalization of the area under the ROC curve to multiple class classification problems. Machine Learn, 45, 171–186.Google Scholar

Hanley, J.A. (1988). The robustness of the “binormal” assumptions used in fitting ROC curves. Med Decis Making, 8, 197–203.CrossRef Google Scholar PubMed

Hanley, J.A. (1996). The use of the ‘‘binormal’’ model for parametric ROC analysis of quantitative diagnostic tests. Statistics Med, 15, 1575–1585.Google Scholar

Hanley, J.A., Hajian-Tilaki, K.O. (1997). Sampling variability of nonparametric estimates of the areas under receiver operating characteristic curves: an update. Acad Radiol, 4, 49–58.CrossRef Google Scholar PubMed

Hanley, J.A., McNeil, B.J. (1982). The meaning and use of the area under a receiver operating characteristic curve. Radiology, 143, 29–36.Google Scholar

Hanley, J.A., McNeil, B.J. (1983). A method for comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology, 148, 839–843.Google Scholar

Harrell, F.E., Jr., Califf, R.M., Pryor, D.B., Lee, K.L., Rosati, R.A. (1982). Evaluating the yield of medical tests. JAMA, 247, 2543–2546.Google Scholar

Henkelman, R.M., Kay, I., Bronskill, M.J. (1990). Receiver operator characteristic (ROC) analysis without truth. Med Decis Making, 10, 24–29.Google Scholar

Ikeda, M., Ishigaki, T., Yamauch, K. (2003). How to establish equivalence between two treatments in ROC analysis. Proc SPIE, 5034, 383–392.Google Scholar

Jiang, Y., Metz, C.E., Nishikawa, R.M. (1996). A receiver operating characteristic partial area index for highly sensitive diagnostic tests. Radiology, 201, 745–750.Google Scholar

Johnson, W.O., Gastwirth, J.L., Pearson, L.M. (2001). Screening without a “gold standard”: the Hui-Walter paradigm revisited. Am J Epidemiol, 153, 921–924.Google Scholar

Kairisto, V., Poola, A. (1995). Software for illustrative presentation of basic clinical characteristics of laboratory tests – Graphroc for Windows. Scand J Clin Lab Invest, 55, 43–60.Google Scholar

Kijewski, M.F., Swennson, R.G., Judy, P.F. (1989). Analysis of rating data from multiple-alternative tasks. J Math Psychol, 33, 1–23.Google Scholar

Lee, W.C. (1999). Probabilistic analysis of global performances of diagnostic tests: interpreting the Lorenz curve-based summary measures. Statistics Med, 18, 455–471.Google Scholar

Lee, W.C., Hsiao, C.K. (1996). Alternative summary indices for the receiver operating characteristic curve. Epidemiology, 7, 605–611.Google Scholar

Li, C.R., Liao, C.-T., Liu, J.-P. (2008). A non-inferiority test for diagnostic accuracy based on the paired partial areas under ROC curves. Statistics Med, 27, 1762–1776.Google Scholar

Liu, J.-P., Ma, M.-C., Wu, C.-Y., Tai, J.-Y. (2006). Tests of equivalence and non-inferiority for diagnostic accuracy based on the paired areas under ROC curves. Statistics Med, 25, 1219–1238.Google Scholar

Lusted, L.B. (1960). Logical analysis in roentgen diagnosis. Radiology, 74, 78–93.Google Scholar

Lusted, L.B. (1961). Signal detectability and medical decision making. Science, 171, 1217–1219.Google Scholar

McClish, D.K. (1989). Analyzing a portion of the ROC curve. Med Decis Making, 9, 190–195.Google Scholar

Metz, C.E. (1978). Basic principles of ROC analysis. Semin Nucl Med, 8, 283–298.CrossRef Google Scholar PubMed

Metz, C.E. (1986a). Statistical analysis of ROC data in evaluating diagnostic performance. In: Herbert, D., Myers, R. (eds.) Multiple Regression Analysis: Applications in the Health Sciences. New York, NY: American Institute of Physics, pp. 365–384.Google Scholar

Metz, C.E. (1986b). ROC methodology in radiologic imaging. Invest Radiol, 21, 720–733.Google Scholar

Metz, C.E., Kronman, H.B. (1980). Statistical significance tests for binormal ROC curves. J Math Psychol, 22, 218–243.Google Scholar

Metz, C.E., Pan, X. (1999). “Proper” binormal ROC curves: theory and maximum-likelihood estimation. J Math Psychol, 43, 1–33.Google Scholar

Metz, C.E., Wang, P.-L., Kronman, H.B. (1984). A new approach for testing the significance of differences between ROC curves measured from correlated data. In: Deconinck, F. (ed.) Information Processing in Medical Imaging. The Hague: Nijhoff, pp. 432–445.CrossRef Google Scholar

Metz, C.E., Herman, B.A., Shen, J.-H. (1998). Maximum-likelihood estimation of ROC curves from continuously-distributed data. Statistics Med, 17, 1033–1053.3.0.CO;2-Z>CrossRef Google Scholar PubMed

Miller, D.P., O’Shaughnessy, K.F., Wood, S.A., Castellino, R.A. (2004). Gold standards and expert panels: a pulmonary nodule case study with challenges and solutions. Proc SPIE, 5372, 173.Google Scholar

Mossman, D. (1999). Three-way ROCs. Med Decis Making, 19, 78–89.Google Scholar

Obuchowski, N.A. (1994). Sample size for receiver operating characteristic studies. Invest Radiol, 29, 238–243.Google Scholar

Obuchowski, N.A. (1997). Testing for equivalence of diagnostic tests. Am J Radiol, 168, 13–17.Google Scholar

Obuchowski, N.A. (2000). Sample size tables for receiver operating characteristic studies. Am J Roentgenol, 175, 603–608.Google Scholar

Obuchowski, N.A. (2005). Multi-reader multi-modality ROC studies: hypothesis testing and sample size estimation using an ANOVA approach with dependent observations. Acad Radiol, 2, 522–529.Google Scholar

Obuchowski, N.A. (2006). An ROC-type measure of diagnostic accuracy when the gold standard is continuous-scale. Statistics Med, 25, 481–493.Google Scholar

Obuchowski, N.A., Liebler, M.L. (1998). Confidence intervals for the receiver operating characteristic area in studies with small samples. Acad Radiol, 5, 561–571.Google Scholar

Obuchowski, N.A., Goske, M.J., Applegate, K.E. (2001). Assessing physicians’ accuracy in diagnosing pediatric patients with acute abdominal pain: measuring accuracy for multiple diseases. Statistics Med, 20, 3261–3278.Google Scholar

Pepe, M.S. (2003). The Statistical Evaluation of Medical Tests for Classification and Prediction. Oxford, UK: Oxford University Press.Google Scholar

Petrick, N., Gallas, B.D., Samuelson, F.W., Wagner, R.F., Myers, K.J. (2005). Influence of panel size and expert skill on truth panel performance when combining expert ratings. Proc SPIE, 5749, 49.CrossRef Google Scholar

Phelps, C.E., Hutson, A. (1995). Estimating diagnostic test accuracy using a “fuzzy gold standard.” Med Decis Making, 15, 44–57.Google Scholar

Schafer, H. (1989). Constructing a cut-off point for a quantitative diagnostic test. Statistics Med, 8, 1381–1391.Google Scholar

Schoonjans, F., Zalata, A., Depuydt, C.E., Comhaire, F.H. (1995). Medcalc: a new computer program for medical statistics. Comput Methods Programs Biomed, 48, 257–262.Google Scholar

Schisterman, E.F., Perkins, N.J., Aiyi, L., Bondell, H. (2005). Optimal cut-point and its corresponding Youden index to discriminate individuals using pooled blood samples. Epidemiology, 16, 73–81.Google Scholar

Schuirmann, D.U.I. (1987). A comparison of the two 1-sided tests procedure and the power approach for assessing the equivalence of average bioavailability. J Pharmacokinet Pharmacodyn, 15, 657–680.Google Scholar

Stephan, C., Wesseling, S., Schink, T., Jung, K. (2003). Comparison of eight computer programs for receiver-operating characteristic analysis. Clin Chem, 49, 433–439.CrossRef Google Scholar PubMed

Swets, J.A. (1979). ROC analysis applied to the evaluation of medical imaging techniques. Invest Radiol, 14, 109–121.Google Scholar

Swets, J.A. (1986). Empirical ROCs in discrimination and diagnostic tasks: implications for theory and measurement of performance. Psychol Bull, 99, 181–198.Google Scholar

Swets, J.A. (1988). Measuring the accuracy of diagnostic systems. Science, 240, 1285–1293.Google Scholar

Swets, J.A. (1992). The science of choosing the right decision threshold in high-stakes diagnostics. Am Psychol, 47, 522–532.Google Scholar

Toledano, A.Y., Gatsonis, C. (1999). Generalized estimating equations for ordinal categorical data: arbitrary patterns of missing responses and missingness in a key covariate. Biometrics, 55, 488–496.Google Scholar

Vergara, I.A., Norambuena, T., Ferrada, E., Slater, A.W., Melo, F. (2008). StAR: a simple tool for the statistical comparison of ROC curves. BMC Bioinformatics, 9, 265.Google Scholar

Wagner, R.F., Beiden, C.V., Metz, C.E., Campbell, G. (2001). Continuous versus categorical data for ROC analysis: some quantitative considerations. Acad Radiol, 8, 328–334.CrossRef Google Scholar PubMed

Wagner, R.F., Metz, C.E., Campbell, G. (2007). Assessment of medical imaging systems and computer aids: a tutorial review. Acad Radiol, 14, 723–748.CrossRef Google Scholar PubMed

Walsh, S.J. (1999). Goodness-of-fit issues in ROC curve estimation. Med Decis Making, 19, 193–201.Google Scholar

Wieand, S., Gail, M.H., James, B.R., James, K.L. (1989). A family of nonparametric statistics for comparing diagnostic markers with paired or unpaired data. Biometrika, 76, 585–592.Google Scholar

Youden, W.J. (1950). Index for rating diagnostic tests. Cancer, 3, 32–35.3.0.CO;2-3>CrossRef Google Scholar PubMed

Zhang, D.D., Zhou, X.-H., Freeman, D.H., Jr., Freeman, J.L. (2002). A non-parametric method for the comparison of partial areas under ROC curves and its application to large health care data sets. Statistics Med, 21, 701–15.Google Scholar

Zhou, X.-H., Higgs, R.E. (2000). Assessing the relative accuracies of two screening tests in the presence of verification bias. Statistics Med, 19, 1697–1705.Google Scholar

Zhou, X.-H., Obuchowski, N.A., McClish, D.K. (2002). Statistical Methods in Diagnostic Medicine. New York, NY: Wiley.CrossRef Google Scholar

Zou, K.H. (2001). Comparison of correlated receiver operating characteristic curves derived from repeated diagnostic test data. Acad Radiol, 8, 225–233.Google Scholar

Zou, K.H., Hall, W.J., Shapiro, D.E. (1997). Smooth non-parametric receiver operating characteristic (ROC) curves for continuous diagnostic tests. Statistics Med, 16, 2143–2156.Google Scholar

Zou, K.H., Tempany, C.M., Fielding, J.R., Silverman, S.G. (1998). Original smooth receiver operating characteristic curve estimation from continuous data: statistical methods for analyzing the predictive value of spiral CT of ureteral stones. Acad Radiol, 5, 680–687.Google Scholar

Zou, K.H., Resnic, F.S., Talos, I.F., et al. (2005). A global goodness-of-fit test for receiver operating characteristic curve analysis via the bootstrap method. J Biomed Informatics, 38, 395–403.Google Scholar