Skip to main content Accessibility help
×
Hostname: page-component-cd9895bd7-jkksz Total loading time: 0 Render date: 2024-12-24T16:47:28.300Z Has data issue: false hasContentIssue false

2 - Psychometrics and Psychological Assessment

from Part I - General Issues in Clinical Assessment and Diagnosis

Published online by Cambridge University Press:  06 December 2019

Martin Sellbom
Affiliation:
University of Otago, New Zealand
Julie A. Suhr
Affiliation:
Ohio University
Get access

Summary

In this chapter, we address the key psychometric concepts of standardization, reliability, validity, norms, and utility. In doing so, we focus primarily on classical test theory (CTT) – the psychometric framework most commonly used in the clinical assessment literature – which disaggregates a person’s observed score into true score and error components. Given its growing use with psychological instruments, we also present basic information on aspects of item response theory (IRT). In contrast to CTT, IRT assumes that some test items are more relevant than other items for evaluating a person’s true score and that the extent to which an item accurately measures a person’s ability can differ across ability levels. After presenting the central aspects of these two frameworks, we conclude the chapter with a discussion of the need to consider cultural/diversity issues in the development, validation, and use of psychological instruments.

Type
Chapter
Information
Publisher: Cambridge University Press
Print publication year: 2019

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Achenbach, T. M. (2001). What are norms and why do we need valid ones? Clinical Psychology: Science and Practice, 8, 446450.Google Scholar
AERA (American Educational Research Association), APA (American Psychological Association), & NCME (National Council on Measurement in Education). (2014). Standards for educational and psychological testing. Washington, DC: AERA.Google Scholar
Anastasi, A., & Urbina, S. (1997). Psychological testing (7th ed.). Upper Saddle River, NJ: Prentice-Hall.Google Scholar
Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43, 561573.Google Scholar
Andrich, D. (2004). Controversy and the Rasch model. Medical Care, 42, 110.Google Scholar
Arbisi, P. A., Ben-Porath, Y. S., & McNulty, J. (2002). A comparison of MMPI-2 validity in African American and Caucasian psychiatric inpatients. Psychological Assessment, 14, 315.Google Scholar
Baker, F. (2001). The basics of item response theory. College Park, MD: ERIC Clearinghouse on Assessment and Evaluation.Google Scholar
Barry, A. E., Chaney, B. H., Piazza-Gardner, A. K., & Chavarria, E. A. (2014). Validity and reliability reporting in the field of health education and behavior: A review of seven journals. Health Education and Behavior, 41, 1218.Google Scholar
Ben-Porath, Y. S., & Tellegen, A. (2008). The Minnesota Multiphasic Personality Inventory – 2 Restructured Form: Manual for administration, scoring, and interpretation. Minneapolis: University of Minnesota Press.Google Scholar
Bingenheimer, J. B., Raudenbush, S. W., Leventhal, T., & Brooks-Gunn, J. (2005). Measurement equivalence and differential item functioning in family psychology. Journal of Family Psychology, 19, 441455.CrossRefGoogle ScholarPubMed
Bock, R. D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 37, 2951.Google Scholar
Brooks, B. L., Strauss, E., Sherman, E. M. S., Iverson, G. L., & Slick, D. J. (2009). Developments in neuropsychological assessment: Refining psychometric and clinical interpretive methods. Canadian Psychology, 50, 196209.CrossRefGoogle Scholar
Bush, S. S., Ruff, R. M., Tröster, A. I., Barth, J. T., Koffler, S. P., Pliskin, N. H., Reynolds, C. R., & Silver, C. H. (National Academy of Neuropsychology Policy & Planning Committee). (2005). Symptom validity assessment: Practice issues and medical necessity. Archives of Clinical Neuropsychology, 20, 419426.Google Scholar
Chmielewski, M., Clark, L. A., Bagby, R. M., & Watson, D. (2015). Method matters: Understanding diagnostic reliability in DSM-IV and DSM-5. Journal of Abnormal Psychology, 124, 764769.Google Scholar
Cicchetti, D. V. (1994). Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in psychology. Psychological Assessment, 6, 284290.CrossRefGoogle Scholar
Cronbach, L. J., Gleser, G. C., Nanda, H., & Rajaratnum, N. (1972). The dependability of behavioral measures: Theory of generalizability for scores and profiles. New York: John Wiley & Sons.Google Scholar
Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52, 281302.Google Scholar
Dunn, T. J., Baguley, T., & Brunsden, V. (2014). From alpha to omega: A practical solution to the pervasive problem of internal consistency estimation. British Journal of Psychology, 105, 399412.Google Scholar
Fariña, F., Redondo, L., Seijo, D., Novo, M., & Arce, R. (2017). A meta-analytic review of the MMPI validity scales and indexes to detect defensiveness in custody evaluations. International Journal of Clinical and Health Psychology, 17, 128138.Google Scholar
Ferguson, G. A. (1942). Item selection by the constant progress. Psychometrika, 7, 1929.Google Scholar
Fernandez, K., Boccaccini, M. T., & Noland, R. M. (2007). Professionally responsible test selection for Spanish-speaking clients: A four-step approach for identifying and selecting translated tests. Professional Psychology: Research and Practice, 38, 363374.CrossRefGoogle Scholar
Fortney, J. C., Unützer, J., Wrenn, G., Pyne, J. M., Smith, G. R., Schoenbaum, M., & Harbin, H. T. (2017). A tipping point for measurement-based care. Psychiatric Services, 68, 179188.Google Scholar
Fuentes, K., & Cox, B.J. (1997). Prevalence of anxiety disorders in elderly adults: A critical analysis. Journal of Behavior Therapy and Experimental Psychiatry, 28, 269279.Google Scholar
Haynes, S. N., Richard, D. C. S., & Kubany, E. S. (1995). Content validity in psychological assessment: A functional approach to concepts and methods. Psychological Assessment, 7, 238247.Google Scholar
Haynes, S. N., Smith, G., & Hunsley, J. (2019). Scientific foundations of clinical assessment (2nd ed.). New York: Taylor & Francis.Google Scholar
Henson, R., Kogan, L., & Vacha-Haase, T. (2001). A reliability generalization study of the Teacher Efficacy Scale and related instruments. Educational and Psychological Measurement, 61, 404420.Google Scholar
Hogan, T. P. (2014). Psychological testing: A practical introduction (3rd ed.). Hoboken, NJ: John Wiley & Sons.Google Scholar
Hogan, T. P., Benjamin, A., & Brezinski, K. L. (2000). Reliability methods: A note on the frequency of use of various types. Educational and Psychological Measurement, 60, 523531.Google Scholar
Hunsley, J., & Mash, E. J. (Eds.). (2018a). A guide to assessments that work. New York: Oxford University Press.CrossRefGoogle Scholar
Hunsley, J., & Mash, E. J. (2018b). Developing criteria for evidence-based assessment: An introduction to assessments that work. In Hunsley, J. & Mash, E. J. (Eds.), A guide to assessments that work (pp. 314). New York: Oxford University Press.CrossRefGoogle Scholar
Hunsley., J., & Meyer, G. J. (2003). The incremental validity of psychological testing and assessment: Conceptual, methodological, and statistical issues. Psychological Assessment, 15, 446455.Google Scholar
Hurl, K., Wightman, J. K., Haynes, S. N., & Virués-Ortega, J. (2016). Does a pre-intervention functional assessment increase intervention effectiveness? A meta-analysis of within-subject interrupted time-series studies. Clinical Psychology Review, 47, 7184.Google Scholar
Kendall, P. C., Marrs-Garcia, A., Nath, S. R., & Sheldrick, R. C. (1999). Normative comparisons for the evaluation of clinical significance. Journal of Consulting and Clinical Psychology, 67, 285299.Google Scholar
Kieffer, K. M., & Reese, R. J. (2002). A reliability generalization study of the Geriatric Depression Scale (GDS). Educational and Psychological Measurement, 62, 969994.Google Scholar
Kroenke, K., Spitzer, R. L., & Williams, J. B. W. (2001). The PHQ-9: The validity of a brief depression severity measure. Journal of General Internal Medicine, 16, 606613.Google Scholar
Krueger, R. F., Derringer, J., Markon, K. E., Watson, D., & Skodol, A. E. (2012). Initial construction of a maladaptive personality trait model and inventory for DSM-5. Psychological Medicine, 42, 18791890.CrossRefGoogle ScholarPubMed
Lambert, M. J., & Shimokawa, K. (2011). Collecting client feedback. Psychotherapy, 48, 7279.CrossRefGoogle ScholarPubMed
Lord, F. (1952). A theory of test scores. Richmond, VA: Psychometric Corporation.Google Scholar
Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149174.Google Scholar
McGrath, R. E. (2001). Toward more clinically relevant assessment research. Journal of Personality Assessment, 77, 307332.CrossRefGoogle ScholarPubMed
McGrath, R. E., Mitchell, M., Kim, B. H., & Hough, L. (2010). Evidence for response bias as a source of error variance in applied assessment. Psychological Bulletin, 136, 450470.Google Scholar
McGrew, K. S., LaForte, E. M., & Schrank, F. A. (2014). Technical manual: Woodcock-Johnson IV. Rolling Meadows, IL: RiversideGoogle Scholar
Merten, T., Dandachi-FitzGerald, B., Hall, V., Schmand, B. A., Santamaría, P., & González-Ordi, H. (2013). Symptom validity assessment in European countries: Development and state of the art. Clínica y Salud, 24, 129138.Google Scholar
Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons’ responses and performances as scientific inquiry into score meaning. American Psychologist, 50, 741749.CrossRefGoogle Scholar
Milfont, T. L., & Fischer, R. (2010). Testing measurement invariance across groups: Applications in cross-cultural research. International Journal of Psychological Research, 3, 111121.Google Scholar
Miller, C. S., Kimonis, E. R., Otto, R. K., Kline, S. M., & Wasserman, A. L. (2012). Reliability of risk assessment measures used in sexually violent predator proceedings. Psychological Assessment, 24, 944953.Google Scholar
Morash, V. S., & McKerracher, A. (2017). Low reliability of sighted-normed verbal assessment scores when administered to children with visual impairments. Psychological Assessment, 29, 343348.Google Scholar
Morey, L. C. (1991). The Personality Assessment Inventory professional manual. Odessa, FL: Psychological Assessment Resources.Google Scholar
Moskowitz, D. S., Russell, J. J., Sadikaj, G., & Sutton, R. (2009). Measuring people intensively. Canadian Psychology, 50, 131140.Google Scholar
Muraki, E. (1990). Fitting a polytomous item response model to Likert-type data. Applied Psychological Measurement, 14, 5971.Google Scholar
Muraki, E. (1992). A generalized partial credit model: application of an EM algorithm. Applied Psychological Measurement, 16, 159176.Google Scholar
Murphy, K. R., & Davidshofer, C. O. (2005). Psychological testing: Principles and applications (6th ed.). New York: Pearson.Google Scholar
Nelson-Gray, R. O. (2003). Treatment utility of psychological assessment. Psychological Assessment, 15, 521531.Google Scholar
Newton, P. E., & Shaw, S. D. (2013). Standards for talking and thinking about validity. Psychological Methods, 18, 301319.CrossRefGoogle ScholarPubMed
Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric theory (3rd ed.). New York: McGraw-Hill.Google Scholar
Paulhus, D. L. (1998). Manual for the Paulhus Deception Scales: BIDR Version 7. Toronto: Multi-Health Systems.Google Scholar
Reisse, S. P., & Revicki, D. A. (Eds.). (2015). Handbook of item response theory modeling. New York: Routledge.Google Scholar
Revelle, W., & Zinbarg, R.E. (2009). Coefficients alpha, beta, omega and the glb: Comments on Sijtsma. Psychometrika, 74, 145154.Google Scholar
Rodebaugh, T. L., Sculling, R. B., Langer, J. K., Dixon, D. J., Huppert, J. D., Bernstein, A., … Lenze, E. J. (2016). Unreliability as a threat to understanding psychopathology: The cautionary tale of attentional bias. Journal of Abnormal Psychology, 125, 840851.Google Scholar
Rohling, M. L., Larrabee, G. J., Greiffenstein, M. F., Ben-Porath, Y. S., Lees-Haley, P., Green, P., & Greve, K. W. (2011). A misleading review of response bias: Response to McGrath, Mitchell, Kim, & Hough (2010). Psychological Bulletin, 137, 708712.Google Scholar
Rousse, S. V. (2007). Using reliability generalization methods to explore measurement error: An illustration using the MMPI-2 PSY-5 scales. Journal of Personality Assessment, 88, 264275.Google Scholar
Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores (Psychometric Monograph No. 17). Richmond, VA: Psychometric Society. www.psychometrika.org/journal/online/MN17.pdfGoogle Scholar
Schmidt, F. L., & Hunter, J. E. (1977). Development of a general solution to the problem of validity generalization. Journal of Applied Psychology, 62, 529540.CrossRefGoogle Scholar
Sechrest, L. (1963). Incremental validity: A recommendation. Educational and Psychological Measurement, 23, 153158.Google Scholar
Smid, W. J., Kamphuis, J. H., Wever, E. C., & Van Beek, D. J. (2014). A comparison of the predictive properties of the nine sex offender risk assessment instruments. Psychological Assessment, 26, 691703.Google Scholar
Smith, G. T., Fischer, S., & Fister, S. M. (2003). Incremental validity principles in test construction. Psychological Assessment, 15, 467477.Google Scholar
Stanley, D. J., & Spence, J. R. (2014). Expectations for replications: Are yours realistic? Perspectives on Psychological Science, 9, 305318.CrossRefGoogle ScholarPubMed
Strauss, M. E., & Smith, G. T. (2009). Construct validity: Advances in theory and methodology. Annual Review of Clinical Psychology, 5, 89113.Google Scholar
Streiner, D. L. (2003a). Starting at the beginning: An introduction to coefficient alpha and internal consistency. Journal of Personality Assessment, 80, 99103.Google Scholar
Streiner, D. L. (2003b). Diagnosing tests: Using and misusing diagnostic and screening tests. Journal of Personality Assessment, 81, 209219.Google Scholar
Tam, H. E., & Ronan, K. (2017). The application of a feedback-informed approach in psychological service with youth: Systematic review and meta-analysis. Clinical Psychology Review, 55, 4155.Google Scholar
Teglasi, H. (2010). Essentials of TAT and other storytelling assessments (2nd ed.). Hoboken, NJ: Wiley.Google Scholar
Therrien, Z., & Hunsley, J. (2012). Assessment of anxiety in older adults: A systematic review of commonly used measures. Aging and Mental Health, 16, 116.Google Scholar
Therrien, Z., & Hunsley, J. (2013). Assessment of anxiety in older adults: A reliability generalization meta-analysis of commonly used measures. Clinical Gerontologist, 36, 171194.Google Scholar
Tombaugh, T. N. (1996). The Test of Memory Malingering. Toronto: Multi-Health Systems.Google Scholar
Vacha-Haase, T. (1998). Reliability generalization exploring variance in measurement error affecting score reliability across studies. Educational and Psychological Measurement, 58, 620.Google Scholar
Vacha-Haase, T., Henson, R., & Caruso, J. (2002). Reliability generalization: Moving toward improved understanding and use of score reliability. Educational and Psychological Measurement, 62, 562569.Google Scholar
Vacha-Haase, T., & Thompson, B. (2011). Score reliability: A retrospective look back at 12 years of reliability generalization studies. Measurement and Evaluation in Counseling and Development, 44, 159168.Google Scholar
van de Schoot, R., Lugtig, P., & Hox, J. (2012). A checklist for testing measurement invariance. European Journal of Developmental Psychology, 9, 486492.Google Scholar
van der Linden, W. J. (Ed.). (2016a). Handbook of item response theory, Vol. 1. Boca Raton, FL: CRC Press.Google Scholar
van der Linden, W. J. (Ed.). (2016b). Handbook of item response theory, Vol. 2. Boca Raton, FL: CRC Press.CrossRefGoogle Scholar
Wasserman, J. D., & Bracken, B. A. (2013). Fundamental psychometric considerations in assessment. In Graham, J. R. & Naglieri, J. A. (Eds.), Handbook of psychology. Vol. 10: Assessment psychology (2nd ed., pp. 5081). Hoboken, NJ: John Wiley & Sons.Google Scholar
Weisz, J. R., Chorpita, B. F., Frye, A., Ng, M. Y., Lau, N., Bearman, S. K., & Hoagwood, K. E. (2011). Youth Top Problems: Using idiographic, consumer-guided assessment to identify treatment needs and to track change during psychotherapy. Journal of Consulting and Clinical Psychology, 79, 369380.Google Scholar
Wiggins, C. W., Wygant, D. B., Hoelzle, J. B., & Gervais, R. O. (2012). The more you say the less it means: Over-reporting and attenuated criterion validity in a forensic disability sample. Psychological Injury and Law, 5, 162173.Google Scholar
Wood, J. M., Garb, H. N., & Nezworski, M. T. (2006). Psychometrics: Better measurement makes better clinicians. In Lilienfeld, S. O. & O’Donohue, W. T. (Eds.), The great ideas of clinical science: The 17 concepts that every mental health practitioner should understand (pp. 7792). New York: Brunner-Routledge.Google Scholar
Wright, A. G. C., & Simms, L. J. (2014). On the structure of personality disorder traits: Conjoint analyses of the CAT-PD, PID-5, and NEO-PI-3 Trait Models. Personality Disorders: Theory, Research, and Treatment, 5, 4354.CrossRefGoogle ScholarPubMed
Xu, S., & Lorber, M. F. (2014). Interrater agreement statistics with skewed data: Evaluation of alternatives to Cohen’s kappa. Journal of Consulting and Clinical Psychology, 82, 12191227.Google Scholar
Youngstrom, E. A., Van Meter, A., Frazier, T. W., Hunsley, J., Prinstein, M. J., Ong, M.-L., & Youngstrom, J. K. (2017). Evidence-based assessment as an integrative model for applying psychological science to guide the voyage of treatment. Clinical Psychology: Science and Practice, 24, 331363.Google Scholar

Save book to Kindle

To save this book to your Kindle, first ensure [email protected] is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×