Measurement

doi:10.1017/9781009170123.021

20 - Measurement

Reliability, Construct Validation, and Scale Construction

from Part IV - Understanding What Your Data Are Telling You About Psychological Processes

Published online by Cambridge University Press: 12 December 2024

William Revelle and

Edited by

Tessa West and

Harry T. Reis: Affiliation:
University of Rochester, New York
Tessa West: Affiliation:
New York University
Charles M. Judd: Affiliation:
University of Colorado Boulder

Book contents

Get access

Summary

Adequate measurement of psychological phenomena is a fundamental aspect of theory construction and validation. Forming composite scales from individual items has a long and honored tradition, although, for predictive purposes, the power of using individual items should be considered. We outline several fundamental steps in the scale construction process, including (1) choosing between prediction and explanation; (2) specifying the construct(s) to measure; (3) choosing items thought to measure these constructs; (4) administering the items; (5) examining the structure and properties of composites of items (scales); (6) forming, scoring, and examining the scales; and (7) validating the resulting scales.

Keywords

psychometrics factor analysis reliability validity scale development

Type: Chapter
Information: Handbook of Research Methods in Social and Personality Psychology , pp. 471 - 501

DOI: https://doi.org/10.1017/9781009170123.021 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2024

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book purchase

Temporarily unavailable

References

Allport, G. W., and Odbert, H. S. (1936). Trait-names: A psycho-lexical study. Psychological Monographs, 47(211), DOI: 10.1037/h0093360.CrossRef Google Scholar

Allport, G. W., and Vernon, P. E. (1933). Studies in Expressive Movement. Macmillan.CrossRef Google Scholar

Arias, V. B., Garrido, L. E., Jenaro, C., Martinez-Molina, A., and Arias, B. (2020). A little garbage in, lots of garbage out: Assessing the impact of careless responding in personality survey data. Behavior Research Methods, 52(6), 2489–2505.CrossRef Google Scholar PubMed

Athenstaedt, U. (2003). On the content and structure of the gender role self-concept: Including gender-stereotypical behaviors in addition to traits. Psychology of Women Quarterly, 27(4), 309–318.CrossRef Google Scholar

Bernaards, C., and Jennrich, R. (2005). Gradient projection algorithms and software for arbitrary rotation criteria in factor analysis. Educational and Psychological Measurement, 65(5), 676–696.CrossRef Google Scholar

Bernreuter, R. (1931). Bernreuter Personality Inventory. Stanford University Press.Google Scholar

Binet, A., and Simon, T. (1905). New methods for the diagnosis of the intellectual level of subnormals. L’annee psychologique, 12, 191–244 (translated in 1916 by E. S. Kite in The Development of Intelligence in Children. Publications of the Training School at Vineland).Google Scholar

Binet, A., and Simon, T. (1916). The Development of Intelligence in Children, translated by Kite, Elizabeth S. (ed. Goddard, H. H.). William and Wilkens Company.Google Scholar

Borsboom, D., Mellenbergh, G. J., and van Heerden, J. (2004). The concept of validity. Psychological Review, 111(4), 1061–1071.CrossRef Google Scholar PubMed

Brown, W. (1910). Some experimental results in the correlation of mental abilities. British Journal of Psychology, 3(3), 296–322.Google Scholar

Campbell, D. P., and Borgen, F. H. (1999). Holland’s theory and the development of interest inventories. Journal of Vocational Behavior, 55(1), 86–101.CrossRef Google Scholar

Campbell, D. T., and Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait–multimethod matrix. Psychological Bulletin, 56(8), 81–105.CrossRef Google Scholar PubMed

Clark, L. A., and Watson, D. (1995). Constructing validity: Basic issues in objective scale development. Psychological Assessment, 7(3), 309–319.CrossRef Google Scholar

Clark, L. A., and Watson, D. (2019). Constructing validity: New developments in creating objective measuring instruments. Psychological Assessment, 31(12), 1412–1427.CrossRef Google Scholar PubMed

Condon, D. M. (2018). The SAPA Personality Inventory: An empirically-derived, hierarchically-organized self-report personality assessment model. PsyArXiv, /sc4p9/, DOI: 10.31234/osf.io/sc4p9.CrossRef Google Scholar

Condon, D. M. (2019). Database of individual differences survey tools. Harvard Dataverse, DOI: 10.7910/DVN/T1NQ4V.CrossRef Google Scholar

Condon, D. M. (2022, June). Retest reliability = f (stability, memory, personality)+ɛ. (presented at symposium in honor of Sarah Dubrow).Google Scholar

Condon, D. M., and Revelle, W. (2014). The international cognitive ability resource: Development and initial validation of a public-domain measure. Intelligence, 43, 52–64.CrossRef Google Scholar

Condon, D. M., and Revelle, W. (2015). Selected personality data from the SAPA-Project: 08dec2013 to 26jul2014. Harvard Dataverse, DOI: 10.7910/DVN/SD7SVE.CrossRef Google Scholar

Condon, D. M., Roney, E., and Revelle, W. (2017a). Selected personality data from the sapa-project: 22dec2015 to 07feb2017 (48,350 participant data file and codebook). Harvard Dataverse, DOI: 10.7910/DVN/TZJGAT.CrossRef Google Scholar

Condon, D. M., Roney, E., and Revelle, W. (2017b). Selected personality data from the sapa-project: 26jul2014 to 22dec2015 (54,855 participant data file and codebook). Harvard Dataverse, DOI: 10.7910/DVN/GU70EV.CrossRef Google Scholar

Condon, D. M., Wood, D., Mõttus, R., Booth, T., Costantini, G., Greiff, S., Johnson, W., Lukaszewski, A., Murray, A., Revelle, W., Wright, A. G. C., Ziegler, M., and Zimmermann, J. (2020). Bottom up construction of a personality taxonomy. European Journal of Psychological Assessment, 36, 923–934.CrossRef Google Scholar

Cronbach, L. J., and Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52(4), 281–302.CrossRef Google Scholar PubMed

Cureton, E. E. (1950). Validity, reliability, and baloney. Educational and Psychological Measurement, 10(1), 94–96.CrossRef Google Scholar

Dawis, R. V. (1992). The individual differences tradition in counseling psychology. Journal of Counseling Psychology, 39(1), 7–19.CrossRef Google Scholar

Del Giudice, M. (2021). Individual and group differences in multivariate domains: What happens with the number of traits increases? PsyArXiv, DOI: 10.31234/osf.io/rgzd2.CrossRef Google Scholar

Eagly, A. H., and Revelle, W. (2022). Understanding the magnitude of psychological differences between women and men requires seeing the forest and the trees. Perspectives on Psychological Science, 17(5), DOI: 10.1177/17456916211046006.CrossRef Google Scholar PubMed

Elleman, L. G., McDougald, S., Revelle, W., and Condon, D. (2020). That takes the BISCUIT: A comparative study of predictive accuracy and parsimony of four statistical learning techniques in personality data, with data missingness conditions. European Journal of Psychological Assessment, 36(6), 948–958.CrossRef Google Scholar

Embretson, S. (2007). Construct validity: A universal validity system or just another test evaluation procedure? Educational Researcher, 36(8), 449–455.CrossRef Google Scholar

Eysenck, H. J., and Eysenck, S. B. G. (1964). Eysenck Personality Inventory. Educational and Industrial Testing Service.Google Scholar

Fyffe, S., Lee, P., and Kaplan, S. (2023). “transforming” personality scale development: Illustrating the potential of state-of-the-art natural language processing. Organizational Research Methods, DOI: 10.1177/10944281231155771.CrossRef Google Scholar

Galton, F. (1865). Hereditary talent and character. Macmillan’s Magazine, 12, 157–166.Google Scholar

Galton, F. (1884). Measurement of character. Fortnightly Review, 36, 179–185.Google Scholar

Goldberg, L. R. (1972). Parameters of personality inventory construction and utilization: A comparison of prediction strategies and tactics. Multivariate Behavioral Research Monographs. No 72-2, 7.Google Scholar

Goldberg, L. R. (1990). An alternative “description of personality”: The Big-Five factor structure. Journal of Personality and Social Psychology, 59(6), 1216–1229.CrossRef Google Scholar PubMed

Goldberg, L. R. (1992). The development of markers for the Big-Five factor structure. Psychological Assessment, 4(1), 26–42.CrossRef Google Scholar

Goldberg, L. R. (1999). A broad-bandwidth, public domain, personality inventory measuring the lower-level facets of several five-factor models. In Mervielde, I., Deary, I., De Fruyt, F., and Ostendorf, F. (eds.) Personality Psychology in Europe, vol. 7. Tilburg University Press.Google Scholar

Goldberg, L. R. (2008). The Eugene-Springfield Community Sample: Information Available from the Research Participants (Technical Report No. 48-1). Oregon Research Institute.Google Scholar

Goldberg, L. R. (2010). Personality, demographics and self reported acts: The development of avocational interest scales from estimates of the amount time spent in interest-related activities. In Agnew, C., Carlston, D., Graziano, W., and Kelly, J. (eds.) Then a Miracle Occurs: Focusing on the Behavior in Social Psychological Theory and Research. Oxford University Press.Google Scholar

Goldberg, L. R., and Kilkowski, J. M. (1985). The prediction of semantic consistency in self-descriptions: Characteristics of persons and of terms that affect the consistency of responses to synonym and antonym pairs. Journal of Personality and Social Psychology, 48(1), 82–98.CrossRef Google Scholar PubMed

Goldberg, L. R., and Saucier, G. (2016). The Eugene-Springfield Community Sample: Information Available from the Research Participants (Technical Report No. 56-1). Oregon Research Institute.Google Scholar

Gough, H. G. (1965) Conceptual analysis of psychological test scores and other diagnostic variables. Journal of Abnormal Psychology, 70, 294–302.CrossRef Google Scholar PubMed

Graziano, W. G., Jensen-Campbell, L. A., Steele, R. G., and Hair, E. C. (1998). Unknown words in self-reported personality: Lethargic and provincial in Texas. Personality and Social Psychology Bulletin, 24(8), 893–905.CrossRef Google Scholar

Gruber, F. M., Distlberger, E., Scherndl, T., Ortner, T. M., and Pletzer, B. (2020). Psychometric properties of the multifaceted gender-related attributes survey (GERAS). European Journal of Psychological Assessment, 36(4), 612–623.CrossRef Google Scholar PubMed

Guttman, L. (1945). A basis for analyzing test–retest reliability. Psychometrika, 10(4), 255–282.CrossRef Google Scholar PubMed

Hathaway, S., and McKinley, J. (1943). Manual for Administering and Scoring the MMPI. University of Minnesota Press.Google Scholar

Hogan, R., and Nicholson, R. A. (1988). The meaning of personality test scores. American Psychologist, 43(8), 621–626.CrossRef Google Scholar

Holzinger, K., and Swineford, F. (1937). The bi-factor method. Psychometrika, 2(1), 41–54.CrossRef Google Scholar

Horn, J. L. (1965). A rationale and test for the number of factors in factor analysis. Psychometrika, 30(2), 179–185.CrossRef Google Scholar PubMed

Johnson, J. A. (2005). Ascertaining the validity of individual protocols from web-based personality inventories. Journal of Research in Personality, 39(1), 103–129.CrossRef Google Scholar

Likert, R. (1932). A technique for the measurement of attitudes. Archives of Psychology (140), 1–53.Google Scholar

Likert, R., Roslow, S., and Murphy, G. (1934). A simple and reliable method of scoring the Thurstone attitude scales. Journal of Social Psychology, 5(2), 228–238.CrossRef Google Scholar

Loevinger, J. (1957). Objective tests as instruments of psychological theory. Psychological Reports Monograph Supplement 9, 3, 635–694.Google Scholar

Lord, F. M., and Novick, M. R. (1968) Statistical Theories of Mental Test Scores. Addison-Wesley.Google Scholar

McDonald, R. P. (1999). Test Theory: A Unified Treatment. L. Erlbaum Associates.Google Scholar

McNemar, Q. (1946). Opinion–attitude methodology. Psychological Bulletin, 43(4), 289–374.CrossRef Google Scholar PubMed

Meade, A. W., and Craig, S. B. (2012). Identifying careless responses in survey data. Psychological methods, 17(3), 437–455.CrossRef Google Scholar PubMed

Mõttus, R., Wood, D., Condon, D. M., Back, M. D., Baumert, A., Costantini, G., Epskamp, S., Greiff, S., Johnson, W., Lukaszewski, A., Murray, A., Revelle, W., Wright, A. G. C., Yarkoni, T., Ziegler, M., and Zimmermann, J. (2020). Descriptive, predictive and explanatory personality research: Different goals, different approaches, but a shared need to move beyond the big few traits. European Journal of Personality, 34(6), 1175–1201.CrossRef Google Scholar

Nájera, P., Abad, F. J. and Sorrel, M. A. (in press). Is EFA always to be preferred? A systematic comparison of factor analytic techniques throughout the confirmatory-exploratory continuum. Psychological Methods.Google Scholar

Nichols, D. S., and Greene, R. L. (1997). Dimensions of deception in personality assessment: The example of the MMPI-2. Journal of Personality Assessment, 68(2), 251–266.CrossRef Google Scholar PubMed

Norman, W. T. (1963). Toward an adequate taxonomy of personality attributes: Replicated factors structure in peer nomination personality ratings. Journal of Abnormal and Social Psychology, 66, 574–583.CrossRef Google Scholar PubMed

Core Team, R. (2023). R: A Language and Environment for Statistical Computing (computer software manual), www.R-project.org.Google Scholar

Reise, S. P. (2012). The rediscovery of bifactor measurement models. Multivariate Behavioral Research, 47(5), 667–696.CrossRef Google Scholar PubMed

Reise, S. P., Morizot, J., and Hays, R. (2007). The role of the bifactor model in resolving dimensionality issues in health outcomes measures. Quality of Life Research, 16(0), 19–31.CrossRef Google Scholar PubMed

Revelle, W. (1979). Hierarchical cluster-analysis and the internal structure of tests. Multivariate Behavioral Research, 14(1), 57–74.CrossRef Google Scholar PubMed

Revelle, W. (2023a). psych: Procedures for Psychological, Psychometric, and Personality Research, ed. 2.3.3 (computer software manual). psych.Google Scholar

Revelle, W. (2023b). psychTools Tools to Accompany the psych Package for Psychological Research, R package version 2.3.3 (computer software manual). psychTools.Google Scholar

Revelle, W., and Anderson, K. J. (1998). Personality, Motivation and Cognitive Performance: Final Report to the Army Research Institute on Contract MDA 903-93-K-0008. Northwestern University.Google Scholar

Revelle, W., and Condon, D. M. (2019). Reliability from α to ω: A tutorial. Psychological Assessment., 31(12), 1395–1411.CrossRef Google Scholar

Revelle, W., Condon, D. M., Wilt, J., French, J. A., Brown, A., and Elleman, L. G. (2017). Web- and phone-based data collection using planned missing designs. In Fielding, N. G., Lee, R. M., and Blank, G. (eds.) Sage Handbook of Online Research Methods, 2nd ed. Sage Publications, Inc.Google Scholar

Revelle, W., Dworak, E. M., and Condon, D. M. (2021). Exploring the persome: The power of the item in understanding personality structure. Personality and Individual Differences, 169, DOI: 10.1016/j.paid.2020.109905.CrossRef Google Scholar

Reyes, D. L. (2020). Combatting carelessness: Can placement of quality check items help reduce careless responses? Current Psychology, 41(2), DOI: 10.1007/s12144-020-01183-4.Google Scholar

Robins, R. W., Hendin, H. M., and Trzesniewski, K. H. (2001). Measuring global self-esteem: Construct validation of a single-item measure and the Rosenberg self-esteem scale. Personality and Social Psychology Bulletin, 27(2), 151–161.CrossRef Google Scholar

Rodgers, J. L., and Nicewander, W. A. (1988). Thirteen ways to look at the correlation coefficient. American Statistician, 42(1), 59–66.CrossRef Google Scholar

Sartori, R., and Pasini, M. (2007). Quality and quantity in test validity: How can we be sure that psychological tests measure what they have to? Quality & Quantity, 41(3), 359–374.CrossRef Google Scholar

Schmid, J. J., and Leiman, J. M. (1957). The development of hierarchical factor solutions. Psychometrika, 22(1), 83–90.CrossRef Google Scholar

Schwaba, T., Rhemtulla, M., Hopwood, C. J., and Bleidorn, W. (2020). A facet atlas: Visualizing networks that describe the blends, cores, and peripheries of personality structure. PLOS ONE, 15(7), 1–21.CrossRef Google Scholar PubMed

Simms, L. J., Zelazny, K., Williams, T. F., and Bernstein, L. (2019). Does the number of response options matter? Psychometric perspectives using personality questionnaire data. Psychological Assessment, 31(4), 557–566.CrossRef Google Scholar PubMed

Spearman, C. (1904a). “General intelligence,” objectively determined and measured. American Journal of Psychology, 15(2), 201–292.CrossRef Google Scholar

Spearman, C. (1904b). The proof and measurement of association between two things. American Journal of Psychology, 15(1), 72–101.CrossRef Google Scholar

Spearman, C. (1910). Correlation calculated from faulty data. British Journal of Psychology, 3(3), 271–295.Google Scholar

Strong, E. K., Jr. (1927). Vocational interest test. Educational Record, 8(2), 107–121.Google Scholar

Thayer, R. E. (1989). The Biopsychology of Mood and Arousal. Oxford University Press.Google Scholar

Ward, M., and Meade, A. W. (2018). Applying social psychology to prevent careless responding during online surveys. Applied Psychology, 67(2), 231–263.CrossRef Google Scholar

Widaman, K. F., and Revelle, W. (2022). Thinking thrice about sum scores, and then some more about measurement and analysis. Behavior Research Methods, 55(3), DOI: 10.3758/s13428-022-01849-w.CrossRef Google Scholar

Woods, S. A., and Hampson, S. E. (2005). Measuring the Big Five with single items using a bipolar response scale. European Journal of Personality, 19(5), 373–390.CrossRef Google Scholar

Yarkoni, T., and Westfall, J. (2017). Choosing prediction over explanation in psychology: Lessons from machine learning. Perspectives on Psychological Science, 12(6), 1100–1122.CrossRef Google Scholar PubMed

Zhang, X., and Savalei, V. (2016). Improving the factor structure of psychological scales: The expanded format as an alternative to the likert scale format. Educational and Psychological Measurement, 76(3), 357–386.CrossRef Google Scholar

Zimmerman, J. (2020). Descriptive, predictive and explanatory personality research: Different goals, different approaches, but a shared need to move beyond the big few traits. European Journal of Personality, 34(6), DOI: 10.1002/per.2311.Google Scholar

Zinbarg, R. E., Revelle, W., Yovel, I., and Li, W. (2005). Cronbach’s α, Revelle’s β, and McDonald’s ω_H: Their relations with each other and two alternative conceptualizations of reliability. Psychometrika, 70(1), 123–133.CrossRef Google Scholar

Zola, A., Condon, D. M., and Revelle, W. (2021, 08). The convergence of self and informant reports in a large online sample. Collabra: Psychology, 7(1), 25983, DOI: 10.1525/collabra.25983.CrossRef Google Scholar