Measurement validity and the integrative approach

Wendy C. Higgins; Alexander J. Gillett; Eliane Deschrijver; Robert M. Ross

doi:10.1017/S0140525X23002194

Measurement validity and the integrative approach

Published online by Cambridge University Press: 05 February 2024

and

Wendy C. Higgins*: Affiliation:
School of Psychological Sciences, Macquarie University, Sydney, NSW, Australia [email protected], https://researchers.mq.edu.au/en/persons/wendy-higgins [email protected], https://researchers.mq.edu.au/en/persons/eliane-deschrijver
Alexander J. Gillett: Affiliation:
Department of Philosophy, Macquarie University, Sydney, NSW, Australia [email protected] [email protected], https://researchers.mq.edu.au/en/persons/robert-ross
Eliane Deschrijver: Affiliation:
School of Psychological Sciences, Macquarie University, Sydney, NSW, Australia [email protected], https://researchers.mq.edu.au/en/persons/wendy-higgins [email protected], https://researchers.mq.edu.au/en/persons/eliane-deschrijver
Robert M. Ross: Affiliation:
Department of Philosophy, Macquarie University, Sydney, NSW, Australia [email protected] [email protected], https://researchers.mq.edu.au/en/persons/robert-ross
*: *Corresponding author.

Article contents

Abstract
Financial support
Competing interest
References

Rights & Permissions

Abstract

Almaatouq et al. propose a novel integrative approach to experiments. We provide three examples of how unaddressed measurement issues threaten the feasibility of the approach and its promise of promoting commensurability and knowledge integration.

Type: Open Peer Commentary
Information: Behavioral and Brain Sciences , Volume 47 , 2024 , e46

DOI: https://doi.org/10.1017/S0140525X23002194 [Opens in a new window]
Copyright: Copyright © The Author(s), 2024. Published by Cambridge University Press

When scientists lack validity evidence for measures, they lack the necessary information to evaluate the overall validity of a study's conclusions.

Flake and Fried (2020, p. 457)

Questionable measurement practices are widespread in the social and behavioural sciences and raise serious questions about the interpretability of numerous studies (Flake & Fried, Reference Flake and Fried2020; Lilienfeld & Strother, Reference Lilienfeld and Strother2020; Vazire, Schiavone, & Bottesini, Reference Vazire, Schiavone and Bottesini2022). Because Almaatouq et al. do not explicitly address measurement, we argue that unresolved measurement issues may threaten the feasibility and utility of their integrative approach. Below, we present three measurement concerns.

First, the interpretability of findings from experiments designed using the integrative approach will rely on the use of valid measurements. Consider the “Moral Machine” experiment (Awad et al., Reference Awad, Dsouza, Kim, Schulz, Henrich, Shariff and Rahwan2018, Reference Awad, Dsouza, Kim, Schulz, Henrich, Shariff and Rahwan2020), which Almaatouq et al. describe as “seminal.” Utilising a modified version of the trolley problem, this experiment evaluated participant's preferences for how autonomous vehicles should weight lives in life-or-death situations based on nine different dimensions. By assessing these dimensions simultaneously and collecting responses from millions of participants, Almaatouq et al. claim that this experiment “offers numerous findings that were neither obvious nor deducible from prior research or traditional experimental designs” (target article, sect. 4.1, para. 2). One of these key findings is that participants are willing to treat people differently based on demographic characteristics when the complexity of a moral decision is increased. However, the validity of this finding has been questioned because it may be an artefact of the forced choice methodology that was used (Bigman & Gray, Reference Bigman and Gray2020). In addition, there is considerable debate in moral psychology about the external validity of the trolley problem and other sacrificial dilemmas (i.e., it is unclear that responses in these tasks predict real-world decisions or ethical judgements; Bauman, McGraw, Bartels, & Warren, Reference Bauman, McGraw, Bartels and Warren2014; Bostyn, Sevenhant, & Roets, Reference Bostyn, Sevenhant and Roets2018). Thus, to our minds, this example demonstrates that no matter how large and integrative an experiment might be, evaluating the validity of the measurements is essential.

Second, the construction of design spaces and the mapping of experiments onto them relies on valid measurement of design space dimensions. However, the validity of measurements, including those obtained from widely used measures, cannot be assumed. Consider Almaatouq et al.'s identification of social perceptiveness as a relevant dimension of group synergy research. They cite four studies that measured social perceptiveness using the Reading the Mind in the Eyes Test (RMET; Almaatouq, Alsobay, Yin, & Watts, Reference Almaatouq, Alsobay, Yin and Watts2021; Engel, Woolley, Jing, Chabris, & Malone, Reference Engel, Woolley, Jing, Chabris and Malone2014; Kim et al., Reference Kim, Engel, Woolley, Lin, McArthur and Malone2017; Woolley, Chabris, Pentland, Hashmi, & Malone, Reference Woolley, Chabris, Pentland, Hashmi and Malone2010). However, it is unclear what psychological constructs the RMET measures. While the RMET has been used to measure multiple dimensions of social cognition, including “theory of mind,” “emotion recognition,” “empathy,” “emotional intelligence,” “mindreading,” “mentalising,” and “social perceptiveness,” there is ongoing debate about the relationship between these constructs and which, if any, of them the RMET actually measures (Kittel, Olderbak, & Wilhelm, Reference Kittel, Olderbak and Wilhelm2022; Oakley, Brewer, Bird, & Catmur, Reference Oakley, Brewer, Bird and Catmur2016; Silverman, Reference Silverman2022). Moreover, despite the extensive use of the RMET (cited over 7,000 times according to Google Scholar), serious questions have been raised about the reliability and validity of RMET scores (Higgins, Ross, Langdon, & Polito, Reference Higgins, Ross, Langdon and Polito2023; Higgins, Ross, Polito, & Kaplan, Reference Higgins, Ross, Polito and Kaplan2023; Kittel et al., Reference Kittel, Olderbak and Wilhelm2022; Olderbak et al., Reference Olderbak, Wilhelm, Olaru, Geiger, Brenneman and Roberts2015). This means that any integrative experiment that uses the RMET to measure social perceptiveness as a dimension of group synergy research will be very difficult to interpret. Given that vast swathes of measures used in psychological and social science research lack good validity evidence (Flake & Fried, Reference Flake and Fried2020), analogous validity concerns are likely to exist for measures of many dimensions of a given design space. Thus, measurement validation is a critical and nontrivial consideration for the construction and implementation of the design spaces at the heart of the integrative approach. Moreover, given that design spaces are likely to include large numbers of dimensions, a coherent strategy to handle these issues must be developed otherwise the integrative approach risks becoming unmanageable in terms of magnitude and complexity.

Third, measurement incommensurability poses a substantial challenge to the feasibility and utility of the integrative approach because knowledge integration relies on valid and commensurable measurements. Consider depression, one of the most prevalent mental health conditions worldwide (Herrman et al., Reference Herrman, Kieling, McGorry, Horton, Sargent and Patel2019). Fried, Flake, and Robinaugh (Reference Fried, Flake and Robinaugh2022) recently identified over 280 different depression measures. Extensive variability in the symptoms assessed by these measures forced them to conclude that different depression measures “seem to measure different ‘depressions’” (p. 360). Moreover, they found that depression measures frequently fail to show measurement invariance, meaning that they might measure different things when used in different groups or contexts. Fried and colleagues’ examination of depression measures is an unusually thorough demonstration of just how serious measurement incommensurability problems can be. Nonetheless, there are indications that validity and commensurability problems extend to a diverse range of research areas which, troublingly, are also pertinent to human welfare, including child and adolescent psychopathology (Stevanovic et al., Reference Stevanovic, Jafari, Knez, Franic, Atilola, Davidovic and Lakic2017); race-related attitudes, beliefs, and motivations (Hester, Axt, Siemers, & Hehman, Reference Hester, Axt, Siemers and Hehman2023); and well-being (Alexandrova & Haybron, Reference Alexandrova and Haybron2016). While Almaatouq et al. claim that their integrative approach “intrinsically promotes commensurability and continuous integration of knowledge” (target article, abstract), it is unclear how the approach can feasibly address incommensurability arising from the use of disparate measures and violations of measurement invariance. Left unaddressed, measurement incommensurability might substantially curtail the knowledge integration potential of the proposed approach.

To summarise, although we are sympathetic to Almaatouq et al.'s ambitious attempt to tackle the substantial challenges in the psychological and behavioural sciences, their lack of engagement with the measurement literature raises serious questions about their approach. If it is to deliver its intended benefits of increased commensurability and knowledge integration, then measurement must be addressed explicitly. It is unclear to us whether this can be achieved while maintaining the feasibility of the proposed integrative approach.

Financial support

This work was supported by an Australian Government Research Training Program (RTP) Scholarship (W. C. H.), a Macquarie University Research Excellence Scholarship (W. C. H.), a Discovery Early Researcher Award (DECRA) by The Australian Research Council (ARC) (E. D., grant number DE220100087), and the John Templeton Foundation (R. M. R., grant number 62631; A.G., grant number 61924).

Competing interest

None.

References

Alexandrova, A., & Haybron, D. M. (2016). Is construct validation valid? Philosophy of Science, 83(5), 1098–1109. https://doi.org/10.1086/687941CrossRef Google Scholar

Almaatouq, A., Alsobay, M., Yin, M., & Watts, D. J. (2021). Task complexity moderates group synergy. Proceedings of the National Academy of Sciences of the United States of America, 118(36), Article e2101062118. https://doi.org/10.1073/pnas.2101062118CrossRef Google Scholar PubMed

Awad, E., Dsouza, S., Kim, R., Schulz, J., Henrich, J., Shariff, A., … Rahwan, I. (2018). The moral machine experiment. Nature, 563(7729), 59–64. https://doi.org/10.1038/s41586-018-0637-6CrossRef Google Scholar PubMed

Awad, E., Dsouza, S., Kim, R., Schulz, J., Henrich, J., Shariff, A., … Rahwan, I. (2020). Reply to: Life and death decisions of autonomous vehicles. Nature, 579(7797), E3–E5. https://doi.org/10.1038/s41586-020-1988-3CrossRef Google Scholar PubMed

Bauman, C. W., McGraw, A. P., Bartels, D. M., & Warren, C. (2014). Revisiting external validity: Concerns about trolley problems and other sacrificial dilemmas in moral psychology. Social and Personality Psychology Compass, 8(9), 536–554. https://doi.org/10.1111/spc3.12131CrossRef Google Scholar

Bigman, Y. E., & Gray, K. (2020). Life and death decisions of autonomous vehicles. Nature, 579(7797), E1–E2. https://doi.org/10.1038/s41586-020-1987-4CrossRef Google Scholar PubMed

Bostyn, D. H., Sevenhant, S., & Roets, A. (2018). Of mice, men, and trolleys: Hypothetical judgment versus real-life behavior in trolley-style moral dilemmas. Psychological Science, 29(7), 1084–1093. https://doi.org/10.1177/0956797617752640CrossRef Google Scholar PubMed

Engel, D., Woolley, A. W., Jing, L. X., Chabris, C. F., & Malone, T. W. (2014). Reading the mind in the eyes or reading between the lines? Theory of mind predicts collective intelligence equally well online and face-to-face. PLoS ONE, 9(12), e115212. https://doi.org/10.1371/journal.pone.0115212CrossRef Google Scholar PubMed

Flake, J. K., & Fried, E. I. (2020). Measurement schmeasurement: Questionable measurement practices and how to avoid them. Advances in Methods and Practices in Psychological Science, 3(4), 456–465. https://doi.org/10.1177/2515245920952393CrossRef Google Scholar

Fried, E. I., Flake, J. K., & Robinaugh, D. J. (2022). Revisiting the theoretical and methodological foundations of depression measurement. Nature Reviews Psychology, 1(6), 358–368. https://doi.org/10.1038/s44159-022-00050-2CrossRef Google Scholar PubMed

Herrman, H., Kieling, C., McGorry, P., Horton, R., Sargent, J., & Patel, V. (2019). Reducing the global burden of depression: A Lancet–World Psychiatric Association Commission. The Lancet, 393(10189), e42–e43. https://doi.org/10.1016/S0140-6736(18)32408-5CrossRef Google Scholar PubMed

Hester, N., Axt, J. R., Siemers, N., & Hehman, E. (2023). Evaluating validity properties of 25 race-related scales. Behavior Research Methods, 55(4), 1758–1777. https://doi.org/10.3758/s13428-022-01873-wCrossRef Google Scholar PubMed

Higgins, W. C., Ross, R. M., Langdon, R., & Polito, V. (2023). The “reading the mind in the eyes” test shows poor psychometric properties in a large, demographically representative U.S. sample. Assessment, 30(6), 1777–1789. https://doi.org/10.1177/10731911221124342CrossRef Google Scholar

Higgins, W. C., Ross, R. M., Polito, V., & Kaplan, D. M. (2023). Three threats to the validity of the reading the mind in the eyes test: A commentary on Pavolova and Sokolov (2022). Neuroscience and Biobehavioral Reviews, 147, 105088. https://doi.org/10.1016/j.neubiorev.2023.105088CrossRef Google Scholar

Kim, Y. J., Engel, D., Woolley, A. W., Lin, J. Y.-T., McArthur, N., & Malone, T. W. (2017). What makes a strong team? Using collective intelligence to predict team performance in League of Legends. In Proceedings of the 2017 ACM conference on computer supported cooperative work and social computing (pp. 2316–2329). Association for Computing Machinery, Portland, Oregon, USA. https://doi.org/10.1145/2998181.2998185CrossRef Google Scholar

Kittel, A. F. D., Olderbak, S., & Wilhelm, O. (2022). Sty in the mind's eye: A meta-analytic investigation of the nomological network and internal consistency of the “reading the mind in the eyes” test. Assessment, 29(5), 872–895. https://doi.org/10.1177/1073191121996469CrossRef Google Scholar

Lilienfeld, S. O., & Strother, A. N. (2020). Psychological measurement and the replication crisis: Four sacred cows. Canadian Psychology, 61(4), 281–288. https://doi.org/10.1037/cap0000236CrossRef Google Scholar

Oakley, B. F., Brewer, R., Bird, G., & Catmur, C. (2016). Theory of mind is not theory of emotion: A cautionary note on the reading the mind in the eyes test. Journal of Abnormal Psychology, 125(6), 818–823. https://doi.org/10.1037/abn0000182CrossRef Google Scholar

Olderbak, S., Wilhelm, O., Olaru, G., Geiger, M., Brenneman, M. W., & Roberts, R. D. (2015). A psychometric analysis of the reading the mind in the eyes test: Toward a brief form for research and applied settings. Frontiers in Psychology, 6, 1503. https://doi.org/10.3389/fpsyg.2015.01503CrossRef Google Scholar

Silverman, C. (2022). How to read “reading the mind in the eyes”. Notes and Records of the Royal Society of London, 76(4), 683–697. https://doi.org/10.1098/rsnr.2021.0058CrossRef Google Scholar

Stevanovic, D., Jafari, P., Knez, R., Franic, T., Atilola, O., Davidovic, N., … Lakic, A. (2017). Can we really use available scales for child and adolescent psychopathology across cultures? A systematic review of cross-cultural measurement invariance data. Transcultural Psychiatry, 54(1), 125–152. https://doi.org/10.1177/1363461516689215CrossRef Google Scholar PubMed

Vazire, S., Schiavone, S. R., & Bottesini, J. G. (2022). Credibility beyond replicability: Improving the four validities in psychological science. Current Directions in Psychological Science, 31(2), 162–168. https://doi.org/10.1177/09637214211067779CrossRef Google Scholar

Woolley, A. W., Chabris, C. F., Pentland, A., Hashmi, N., & Malone, T. W. (2010). Evidence for a collective intelligence factor in the performance of human groups. Science (New York, N.Y.), 330(6004), 686–688. https://doi.org/10.1126/science.1193147CrossRef Google Scholar PubMed