The National Institutes of Health (NIH) initiated the Patient-Reported Outcomes Measurement Information System \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{\mathrm {{\textregistered }}}$$\end{document} (PROMIS \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{\mathrm {{\textregistered }}})$$\end{document} collaborative in 2004 to develop and provide access to standardized state-of-the-science health-related quality of life (HRQOL) measures for use in health research and clinical practice (Cella et al., Reference Cella, Yount, Rothrock, Gershon, Cook, Reeve and Rose2007b; Cella et al., Reference Cella, Riley, Stone, Rothrock, Reeve and Yount2010). The success of the NIH’s PROMIS project is due to the involvement of a broad range of experts in measurement methods and clinical research from academia, government, and industry working together to build, refine, and implement the PROMIS measurement tools in research and healthcare delivery settings. PROMIS includes over 300 measures of physical, mental, and social aspects of HRQOL that may be used in the general population and for individuals with chronic and acute health conditions. This includes self-report measures for adults (18 years or older), self-report measures for children and adolescents (between 8 and 17 years), and proxy-report measures by caregivers for children between 5 and 17 years of age. The adoption of PROMIS measures by the international community is evidenced by the existence of over 50 translations of at least one of the PROMIS measures (HealthMeasures, 2021; Alonso Reference Alonso, Bartlett, Rose, Aaronson, Chaplin, Efficace and Forrest2013). As of December 2020, there were well over 2000 publications in the scientific literature about PROMIS.
The high quality of the PROMIS measures is due to the multi-method approaches used by the multi-disciplinary experts to design the measures. Initially, PROMIS investigators examined previous research and vetted existing HRQOL measures to identify the salient concepts that should be measured. Next, they derived an initial set of questions (or “items”) to capture each concept following best practices for patient-reported health surveys. Importantly, they conducted multiple rounds of cognitive testing to make sure the PROMIS items are clear, relevant to the patient experience, and content valid (DeWalt et al., Reference DeWalt, Rothrock, Yount and Stone2007; Irwin et al., Reference Irwin, Varni, Yeatts and DeWalt2009). Then, they used a wide range of psychometric methods to evaluate the item and scale properties and to calibrate the items to enable the application of computerized adaptive testing (CAT) and static short form development (Reeve et al., Reference Reeve, Hays, Bjorner, Cook, Crane, Teresi and Cella2007; Cella et al., Reference Cella, Yount, Rothrock, Gershon, Cook, Reeve and Rose2007b; Liu et al., Reference Liu, Cella, Gershon, Shen, Morales, Riley and Hays2010).
For each PROMIS HRQOL domain (e.g., fatigue, depression, physical functioning), there is an item bank that includes a large number of items that capture the salient concepts it intends to measure (i.e., to be content valid) and to estimate the respondent’s level on the HRQOL domain across a broad range of the continuum. Each item underwent extensive evaluation using qualitative and psychometric methods to make sure it is appropriate for measuring the HRQOL domain of interest (DeWalt et al., Reference DeWalt, Rothrock, Yount and Stone2007; Reeve et al., Reference Reeve, Hays, Bjorner, Cook, Crane, Teresi and Cella2007). Items were subsequently calibrated with unidimensional item response theory (IRT) models. For each method, different approaches were applied recognizing each had their strengths and limitations. For example, tests for differential item functioning (DIF) included IRT-based and structural equation modeling (SEM)-based methods. The PROMIS research team held multiple scientific meetings to discuss their approach and seek feedback from the community.
It’s been 17 years since the initiation of PROMIS and over 10 years since its primary measures were released to the public through the HealthMeasures.net website. It has received unprecedented attention for its quality and standards. It has also allowed enough time for the broader scientific community to consider alternate psychometric methods that may give more insight on the item and scale performance and to consider how PROMIS measures may be used in clinical research to assess treatment efficacy. The set of papers in this special section of Psychometrika discuss some of the lessons learned and identify future psychometric directions for HRQOL researchers.
Teresi et al., (Reference Teresi, Wang, Kleinman, Jones and Weiss2021) included authors who were original architects for the approaches used to test for differential item functioning (DIF) for the items included in the PROMIS HRQOL item banks. In their recent article, they summarize the strengths and limitations of some of these approaches including IRT-based and SEM-based methods (Teresi et al., Reference Teresi, Wang, Kleinman, Jones and Weiss2021). They highlight the future work to examine DIF through the lenses of models that account for the multi-dimensional nature of the HRQOL data.
Schalet et al. (Reference Schalet, Lim, Cella and Choi2021) discuss approaches to link PROMIS measures with established (“legacy”) PRO measures to allow the comparison or combination of data from multiple studies that use different PRO measures of the same HRQOL construct. Schalet et al. (Reference Schalet, Lim, Cella and Choi2021) highlight the strengths and limitations of equipercentile, unidimensional IRT-based calibration, and calibrated projection methods.
Cai and Houts (Reference Cai and Houts2021) highlight the value of modeling HRQOL longitudinally. They summarize psychometric methods of growth models, multi-level models, and latent variable models and provide examples with HRQOL data collected by PROMIS measures used in clinical trials.
Hays et al. (Reference Hays, Spritzer and Reise2021) contrast the IRT-based and the classical test theory approaches to evaluate individual change in HRQOL data. Using PROMIS data from a longitudinal study of chronic low back pain and chronic neck pain patients, Hays et al. (Reference Hays, Spritzer and Reise2021) find the CTT-based approach to over-estimate change relative to the IRT-based approach.
Finally, Reise et al. (Reference Reise, Du, Wong, Hubbard and Haviland2021) address the critical issue for all modeling methods to make sure the selected approach should be based on a deep understanding of the concept and its distribution in the target population. Reise et al. (Reference Reise, Du, Wong, Hubbard and Haviland2021) contrast with PROMIS data Samejima’s graded response model with the log-logistic model to illustrate how two methods (with the log-logistic model a nonlinear transformation of the graded response model with equivalent fit) provide different interpretations of the performance of the items (or set of items) for the HRQOL construct being modeled. These are important considerations to make when thinking about constructs that may be continuous in nature versus constructs (e.g., pain) that may be unipolar and skewed.
We hope that this series of papers provides food for thought and stimulates future efforts to apply the most psychometrically appropriate methods in research and clinical practice with PROMIS and other HRQOL measures.