Correction for Item Response Theory Latent Trait Measurement Error in Linear Mixed Effects Models

Chun Wang; Gongjun Xu; Xue Zhang

doi:10.1007/s11336-019-09672-7

Correction for Item Response Theory Latent Trait Measurement Error in Linear Mixed Effects Models

Published online by Cambridge University Press: 01 January 2025

Chun Wang

Gongjun Xu and

Xue Zhang

Show author details

Chun Wang*: Affiliation:
University of Washington
Gongjun Xu: Affiliation:
University of Michigan
Xue Zhang: Affiliation:
Northeast Normal University
*: Correspondence should be made to Chun Wang, Measurement and Statistics, College of Education, University of Washington, 312E Miller Hall, Box 353600, Seattle, WA 98195-3600, USA. Email: [email protected]

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

When latent variables are used as outcomes in regression analysis, a common approach that is used to solve the ignored measurement error issue is to take a multilevel perspective on item response modeling (IRT). Although recent computational advancement allows efficient and accurate estimation of multilevel IRT models, we argue that a two-stage divide-and-conquer strategy still has its unique advantages. Within the two-stage framework, three methods that take into account heteroscedastic measurement errors of the dependent variable in stage II analysis are introduced; they are the closed-form marginal MLE, the expectation maximization algorithm, and the moment estimation method. They are compared to the naïve two-stage estimation and the one-stage MCMC estimation. A simulation study is conducted to compare the five methods in terms of model parameter recovery and their standard error estimation. The pros and cons of each method are also discussed to provide guidelines for practitioners. Finally, a real data example is given to illustrate the applications of various methods using the National Educational Longitudinal Survey data (NELS 88).

Keywords

item response theory measurement error marginal maximum likelihood estimation expectation–maximization estimation two-stage estimation

Type: Original Research
Information: Psychometrika , Volume 84 , Issue 3 , September 2019 , pp. 673 - 700

DOI: https://doi.org/10.1007/s11336-019-09672-7 [Opens in a new window]
Copyright: Copyright © 2019 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Adams, R. J.Wilson, M., & Wu, M. (1997). Multilevel item response models: An approach to errors in variables regression. Journal of Educational and Behavioral Statistics, 22 (1), 47–76.CrossRef Google Scholar

Anderson, J. C., & Gerbing, D. W. (1984). The effect of sampling error on convergence, improper solutions, and goodness-of-fit indices for maximum likelihood confirmatory factor analysis. Psychometrika, 49 (2), 155–173.CrossRef Google Scholar

Anderson, J. C., & Gerbing, D. W. (1988). Structural equation modeling in practice: A review and recommended two-step approach. Psychological Bulletin, 103, 411–423.CrossRef Google Scholar

Bacharach, V. R.Baumeister, A. A., & Furr, R. M. (2003). Racial and gender science achievement gaps in secondary education. The Journal of Genetic Psychology, 164 (1), 115–126.CrossRef Google Scholar PubMed

Baker, F. B., & Kim, S.-H. (2004). Item response theory: Parameter estimation techniques, NewYork: Dekker.CrossRef Google Scholar

Bianconcini, S., & Cagnone, S. (2012). A general multivariate latent growth model with applications to student achievement. Journal of Educational and Behavioral Statistics, 37, 339–364.CrossRef Google Scholar

Bollen, K. A. (1989). Structural equations with latent variables, New York: Wiley.CrossRef Google Scholar

Broyden, C. G. (1970). The convergence of a class of double-rank minimization algorithms 1. General considerations. IMA Journal of Applied Mathematics, 6, 76.CrossRef Google Scholar

Buonaccorsi, J. P. (1996). Measurement error in the response in the general linear model. Journal of the American Statistical Association, 91 (434), 633–642.CrossRef Google Scholar

Burt, R. S. (1973). Confirmatory factor-analytic structures and the theory construction process. Sociological Methods and Research, 2 (2), 131–190.CrossRef Google Scholar

Burt, R. S. (1976). Interpretational confounding of unobserved variables in structural equation models. Sociological Methods and Research, 5 (1), 3–52.CrossRef Google Scholar

Byrd, R. H.Lu, P.Nocedal, J., & Zhu, C. (1995). A limited memory algorithm for bound constrained optimization. SIAM Journal on Scientific Computing, 16, 1190–1208.CrossRef Google Scholar

Cai, L. (2008). SEM of another flavour: Two new applications of the supplemented EM algorithm. British Journal of Mathematical and Statistical Psychology, 61 (2), 309–329.CrossRef Google Scholar PubMed

Carroll, R.Ruppert, D.Stefanski, L., & Crainiceanu, C. (2006). Measurement error in nonlinear models: A modern perspective, 2 London: Chapman and Hall.CrossRef Google Scholar

Chang, H., & Stout, W. (1993). The asymptotic posterior normality of the latent trait in an IRT model. Psychometrika, 58, 37–52.CrossRef Google Scholar

Cohen, A. S.Bottge, B. A., & Wells, C. S. (2001). Using item response theory to assess effects of mathematics instruction in special populations. Exceptional Children, 68 (1), 23–44. https://doi.org/10.1177/001440290106800102.CrossRef Google Scholar

Congdon, P. (2001). Bayesian statistical modeling, Chichester: Wiley.Google Scholar

De Boeck, P., & Wilson, M. (2004). A framework for item response models, New York: Springer.CrossRef Google Scholar

De Fraine, B.Van Damme, J., & Onghena, P. (2007). A longitudinal analysis of gender differences in academic self-concept and language achievement: A multivariate multilevel latent growth approach. Contemporary Educational Psychology, 32 (1), 132–150.CrossRef Google Scholar

Dempster, A. P.Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B (Methodological), 39, 1–38.CrossRef Google Scholar

Devanarayan, V., & Stefanski, L. (2002). Empirical simulation extrapolation for measurement error models with replicate measurements. Statistics and Probability Letters, 59, 219–225.CrossRef Google Scholar

Diakow, R. (2010). The use of plausible values in multilevel modeling. Unpublished masters thesis. Berkeley: University of California.Google Scholar

Diakow, R. P. (2013). Improving explanatory inferences from assessments. Unpublished doctoral dissertation. University of California-Berkley.Google Scholar

Drechsler, J. (2015). Multiple imputation of multilevel missing data—Rigor versus simplicity. Journal of Educational and Behavioral Statistics, 40 (1), 69–95.CrossRef Google Scholar

Fan, X.Chen, M., & Matsumoto, A. R. (1997). Gender differences in mathematics achievement: Findings from the National Education Longitudinal Study of 1988. Journal of Experimental Education, 65 (3), 229–242.CrossRef Google Scholar

Fletcher, R. (1970). A new approach to variable metric algorithms. The Computer Journal, 13, 317.CrossRef Google Scholar

Fox, J.-P. (2010). Bayesian item response theory modeling: Theory and applications, New York: Springer.CrossRef Google Scholar

Fox, J.-P., & Glas, C. A. (2001). Bayesian estimation of a multilevel IRT model using Gibbs sampling. Psychometrika, 66 (2), 271–288.CrossRef Google Scholar

Fox, J.-P., & Glas, C. A. (2003). Bayesian modeling of measurement error in predictor variables using item response theory. Psychometrika, 68 (2), 169–191.CrossRef Google Scholar

Fuller, W. (2006). Measurement error models, 2 New York, NY: Wiley.Google Scholar

Goldfarb, D. (1970). A family of variable metric updates derived by variational means. Mathematics of Computation, 24, 23–26.CrossRef Google Scholar

Goldhaber, D. D., & Brewer, D. J. (1997). Why don’t schools and teachers seem to matter? Assessing the impact of unobservables on educational productivity. The Journal of Human Resources, 32 (3), 505–523.CrossRef Google Scholar

Hill, H. C.Rowan, B., & Ball, D. L. (2005). Effects of teachers’ mathematical knowledge for teaching on student achievement. American Educational Research Journal, 42 (2), 371–406.CrossRef Google Scholar

Hong, G., & Yu, B. (2007). Early-grade retention and children’s reading and math learning in elementary years. Educational Evaluation and Policy Analysis, 29, 239–261.CrossRef Google Scholar

Hsiao, Y.Kwok, O., & Lai, M. (2018). Evaluation of two methods for modeling measurement errors when testing interaction effects with observed composite scores. Educational and Psychological Measurement, 78, 181–202.CrossRef Google Scholar PubMed

Jeynes, W. H. (1999). Effects of remarriage following divorce on the academic achievement of children. Journal of Youth and Adolescence, 28 (3), 385–393. https://doi.org/10.1023/A:1021641112640.CrossRef Google Scholar

Kamata, A. (2001). Item analysis by the hierarchical generalized linear model. Journal of Educational Measurement, 38, 79–93.CrossRef Google Scholar

Khoo, S.West, S.Wu, W., & Kwok, O. (2006). Longitudinal methods. Eid, M., & Diener, E. Handbook of psychological measurement: A multimethod perspective, Washington, DC: APA. 301–317.CrossRef Google Scholar

Koedel, C.Leatherman, R., & Parsons, E. (2012). Test measurement error and inference from value-added models. The B. E. Journal of Economic Analysis and Policy, 12, 1–37.CrossRef Google Scholar

Kohli, N.Hughes, J.Wang, C.Zopluoglu, C., & Davison, M. L. (2015). Fitting a linear–linear piecewise growth mixture model with unknown knots: A comparison of two common approaches to inference. Psychological Methods, 20 (2), 259.CrossRef Google Scholar

Kolen, M. J.Hanson, B. A., & Brennan, R. L. (1992). Conditional standard errors of measurement for scale scores. Journal of Educational Measurement, 29, 285–307.CrossRef Google Scholar

Lee, S., & Song, X. (2003). Bayesian analysis of structural equation models with dichotomous variables. Statistics in Medicine, 22, 3073–3088.CrossRef Google Scholar PubMed

Lindstrom, M. J., & Bates, D. (1988). Newton–Raphson and EM algorithms for linear mixed-effects models for repeated measure data. Journal of the American Statistical Association, 83, 1014–1022.Google Scholar

Liu, Y., & Yang, J. (2018). Bootstrap-calibrated interval estimates for latent variable scores in item response theory. Psychometrika, 83, 333–354.CrossRef Google Scholar PubMed

Lockwood, L. R., & McCaffrey, D. F. (2014). Correcting for test score measurement error in ANCOVA models for estimating treatment effects. Journal of Educational and Behavioral Statistics, 39, 22–52.CrossRef Google Scholar

Lu, I. R.Thomas, D. R., & Zumbo, B. D. (2005). Embedding IRT in structural equation models: A comparison with regression based on IRT scores. Structural Equation Modeling, 12 (2), 263–277.CrossRef Google Scholar

Magis, D., & Raiche, G. (2012). On the relationships between Jeffrey’s model and weighted likelihood estimation of ability under logistic IRT models. Psychometrika, 77, 163–169.CrossRef Google Scholar

Meng, X. (1994). Multiple-imputation inferences with uncongenial sources of input. Statistical Science, 10, 538–573.Google Scholar

Meng, X., & Rubin, D. (1993). Maximum likelihood estimation via the ECM algorithm: A general framework. Biometrika, 80, 267–278.CrossRef Google Scholar

Mislevy, R. J.Beaton, A. E.Kaplan, B., & Sheehan, K. M. (1992). Estimating population characteristics from sparse matrix samples of item responses. Journal of Educational Measurement, 29 (2), 133–161.CrossRef Google Scholar

Monseur, C., & Adams, R. J. (2009). Plausible values: How to deal with their limitations. Journal of Applied Measurement, 10 (3), 320–334.Google Scholar PubMed

Murphy, K. (2007). Conjugate Bayesian analysis of the Gaussian distribution. Online file at https://www.cs.ubc.ca/~murphyk/Papers/bayesGauss.pdf Google Scholar

Nelder, J. A., & Mead, R. (1965). A simplex algorithm for function minimization. Computer Journal, 7, 308–313.CrossRef Google Scholar

Nussbaum, E.Hamilton, L., & Snow, R. (1997). Enhancing the validity and usefulness of large-scale educational assessment: IV.NELS:88 Science achievement to 12th grade. American Educational Research Journal, 34, 151–173.Google Scholar

Pastor, D. A., & Beretvas, N. S. (2006). Longitudinal Rasch modeling in the context of psychotherapy outcomes assessment. Applied Psychological Measurement, 30, 100–120.CrossRef Google Scholar

Pinheiro, J. C., & Bates, D. M. (1995). Approximations to the log-likelihood function in the nonlinear mixed-effects model. Journal of computational and Graphical Statistics, 4 (1), 12–35.CrossRef Google Scholar

Rabe-Hesketh, S., & Skrondal, A. (2008). Multilevel and longitudinal modeling using Stata, New York: STATA Press.Google Scholar

Rabe-Hesketh, S.Skrondal, A., & Pickles, A. (2004). GLLAMM manual, Oakland/Berkeley: University of California/Berkeley Electronic Press.Google Scholar

Raudenbush, S. W., & Bryk, A. S. (1985). Empirical Bayes meta-analysis. Journal of Educational and Behavioral Statistics, 10, 75–98.CrossRef Google Scholar

Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applications and data analysis methods, Thousand Oaks, CA: Sage.Google Scholar

Raudenbush, S. W.Bryk, A. S., & Congdon, R. (2004). HLM 6 for windows (computer software), Lincolnwood, IL: Scientific Software International.Google Scholar

Raudenbush, S. W., & Liu, X. (2000). Statistical power and optimal design for multisite randomized trials. Psychological Methods, 5 (2), 199.CrossRef Google Scholar PubMed

Rijmen, F.Vansteelandt, K., & De Boeck, P. (2008). Latent class models for diary method data: Parameter estimation by local computations. Psychometrika, 73 (2), 167–182.CrossRef Google Scholar PubMed

Rosseel, Y. (2012). Lavaan: An R package for structural equation modeling. Journal of Statistical Software, 48 (2), 1–36. https://doi.org/10.18637/jss.v048.i02.CrossRef Google Scholar

Shang, Y. (2012). Measurement error adjustment using the SIMEX method: An application to student growth percentiles. Journal of Educational Measurement, 49, 446–465.CrossRef Google Scholar

Shanno, D. F. (1970). Conditioning of quasi-Newton methods for function minimization. Mathematics of Computation, 24, 647–656.CrossRef Google Scholar

Sirotnik, K., & Wellington, R. (1977). Incidence sampling: an integrated theory for “matrix sampling”. Journal of Educational Measurement, 14, 343–399.CrossRef Google Scholar

Skrondal, A., & Kuha, J. (2012). Improved regression calibration. Psychometrika, 77, 649–669.CrossRef Google Scholar

Skrondal, A., & Laake, P. (2001). Regression among factor scores. Psychometrika, 66 (4), 563–575.CrossRef Google Scholar

Skrondal, A., & Rabe-Hesketh, S. (2004). Generalized latent variable modeling: Multilevel, longitudinal, and structural equation models, Boca Raton: CRC Press.CrossRef Google Scholar

StataCorp., (2011). Stata statistical software: Release 12. College Station, TX: StataCorp LP.Google Scholar

Stoel, R. D.Garre, F. G.Dolan, C., & Van Den Wittenboer, G. (2006). On the likelihood ratio test in structural equation modeling when parameters are subject to boundary constraints. Psychological Methods, 11 (4), 439.CrossRef Google Scholar PubMed

Thompson, N., & Weiss, D. (2011). A framework for the development of computerized adaptive tests. Practical Assessment, Research and Evaluation, 16(1). http://pareonline.net/getvn.asp?v=16&n=1.Google Scholar

Tian, W.Cai, L.Thissen, D., & Xin, T. (2013). Numerical differentiation methods for computing error covariance matrices in item response theory modeling: An evaluation and a new proposal. Educational and Psychological Measurement, 73 (3), 412–439.CrossRef Google Scholar

van der Linden, W. J., & Glas, C. AW. (2010). Elements of adaptive testing (Statistics for social and behavioral sciences series), New York: Springer.Google Scholar

Verhelst, N.Creemers, B. P.Kyriakides, L., & Sammons, P. (2010). IRT models: Parameter estimation, statistical testing and application in EER. Methodological advances in educational effectiveness research, New York: Routledge. 183–218.Google Scholar

von Davier, M., & Sinharay, S. (2007). An importance sampling EM algorithm for latent regression models. Journal of Educational and Behavioral Statistics, 32 (3), 233–251.CrossRef Google Scholar

Wang, C. (2015). On latent trait estimation in multidimensional compensatory item response models. Psychometrika, 80, 428–449.CrossRef Google Scholar PubMed

Wang, C.Kohli, N., & Henn, L. (2016). A second-order longitudinal model for binary outcomes: Item response theory versus structural equation modeling. Structural Equation Modeling: A Multidisciplinary Journal, 23, 455–465.CrossRef Google Scholar

Wang, C., & Nydick, S. (2015). Comparing two algorithms for calibrating the restricted non-compensatory multidimensional IRT model. Applied Psychological Measurement, 39, 119–134.CrossRef Google Scholar PubMed

Warm, T. A. (1989). Weighted likelihood estimation of ability in item response theory. Psychometrika, 54, 427–450.CrossRef Google Scholar

Ye, F. (2016). Latent growth curve analysis with dichotomous items: Comparing four approaches. British Journal of Mathematical and Statistical Psychology, 69, 43–61.CrossRef Google Scholar PubMed

Zwinderman, A. H. (1991). A generalized Rasch model for manifest predictors. Psychometrika, 56 (4), 589–600.CrossRef Google Scholar

Article contents

Correction for Item Response Theory Latent Trait Measurement Error in Linear Mixed Effects Models

Abstract

Keywords

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests