Hostname: page-component-745bb68f8f-b95js Total loading time: 0 Render date: 2025-01-07T15:24:18.188Z Has data issue: false hasContentIssue false

Multiple Equating of Separate IRT Calibrations

Published online by Cambridge University Press:  01 January 2025

Michela Battauz*
Affiliation:
University of Udine
*
Correspondence should be made to Michela Battauz, Department of Economics and Statistics, University of Udine,Udine, Italy. Email: [email protected]; http://people.uniud.it/page/michela.battauz

Abstract

When test forms are calibrated separately, item response theory parameters are not comparable because they are expressed on different measurement scales. The equating process includes the conversion of item parameter estimates on a common scale and the determination of comparable test scores. Various statistical methods have been proposed to perform equating between two test forms. This paper provides a generalization to multiple test forms of the mean-geometric mean, the mean-mean, the Haebara, and the Stocking–Lord methods. The proposed methods estimate simultaneously the equating coefficients that permit the scale transformation of the parameters of all forms to the scale of the base form. Asymptotic standard errors of the equating coefficients are derived. A simulation study is presented to illustrate the performance of the methods.

Type
Original paper
Copyright
Copyright © 2016 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Baldwin, P.. (2013). On mean-sigma estimators and bias. British Journal of Mathematical and Statistical Psychology, 66, 277289. doi:10.1111/j.2044-8317.2012.02048.x.CrossRefGoogle ScholarPubMed
Battauz, M.. (2013). IRT test equating in complex linkage plans. Psychometrika, 78, 464480. doi:10.1007/s11336-012-9316-y.CrossRefGoogle ScholarPubMed
Battauz, M.. (2015). equateIRT: An R package for IRT test equating. Journal of Statistical Software, 68, 122. doi:10.18637/jss.v068.i07.CrossRefGoogle Scholar
Bock, R. D., & Aitkin, M.. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46, 443459. doi:10.1007/BF02293801.CrossRefGoogle Scholar
Deming, W. E., & Stephan, F. F.. (1940). On a least squares adjustment of a sampled frequency table when the expected marginal totals are known. The Annals of Mathematical Statistics, 11, 427444. doi:10.1214/aoms/1177731829.CrossRefGoogle Scholar
Goodman, L. A.. (1968). The analysis of cross-classified data: independence, quasi-independence and interactions in contingency tables with or without missing entries. Journal of the American Statistical Association, 63, 10911131.Google Scholar
Haberman, S. J. (2009). Linking parameter estimates derived from an item response model through separate calibrations. ETS Research Report Series, 2009, i-9. doi:10.1002/j.2333-8504.2009.tb02197.x.CrossRefGoogle Scholar
Haberman, S. J., Lee, Y. H. & Qian, J. (2009). Jackknifing techniques for evaluation of equating accuracy . ETS Research Report Series, 2009, i-37. doi:10.1002/j.2333-8504.2009.tb02196.x.CrossRefGoogle Scholar
Haebara, T.. (1980). Equating logistic ability scales by a weighted least squares method. Japanese Psychological Research, 22, 144149. doi:10.4992/psycholres1954.22.144.CrossRefGoogle Scholar
Kim, S., Kolen, M. J.. (2007). Effects on scale linking of different definitions of criterion functions for the IRT characteristic curve methods. Journal of Educational and Behavioral Statistics, 32, 371397. doi:10.3102/1076998607302632.CrossRefGoogle Scholar
Kolen, M. J., & Brennan, R. L., (2014). Test equating, scaling, and linking: methods and practices. 3New York: Springerdoi:10.1007/978-1-4939-0317-7.CrossRefGoogle Scholar
Lee, Y-H, Haberman, S. J.. (2013). Harmonic regression and scale stability. Psychometrika, 78, 815829. doi:10.1007/s11336-013-9337-1.CrossRefGoogle ScholarPubMed
Loyd, B. H., & Hoover, H. D.. (1980). Vertical equating using the Rasch model. Journal of Educational Measurement, 17, 179193. doi:10.1111/j.1745-3984.1980.tb00825.x.CrossRefGoogle Scholar
Marco, G. L.. (1977). Item characteristic curve solutions to three intractable testing problems. Journal of Educational Measurement, 14, 139160. doi:10.1111/j.1745-3984.1977.tb00033.x.CrossRefGoogle Scholar
Michaelides, M. P., & Haertel, E. H.. (2014). Selection of common items as an unrecognized source of variability in test equating: A bootstrap approximation assuming random sampling of common items. Applied Measurement in Education, 27, 4657. doi:10.1080/08957347.2013.853069.CrossRefGoogle Scholar
Mislevy, R. J. & Bock, R. D. (1990). BILOG 3. Item analysis and test scoring with binary logistic models. Mooresville, IN: Scientific Software..Google Scholar
Ogasawara, H.. (2000). Asymptotic standard errors of IRT equating coefficients using moments. Economic Review (Otaru University of Commerce), 51, 123.Google Scholar
Ogasawara, H.. (2001). Item response theory true score equatings and their standard errors. Journal of Educational and Behavioral Statistics, 26, 3150. doi:10.3102/10769986026001031.CrossRefGoogle Scholar
Ogasawara, H.. (2001). Standard errors of item response theory equating/linking by response function methods. Applied Psychological Measurement, 25, 5367. doi:10.1177/01466216010251004.CrossRefGoogle Scholar
Ogasawara, H.. (2003). Asymptotic standard errors of IRT observed-score equating methods. Psychometrika, 68, 193211. doi:10.1007/BF02294797.CrossRefGoogle Scholar
R Development Core Team. (2016). R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing..Google Scholar
Rizopoulos, D.. (2006). ltm: An R package for latent variable modelling and item response theory analyses. Journal of Statistical Software, 17, 125. doi:10.18637/jss.v017.i05.CrossRefGoogle Scholar
Stocking, M., Lord, M. L.. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7, 201210. doi:10.1177/014662168300700208.CrossRefGoogle Scholar
van der Linden, W. J., & Hambleton, R. K., (1997). Handbook of modern item response theory. New York: Springerdoi:10.1007/978-1-4757-2691-6.CrossRefGoogle Scholar