Hostname: page-component-f554764f5-nqxm9 Total loading time: 0 Render date: 2025-04-23T00:03:54.826Z Has data issue: false hasContentIssue false

A Latent Hidden Markov Model for Process Data

Published online by Cambridge University Press:  01 January 2025

Xueying Tang*
Affiliation:
University of Arizona
*
Correspondence should be made to Xueying Tang, University of Arizona, 617 N. Santa Rita Ave., Tucson, AZ 85721, USA. Email: [email protected]

Abstract

Response process data from computer-based problem-solving items describe respondents’ problem-solving processes as sequences of actions. Such data provide a valuable source for understanding respondents’ problem-solving behaviors. Recently, data-driven feature extraction methods have been developed to compress the information in unstructured process data into relatively low-dimensional features. Although the extracted features can be used as covariates in regression or other models to understand respondents’ response behaviors, the results are often not easy to interpret since the relationship between the extracted features, and the original response process is often not explicitly defined. In this paper, we propose a statistical model for describing response processes and how they vary across respondents. The proposed model assumes a response process follows a hidden Markov model given the respondent’s latent traits. The structure of hidden Markov models resembles problem-solving processes, with the hidden states interpreted as problem-solving subtasks or stages. Incorporating the latent traits in hidden Markov models enables us to characterize the heterogeneity of response processes across respondents in a parsimonious and interpretable way. We demonstrate the performance of the proposed model through simulation experiments and case studies of PISA process data.

Type
Theory and Methods
Copyright
Copyright © 2023 The Author(s), under exclusive licence to The Psychometric Society.

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Binkley, M., Erstad, O., Herman, J., Raizen, S., Ripley, M., Miller-Ricci, M., & Rumble, M. (2012). Defining twenty-first century skills. In Assessment and teaching of 21st century skills (pp. 1766). Springer.CrossRefGoogle Scholar
Broyden, C.G.. (1970). The convergence of a class of double-rank minimization algorithms 1. General considerations. IMA Journal of Applied Mathematics, 6 17690.CrossRefGoogle Scholar
Cappé, O, Moulines, E, Ryden, TInference in hidden Markov models 2005 Springer.CrossRefGoogle Scholar
Chen, Y. (2020). A continuous-time dynamic choice measurement model for problem-solving process data. Psychometrika, 85 410521075.CrossRefGoogle ScholarPubMed
Chen, Y, Li, X, Liu, J, Ying, Z. (2019). Statistical analysis of complex problem-solving process data: An event history analysis approach. Frontiers in Psychology, 10, 486.CrossRefGoogle Scholar
Chen, Y, Li, X, Zhang, S. (2019). Joint maximum likelihood estimation for high-dimensional exploratory item factor analysis. Psychometrika, 84 1124146.CrossRefGoogle ScholarPubMed
Cover, T.M., Thomas, J.A.Elements of information theory 2006 2Wiley.Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B, 39 1122.CrossRefGoogle Scholar
Eddelbuettel, D, François, R. (2011). Rcpp: Seamless r and c++ integration. Journal of Statistical Software, 40, 118.CrossRefGoogle Scholar
Eichmann, B, Greiff, S, Naumann, J, Brandhuber, L, Goldhammer, F. (2020). Exploring behavioural patterns during complex problem-solving. Journal of Computer Assisted Learning, 36 6933956.CrossRefGoogle Scholar
Fletcher, R. (1970). A new approach to variable metric algorithms. The Computer Journal, 13 3317322.CrossRefGoogle Scholar
Giner, G., Chen, L., Hu, Y., Dunn, P., Phipson, B., & Chen, Y. (2023). statmod: Statistical modeling [Computer software manual]. Retrieved from https://cran.r-project.org/package=statmod.Google Scholar
Goldfarb, D. (1970). A family of variable-metric methods derived by variational means. Mathematics of Computation, 24 1092326.CrossRefGoogle Scholar
Greiff, S, Niepel, C, Scherer, R, Martin, R. (2016). Understanding students’ performance in a computer-based assessment of complex problem solving: An analysis of behavioral data from computer-generated log files. Computers in Human Behavior, 61, 3646.CrossRefGoogle Scholar
Greiff, S, Wüstenberg, S, Avvisati, F. (2015). Computer-generated log-file analyses as a window into students’ minds? A showcase study based on the PISA 2012 assessment of problem solving. Computers & Education, 91, 92105.CrossRefGoogle Scholar
Han, Y, Liu, H, Ji, F. (2021). A sequential response model for analyzing process data on technology-based problem-solving tasks. Multivariate Behavioral Research, 57, 960.CrossRefGoogle ScholarPubMed
He, Q., & von Davier, M. (2016). Analyzing process data from problem-solving items with n-grams: Insights from a computer-based large-scale assessment. In Y. Rosen, S. Ferrara, & M. Mosharraf (Eds.), Handbook of research on technology tools for real-world skill development (pp. 749-776). Information Science Reference. https://doi.org/10.4018/978-1-4666-9441-5.ch029.CrossRefGoogle Scholar
He, Q., Liao, D., & Jiao, H. (2019). Clustering behavioral patterns using process data in PIAAC problem-solving items. In Theoretical and practical advances in computer-based educational measurement (pp. 189-212). Springer.CrossRefGoogle Scholar
Herborn, K, Mustafić, M, Greiff, S. (2017). Mapping an experiment-based assessment of collaborative behavior onto collaborative problem solving in PISA 2015: A cluster analysis approach for collaborator profiles. Journal of Educational Measurement, 54 1103122.CrossRefGoogle Scholar
Liang, K, Tu, D, Cai, Y. (2022). Using process data to improve classification accuracy of cognitive diagnosis model. Multivariate Behavioral Research, .Google Scholar
Lord, F.M.Applications of item response theory to practical testing problems 1980 Routledge.Google Scholar
McCullagh, P, Nelder, JGeneralized linear models 2018 Routledge.Google Scholar
OECD PISA 2012 results: Creative problem solving: Students’ skills in tackling real-life problems 2014 OECD Publishing.Google Scholar
R Core Team. (2023). R: A language and environment for statistical computing [Computer software manual]. Vienna, Austria. Retrieved from https://www.R-project.org/.Google Scholar
Rabiner, L, Juang, B. (1986). An introduction to hidden Markov models. IEEE ASSP Magazine, 3 1416.CrossRefGoogle Scholar
Rupp, A.A., Templin, J, Henson, R.A.Diagnostic measurement: Theory, methods, and applications 2010 Guilford Press.Google Scholar
Shanno, D.F.. (1970). Conditioning of quasi-Newton methods for function minimization. Mathematics of Computation, 24 111647656.CrossRefGoogle Scholar
Stadler, M, Fischer, F, Greiff, S. (2019). Taking a closer look: An exploratory analysis of successful and unsuccessful strategy use in complex problems. Frontiers in Psychology, 10, 777.CrossRefGoogle ScholarPubMed
Tang, X, Wang, Z, He, Q, Liu, J, Ying, Z. (2020). Automatic feature construction for process data using multidimensional scaling. Psychometrika, 85, 378397.CrossRefGoogle Scholar
Tang, X, Wang, Z, Liu, J, Ying, Z. (2021). An exploration of process data by action sequence autoencoder. British Journal of Mathematical and Statistical Psychology, 74, 133.CrossRefGoogle Scholar
Ulitzsch, E, He, Q, Pohl, S. (2022). Using sequence mining techniques for understanding incorrect behavioral patterns on interactive tasks. Journal of Educational and Behavioral Statistics, 47 1335.CrossRefGoogle Scholar
Ulitzsch, E, Ulitzsch, V, He, Q, Lüdtke, O. (2022). A machine learning-based procedure for leveraging clickstream data to investigate early predictability of failure on interactive tasks. Behavior Research Methods, 55, 1392.CrossRefGoogle ScholarPubMed
Viterbi, A. (1967). Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE transactions on Information Theory, 13 2260269.CrossRefGoogle Scholar
von Davier, M, Khorramdel, L, He, Q, Shin, H.J., Chen, H. (2019). Developments in psychometric population models for technology-based large-scale assessments: An overview of challenges and opportunities. Journal of Educational and Behavioral Statistics, 44 6671705.CrossRefGoogle Scholar
Wang, Z, Tang, X, Liu, J, Ying, Z. (2022). Subtask analysis of process data through a predictive model. British Journal of Mathematical and Statistical Psychology, .Google ScholarPubMed
Xiao, Y, He, Q, Veldkamp, B, Liu, H. (2021). Exploring latent states of problem-solving competence using hidden Markov model on process data. Journal of Computer Assisted Learning, 37 512321247.CrossRefGoogle Scholar
Xu, H, Fang, G, Ying, Z. (2020). A latent topic model with Markov transition for process data. British Journal of Mathematical and Statistical Psychology, 73 3474505.CrossRefGoogle ScholarPubMed
Zhang, S, Wang, Z, Qi, J, Liu, J, Ying, ZAccurate assessment via process data. Psychometric 2023 88, 7697.Google Scholar
Zhan, P, Qiao, X. (2022). Diagnostic classification analysis of problem-solving competence using process data: An item expansion method. Psychometrika, 87, 1529.CrossRefGoogle ScholarPubMed