Hostname: page-component-745bb68f8f-d8cs5 Total loading time: 0 Render date: 2025-01-08T09:54:34.081Z Has data issue: false hasContentIssue false

Model Based Clustering of Large Data Sets: Tracing the Development of Spelling Ability

Published online by Cambridge University Press:  01 January 2025

Herbert Hoijtink*
Affiliation:
Utrecht University
Annelise Notenboom
Affiliation:
Free University Amsterdam
*
Requests for reprints should be sent to Dr. Herbert Hoijtink, Department of Methodology and Statistics, Postbus 80140, University of Utrecht, 3508 TC Utrecht, NETHERLANDS. Email: [email protected]

Abstract

There are two main theories with respect to the development of spelling ability: the stage model and the model of overlapping waves. In this paper exploratory model based clustering will be used to analyze the responses of more than 3500 pupils to subsets of 245 items. To evaluate the two theories, the resulting clusters will be ordered along a developmental dimension using an external criterion. Solutions for three statistical problems will be given: (1) an algorithm that can handle large data sets and only renders non-degenerate clusters; (2) a goodness of fit test that is not affected by the fact that the number of possible response vectors by far out-weights the number of observed response vectors; and (3) a new technique, data expunction, that can be used to evaluate goodness-of-fit tests if the missing data mechanism is known.

Type
Application Reviews And Case Studies
Copyright
Copyright © 2004 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Research supported by a grant (NWO 411-21-006) of the Dutch Organization for Scientific Research.

References

Agresti, A. (1990). Categorical data analysis. New York: John WileyGoogle Scholar
Akaike, H. (1987). Factor analysis and AIC. Psychometrika, 52, 317332CrossRefGoogle Scholar
Berger, J., & Pericchi, L. (2001). Objective Bayesian methods for model selection: Introduction and comparison [with discussion]. In Lahiri, P. (Eds.), Model selection (pp. 135207). Beachwood, OH: Institute of Mathematical StatisticsCrossRefGoogle Scholar
Bear, D.R., & Templeton, S. (1998). Explorations in developmental spelling: Foundations for learning and teaching phonics, spelling and vocabulary. The Reading Teacher, 52, 222242Google Scholar
Bowman, M., & Treiman, R. (2002). Relating print and speech: The effects of letter names and word position on reading and spelling performance. Journal of Experimental Child Psychology, 82, 305340CrossRefGoogle ScholarPubMed
Bozdogan, H. (1987). Model selection and Akaike's information criterion (AIC): The general theory and its analytical extensions. Psychometrika, 52, 345370CrossRefGoogle Scholar
Congdon, P. (2001). Bayesian statistical modelling. New York: John WileyGoogle Scholar
Ehri, L. (1986). Sources of difficulty in learning to spell and read. In Wolraich, M.L., Routh, D. (Eds.), Advances in developmental and behavioral pediatrics, Vol. 7 (pp. 121195). Greenwich, CT: JAI PressGoogle Scholar
Everitt, B.S. (1988). A Monte Carlo investigation of the likelihood ratio test for number of classes in latent class analysis. Multivariate Behavioral Research, 23, 531538CrossRefGoogle Scholar
Frith, U. (1980). Unexpected spelling problems. In Frith, U. (Eds.), Cognitive processes in spelling (pp. 495515). London: Academic PressGoogle Scholar
Frith, U. (1985). Beneath the surface of developmental dyslexia. In Patterson, K.E., Marshall, J.C., and Coltheart, M. (Eds.), Surface dyslexia (pp. 301326). London: Routledge and Kegan-PaulGoogle Scholar
Geelhoed, J., & Reitsma, P. (1999). PI-dictee. Lisse: Swets and ZeitlingerGoogle Scholar
Gelman, A., Carlin, J.B., Stern, H.S., & Rubin, D.B. (2000). Bayesian data analysis. London: Chapman and HallGoogle Scholar
Gentry, J.R. (1982). An analysis of developmental spelling in GNYS AT WRK. The Reading Teacher, 36, 192200Google Scholar
Henderson, E.H., & Templeton, S. (1986). A developmental perspective of formal spelling instruction through alphabet, pattern and meaning. The Elementary School Journal, 86, 305316CrossRefGoogle Scholar
Hoijtink, H. (1998). Constrained latent class analysis using the Gibbs sampler and posterior predictivep-values: Applications to educational testing. Statistica Sinica, 8, 691711Google Scholar
Hoijtink, H. (2001). Confirmatory latent class analysis: Model selection using Bayes factors and (Pseudo) likelihood ratio statistics. Multivariate Behavioral Research, 36, 563588CrossRefGoogle ScholarPubMed
Hoskens, M., & de Boeck, P. (1995). Componential IRT models for polytomous items. Journal of Educational Measurement, 32, 364384CrossRefGoogle Scholar
Hoskens, M., & de Boeck, P. (1997). A parametric model for local dependence among test items. Psychological Methods, 2, 261277CrossRefGoogle Scholar
Jefferys, W., & Berger, J. (1992). Ockham's razor and Bayesian analysis. American Scientist, 80, 6472Google Scholar
Kass, R.E., & Raftery, A.E. (1995). Bayes factors. Journal of the American Statistical Association, 90, 773795CrossRefGoogle Scholar
Lin, T.H., & Dayton, C.M. (1997). Model selection information criteria for non-nested latent class models. Journal of Educational and Behavioral Statistics, 22, 249264CrossRefGoogle Scholar
Meng, X.L. (1994). Posterior predictivep-values. The Annals of Statistics, 22, 11421160CrossRefGoogle Scholar
Morris, D., Nelson, L., & Perney, J. (1986). Exploring the concept of “spelling instructional level” through the analysis of error-types. The Elementary School Journal, 87, 181200CrossRefGoogle Scholar
Newton, M.A., &Raftery, A.E. (1994). Approximate Bayesian inference with the weighted likelihood bootstrap. Journal of the Royal Statistical Society, B, 56, 348CrossRefGoogle Scholar
Notenboom, A., Hoijtink, H., & Reitsma, P. (2004). Modeling the development of Dutch spelling ability by Latent Class Analysis. Manuscript submitted for publication.Google Scholar
Richardson, S., & Green, P.J. (1997). On Bayesian analysis of mixtures with an unknown number of components. Journal of the Royal Statistical Society, B, 59, 731792CrossRefGoogle Scholar
Rittle-Johnson, B., & Siegler, R.S. (1999). Learning to spell: Variability, choice and change in children's strategy use. Child Development, 70, 332348CrossRefGoogle ScholarPubMed
Rubin, D.B. (1987). Multiple imputation for nonresponse in surveys. New York: John WileyCrossRefGoogle Scholar
Schafer, J.L. (1997). Analysis of incomplete multivariate data. London: Chapman and HallCrossRefGoogle Scholar
Schafer, J.L., & Graham, J.W. (2002). Missing data: Our view of the state of the art. Psychological Methods, 7, 147177CrossRefGoogle ScholarPubMed
Siegler, R.S. (1995). How change does occur: A microgenetic study of number conservation. Cognitive Psychology, 28, 225273CrossRefGoogle ScholarPubMed
Siegler, R.S. (1996). Emerging minds: The process of change in children's thinking. New York: Oxford University PressCrossRefGoogle Scholar
Siegler, R.S. (2000). The rebirth of children's learning. Child Development, 71, 2635CrossRefGoogle ScholarPubMed
Siegler, R.S., & Chen, Z. (1998). Developmental differences in rule learning: A microgenetic analysis. Cognitive Psychology, 36, 273310CrossRefGoogle ScholarPubMed
Siegler, R.S., & Stern, E. (1998). Conscious and unconscious strategy discoveries: a microgenetic analysis. Journal of Experimental Psychology: General, 127, 377397CrossRefGoogle ScholarPubMed
Smith, A.F.M., & Spiegelhalter, D.J. (1980). Bayes factors and choice criteria for linear models. Journal of the Royal Statistical Society, Series B, 42, 213220CrossRefGoogle Scholar
Steffler, D.J., Varnhagen, C.K., Treiman, R., & Friesen, C.K. (1998). There's more to children's spelling than the errors they make: Strategic and automatic processes for one-syllable words. Journal of Educational Psychology, 90, 492505CrossRefGoogle Scholar
Stevens, M. (2000). Dealing with label switching in mixture models. Journal of the Royal Statistical Society, Series B, 62, 795810CrossRefGoogle Scholar
Treiman, R., & Bourassa, D.C. (2000). The development of spelling skill. Topics in Language Disorders, 20, 118CrossRefGoogle Scholar
Varnhagen, C.K., McCallum, M., & Burstow, M. (1997). Is children's spelling naturally stage-like?. Reading and Writing: An Interdisciplinary Journal, 9, 451481CrossRefGoogle Scholar
Vermunt, J.K., & Magidson, J. (2000). Latent Gold. Belmont: Statistical Innovations Inc.Google Scholar
Zeger, S.L., & Karim, M.R. (1991). Generalized linear models with random effects: A Gibbs sampling approach. Journal of the American Statistical Association, 86, 7986CrossRefGoogle Scholar