Model Validation, Comparison, and Selection

doi:10.1017/9781108755610.042

36 - Model Validation, Comparison, and Selection

from Part V - General Discussion

Published online by Cambridge University Press: 21 April 2023

Leslie M. Blaha and

Kevin A. Gluck

Edited by

Ron Sun

Show author details

Ron Sun: Affiliation:
Rensselaer Polytechnic Institute, New York

Book contents

Get access

Summary

Progress in the computational cognitive sciences depends critically on model evaluation. This chapter provides an accessible description of key considerations and methods important in model evaluation, with special emphasis on evaluation in the forms of validation, comparison, and selection. Major sub-topics include qualitative and quantitative validation, parameter estimation, cross-validation, goodness of fit, and model mimicry. The chapter includes definitions of an assortment of key concepts, relevant equations, and descriptions of best practices and important considerations in the use of these model evaluation methods. The chapter concludes with important high-level considerations regarding emerging directions and opportunities for continuing improvement in model evaluation.

Keywords

quantitative evaluation qualitative evaluation model validation model comparison model selection goodness of fit cognitive modeling

Type: Chapter
Information: The Cambridge Handbook of Computational Cognitive Sciences , pp. 1165 - 1200

DOI: https://doi.org/10.1017/9781108755610.042 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2023

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book purchase

Temporarily unavailable

References

Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In Petrov, B. N. & Csáki, F. (Eds.), 2nd International Symposium on Information Theory (pp. 267–281). Budapest: Akadémiai Kiadó.Google Scholar

Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19 (6), 716–723.Google Scholar

Anderson, J. R. (1990). The Adaptive Character of Thought. Hillsdale, NJ: Erlbaum.Google Scholar

Anderson, J. R. (2007). How Can the Human Mind Occur in the Physical Universe? New York, NY: Oxford University Press.Google Scholar

Ashby, F. G., & Townsend, J. T. (1980). Decomposing the reaction time distribution: pure insertion and selective influence revisited. Journal of Mathematical Psychology, 21 (2), 93–123.Google Scholar

Bakan, D. (1966). The test of significance in psychological research. Psychological Bulletin, 66 (6), 423–437.Google Scholar

Bamber, D., & Van Santen, J. P. (1985). How many parameters can a model have and still be testable? Journal of Mathematical Psychology, 29 (4), 443–473.CrossRef Google Scholar

Bamber, D., & Van Santen, J. P. (2000). How to assess a model’s testability and identifiability. Journal of Mathematical Psychology, 44 (1), 20–40.Google Scholar

Blaha, L. M. (2019). We have not looked at our results until we have displayed them effectively: a comment on robust modeling in cognitive science. Computational Brain & Behavior, 2 (3), 247–250.Google Scholar

Blaha, L. M., Fisher, C. R., Walsh, M. M., Veksler, B. Z., & Gunzelmann, G. (2016) Real-time fatigue monitoring with computational cognitive models. In Proceedings of Human-Computer Interaction International 2016, Toronto, Canada.Google Scholar

Blokpoel, M. & van Rooij, I. (2021). Theoretical modeling for cognitive science and psychology. Retrieved from: https://computationalcognitivescience.github.io/lovelace/home [last accessed August 2, 2022].Google Scholar

Bozdogan, H. (1990). On the information-based measure of covariance complexity and its application to the evaluation of multivariate linear models. Communications in Statistics – Theory and Methods, 19 (1), 221–278.Google Scholar

Bozdogan, H. (2000). Akaike’s information criterion and recent developments in information complexity. Journal of Mathematical Psychology, 44 (1), 62–91.Google Scholar

Broomell, S. B., Budescu, D. V., & Por, H.-H. (2011). Pair-wise comparisons of multiple models. Judgment and Decision Making, 6 (8), 821–831.Google Scholar

Broomell, S. B., Sloman, S. J., Blaha, L. M., & Chelen, J. (2019). Interpreting model comparison requires understanding model-stimulus relationships. Computational Brain & Behavior, 2 (3), 233–238.Google Scholar

Burnham, K. P., & Anderson, D. R. (2002) Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach (2nd ed.). New York, NY: Springer Verlag.Google Scholar

Busemeyer, J. R., & Diederich, A. (2010). Cognitive Modeling. Los Angeles, CA: Sage.Google Scholar

Campbell, G. E., & Bolton, A. E. (2005). HBR validation: integrating lessons learned from multiple academic disciplines, applied communities, and the AMBR project. In Gluck, K. A. & Pew, R. W. (Eds.), Modeling Human Behavior with Integrated Cognitive Architectures: Comparison, Evaluation, and Validation (pp. 365–395), Mahwah, NJ: Lawrence Erlbaum Associates.Google Scholar

Chechile, R. A. (2010). A novel Bayesian parameter mapping method for estimating the parameters of an underlying scientific model. Communications in Statistics – Theory and Methods, 39 , 1190–1201.Google Scholar

Cohen, A. L., Sanborn, A. N., & Shiffrin, R. M. (2008). Model evaluation using grouped or individual data. Psychonomic Bulletin & Review, 15 (4), 692–712.Google Scholar

Colonius, H., & Vorberg, D. (1994). Distribution inequalities for parallel models with unlimited capacity. Journal of Mathematical Psychology, 38, 35–58.Google Scholar

Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52, 281–302.Google Scholar

Dawid, A. P. (1984). Statistical theory: the prequential approach. Journal of the Royal Statistical Society A, 147, 278–292.Google Scholar

Devezer, B., Navarro, D. J., Vandekerckhove, J., & Buzbas, E. O. (2020). The case for formal methodology in scientific reform. Royal Society Open Science, 8 (3), 200805.Google Scholar

Dutton, J. M., & Starbuck, W. H. (1971). Computer Simulation of Human Behavior. New York, NY: Wiley.Google Scholar

Dzhafarov, E. N. (2003). Selective influence through conditional independence. Psychometrika, 68 (1), 7–25.CrossRef Google Scholar

Dzhafarov, E. N., Schweickert, R., & Sung, K. (2004). Mental architectures with selectively influenced but stochastically interdependent components. Journal of Mathematical Psychology, 48 (1), 51–64.Google Scholar

Erev, I., Ert, E., Roth, A. E., et al. (2010). A choice prediction competition: choices from experience and from description. Journal of Behavioral Decision Making, 23 (1), 15–47.Google Scholar

Estes, W. K. (2002). Traps in the route to models of memory and decision. Psychonomic Bulletin & Review, 9 (1), 3–25.CrossRef Google Scholar PubMed

Farrell, S., & Lewandowsky, S. (2018). Computational Modeling of Cognition and Behavior. Cambridge: Cambridge University Press.CrossRef Google Scholar

Fisher, C. R., Houpt, J. W., & Gunzelmann, G. (2020). Developing memory-based models of ACT-R within a statistical framework. Journal of Mathematical Psychology, 98, 102416.Google Scholar

Fum, D., Del Missier, F., & Stocco, A. (2007). The cognitive modeling of human behavior: why a model is (sometimes) better than 10,000 words. Cognitive Systems Research, 8, 135–142.Google Scholar

Gallant, A. R. (1987). Nonlinear Statistical Models. New York, NY: Wiley.CrossRef Google Scholar

Geisser, S. (1975). The predictive sample reuse method with applications. Journal of the American Statistical Association, 70 (350), 320–328.CrossRef Google Scholar

Gluck, K. A., Bello, P., & Busemeyer, J. (2008). Introduction to the special issue. Cognitive Science, 32, 1245–1247.CrossRef Google Scholar PubMed

Gluck, K. A., & Pew, R. W. (2005). Modeling Human Behavior with Integrated Cognitive Architectures: Comparison, Evaluation, and Validation. Mahwah, NJ: Erlbaum.Google Scholar

Gluck, K. A., Stanley, C. T., Moore, L. R., Reitter, D., & Halbrügge, M. (2010). Exploration for understanding in cognitive modeling. Journal of Artificial General Intelligence, 2 (2), 88–107.Google Scholar

Gronau, Q. F., & Wagenmakers, E. J. (2019). Limitations of Bayesian leave-one-out cross-validation for model selection. Computational Brain & Behavior, 2 (1), 1–11.Google Scholar

Grünwald, P. (2000). Model selection based on minimum description length. Journal of Mathematical Psychology, 44 (1), 133–152.Google Scholar

Gunzelmann, G. (2019). Promoting cumulation in models of the human mind. Computational Brain & Behavior, 2 (3–4), 157–159.Google Scholar

Harding, B., Goulet, M. A., Jolin, S., Tremblay, C., Villeneuve, S. P., & Durand, G. (2016). Systems factorial technology explained to humans. Tutorials in Quantitative Methods for Psychology, 12 (1), 39–56.Google Scholar

Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. New York, NY: Springer.Google Scholar

Hough, A. R., & Gluck, K. A. (2019). The understanding problem in cognitive science. Advances in Cognitive Systems, 8, 13–32.Google Scholar

Houpt, J. W., Blaha, L. M., McIntire, J. P., Havig, P. R., & Townsend, J. T. (2014). Systems factorial technology with R. Behavior Research Methods, 46 (2), 307–330.Google Scholar

Jeffreys, H. (1961). Theory of Probability (3rd ed.). Oxford: Oxford University Press.Google Scholar

Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association, 90 (430), 773–795.Google Scholar

Kieras, D. E., & Meyer, D. E. (1997). An overview of the EPIC architecture for cognition and performance with application to human–computer interaction. Human–Computer Interaction, 12 (4), 391–438.Google Scholar

Kim, W., Pitt, M. A., Lu, Z. L., Steyvers, M., & Myung, J. I. (2014). A hierarchical adaptive approach to optimal experimental design. Neural Computation, 26(11), 2465–2492.Google Scholar

Kirkpatrick, S., Gelatt, C. D., & Vecchi, M. P. (1983). Optimization by simulated annealing. Science, 220 (4598), 671–680.Google Scholar

Kujala, J. V., & Dzhafarov, E. N. (2008). Testing for selectivity in the dependence of random variables on external factors. Journal of Mathematical Psychology, 52 (2), 128–144.Google Scholar

Laird, J. E. (2012). The SOAR Cognitive Architecture. Cambridge, MA: MIT Press.CrossRef Google Scholar

Laird, J. E., Lebiere, C., & Rosenbloom, P. S. (2017). A standard model of the mind: toward a common computational framework across artificial intelligence, cognitive science, neuroscience, and robotics. AI Magazine, 38 (4), 13–26.Google Scholar

Lebiere, C., Gonzalez, C., & Warwick, W. (2010). Editorial: cognitive architectures, model comparison, and AGI. Journal of Artificial General Intelligence, 2 (2), 1–19.Google Scholar

Lee, M. D., Criss, A. H., Devezer, B., et al. (2019). Robust modeling in cognitive science. Computational Brain & Behavior, 2, 141–153.Google Scholar

Little, D., Altieri, N., Fific, M., & Yang, C. T. (Eds.). (2017). Systems Factorial Technology: A Theory Driven Methodology for the Identification of Perceptual and Cognitive Mechanisms. New York, NY: Academic Press.Google Scholar

Macmillan, N. A., & Creelman, C. D. (2005). Detection Theory: A User’s Guide (2nd ed.). Mahwah, NJ: Lawrence Erlbaum Associates.Google Scholar

McClelland, J. L. (2009). The place of modeling in cognitive science. Topics in Cognitive Science, 1 (1), 11–38.Google Scholar

Miller, J. (1982). Divided attention: evidence for coactivation with redundant signals. Cognitive Psychology, 14, 247–279.CrossRef Google Scholar PubMed

Mosier, C. I. (1947). A critical examination of the concepts of face validity. Educational and Psychological Measurement, 7, 191–205.Google Scholar

Myung, I. J., Balasubramanian, V., & Pitt, M. A. (2000). Counting probability distributions: differential geometry and model selection. Proceedings of the National Academy of Sciences, 97 (21), 11170–11175.Google Scholar

Myung, I. J., Kim, C., & Pitt, M. A. (2000). Toward an explanation of the power law artifact: insights from response surface analysis. Memory & Cognition, 28 (5), 832–840.Google Scholar

Myung, I. J., Navarro, D. J., & Pitt, M. A. (2006). Model selection by normalized maximum likelihood. Journal of Mathematical Psychology, 50 , 167–179.Google Scholar

Myung, J. I., & Pitt, M. A. (2009). Optimal experimental design for model discrimination. Psychological Review, 116 (3), 499–518.Google Scholar

Navarro, D. J. (2019). Between the devil and the deep blue sea: tensions between scientific judgement and statistical model selection. Computational Brain & Behavior, 2 (1), 28–34.Google Scholar

Navarro, D. J. (2021). If mathematical psychology did not exist we might need to invent it: a comment on theory building in psychology. Perspectives on Psychological Science, 16 (4), 707–716.Google Scholar

Navarro, D. J., Pitt, M. A., & Myung, I. J. (2004). Assessing the distinguishability of models and the informativeness of data. Cognitive Psychology, 49 (1), 47–84.Google Scholar

Nelder, J. A., & Mead, R. (1965). A simplex method for function minimization. Computer Journal, 7 , 308–313.Google Scholar

Newell, A., Shaw, J. C., & Simon, H. A. (1958). Elements of a theory of human problem solving. Psychological Review, 65 (3), 151–166.CrossRef Google Scholar

Peressini, A. L., Sullivan, F. E., & Uhl Jr., J. J. (1988). The Mathematics of Nonlinear Programming. New York, NY: Springer-Verlag.Google Scholar

Pitt, M. A., Kim, W., Navarro, D. J., & Myung, J. I. (2006). Global model analysis by parameter space partitioning. Psychological Review, 113 (1), 57–83.Google Scholar

Pitt, M. A., & Myung, I. J. (2002). When a good fit can be bad. Trends in Cognitive Sciences, 6 (10), 421–425.Google Scholar

Pitt, M. A., Myung, I. J., Montenegro, M., & Pooley, J. (2008). Measuring model flexibility with parameter space partitioning: an introduction and application example. Cognitive Science, 32, 1285–1303.Google Scholar

Pitt, M. A., Myung, I. J., & Zhang, S. (2002). Toward a method of selecting among computational models of cognition. Psychological Review, 109 (3), 472–491.Google Scholar

Rissanen, J. J. (1996). Fisher information and stochastic complexity. IEEE Transactions on Information Theory, 42 (1), 40–47.CrossRef Google Scholar

Rissanen, J. J. (2001). Strong optimality of the normalized ML models as universal codes and information in data. IEEE Transactions on Information Theory, 47 , 1712–1717.Google Scholar

Roach, P. J. (2009). Fundamentals of Validation and Verification. Soccorro, NM: Hermosa Publishers.Google Scholar

Roberts, S., & Pashler, H. (2000). How persuasive is a good fit? A comment on theory testing. Psychological Review, 107 (2), 358–367.Google Scholar

Rodgers, J. L., & Rowe, D. C. (2002). Theory development should begin (but not end) with good empirical fits: a comment on Roberts and Pashler (2000). Psychological Review, 109 (3), 599–603.Google Scholar

Rosenbloom, P. S. (2013). On Computing: The Fourth Great Scientific Domain. Cambridge, MA: MIT Press.Google Scholar

Schunn, C. D., & Wallach, D. (2005). Evaluating goodness-of-fit in comparison of models to data. In Tack, W. (Ed.), Psychologie der Kognition: Reden und Vorträge anlässlich der Emeritierung von Werner Tack (pp. 115–154). Saarbruken: University of Saarland Press.Google Scholar

Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6 (2), 461–464.Google Scholar

Shiffrin, R. M., Lee, M. D., Kim, W., & Wagenmakers, E. J. (2008). A survey of model evaluation approaches with a tutorial on hierarchical Bayesian methods. Cognitive Science, 32 (8), 1248–1284.Google Scholar

Simon, H. A. (1992). What is an “explanation” of behavior? Psychological Science, 3 (3), 150–161.Google Scholar

Simon, H. A. (1996). Models of My Life. Cambridge, MA: MIT Press.Google Scholar

Slaney, K. (2017). Validating Psychological Constructs: Historical, Philosophical, and Practical Dimensions. London: Palgrave Macmillan.Google Scholar

Smaldino, P. (2019). Better methods can’t make up for mediocre theory. Nature, 575 (7781), 9–10.Google Scholar

Stewart, T. (2006). Tools and techniques for quantitative and predictive cognitive science. In Sun, R. & Miyake, N. (Eds.), Proceedings of the 28th Annual Meeting of the Cognitive Science Society (pp. 816–821). Mahwah, NJ: Lawrence Erlbaum Associates.Google Scholar

Stokes, D. E. (1997). Pasteur’s Quadrant: Basic Science and Technological Innovation. Washington, DC: Brookings Institution Press.Google Scholar

Stone, M. (1974). Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society: Series B (Methodological), 36 (2), 111–133.Google Scholar

Stone, M. (1977). An asymptotic equivalence of choice of model by cross‐validation and Akaike’s criterion. Journal of the Royal Statistical Society: Series B (Methodological), 39 (1), 44–47.Google Scholar

Sun, R. (2016). Anatomy of the Mind: Exploring Psychological Mechanisms and Processes with the Clarion Cognitive Architecture. Oxford: Oxford University Press.Google Scholar

Thomas, R. D. (2001). Perceptual interactions of facial dimensions in speeded classification and identification. Perception & Psychophysics, 63 (4), 625–650.Google Scholar

Townsend, J. T. (1990). Serial vs. parallel processing: sometimes they look like Tweedledum and Tweedledee but they can (and should) be distinguished. Psychological Science, 1 (1), 46–54.CrossRef Google Scholar

Townsend, J. T., & Ashby, F. G. (1983). Stochastic Modeling of Elementary Psychological Processes. Cambridge: Cambridge University Press.Google Scholar

Townsend, J. T., & Eidels, A. (2011). Workload capacity spaces: a unified methodology for response time measures of efficiency as workload is varied. Psychonomic Bulletin & Review, 18 (4), 659–681.Google Scholar

Townsend, J. T., & Nozawa, G. (1995). Spatio-temporal properties of elementary perception: an investigation of parallel, serial, and coactive theories. Journal of Mathematical Psychology, 39 (4), 321–359.Google Scholar

Tukey, J. W. (1977). Exploratory Data Analysis. Reading: Addison-Wesley Publishing.Google Scholar

U.S. Department of Defense. (2011). VV&A Recommended Practices Guide. Washington, DC: Defense Modeling and Simulation Coordination Office. Retrieved from: https://vva.msco.mil [last accessed August 2, 2022].Google Scholar

van Zandt, T. (2000). How to fit a response time distribution. Psychonomic Bulletin & Review, 7 (3), 424–465.Google Scholar

Vandekerckhove, J., Matzke, D., & Wagenmakers, E.-J. (2015). Model comparison and the principle of parsimony. In Busemeyer, J. R., Wang, Z., Townsend, J. T., & Eidels, A. (Eds.), The Oxford Handbook of Computational and Mathematical Psychology (pp. 300–319). Oxford: Oxford University Press.Google Scholar

Veksler, V. D., Myers, C. W., & Gluck, K. A. (2015). Model flexibility analysis. Psychological Review, 122 (4), 755–769.Google Scholar

Vitányi, P. M., & Li, M. (2000). Minimum description length induction, Bayesianism, and Kolmogorov complexity. IEEE Transactions on Information Theory, 46 (2), 446–464.Google Scholar

Wagenmakers, E. J., Ratcliff, R., Gomez, P., & Iverson, G. J. (2004). Assessing model mimicry using the parametric bootstrap. Journal of Mathematical Psychology, 48 (1), 28–50.Google Scholar

Walsh, M. M., Gunzelmann, G., & Van Dongen, H. P. A. (2017). Computational cognitive models of the temporal dynamics of fatigue from sleep loss. Psychonomic Bulletin & Review, 24, 1785–1807.Google Scholar

Weaver, R. (2008). Parameters, predictions, and evidence in computational modeling: a statistical view informed by ACT–R. Cognitive Science 32 (8), 1349–1375.Google Scholar

Yang, J., Pitt, M. A., Ahn, W. Y., & Myung, J. I. (2021). ADOpy: a python package for adaptive design optimization. Behavior Research Methods, 53 (2), 874–897.CrossRef Google Scholar PubMed