Hostname: page-component-586b7cd67f-t7czq Total loading time: 0 Render date: 2024-11-22T18:40:23.688Z Has data issue: false hasContentIssue false

Going after the bigger picture: Using high-capacity models to understand mind and brain

Published online by Cambridge University Press:  06 December 2023

Hans Op de Beeck
Affiliation:
Leuven Brain Institute, KU Leuven, Leuven, Belgium [email protected] www.hoplab.be
Stefania Bracci
Affiliation:
Center for Mind/Brain Sciences, University of Trento, Rovereto, Italy [email protected] https://webapps.unitn.it/du/en/Persona/PER0076943/Curriculum

Abstract

Deep neural networks (DNNs) provide a unique opportunity to move towards a generic modelling framework in psychology. The high representational capacity of these models combined with the possibility for further extensions has already allowed us to investigate the forest, namely the complex landscape of representations and processes that underlie human cognition, without forgetting about the trees, which include individual psychological phenomena.

Type
Open Peer Commentary
Copyright
Copyright © The Author(s), 2023. Published by Cambridge University Press

Bowers et al. challenge the notion that deep neural networks (DNNs) are the best or even a highly promising model of human cognition and recommend that future studies should test specific psychological neural phenomena and potential hypotheses by independently manipulating factors.

We agree with Bowers et al. that overall predictive power is not sufficient to have a good model, in particular not when experiments are lacking in diversity of tested stimuli (Grootswagers & Robinson, Reference Grootswagers and Robinson2021). Nevertheless, prediction is a necessary condition and a good starting point. DNNs have the power to serve as a generic model, and at the same time they can be tested on a variety of cognitive/psychological phenomena to go beyond prediction and give insight to understand the functioning of a system. Strikingly, and in contrast to how the literature is characterized by Bowers et al., the first wave of studies comparing DNNs to human vision already included studies that went beyond mere prediction on generic stimulus sets. To just take one example, the study by Kubilius, Bracci, and Op de Beeck (Reference Kubilius, Bracci and Op de Beeck2016) that is characterized as a prediction-based experiment by Bowers et al., tested a specific cognitive hypothesis (the role of nonaccidental properties, see Biederman, Reference Biederman1987) and independently manipulated shape and category similarity (Kubilius et al., Reference Kubilius, Bracci and Op de Beeck2016; see also Bracci & Op de Beeck, Reference Bracci and Op de Beeck2016; Zeman, Ritchie, Bracci, & Op de Beeck, Reference Zeman, Ritchie, Bracci and Op de Beeck2020). More in general, the goal of explanation over prediction is already a central one as shown by examples of recent work testing underlying mechanisms of object perception (Singer, Seeliger, Kietzmann, & Hebart, Reference Singer, Seeliger, Kietzmann and Hebart2022), category domains (Dobs, Martinez, Kell, & Kanwisher, Reference Dobs, Martinez, Kell and Kanwisher2022), or predictive coding (Ali, Ahmad, de Groot, van Gerven, & Kietzmann, Reference Ali, Ahmad, de Groot, van Gerven and Kietzmann2022), just to mention a few. The wealth of data that the community has gathered with DNNs in less than a decade illustrates the potential of this approach.

Bowers et al. provide many examples of failures of DNNs, on the side admitting some of the successes and progress. Many of the failures show that vanilla DNNs, as is true for all models, are not perfect and do not capture all aspects of brain processing. Revealing such limitations is generally considered essential to move the field forward towards making DNN computations more human-like (Firestone, Reference Firestone2020), and is no reason to abandon these models as long as there is an obvious road ahead with them. Some proposed examples are the addition of optical limitations reminiscent of the human eye that can make a network more robust to adversarial attacks (Elsayed et al., Reference Elsayed, Shankar, Cheung, Papernot, Kurakin, Goodfellow and Sohl-Dickstein2018), the implementation of intuitive physics (Piloto, Weinstein, Battaglia, & Botvinick, Reference Piloto, Weinstein, Battaglia and Botvinick2022), or considerations about the influence of visual system maturation and low visual acuity at birth (Avberšek, Zeman, & Op de Beeck, Reference Avberšek, Zeman and Op de Beeck2021; Jinsi, Henderson, & Tarr, Reference Jinsi, Henderson and Tarr2023).

It is difficult to reconcile a fundamental criticism of DNNs that they do not capture all psychological phenomena without further extensions, with the proposal of Bower et al to switch to alternative strategies that are much more limited in terms of the extent to which they capture the full complexity of information processing from input to output (e.g., Grossberg, Reference Grossberg1987; Hummel & Biederman, Reference Hummel and Biederman1992; McClelland, Rumelhart, & PDP Research Group, Reference McClelland and Rumelhart1986, Psych Rev). These alternative models are very appealing but also more narrow in scope. Consider, for example, the simplicity with which the well-known ALCOVE model explains categorization (Kruschke, Reference Kruschke1992), compared to the complex high-dimensional space that is the actual reality of the underlying representations (for a review, see Bracci & Op de Beeck, Reference Bracci and Op de Beeck2023). Note that we consider these alternatives to be an excellent way to obtain a conceptual understanding of a phenomenon, we all very much build on top of this pioneering work using conceptually elegant models with few parameters (e.g., Ritchie & Op de Beeck, Reference Ritchie and Op de Beeck2019). Nevertheless, scientists should not stop there. If we would, then we would be left with a wide range of niche solutions and no progress towards either a generic model that can be applied across domains, or at least a path towards it. Luckily, this path looks very promising for DNNs, given that there is a large community of relatively junior scientists that is ready to make progress (e.g., Doerig et al., Reference Doerig, Sommers, Seeliger, Richards, Ismael, Lindsay and Kietzmann2023; Naselaris et al., Reference Naselaris, Bassett, Fletcher, Kording, Kriegeskorte, Nienborg and Kay2018). The necessary modifications will move the needle in various directions, such as elaborations in terms of front-ends, architecture, learning and optimization rules, learning regime, level of neural detail (e.g., spiking networks), the addition of attentional and working memory processes, and potentially the interaction with symbolic processing. None of that will lead to the dismissal of DNNs.

We see the high capacity of DNNs as a feature, not a bug, and currently we are still on the part of the curve where higher capacity means better (Elmoznino & Bonner, Reference Elmoznino and Bonner2022). In contrast to the alternatives, DNNs confront us upfront with the complexity of human information processing because they have to work vis-à-vis an actual stimulus as an input. This is not just a faits divers, it is a necessary condition for the ideal model. DNNs and related artificial intelligence (AI) models seem to be able to stand up to this challenge, even up to the point that in some domains they can already predict empirical data about neural selectivity to real images to a greater extent than professors in cognitive neuroscience (Ratan Murty, Bashivan, Abate, DiCarlo, & Kanwisher, Reference Ratan Murty, Bashivan, Abate, DiCarlo and Kanwisher2021). The general applicability of these models and the legacy of knowledge that has by now been obtained provides a unique resource to test a wide variety of psychological and neural phenomena (e.g., Duyck, Bracci, & Op de Beeck, Reference Duyck, Bracci and Op de Beeck2022; Kanwisher, Gupta, & Dobs, Reference Kanwisher, Gupta and Dobs2023).

The way forward is to build better models, including DNN-based models that take the complexity of human vision and cognition seriously (Bracci & Op de Beeck, Reference Bracci and Op de Beeck2023). As it has been since the very early days of AI, we need continuous interaction and exchange between disciplines and their expertise at all levels (cognitive and computational psychologists, computer vision scientists, philosophers of the mind, neuroscientists) to bring us towards a common goal of a human-like AI that we understand mechanistically. Solving the deep problem of understanding biological vision will not happen by too easily dismissing DNNs and missing out on their potential.

Financial support

H. O. B. is supported by FWO research project G073122N and KU Leuven project IDN/21/010.

Competing interest

None.

References

Ali, A., Ahmad, N., de Groot, E., van Gerven, M. A. J., & Kietzmann, T. C. (2022). Predictive coding is a consequence of energy efficiency in recurrent neural networks. Patterns, 3(12), 100639.CrossRefGoogle ScholarPubMed
Avberšek, L. K., Zeman, A., & Op de Beeck, H. (2021). Training for object recognition with increasing spatial frequency: A comparison of deep learning with human vision. Journal of Vision, 21(10), 1414.CrossRefGoogle ScholarPubMed
Biederman, I. (1987). Recognition-by-components: A theory of human image understanding. Psychological Review, 94(2), 115.CrossRefGoogle ScholarPubMed
Bracci, S., & Op de Beeck, H. (2016). Dissociations and associations between shape and category representations in the two visual pathways. Journal of Neuroscience, 36(2), 432444.CrossRefGoogle ScholarPubMed
Bracci, S., & Op de Beeck, H. P. (2023). Understanding human object vision: A picture is worth a thousand representations. Annual Review of Psychology, 74, 113135.CrossRefGoogle Scholar
Dobs, K., Martinez, J., Kell, A. J., & Kanwisher, N. (2022). Brain-like functional specialization emerges spontaneously in deep neural networks. Science Advances, 8(11), eabl8913.CrossRefGoogle ScholarPubMed
Doerig, A., Sommers, R. P., Seeliger, K., Richards, B., Ismael, J., Lindsay, G. W., … Kietzmann, T. C. (2023). The neuroconnectionist research programme. Nature Reviews Neuroscience, 24(7), 431450.CrossRefGoogle ScholarPubMed
Duyck, S., Bracci, S., & Op de Beeck, H. (2022). A computational understanding of zoomorphic perception in the human brain. bioRxiv, 2022-09.Google Scholar
Elmoznino, E., & Bonner, M. F. (2022). High-performing neural network models of visual cortex benefit from high latent dimensionality. bioRxiv, 2022-07.Google Scholar
Elsayed, G., Shankar, S., Cheung, B., Papernot, N., Kurakin, A., Goodfellow, I., & Sohl-Dickstein, J. (2018). Adversarial examples that fool both computer vision and time-limited humans. Advances in Neural Information Processing Systems, 31.Google Scholar
Firestone, C. (2020). Performance vs. competence in human–machine comparisons. Proceedings of the National Academy of Sciences of the United States of America, 117(43), 2656226571.CrossRefGoogle ScholarPubMed
Grootswagers, T., & Robinson, A. K. (2021). Overfitting the literature to one set of stimuli and data. Frontiers in Human Neuroscience, 15, 682661.CrossRefGoogle ScholarPubMed
Grossberg, S. (1987). Competitive learning: From interactive activation to adaptive resonance. Cognitive Science, 11(1), 2363.CrossRefGoogle Scholar
Hummel, J. E., & Biederman, I. (1992). Dynamic binding in a neural network for shape recognition. Psychological Review, 99(3), 480.CrossRefGoogle Scholar
Jinsi, O., Henderson, M. M., & Tarr, M. J. (2023). Early experience with low-pass filtered images facilitates visual category learning in a neural network model. PLoS ONE, 18(1), e0280145.CrossRefGoogle Scholar
Kanwisher, N., Gupta, P., & Dobs, K. (2023). CNNs reveal the computational implausibility of the expertise hypothesis. iScience, 105976.CrossRefGoogle ScholarPubMed
Kruschke, J. K. (1992). ALCOVE: An exemplar-based connectionist model of category learning. Psychological Review, 99(1), 22.CrossRefGoogle ScholarPubMed
Kubilius, J., Bracci, S., & Op de Beeck, H. P. (2016). Deep neural networks as a computational model for human shape sensitivity. PLoS Computational Biology, 12(4), e1004896.CrossRefGoogle ScholarPubMed
McClelland, J. L., Rumelhart, D. E., & PDP Research Group. (1986). Parallel distributed processing (Vol. 2, pp. 2021). MIT Press.Google Scholar
Naselaris, T., Bassett, D. S., Fletcher, A. K., Kording, K., Kriegeskorte, N., Nienborg, H., … Kay, K. (2018). Cognitive computational neuroscience: A new conference for an emerging discipline. Trends in Cognitive Sciences, 22(5), 365367.CrossRefGoogle Scholar
Piloto, L. S., Weinstein, A., Battaglia, P., & Botvinick, M. (2022). Intuitive physics learning in a deep-learning model inspired by developmental psychology. Nature Human Behaviour, 6(9), 12571267.CrossRefGoogle Scholar
Ratan Murty, N. A., Bashivan, P., Abate, A., DiCarlo, J. J., & Kanwisher, N. (2021). Computational models of category-selective brain regions enable high-throughput tests of selectivity. Nature Communications, 12(1), 5540.CrossRefGoogle ScholarPubMed
Ritchie, J. B., & Op de Beeck, H. (2019). A varying role for abstraction in models of category learning constructed from neural representations in early visual cortex. Journal of Cognitive Neuroscience, 31(1), 155173.CrossRefGoogle ScholarPubMed
Singer, J. J., Seeliger, K., Kietzmann, T. C., & Hebart, M. N. (2022). From photos to sketches – How humans and deep neural networks process objects across different levels of visual abstraction. Journal of Vision, 22(2), 44.CrossRefGoogle ScholarPubMed
Zeman, A. A., Ritchie, J. B., Bracci, S., & Op de Beeck, H. (2020). Orthogonal representations of object shape and category in deep convolutional neural networks and human visual cortex. Scientific Reports, 10(1), 2453.CrossRefGoogle ScholarPubMed