Clarifying status of DNNs as models of human vision

Jeffrey S. Bowers; Gaurav Malhotra; Marin Dujmović; Milton L. Montero; Christian Tsvetkov; Valerio Biscione; Guillermo Puebla; Federico Adolfi; John E. Hummel; Rachel F. Heaton; Benjamin D. Evans; Jeffrey Mitchell; Ryan Blything

doi:10.1017/S0140525X23002777

Clarifying status of DNNs as models of human vision

Published online by Cambridge University Press: 06 December 2023

John E. Hummel and

Jeffrey S. Bowers: Affiliation:
School of Psychological Science, University of Bristol, Bristol, UK [email protected]; https://jeffbowers.blogs.bristol.ac.uk/ [email protected] [email protected] [email protected] [email protected] [email protected]
Gaurav Malhotra: Affiliation:
School of Psychological Science, University of Bristol, Bristol, UK [email protected]; https://jeffbowers.blogs.bristol.ac.uk/ [email protected] [email protected] [email protected] [email protected] [email protected]
Marin Dujmović: Affiliation:
School of Psychological Science, University of Bristol, Bristol, UK [email protected]; https://jeffbowers.blogs.bristol.ac.uk/ [email protected] [email protected] [email protected] [email protected] [email protected]
Milton L. Montero: Affiliation:
School of Psychological Science, University of Bristol, Bristol, UK [email protected]; https://jeffbowers.blogs.bristol.ac.uk/ [email protected] [email protected] [email protected] [email protected] [email protected]
Christian Tsvetkov: Affiliation:
School of Psychological Science, University of Bristol, Bristol, UK [email protected]; https://jeffbowers.blogs.bristol.ac.uk/ [email protected] [email protected] [email protected] [email protected] [email protected]
Valerio Biscione: Affiliation:
School of Psychological Science, University of Bristol, Bristol, UK [email protected]; https://jeffbowers.blogs.bristol.ac.uk/ [email protected] [email protected] [email protected] [email protected] [email protected]
Guillermo Puebla: Affiliation:
National Center for Artificial Intelligence, Macul, Chile [email protected]
Federico Adolfi: Affiliation:
Ernst Strüngmann Institute (ESI) for Neuroscience in Cooperation with Max Planck Society, Frankfurt am Main, Germany [email protected]
John E. Hummel: Affiliation:
Psychology Department, University of Illinois Urbana–Champaign, Champaign, IL, USA [email protected] [email protected]
Rachel F. Heaton: Affiliation:
Psychology Department, University of Illinois Urbana–Champaign, Champaign, IL, USA [email protected] [email protected]
Benjamin D. Evans: Affiliation:
Department of Informatics, School of Engineering and Informatics, University of Sussex, Brighton, UK [email protected] [email protected]
Jeffrey Mitchell: Affiliation:
Department of Informatics, School of Engineering and Informatics, University of Sussex, Brighton, UK [email protected] [email protected]
Ryan Blything: Affiliation:
School of Psychology, Aston University, Birmingham, UK [email protected]

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

On several key issues we agree with the commentators. Perhaps most importantly, everyone seems to agree that psychology has an important role to play in building better models of human vision, and (most) everyone agrees (including us) that deep neural networks (DNNs) will play an important role in modelling human vision going forward. But there are also disagreements about what models are for, how DNN–human correspondences should be evaluated, the value of alternative modelling approaches, and impact of marketing hype in the literature. In our view, these latter issues are contributing to many unjustified claims regarding DNN–human correspondences in vision and other domains of cognition. We explore all these issues in this response.

Type: Authors' Response
Information: Behavioral and Brain Sciences , Volume 46 , 2023 , e415

DOI: https://doi.org/10.1017/S0140525X23002777 [Opens in a new window]
Copyright: Copyright © The Author(s), 2023. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Anciukevicius, T., Fox-Roberts, P., Rosten, E., & Henderson, P. (2022). Unsupervised causal generative understanding of images. Advances in Neural Information Processing Systems, 35, 37037–37054.Google Scholar

Baker, N., & Elder, J. H. (2022). Deep learning models fail to capture the configural nature of human shape perception. iScience, 25(9), 104913.CrossRef Google Scholar PubMed

Baker, N., Garrigan, P., & Kellman, P. J. (2021). Constant curvature segments as building blocks of 2D shape representation. Journal of Experimental Psychology: General, 150(8), 1556–1580.CrossRef Google Scholar PubMed

Biederman, I. (1972). Perceiving real-world scenes. Science (New York, N.Y.), 177, 77–80.CrossRef Google Scholar PubMed

Biederman, I. (1987). Recognition-by-components: A theory of human image understanding. Psychological Review, 94, 115–147.CrossRef Google Scholar PubMed

Biscione, V., & Bowers, J. S. (2022). Learning online visual invariances for novel objects via supervised and self-supervised training. Neural Networks, 150, 222–236.CrossRef Google Scholar PubMed

Biscione, V., Yin, D., Malhotra, G., Dujmović, M., Montero, M., Puebla, G., … Bowers, J. S, . (2023). Introducing the MindSet benchmark for comparing DNNs to human vision. PsyArXiv. https://doi.org/10.31234/osf.io/cneypGoogle Scholar

Bowers, J. S. (2022). Researchers comparing DNNs to brains need to adopt standard Methods of Science. Invited workshop talk at Neural Information Processing Systems, New Orleans.Google Scholar

Bowers, J. S., Malhotra, G., Adolfi, F. G., Dujmović, M., Montero, M. L., Biscione, V., … Heaton, R. F. (2023). On the importance of severely testing deep learning models of cognition. PsyArXiv, 1–34. https://doi.org/10.31234/osf.io/wzns2Google Scholar

Carpenter, G. A., & Grossberg, S. (1987). A massively parallel architecture for a self-organizing neural pattern recognition machine. Computer Vision, Graphics, and Image Processing, 37, 54–115.CrossRef Google Scholar

Caucheteux, C., Gramfort, A., & King, J. R. (2022). Deep language algorithms predict semantic comprehension from brain activity. Scientific Reports, 12, 1–10.CrossRef Google Scholar PubMed

Cavanagh, P., Hénaff, M. A., Michel, F., Landis, T., Troscianko, T., & Intriligator, J. (1998). Complete sparing of high-contrast color input to motion perception in cortical color blindness. Nature Neuroscience, 1, 242–247.CrossRef Google Scholar PubMed

Cowan, N. (2001). The magical number 4 in short-term memory: A reconsideration of mental storage capacity. Behavioral and Brain Sciences, 24, 87–114.CrossRef Google Scholar PubMed

Dagaev, N., Roads, B. D., Luo, X., Barry, D. N., Patil, K. R., & Love, B. C. (2023). A too-good-to-be-true prior to reduce shortcut reliance. Pattern Recognition Letters, 166, 164–171.CrossRef Google Scholar PubMed

Da Silva, L. E. B., Elnabarawy, I., & Wunsch, D. C. II. (2019). A survey of adaptive resonance theory neural network models for engineering applications. Neural Networks, 120, 167–203.CrossRef Google Scholar

de Vries, J. P., Akbarinia, A., Flachot, A., & Gegenfurtner, K. R. (2022). Emergent color categorization in a neural network trained for object recognition. eLife, 11, e76472. https://doi.org/10.7554/eLife.76472CrossRef Google Scholar

Doerig, A., Sommers, R. P., Seeliger, K., Richards, B., Ismael, J., Lindsay, G. W., … Kietzmann, T. C. (2023). The neuroconnectionist research programme. Nature Reviews Neuroscience, 24, 431–450. https://doi.org/10.1038/s41583-023-00705-wCrossRef Google Scholar PubMed

Duan, S., Matthey, L., Saraiva, A., Watters, N., Burgess, C. P., Lerchner, A., & Higgins, I. (2020). Unsupervised model selection for variational disentangled representation learning. In Proceedings of the 8th international conference on learning representations. https://openreview.net/forum? id=SyxL2TNtvr Google Scholar

Dujmović, M., Bowers, J. S., Adolfi, F., & Malhotra, G. (2023). Obstacles to inferring mechanistic similarity using Representational Similarity Analysis. bioRxiv. https://doi.org/10.1101/2022.04.05.487135Google Scholar

Evans, B. D., Malhotra, G., & Bowers, J. S. (2022). Biological convolutions improve DNN robustness to noise and generalisation. Neural Networks, 148, 96–110.CrossRef Google Scholar PubMed

Francis, G., Manassi, M., & Herzog, M. H. (2017). Neural dynamics of grouping and segmentation explain properties of visual crowding. Psychological Review, 124, 483–504.CrossRef Google Scholar PubMed

Fukushima, K. (1980). Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, 36, 193–202.CrossRef Google Scholar PubMed

Garg, A. K., Li, P., Rashid, M. S., & Callaway, E. M. (2019). Color and orientation are jointly coded and spatially organized in primate primary visual cortex. Science (New York, N.Y.), 364(6447), 1275–1279. https://doi.org/10.1126/science.aaw5868CrossRef Google Scholar PubMed

Geirhos, R., Jacobsen, J. H., Michaelis, C., Zemel, R., Brendel, W., Bethge, M., & Wichmann, F. A. (2020). Shortcut learning in deep neural networks. Nature Machine Intelligence, 2, 665–673.CrossRef Google Scholar

George, D., Lazaro-Gredilla, M., Lehrach, W., Dedieu, A., & Zhou, G. (2020). A detailed mathematical theory of thalamic and cortical microcircuits based on inference in a generative vision model. bioRxiv, 2020-09.CrossRef Google Scholar

George, D., Lehrach, W., Kansky, K., Lázaro-Gredilla, M., Laan, C., Marthi, B., … Phoenix, D. S. (2017). A generative vision model that trains with high data efficiency and breaks text-based CAPTCHAs. Science (New York, N.Y.), 358(6368), eaag2612.CrossRef Google Scholar PubMed

Goodale, M. A., & Milner, A. D. (1992). Separate visual pathways for perception and action. Trends in Neurosciences, 15, 20–25.CrossRef Google Scholar PubMed

Goodale, M. A., & Milner, A. D. (2023). Shape perception does not require dorsal stream processing. Trends in Cognitive Sciences, 27, 333–334. https://doi.org/10.1016/j.tics.2022.12.007CrossRef Google Scholar PubMed

Grossberg, S. (1967). Nonlinear difference-differential equations in prediction and learning theory. Proceedings of the National Academy of Sciences of the United States of America, 58, 1329–1334.CrossRef Google Scholar PubMed

Grossberg, S. (1980). How does a brain build a cognitive code? Psychological Review, 87, 1–51.CrossRef Google Scholar PubMed

Grossberg, S. (2003). Filling-in the forms: Surface and boundary interactions in visual cortex. In Pessoa, L. & de Weerd, P. (Eds.), Filling-in (pp. 13–37). Oxford University Press.CrossRef Google Scholar

Grossberg, S. (2014). How visual illusions illuminate complementary brain processes: Illusory depth from brightness and apparent motion of illusory contours. Frontiers in Human Neuroscience, 8, 854. https://doi.org/10.3389/fnhum.2014.00854CrossRef Google Scholar PubMed

Grossberg, S. (2021). Conscious mind, resonant brain: How each brain makes a mind. Oxford University Press.CrossRef Google Scholar

Hermann, K. L., Chen, T., & Kornblith, S. (2020). The origins and prevalence of texture bias in convolutional neural networks. Advances in Neural Information Processing Systems, 33, 19000–19015.Google Scholar

Higgins, I., Matthey, L., Pal, A., Burgess, C., Glorot, X., Botvinick, M., … Lerchner, , A. (2017). Beta-vae: Learning basic visual concepts with a constrained variational framework. In 5th International conference on learning representations, Toulon, France.Google Scholar

Hinton, G. (1979). Some demonstrations of the effects of structural descriptions in mental imagery. Cognitive Science, 3, 231–250.Google Scholar

Hinton, G. (2022). How to represent part-whole hierarchies in a neural network. Neural Computation, 35, 413–452.CrossRef Google Scholar

Huang, K., Arehalli, S., Kugemoto, M., Muxica, C., Prasad, G., Dillon, B., & Linzen, T. (2023). Surprisal does not explain syntactic disambiguation difficulty: Evidence from a large-scale benchmark. PsyArXiv, 1–79. https://doi.org/10.31234/osf.io/z38u6Google Scholar

Hubel, D. H., & Wiesel, T. N. (1962). Receptive fields, binocular interaction and functional architecture in the cat's visual cortex. The Journal of Physiology, 160, 106–152.CrossRef Google Scholar PubMed

Hummel, J. E. (2001). Complementary solutions to the binding problem in vision: Implications for shape perception and object recognition. Visual Cognition, 8, 489–517.CrossRef Google Scholar

Hummel, J. E., & Biederman, I. (1992). Dynamic binding in a neural network for shape recognition. Psychological Review, 99, 480–517. https://doi.org/10.1037/0033-295X.99.3.480CrossRef Google Scholar

Hummel, J. E., & Stankiewicz, B. J. (1996). An architecture for rapid, hierarchical structural description. In Inui, T. & McClelland, J. (Eds.), Attention and performance XVI: Information integration in perception and communication (pp. 93–121). MIT Press.Google Scholar

Irwin, D. E. (1991). Information integration across saccadic eye movements. Cognitive Psychology, 23, 420–456.CrossRef Google Scholar PubMed

Izhikevich, E. M. (2004). Which model to use for cortical spiking neurons? IEEE Transactions on Neural Networks, 15(5), 1063–1070. https://doi.org/10.1109/TNN.2004.832719CrossRef Google Scholar PubMed

Kim, B., Reif, E., Wattenberg, M., Bengio, S., & Mozer, M. C. (2021). Neural networks trained on natural scenes exhibit gestalt closure. Computational Brain & Behavior, 4, 251–263.CrossRef Google Scholar

Kubilius, J., Bracci, S., & Op de Beeck, H. P. (2016). Deep neural networks as a computational model for human shape sensitivity. PLoS Computational Biology, 12, e1004896.CrossRef Google Scholar PubMed

Lavin, A., Guntupalli, J. S., Lázaro-Gredilla, M., Lehrach, W., & George, D. (2018). Explaining visual cortex phenomena using recursive cortical network. bioRxiv, 380048. https://doi.org/10.1101/380048Google Scholar

Livingstone, M., & Hubel, D. (1988). Segregation of form, color, movement, and depth: Anatomy, physiology, and perception. Science (New York, N.Y.), 240, 740–749.CrossRef Google Scholar PubMed

Locatello, F., Poole, B., Rätsch, G., Schölkopf, B., Bachem, O., & Tschannen, M. (2020, November). Weakly-supervised disentanglement without compromises. In International conference on machine learning, Vienna, Austria (pp. 6348–6359).Google Scholar

Malhotra, G., Dujmović, M., Hummel, J., & Bowers, J. S. (in press). Human shape representations are not an emergent property of learning to classify objects. Journal of Experimental Psychology: General.Google Scholar

Malhotra, G., Evans, B. D., & Bowers, J. S. (2020). Hiding a plane with a pixel: Examining shape-bias in CNNs and the benefit of building in biological constraints. Vision Research, 174, 57–68. https://doi.org/10.1016/j.visres.2020.04.013CrossRef Google Scholar PubMed

Marr, D. (1982). Vision: A computational investigation into the human representation and processing of visual information. MIT Press.Google Scholar

Matin, E. (1974). Saccadic suppression: A review and an analysis. Psychological Bulletin, 81, 899–917.CrossRef Google Scholar PubMed

Mayo, D. G. (2018). Statistical inference as severe testing. Cambridge University Press.CrossRef Google Scholar

Medin, D. L., & Schaffer, M. M. (1978). Context theory of classification learning. Psychological Review, 85, 207.CrossRef Google Scholar

Meijer, G. (2021). Neurons in the mouse brain correlate with cryptocurrency price: A cautionary tale. Peer Community Journal, 1, e29.CrossRef Google Scholar

Messina, N., Amato, G., Carrara, F., Gennaro, C., & Falchi, F. (2021). Solving the same-different task with convolutional neural networks. Pattern Recognition Letters, 143, 75–80.CrossRef Google Scholar

Mitchell, J., & Bowers, J. (2020, December). Priorless recurrent networks learn curiously. In Proceedings of the 28th international conference on computational linguistics (pp. 5147–5158). International Committee on Computational Linguistics. https://doi.org/10.18653/v1/2020.coling-main.451CrossRef Google Scholar

Montero, M., Bowers, J., Ponte Costa, R., Ludwig, C., & Malhotra, G. (2022). Lost in latent space: Examining failures of disentangled models at combinatorial generalisation. Advances in Neural Information Processing Systems, 35, 10136–10149.Google Scholar

Nakayama, K., & Shimojo, S. (1992). Experiencing and perceiving visual surfaces. Science (New York, N.Y.), 257, 1357–1363.CrossRef Google Scholar PubMed

Newell, A. (1973). You can't play 20 questions with nature and win: Projective comments on the papers of this symposium. In W. G. Chase (Ed.), Visual information processing: Proceedings of the eighth annual Carnegie symposium on cognition, held at the Carnegie-Mellon University, Pittsburgh, Pennsylvania, May 19, 1972. Academic Press.Google Scholar

Puebla, G., & Bowers, J. S. (2022). Can deep convolutional neural networks support relational reasoning in the same-different task?. Journal of Vision, 22, 11. https://doi.org/10.1167/jov.22.10.11CrossRef Google Scholar PubMed

Puebla, G., & Bowers, J. S. (2023). The role of object-centric representations, guided attention, and external memory on generalizing visual relations. arXiv preprint arXiv:2304.07091.Google Scholar

Ramachandran, V. S., & Gregory, R. L. (1991). Perceptual filling in of artificially induced scotomas in human vision. Nature, 350, 699–702.CrossRef Google Scholar PubMed

Ratcliff, R., & McKoon, G. (2008). The diffusion decision model: Theory and data for two-choice decision tasks. Neural Computation, 20, 873–922.CrossRef Google Scholar PubMed

Rayner, K. (1978). Eye movements in reading and information processing. Psychological Bulletin, 85, 618–660.CrossRef Google Scholar PubMed

Rescorla, R. A., & Wagner, A. R. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In Black, A. H. & Prokasy, W. F. (Eds.), Classical conditioning II: Current research and theory (Vol. 2, pp. 64–69). Appleton-Century Crofts.Google Scholar

Rich, P., de Haan, R., Wareham, T., & van Rooij, I. (2021). How hard is cognitive science? In Proceedings of the annual meeting of the cognitive science society (Vol. 43, No. 43).Google Scholar

Rust, N. C., & Movshon, J. A. (2005). In praise of artifice. Nature Neuroscience, 8, 1647–1650. https://doi.org/10.1038/nn1606CrossRef Google Scholar PubMed

Sabour, S., Frosst, N., & Hinton, G. E. (2017). Dynamic routing between capsules. Advances in Neural Information Processing Systems, 30, 3856–3866.Google Scholar

Schrimpf, M., Blank, I. A., Tuckute, G., Kauf, C., Hosseini, E. A., Kanwisher, N., … Fedorenko, E. (2021). The neural architecture of language: Integrative modeling converges on predictive processing. Proceedings of the National Academy of Sciences of the United States of America, 118, e2105646118.CrossRef Google Scholar PubMed

Sexton, N. J., & Love, B. C. (2022). Reassessing hierarchical correspondences between brain and deep networks through direct interface. Science Advances, 8, eabm2219.CrossRef Google Scholar PubMed

Shepard, R. N. (1987). Toward a universal law of generalization for psychological science. Science (New York, N.Y.), 237, 1317–1323.CrossRef Google Scholar

Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22, 1359–1366.CrossRef Google Scholar PubMed

Simons, D. J., & Levin, D. T. (1997). Change blindness. Trends in Cognitive Sciences, 1, 261–267.CrossRef Google Scholar PubMed

Storrs, K. R., Kietzmann, T. C., Walther, A., Mehrer, J., & Kriegeskorte, N. (2021). Diverse deep neural networks all predict human inferior temporal cortex well, after training and fitting. Journal of Cognitive Neuroscience, 33, 2044–2064.Google Scholar PubMed

Thorpe, S., Fize, D., & Marlot, C. (1996). Speed of processing in the human visual system. Nature, 381, 520–522.CrossRef Google Scholar PubMed

Tsvetkov, C., Malhotra, G., Evans, B. D., & Bowers, J. S. (2023). The role of capacity constraints in convolutional neural networks for learning random versus natural data. Neural Networks, 161, 515–524.CrossRef Google Scholar PubMed

Vannuscorps, G., Galaburda, A., & Caramazza, A. (2021). The form of reference frames in vision: The case of intermediate shape-centered representations. Neuropsychologia, 162, 108053.CrossRef Google Scholar PubMed

van Rooij, I. (2022). Psychological models and their distractors. Nature Reviews Psychology, 1, 127–128. https://doi.org/10.1038/s44159-022-00031-5CrossRef Google Scholar

Wagemans, J., Elder, J. H., Kubovy, M., Palmer, S. E., Peterson, M. A., Singh, M., & von der Heydt, R. (2012). A century of Gestalt psychology in visual perception: I. Perceptual grouping and figure-ground organization. Psychological Bulletin, 138, 1172.CrossRef Google Scholar PubMed

Winograd, T. (1971). Procedures as a representation for data in a computer program for understanding natural language. AITR-235. Retrieved from http://hdl.handle.net/1721.1/7095 Google Scholar

Yovel, G., Grosbard, I., & Abudarham, N. (2022). Computational models of perceptual expertise reveal a domain-specific inversion effect for objects of expertise. PsyXiv, 1–25.Google Scholar

Zeki, S. (1991). Cerebral akinetopsia (visual motion blindness). A review. Brain, 114, 811–824.CrossRef Google Scholar PubMed

Zhang, H., Zhang, Y. F., Liu, W., Weller, A., Schölkopf, B., & Xing, E. P. (2022). Towards principled disentanglement for domain generalization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 8024–8034).CrossRef Google Scholar