Deep problems with neural network models of human vision

Jeffrey S. Bowers; Gaurav Malhotra; Marin Dujmović; Milton Llera Montero; Christian Tsvetkov; Valerio Biscione; Guillermo Puebla; Federico Adolfi; John E. Hummel; Rachel F. Heaton; Benjamin D. Evans; Jeffrey Mitchell; Ryan Blything

doi:10.1017/S0140525X22002813

Deep problems with neural network models of human vision

Published online by Cambridge University Press: 01 December 2022

Jeffrey S. Bowers

Gaurav Malhotra ,

Marin Dujmović ,

Milton Llera Montero ,

John E. Hummel and

Jeffrey S. Bowers: Affiliation:
School of Psychological Science, University of Bristol, Bristol, UK [email protected]; https://jeffbowers.blogs.bristol.ac.uk/ [email protected] [email protected] [email protected] [email protected] [email protected] [email protected]
Gaurav Malhotra: Affiliation:
School of Psychological Science, University of Bristol, Bristol, UK [email protected]; https://jeffbowers.blogs.bristol.ac.uk/ [email protected] [email protected] [email protected] [email protected] [email protected] [email protected]
Marin Dujmović: Affiliation:
School of Psychological Science, University of Bristol, Bristol, UK [email protected]; https://jeffbowers.blogs.bristol.ac.uk/ [email protected] [email protected] [email protected] [email protected] [email protected] [email protected]
Milton Llera Montero: Affiliation:
School of Psychological Science, University of Bristol, Bristol, UK [email protected]; https://jeffbowers.blogs.bristol.ac.uk/ [email protected] [email protected] [email protected] [email protected] [email protected] [email protected]
Christian Tsvetkov: Affiliation:
School of Psychological Science, University of Bristol, Bristol, UK [email protected]; https://jeffbowers.blogs.bristol.ac.uk/ [email protected] [email protected] [email protected] [email protected] [email protected] [email protected]
Valerio Biscione: Affiliation:
School of Psychological Science, University of Bristol, Bristol, UK [email protected]; https://jeffbowers.blogs.bristol.ac.uk/ [email protected] [email protected] [email protected] [email protected] [email protected] [email protected]
Guillermo Puebla: Affiliation:
School of Psychological Science, University of Bristol, Bristol, UK [email protected]; https://jeffbowers.blogs.bristol.ac.uk/ [email protected] [email protected] [email protected] [email protected] [email protected] [email protected]
Federico Adolfi: Affiliation:
School of Psychological Science, University of Bristol, Bristol, UK [email protected]; https://jeffbowers.blogs.bristol.ac.uk/ [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] Ernst Strüngmann Institute (ESI) for Neuroscience in Cooperation with Max Planck Society, Frankfurt am Main, Germany [email protected]
John E. Hummel: Affiliation:
Department of Psychology, University of Illinois Urbana–Champaign, Champaign, IL, USA [email protected] [email protected]
Rachel F. Heaton: Affiliation:
Department of Psychology, University of Illinois Urbana–Champaign, Champaign, IL, USA [email protected] [email protected]
Benjamin D. Evans: Affiliation:
Department of Informatics, School of Engineering and Informatics, University of Sussex, Brighton, UK [email protected] [email protected]
Jeffrey Mitchell: Affiliation:
Department of Informatics, School of Engineering and Informatics, University of Sussex, Brighton, UK [email protected] [email protected]
Ryan Blything: Affiliation:
School of Psychology, Aston University, Birmingham, UK [email protected]

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Deep neural networks (DNNs) have had extraordinary successes in classifying photographic images of objects and are often described as the best models of biological vision. This conclusion is largely based on three sets of findings: (1) DNNs are more accurate than any other model in classifying images taken from various datasets, (2) DNNs do the best job in predicting the pattern of human errors in classifying objects taken from various behavioral datasets, and (3) DNNs do the best job in predicting brain signals in response to images taken from various brain datasets (e.g., single cell responses or fMRI data). However, these behavioral and brain datasets do not test hypotheses regarding what features are contributing to good predictions and we show that the predictions may be mediated by DNNs that share little overlap with biological vision. More problematically, we show that DNNs account for almost no results from psychological research. This contradicts the common claim that DNNs are good, let alone the best, models of human object recognition. We argue that theorists interested in developing biologically plausible models of human vision need to direct their attention to explaining psychological findings. More generally, theorists need to build models that explain the results of experiments that manipulate independent variables designed to test hypotheses rather than compete on making the best predictions. We conclude by briefly summarizing various promising modeling approaches that focus on psychological data.

Keywords

Brain-Score computational neuroscience deep neural networks human vision object recognition

Type: Target Article
Information: Behavioral and Brain Sciences , Volume 46 , 2023 , e385

DOI: https://doi.org/10.1017/S0140525X22002813 [Opens in a new window]
Copyright: Copyright © The Author(s), 2022. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Adolfi, F., Bowers, J. S., & Poeppel, D. (2023). Successes and critical failures of neural networks in capturing human-like speech recognition. Neural Networks, 162, 199–211.CrossRef Google Scholar PubMed

Alcorn, M. A., Li, Q., Gong, Z., Wang, C., Mai, L., Ku, W. S., & Nguyen, A. (2019, June). Strike (with) a pose: Neural networks are easily fooled by strange poses of familiar objects. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, Long Beach, USA (pp. 4845–4854).CrossRef Google Scholar

Alexander, D. M., & Van Leeuwen, C. (2010). Mapping of contextual modulation in the population response of primary visual cortex. Cognitive Neurodynamics, 4(1), 1–24.CrossRef Google Scholar PubMed

Ba, J., Mnih, V., & Kavukcuoglu, K. (2014). Multiple object recognition with visual attention. arXiv preprint arXiv:1412.7755, 1–10.Google Scholar

Baker, N., Kellman, P. J., Erlikhman, G., & Lu, H. (2018a). Deep convolutional networks do not perceive illusory contours. In Proceedings of the 40th annual conference of the cognitive science society, Cognitive Science Society, Austin, TX (pp. 1310–1315).Google Scholar

Baker, N., Lu, H., Erlikhman, G., & Kellman, P. J. (2018b). Deep convolutional networks do not classify based on global object shape. PLoS Computational Biology, 14(12), e1006613.CrossRef Google Scholar

Barrett, D., Hill, F., Santoro, A., Morcos, A., & Lillicrap, T. (2018, July). Measuring abstract reasoning in neural networks. In International conference on machine learning, Stockholm, Sweden (pp. 511–520).Google Scholar

Bhattasali, N. X., Tomov, M., & Gershman, S. (2021, June). CCNLab: A benchmarking framework for computational cognitive neuroscience. In Thirty-fifth conference on neural information processing systems datasets and benchmarks track (round 1). Virtual conference.Google Scholar

Biederman, I. (1987). Recognition-by-components: A theory of human image understanding. Psychological Review, 94(2), 115–147.CrossRef Google Scholar PubMed

Biederman, I., & Ju, G. (1988). Surface versus edge-based determinants of visual recognition. Cognitive Psychology, 20(1), 38–64.CrossRef Google Scholar PubMed

Biscione, V., & Bowers, J. S. (2021). Convolutional neural networks are not invariant to translation, but they can learn to be. Journal of Machine Learning Research, 22, 1–28.Google Scholar

Biscione, V., & Bowers, J. S. (2022). Learning online visual invariances for novel objects via supervised and self-supervised training. Neural Networks, 150, 222–236. https://doi.org/10.1016/j.neunet.2022.02.017CrossRef Google Scholar PubMed

Biscione, V., & Bowers, J. S. (2023). Mixed evidence for gestalt grouping in deep neural networks. Computational Brain & Behavior. https://doi.org/10.1007/s42113-023-00169-2CrossRef Google Scholar

Blything, R., Biscione, V., & Bowers, J. (2020). A case for robust translation tolerance in humans and CNNs. A commentary on Han et al. arXiv preprint arXiv:2012.05950, 1–8.Google Scholar

Blything, R., Biscione, V., Vankov, I. I., Ludwig, C. J. H., & Bowers, J. S. (2021). The human visual system and CNNs can both support robust online translation tolerance following extreme displacements. Journal of Vision, 21(2), 9, 1–16. https://doi.org/10.1167/jov.21.2.9CrossRef Google Scholar PubMed

Bowers, J. S. (2017). Parallel distributed processing theory in the age of deep networks. Trends in Cognitive Science, 21, 950–961.CrossRef Google Scholar PubMed

Bowers, J. S., & Davis, C. J. (2012a). Bayesian just-so stories in psychology and neuroscience. Psychological Bulletin, 138, 389–414. doi:10.1037/a0026450CrossRef Google Scholar PubMed

Bowers, J. S., & Davis, C. J. (2012b). Is that what Bayesians believe? Reply to Griffiths, Chater, Norris, and Pouget (2012). Psychological Bulletin, 138, 423–426. doi:10.1037/a0027750CrossRef Google Scholar PubMed

Bowers, J. S., & Jones, K. W. (2007). Detecting objects is easier than categorizing them. Quarterly Journal of Experimental Psychology, 61, 552–557.CrossRef Google Scholar

Bowers, J. S., Vankov, I. I., & Ludwig, C. J. (2016). The visual system supports online translation invariance for object identification. Psychonomic Bulletin & Review, 23, 432–438.CrossRef Google Scholar PubMed

Burgess, C. P., Matthey, L., Watters, N., Kabra, R., Higgins, I., Botvinick, M., & Lerchner, A. (2019). Monet: Unsupervised scene decomposition and representation. arXiv preprint arXiv:1901.11390, 1–22.Google Scholar

Cao, Y., Grossberg, S., & Markowitz, J. (2011). How does the brain rapidly learn and reorganize view-invariant and position-invariant object representations in the inferotemporal cortex? Neural Networks, 24(10), 1050–1061.CrossRef Google Scholar PubMed

Carpenter, G. A., & Grossberg, S. (1981). Adaptation and transmitter gating in vertebrate photoreceptors. Journal of Theoretical Neurobiology, 1(1), 1–42.Google Scholar

Caucheteux, C., Gramfort, A., & King, J. R. (2022). Deep language algorithms predict semantic comprehension from brain activity. Scientific Reports, 12(1), 1–10.CrossRef Google Scholar PubMed

Cavanagh, P., Hénaff, M. A., Michel, F., Landis, T., Troscianko, T., & Intriligator, J. (1998). Complete sparing of high-contrast color input to motion perception in cortical color blindness. Nature Neuroscience, 1(3), 242–247.CrossRef Google Scholar PubMed

Cichy, R. M., Khosla, A., Pantazis, D., Torralba, A., & Oliva, A. (2016). Comparison of deep neural networks to spatiotemporal cortical dynamics of human visual object recognition reveals hierarchical correspondence. Scientific Reports, 6, 1–13.CrossRef Google Scholar PubMed

Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37–46.CrossRef Google Scholar

Cowan, N. (2001). The magical number 4 in short-term memory: A reconsideration of mental storage capacity. Behavioral and Brain Sciences, 24(1), 87–114.CrossRef Google Scholar PubMed

Crosby, M., Beyret, B., & Halina, M. (2019). The animal-AI Olympics. Nature Machine Intelligence, 1(5), 257–257.CrossRef Google Scholar

Doumas, L. A. A., Puebla, G., Martin, A. E., & Hummel, J. E. (2022). A theory of relation learning and cross-domain generalization. Psychological Review, 129(5), 999–1041. https://doi.org/10.1037/rev0000346CrossRef Google Scholar PubMed

Driver, J., & Baylis, G. C. (1996). Edge-assignment and figure-ground segmentation in short-term visual matching. Cognitive Psychology, 31(3), 248–306.CrossRef Google Scholar PubMed

Duan, S., Matthey, L., Saraiva, A., Watters, N., Burgess, C. P., Lerchner, A., & Higgins, I. (2019). Unsupervised model selection for variational disentangled representation learning. arXiv preprint arXiv:1905.12614, 1–29.Google Scholar

Dujmović, M., Bowers, J. S., Adolfi, F., & Malhotra, G. (2022). Some pitfalls of measuring representational similarity using representational similarity analysis. arXiv preprint, 1–48. https://www.biorxiv.org/content/10.1101/2022.04.05.487135v1 Google Scholar

Dujmović, M., Bowers, J. S., Adolfi, F., & Malhotra, G. (2023). Obstacles to inferring mechanistic similarity using Representational Similarity Analysis. bioRxiv. https://doi.org/10.1101/2022.04.05.487135Google Scholar

Dujmović, M., Malhotra, G., & Bowers, J. S. (2020). What do adversarial images tell us about human vision? eLife, 9, e55978.CrossRef Google Scholar PubMed

Duncan, J., & Humphreys, G. W. (1989). Visual search and stimulus similarity. Psychological Review, 96(3), 433–458.CrossRef Google Scholar PubMed

Elmoznino, E., & Bonner, M. F. (2022). High-performing neural network models of visual cortex benefit from high latent dimensionality. bioRxiv, 1–33. https://doi.org/10.1101/2022.07.13.499969Google Scholar

Erdogan, G., & Jacobs, R. A. (2017). Visual shape perception as Bayesian inference of 3D object-centered shape representations. Psychological Review, 124(6), 740–761.CrossRef Google Scholar PubMed

Evans, B. D., Malhotra, G., & Bowers, J. S. (2022). Biological convolutions improve DNN robustness to noise and generalisation. Neural Networks, 148, 96–110. https://doi.org/10.1016/j.neunet.2021.12.005CrossRef Google Scholar PubMed

Farah, M. J. (2004). Visual agnosia. MIT Press.CrossRef Google Scholar

Feather, J., Durango, A., Gonzalez, R., & McDermott, J. (2019). Metamers of neural networks reveal divergence from human perceptual systems. Advances in Neural Information Processing Systems, 32, 1–12.Google Scholar

Fleming, R. W., & Storrs, K. R. (2019). Learning to see stuff. Current Opinion in Behavioral Sciences, 30, 100–108.CrossRef Google Scholar PubMed

Fodor, J. A., & Pylyshyn, Z. W. (1988). Connectionism and cognitive architecture: A critical analysis. Cognition, 28(1–2), 3–71.CrossRef Google Scholar PubMed

Francis, G., Manassi, M., & Herzog, M. H. (2017). Neural dynamics of grouping and segmentation explain properties of visual crowding. Psychological Review, 124(4), 483–504.CrossRef Google Scholar PubMed

Funke, C. M., Borowski, J., Stosio, K., Brendel, W., Wallis, T. S., & Bethge, M. (2021). Five points to check when comparing visual perception in humans and machines. Journal of Vision, 21(3), 16, 1–23.CrossRef Google Scholar PubMed

Garrigan, P., & Kellman, P. J. (2008). Perceptual learning depends on perceptual constancy. Proceedings of the National Academy of Sciences of the United States of America, 105(6), 2248–2253.CrossRef Google Scholar PubMed

Gauthier, I., & Tarr, M. J. (2016). Visual object recognition: Do we (finally) know more now than we did? Annual Review of Vision Science, 2, 377–396.CrossRef Google Scholar PubMed

Geirhos, R., Jacobsen, J. H., Michaelis, C., Zemel, R., Brendel, W., Bethge, M., & Wichmann, F. A. (2020a). Shortcut learning in deep neural networks. Nature Machine Intelligence, 2(11), 665–673.CrossRef Google Scholar

Geirhos, R., Meding, K., & Wichmann, F. A. (2020b). Beyond accuracy: Quantifying trial-by-trial behaviour of CNNs and humans by measuring error consistency. Advances in Neural Information Processing Systems, 33, 13890–13902.Google Scholar

Geirhos, R., Narayanappa, K., Mitzkus, B., Thieringer, T., Bethge, M., Wichmann, F. A., & Brendel, W. (2021). Partial success in closing the gap between human and machine vision. Advances in Neural Information Processing Systems, 34, 23885–23899.Google Scholar

Geirhos, R., Rubisch, P., Michaelis, C., Bethge, M., Wichmann, F. A., & Brendel, W. (2019). ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. In International conference on learning representations (ICLR), New Orleans. https://openreview.net/forum?id=Bygh9j09KX Google Scholar

Geirhos, R., Temme, C. R., Rauber, J., Schütt, H. H., Bethge, M., & Wichmann, F. A. (2018). Generalisation in humans and deep neural networks. Advances in Neural Information Processing Systems, 31, 7538–7550.Google Scholar

George, D., Lehrach, W., Kansky, K., Lázaro-Gredilla, M., Laan, C., Marthi, B., … Phoenix, D. S. (2017). A generative vision model that trains with high data efficiency and breaks text-based CAPTCHAs. Science (New York, N.Y.), 358(6368), eaag2612.CrossRef Google Scholar PubMed

German, J. S., & Jacobs, R. A. (2020). Can machine learning account for human visual object shape similarity judgments. Vision Research, 167, 87–99. https://doi.org/10.1016/j.visres.2019.12.001CrossRef Google Scholar PubMed

Goodale, M. A., & Milner, A. D. (1992). Separate visual pathways for perception and action. Trends in Neurosciences, 15(1), 20–25.CrossRef Google Scholar PubMed

Graves, A., Wayne, G., Reynolds, M., Harley, T., Danihelka, I., Grabska-Barwińska, A., … Hassabis, D. (2016). Hybrid computing using a neural network with dynamic external memory. Nature, 538(7626), 471–476.CrossRef Google Scholar PubMed

Greff, K., Kaufman, R. L., Kabra, R., Watters, N., Burgess, C., Zoran, D., … Lerchner, A. (2019, May). Multi-object representation learning with iterative variational inference. In International conference on machine learning, Long Beach, USA (pp. 2424–2433).Google Scholar

Greff, K., van Steenkiste, S., & Schmidhuber, J. (2020). On the binding problem in artificial neural networks. arXiv preprint arXiv:2012.05208, 1–75.Google Scholar

Griffiths, T. L., Chater, N., Norris, D., & Pouget, A. (2012). How the Bayesians got their beliefs (and what those beliefs actually are): Comment on Bowers and Davis (2012). Psychological Bulletin, 138(3), 415–422. https://doi.org/10.1037/a0026884CrossRef Google Scholar PubMed

Grill-Spector, K., & Kanwisher, N. (2005). Visual recognition: As soon as you know it is there, you know what it is. Psychological Science, 16(2), 152–160.CrossRef Google Scholar

Grossberg, S. (1980). How does a brain build a cognitive code? Psychological Review, 87, 1–51.CrossRef Google Scholar PubMed

Grossberg, S. (2000). The complementary brain: Unifying brain dynamics and modularity. Trends in Cognitive Sciences, 4, 233–246.CrossRef Google Scholar PubMed

Grossberg, S. (2021). Conscious mind, resonant brain: How each brain makes a mind. Oxford University Press.CrossRef Google Scholar

Grossberg, S., & Mingolla, E. (1985). Neural dynamics of form perception: Boundary completion, illusory figures, and neon color spreading. Psychological Review, 92(2), 173–211.CrossRef Google Scholar PubMed

Grossberg, S., & Mingolla, E. (1987). Neural dynamics of perceptual grouping: Textures, boundaries, and emergent segmentations. In The adaptive brain II (pp. 143–210). Elsevier.CrossRef Google Scholar

Guest, O., & Martin, A. E. (2023). On logical inference over brains, behaviour, and artificial neural networks. Computational Brain & Behavior, 6, 213–227.CrossRef Google Scholar

Gulordava, K., Bojanowski, P., Grave, E., Linzen, T., & Baroni, M. (2018, June). Colorless green recurrent networks dream hierarchically. In Proceedings of NAACL 2018 (pp. 1195–1205). New Orleans, Louisiana: ACL.CrossRef Google Scholar

Hacker, C., & Biederman, I. (2018). The invariance of recognition to the stretching of faces is not explained by familiarity or warping to an average face. arXiv preprint, 1–23. https://doi.org/10.31234/osf.io/e5hgxGoogle Scholar

Hannagan, T., Agrawal, A., Cohen, L., & Dehaene, S. (2021). Emergence of a compositional neural code for written words: Recycling of a convolutional neural network for reading. Proceedings of the National Academy of Sciences of the United States of America, 118(46), e210477911.Google Scholar PubMed

Heinke, D., Wachman, P., van Zoest, W., & Leek, E. C. (2021). A failure to learn object shape geometry: Implications for convolutional neural networks as plausible models of biological vision. Vision Research, 189, 81–92.CrossRef Google Scholar PubMed

Hermann, K., Chen, T., & Kornblith, S. (2020). The origins and prevalence of texture bias in convolutional neural networks. Advances in Neural Information Processing Systems, 33, 19000–19015.Google Scholar

Hochberg, J., & Brooks, V. (1962). Pictorial recognition as an unlearned ability: A study of one child's performance. The American Journal of Psychology, 75(4), 624–628.CrossRef Google Scholar PubMed

Holyoak, K. J., & Hummel, J. E. (2000). The proper treatment of symbols in a connectionist architecture. In Dietrich, E. & Markman, A. (Eds.), Cognitive dynamics: Conceptual change in humans and machines (pp. 229–264). MIT Press.Google Scholar

Huber, L. S., Geirhos, R., & Wichmann, F. A. (2022). The developmental trajectory of object recognition robustness: Children are like small adults but unlike big deep neural networks. arXiv preprint arXiv:2205.10144, 1–32.Google Scholar

Hummel, J. E. (2000). Where view-based theories break down: The role of structure in shape perception and object recognition. In Deitrich, E. & Markman, A. (Eds.), Cognitive dynamics: Conceptual change in humans and machines (pp. 157–185). Erlbaum.Google Scholar

Hummel, J. E. (2013). Object recognition. In Reisburg, D. (Ed.), Oxford handbook of cognitive psychology (pp. 32–46). Oxford University Press.Google Scholar

Hummel, J. E., & Biederman, I. (1992). Dynamic binding in a neural network for shape recognition. Psychological Review, 99, 480–517. https://doi.org/10.1037/0033-295X.99.3.480CrossRef Google Scholar

Hummel, J. E., & Stankiewicz, B. J. (1996). Categorical relations in shape perception. Spatial Vision, 10(3), 201–236.Google Scholar PubMed

Izhikevich, E. M. (2004). Which model to use for cortical spiking neurons? IEEE Transactions on Neural Networks 15(5), 1063–1070. https://doi.org/10.1109/TNN.2004.832719CrossRef Google Scholar PubMed

Jacob, G., Pramod, R. T., Katti, H., & Arun, S. P. (2021). Qualitative similarities and differences in visual object representations between brains and deep networks. Nature Communications, 12(1), 1–14.CrossRef Google Scholar PubMed

Kell, A. J. E., Yamins, D. L. K., Shook, E. N., Norman-Haignere, S. V., & McDermott, J. H. (2018). A task-optimized neural network replicates human auditory behavior, predicts brain responses, and reveals a cortical processing hierarchy. Neuron, 98(3), 630–644.e16.CrossRef Google Scholar PubMed

Khaligh-Razavi, S. M., & Kriegeskorte, N. (2014). Deep supervised, but not unsupervised, models may explain IT cortical representation. PLoS Computational Biology, 10(11), e1003915.CrossRef Google Scholar

Kheradpisheh, S. R., Ghodrati, M., Ganjtabesh, M., & Masquelier, T. (2016). Deep networks can resemble human feed-forward vision in invariant object recognition. Scientific Reports, 6(1), 1–24.CrossRef Google Scholar PubMed

Kiani, R., Esteky, H., Mirpour, K., & Tanaka, K. (2007). Object category structure in response patterns of neuronal population in monkey inferior temporal cortex. Journal of Neurophysiology, 97, 4296–4309. doi:10.1152/jn.00024.2007CrossRef Google Scholar PubMed

Kiat, J. E., Luck, S. J., Beckner, A. G., Hayes, T. R., Pomaranski, K. I., Henderson, J. M., & Oakes, L. M. (2022). Linking patterns of infant eye movements to a neural network model of the ventral stream using representational similarity analysis. Developmental Science, 25, e13155. https://doi.org/10.1111/desc.13155CrossRef Google Scholar PubMed

Kim, B., Reif, E., Wattenberg, M., Bengio, S., & Mozer, M. C. (2021). Neural networks trained on natural scenes exhibit Gestalt closure. Computational Brain & Behavior, 4, 251–263.CrossRef Google Scholar

Krauskopf, J. (1963). Effect of retinal image stabilization of the appearance of heterochromatic targets. Journal of the Optical Society of America, 53, 741–744.CrossRef Google Scholar PubMed

Kriegeskorte, N. (2015). Deep neural networks: A new framework for modeling biological vision and brain information processing. Annual Review of Vision Science, 1, 417–446.CrossRef Google Scholar PubMed

Kriegeskorte, N., Mur, M., & Bandettini, P. A. (2008a). Representational similarity analysis – Connecting the branches of systems neuroscience. Frontiers in Systems Neuroscience, 2(4), 1–28.Google Scholar PubMed

Kriegeskorte, N., Mur, M., Ruff, D. A., Kiani, R., Bodurka, J., Esteky, H., … Bandettini, P. A. (2008b). Matching categorical object representations in inferior temporal cortex of man and monkey. Neuron, 60(6), 1126–1141.CrossRef Google Scholar PubMed

Kriegeskorte, N., & Wei, X. X. (2021). Neural tuning and representational geometry. Nature Reviews Neuroscience, 22, 703–718. https://doi.org/10.1038/s41583-021-00502-3CrossRef Google Scholar PubMed

Kubilius, J., Bracci, S., & Op de Beeck, H. P. (2016). Deep neural networks as a computational model for human shape sensitivity. PLoS Computational Biology, 12(4), e1004896.CrossRef Google Scholar PubMed

Kubilius, J., Schrimpf, M., Kar, K., Hong, H., Majaj, N. J., Rajalingham, R., … DiCarlo, J. J. (2019). Brain-like object recognition with high-performing shallow recurrent ANNs. Advances in Neural Information Processing Systems, 32, 1–12.Google Scholar

Lehky, S. R., & Sejnowski, T. J. (1988). Network model of shape-from-shading: Neural function arises from both receptive and projective fields. Nature, 333(6172), 452–454.CrossRef Google Scholar PubMed

Lehman, J., & Stanley, K. O. (2011). Abandoning objectives: Evolution through the search for novelty alone. Evolutionary Computation, 19(2), 189–223.CrossRef Google Scholar PubMed

Lissauer, S. H. (1890). Ein Fall von Seelenblindheit nebst einem Beitrage zur Theorie derselben. Archiv für Psychiatrie und Nervenkrankheiten, 21(2), 222–270.CrossRef Google Scholar

Livingstone, M., & Hubel, D. (1988). Segregation of form, color, movement, and depth: Anatomy, physiology, and perception. Science, 240(4853), 740–749.CrossRef Google Scholar PubMed

Lonnqvist, B., Bornet, A., Doerig, A., & Herzog, M. H. (2021). A comparative biology approach to DNN modeling of vision: A focus on differences, not similarities. Journal of Vision, 21(10), 17–17. https://doi.org/10.1167/jov.21.10.17CrossRef Google Scholar

Lotter, W., Kreiman, G., & Cox, D. (2020). A neural network trained for prediction mimics diverse features of biological neurons and perception. Nature Machine Intelligence, 2(4), 210–219.CrossRef Google Scholar PubMed

Mack, M. L., Gauthier, I., Sadr, J., & Palmeri, T. J. (2008). Object detection and basic-level categorization: Sometimes you know it is there before you know what it is. Psychonomic Bulletin & Review, 15(1), 28–35.CrossRef Google Scholar PubMed

Macpherson, T., Churchland, A., Sejnowski, T., DiCarlo, J., Kamitani, Y., Takahashi, H., & Hikida, T. (2021). Natural and artificial intelligence: A brief introduction to the interplay between AI and neuroscience research. Neural Networks, 144, 603–613.CrossRef Google Scholar

Majaj, N. J., Hong, H., Solomon, E. A., & DiCarlo, J. J. (2015). Simple learned weighted sums of inferior temporal neuronal firing rates accurately predict human core object recognition performance. Journal of Neuroscience, 35(39), 13402–13418.CrossRef Google Scholar PubMed

Malhotra, G., Dujmovic, M., & Bowers, J. S. (2022). Feature blindness: A challenge for understanding and modelling visual object recognition. PLoS Computational Biology, 18, e1009572. https://doi.org/10.1101/2021.10.20.465074CrossRef Google Scholar PubMed

Malhotra, G., Dujmovic, M., Hummel, J., & Bowers, J. S. (2021). The contrasting shape representations that support object recognition in humans and CNNs. arXiv preprint, 1–51. https://doi.org/10.1101/2021.12.14.472546Google Scholar

Malhotra, G., Evans, B. D., & Bowers, J. S. (2020). Hiding a plane with a pixel: Examining shape-bias in CNNs and the benefit of building in biological constraints. Vision Research, 174, 57–68.CrossRef Google Scholar PubMed

Marcus, G. (2009). Kluge: The haphazard evolution of the human mind. Houghton Mifflin Harcourt.Google Scholar

Marcus, G. F. (1998). Rethinking eliminative connectionism. Cognitive Psychology, 37(3), 243–282.CrossRef Google Scholar PubMed

Marr, D. (1982). Vision: A computational investigation into the human representation and processing of visual information. Henry Holt, 2(4.2).Google Scholar

Mayo, D. G. (2018). Statistical inference as severe testing. Cambridge University Press.CrossRef Google Scholar

McClelland, J. L., Rumelhart, D. E., & PDP Research Group. (1986). Parallel distributed processing (Vol. 2). MIT Press.Google Scholar

Mehrer, J., Spoerer, C. J., Jones, E. C., Kriegeskorte, N., & Kietzmann, T. C. (2021). An ecologically motivated image dataset for deep learning yields better models of human vision. Proceedings of the National Academy of Sciences of the United States of America, 118(8), e2011417118.CrossRef Google Scholar PubMed

Mehrer, J., Spoerer, C. J., Kriegeskorte, N., & Kietzmann, T. C. (2020). Individual differences among deep neural network models. Nature Communications, 11(1), 1–12.CrossRef Google Scholar PubMed

Messina, N., Amato, G., Carrara, F., Gennaro, C., & Falchi, F. (2021). Solving the same-different task with convolutional neural networks. Pattern Recognition Letters, 143, 75–80.CrossRef Google Scholar

Millet, J., Caucheteux, C., Boubenec, Y., Gramfort, A., Dunbar, E., Pallier, C., & King, J. R. (2022). Toward a realistic model of speech processing in the brain with self-supervised learning. Advances in Neural Information Processing Systems, 35, 33428–33443.Google Scholar

Miozzo, M., & Caramazza, A. (1998). Varieties of pure alexia: The case of failure to access graphemic representations. Cognitive Neuropsychology, 15(1–2), 203–238.CrossRef Google Scholar PubMed

Mitchell, J., & Bowers, J. (2020, December). Priorless recurrent networks learn curiously. In Proceedings of the 28th international conference on computational linguistics (pp. 5147–5158).CrossRef Google Scholar

Mitchell, J., & Bowers, J. S. (2021). Generalisation in neural networks does not require feature overlap. arXiv preprint arXiv:2107.06872, 1–19.Google Scholar PubMed

Mnih, V., Heess, N., & Graves, A. (2014). Recurrent models of visual attention. In Advances in neural information processing systems (pp. 2204–2212).Google Scholar

Montero, M. L., Bowers, J. S., Ludwig, C. J., Costa, R. P., & Malhotra, G. (2022). Lost in latent space: Disentangled models and the challenge of combinatorial generalisation. arXiv preprint, 1–27. http://arxiv.org/abs/2204.02283 Google Scholar

Montero, M. L., Ludwig, C. J., Costa, R. P., Malhotra, G., & Bowers, J. (2021). The role of disentanglement in generalisation. In International conference on learning representations. https://openreview.net/forum?id=qbH974jKUVy Google Scholar

Nakayama, K., Shimojo, S., & Silverman, G. H. (1989). Stereoscopic depth: Its relation to image segmentation, grouping, and the recognition of occluded objects. Perception, 18, 55–68.CrossRef Google Scholar PubMed

Nguyen, A., Yosinski, J., & Clune, J. (2015). Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In Proceedings of the IEEE conference on computer vision and pattern recognition, Boston, USA (pp. 427–436).CrossRef Google Scholar

Palmer, S. E. (1999). Color, consciousness, and the isomorphism constraint. Behavioral and Brain Sciences, 22(6), 923–943.CrossRef Google Scholar PubMed

Palmer, S. E. (2003). Visual perception of objects. In Healy, A. F. & Proctor, R. W. (Eds.), Handbook of psychology: Experimental psychology (Vol. 4, pp. 177–211). John Wiley & Sons Inc.CrossRef Google Scholar

Pang, Z., O'May, C. B., Choksi, B., & VanRullen, R. (2021). Predictive coding feedback results in perceived illusory contours in a recurrent neural network. Neural Networks, 144, 164–175.CrossRef Google Scholar

Pater, J. (2019). Generative linguistics and neural networks at 60: Foundation, friction, and fusion. Language, 95(1), e41–e74.CrossRef Google Scholar

Pepperberg, I. M., & Nakayama, K. (2016). Robust representation of shape in a grey parrot (Psittacus erithacus). Cognition, 153, 146–160.CrossRef Google Scholar

Pessoa, L., Thompson, E., & Noë, A. (1998). Finding out about filling-in: A guide to perceptual completion for visual science and the philosophy of perception. Behavioral and Brain Sciences, 21(6), 723–748.CrossRef Google Scholar PubMed

Peters, B., & Kriegeskorte, N. (2021). Capturing the objects of vision with neural networks. Nature Human Behaviour, 5, 1127–1144.CrossRef Google Scholar PubMed

Peterson, J. C., Abbott, J. T., & Griffiths, T. L. (2018). Evaluating (and improving) the correspondence between deep neural networks and human representations. Cognitive Science, 42(8), 2648–2669.CrossRef Google Scholar PubMed

Pinker, S., & Prince, A. (1988). On language and connectionism: Analysis of a parallel distributed processing model of language acquisition. Cognition, 28(1–2), 73–193.CrossRef Google Scholar PubMed

Pomerantz, J. R., & Portillo, M. C. (2011). Grouping and emergent features in vision: Toward a theory of basic Gestalts. Journal of Experimental Psychology: Human Perception and Performance, 10(37), 1331–1349. doi:10.1037/A0024330Google Scholar

Puebla, G., & Bowers, J. S. (2022). Can deep convolutional neural networks support relational reasoning in the same-different task? Journal of Vision, 22(10), 11–11.CrossRef Google Scholar PubMed

Pylyshyn, Z. W., & Storm, R. W. (1988). Tracking multiple independent targets: Evidence for a parallel tracking mechanism. Spatial Vision, 3, 179–197.CrossRef Google Scholar PubMed

Raizada, R., & Grossberg, S. (2001). Context-sensitive bindings by the laminar circuits of V1 and V2: A unified model of perceptual grouping, attention, and orientation contrast. Visual Cognition, 8, 431–466.CrossRef Google Scholar

Rajalingham, R., Issa, E. B., Bashivan, P., Kar, K., Schmidt, K., & DiCarlo, J. J. (2018). Large-scale, high-resolution comparison of the core visual object recognition behavior of humans, monkeys, and state-of-the-art deep artificial neural networks. Journal of Neuroscience, 38(33), 7255–7269.CrossRef Google Scholar PubMed

Rajalingham, R., Schmidt, K., & DiCarlo, J. J. (2015). Comparison of object recognition behavior in human and monkey. Journal of Neuroscience, 35(35), 12127–12136.CrossRef Google Scholar PubMed

Ramachandran, V. S. (1988). Perception of shape from shading. Nature, 331(6152), 163–166.CrossRef Google Scholar PubMed

Ramachandran, V. S. (1992). Filling in gaps in perception: Part I. Current Directions in Psychological Science, 1(6), 199–205.CrossRef Google Scholar

Ratan Murty, N. A., Bashivan, P., Abate, A., DiCarlo, J. J., & Kanwisher, N. (2021). Computational models of category-selective brain regions enable high-throughput tests of selectivity. Nature Communications, 12(1), 1–14.CrossRef Google Scholar PubMed

Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas, USA (pp. 779–788).CrossRef Google Scholar

Richards, B. A., Lillicrap, T. P., Beaudoin, P., Bengio, Y., Bogacz, R., Christensen, A., … Kording, K. P. (2019). A deep learning framework for neuroscience. Nature Neuroscience, 22(11), 1761–1770.CrossRef Google Scholar PubMed

Ritter, S., Barrett, D. G., Santoro, A., & Botvinick, M. M. (2017, July). Cognitive psychology for deep neural networks: A shape bias case study. In International conference on machine learning, Sydney, Australia (pp. 2940–2949).Google Scholar

Rosenfeld, A., Zemel, R., & Tsotsos, J. K. (2018). The elephant in the room. arXiv preprint arXiv:1808.03305, 1–12.Google Scholar

Saarela, T. P., Sayim, B., Westheimer, G., & Herzog, M. H. (2009). Global stimulus configuration modulates crowding. Journal of Vision, 9(2), 5, 1–11.CrossRef Google Scholar PubMed

Sabour, S., Frosst, N., & Hinton, G. E. (2017). Dynamic routing between capsules. Advances in Neural Information Processing Systems, 30, 3856–3866.Google Scholar

Santoro, A., Raposo, D., Barrett, D. G., Malinowski, M., Pascanu, R., Battaglia, P., & Lillicrap, T. (2017). A simple neural network module for relational reasoning. Advances in Neural Information Processing Systems, 30, 4967–4976.Google Scholar

Schaeffer, R., Khona, M., & Fiete, I. (2022). No free lunch from deep learning in neuroscience: A case study through models of the entorhinal–hippocampal circuit. Advances in Neural Information Processing Systems, 35, 16052–16067.Google Scholar

Schott, L., von Kügelgen, J., Träuble, F., Gehler, P., Russell, C., Bethge, M., … Brendel, W. (2021). Visual representation learning does not generalize strongly within the same domain. arXiv preprint arXiv:2107.08221, 1–34.Google Scholar PubMed

Schrimpf, M., Blank, I. A., Tuckute, G., Kauf, C., Hosseini, E. A., Kanwisher, N., … Fedorenko, E. (2021). The neural architecture of language: Integrative modeling converges on predictive processing. Proceedings of the National Academy of Sciences of the United States of America, 118(45), e2105646118.CrossRef Google Scholar PubMed

Schrimpf, M., Kubilius, J., Hong, H., Majaj, N. J., Rajalingham, R., Issa, E. B., … DiCarlo, J. J. (2020a). Brain-Score: Which artificial neural network for object recognition is most brain-like? arXiv preprint, 1–9. https://doi.org/10.1101/407007Google Scholar

Schrimpf, M., Kubilius, J., Lee, M. J., Murty, N. A. R., Ajemian, R., & DiCarlo, J. J. (2020b). Integrative benchmarking to advance neurally mechanistic models of human intelligence. Neuron, 11, 413–423.CrossRef Google Scholar

Shah, H., Tamuly, K., Raghunathan, A., Jain, P., & Netrapalli, P. (2020). The pitfalls of simplicity bias in neural networks. Advances in Neural Information Processing Systems, 33, 9573–9585.Google Scholar

Simons, D. J., & Levin, D. T. (1997). Change blindness. Trends in Cognitive Sciences, 1(7), 261–267.CrossRef Google Scholar PubMed

Sperling, G. (1960). The information available in brief visual presentations. Psychological Monographs, 74, 1–29.CrossRef Google Scholar

Stanley, K. O., Clune, J., Lehman, J., & Miikkulainen, R. (2019). Designing neural networks through neuroevolution. Nature Machine Intelligence, 1(1), 24–35.CrossRef Google Scholar

Storrs, K. R., Kietzmann, T. C., Walther, A., Mehrer, J., & Kriegeskorte, N. (2021). Diverse deep neural networks all predict human inferior temporal cortex well, after training and fitting. Journal of Cognitive Neuroscience, 33(10), 2044–2064.Google Scholar PubMed

Treisman, A., & Schmidt, H. (1982). Illusory conjunctions in the perception of objects. Cognitive Psychology, 14(1), 107–141.CrossRef Google Scholar PubMed

Treisman, A. M., & Gelade, G. (1980). A feature-integration theory of attention. Cognitive Psychology, 12(1), 97–136.CrossRef Google Scholar PubMed

Truzzi, A., & Cusack, R. (2020, April). Convolutional neural networks as a model of visual activity in the brain: Greater contribution of architecture than learned weights. Bridging AI and Cognitive Science. In International conference on learning representations. https://baicsworkshop.github.io/pdf/BAICS_13.pdf Google Scholar

Tsvetkov, C., Malhotra, G., Evans, B., & Bowers, J. (2020). Adding biological constraints to deep neural networks reduces their capacity to learn unstructured data. In Proceedings of the 42nd annual conference of the Cognitive Science Society 2020, Toronto, Canada.Google Scholar

Tsvetkov, C., Malhotra, G., Evans, B. D., & Bowers, J. S. (2023). The role of capacity constraints in convolutional neural networks for learning random versus natural data. Neural Networks, 161, 515–524. https://doi.org/10.1101/2022.03.31.486580CrossRef Google Scholar PubMed

Tuli, S., Dasgupta, I., Grant, E., & Griffiths, T. L. (2021). Are convolutional neural networks or transformers more like human vision? arXiv preprint arXiv:2105.07197, 1–7.Google Scholar

Ullman, S. (1979). The interpretation of structure from motion. Proceedings of the Royal Society of London. Series B. Biological Sciences, 203(1153), 405–426.Google Scholar PubMed

Ullman, S., & Basri, R. (1991). Recognition by linear combination of models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(10), 992–1006.CrossRef Google Scholar

Vaina, L. M., Makris, N., Kennedy, D., & Cowey, A. (1988). The selective impairment of the perception of first-order motion by unilateral cortical brain damage. Visual Neuroscience, 15, 333–348.CrossRef Google Scholar

Vankov, I. I., & Bowers, J. S. (2020). Training neural networks to encode symbols enables combinatorial generalization. Philosophical Transactions of the Royal Society B, 375(1791), 20190309.CrossRef Google Scholar PubMed

Wagemans, J., Elder, J. H., Kubovy, M., Palmer, S. E., Peterson, M. A., Singh, M., & von der Heydt, R. (2012). A century of Gestalt psychology in visual perception: I. Perceptual grouping and figure-ground organization. Psychological Bulletin, 138(6), 1172–1217.CrossRef Google Scholar PubMed

Wang, J., Zhang, Z., Xie, C., Zhou, Y., Premachandran, V., Zhu, J., … Yuille, A. (2018). Visual concepts and compositional voting. Annals of Mathematical Sciences and Applications, 2(3), 4.Google Scholar

Wang, R., Lehman, J., Rawal, A., Zhi, J., Li, Y., Clune, J., & Stanley, K. (2020, November). Enhanced POET: Open-ended reinforcement learning through unbounded invention of learning challenges and their solutions. In International conference on machine learning (pp. 9940–9951).CrossRef Google Scholar

Webb, T. W., Sinha, I., & Cohen, J. D. (2021). Emergent symbols through binding in external memory. arXiv, 1–28. https://doi.org/10.48550/arXiv.2012.14601Google Scholar

Weerts, L., Rosen, S., Clopath, C., & Goodman, D. F. (2021). The psychometrics of automatic speech recognition. bioRxiv, 2021-04.Google Scholar

Wertheimer, M. (1912). Experimentelle Studien über das Sehen von Bewegung. Zeitschrift für Psychologie, 61, 161–265 (in German).Google Scholar

Wolfe, J. M. (1994). Guided search 2.0 a revised model of visual search. Psychonomic Bulletin & Review, 1(2), 202–238.CrossRef Google Scholar PubMed

Wolfe, J. M., Cave, K. R., & Franzel, S. L. (1989). Guided search: An alternative to the feature integration model for visual search. Journal of Experimental Psychology: Human Perception and Performance, 15(3), 419–433.Google Scholar

Wong, Y. K., Twedt, E., Sheinberg, D., & Gauthier, I. (2010). Does Thompson's Thatcher effect reflect a face-specific mechanism? Perception, 39(8), 1125–1141.CrossRef Google Scholar PubMed

Woolley, B. G., & Stanley, K. O. (2011, July). On the deleterious effects of a priori objectives on evolution and representation. In Proceedings of the 13th annual conference on genetic and evolutionary computation, Dublin, Ireland (pp. 957–964).CrossRef Google Scholar

Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., … Bengio, Y. (2015, June). Show, attend and tell: Neural image caption generation with visual attention. In International conference on machine learning, Lille, France (pp. 2048–2057). PMLR.Google Scholar

Xu, Y., & Vaziri-Pashkam, M. (2021). Limits to visual representational correspondence between convolutional neural networks and the human brain. Nature Communications, 12(1), 1–16.Google Scholar PubMed

Yamins, D. L., Hong, H., Cadieu, C. F., Solomon, E. A., Seibert, D., & DiCarlo, J. J. (2014). Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proceedings of the National Academy of Sciences of the United States of America, 111(23), 8619–8624.CrossRef Google Scholar PubMed

Young, T. (1802). Bakerian lecture: On the theory of light and colours. Philosophical Transactions of the Royal Society London, 92, 12–48. doi:10.1098/rstl.1802.0004Google Scholar

Zador, A. M. (2019). A critique of pure learning and what artificial neural networks can learn from animal brains. Nature Communications, 10(1), 1–7.CrossRef Google Scholar PubMed

Zhang, C., Bengio, S., Hardt, M., Recht, B., & Vinyals, O. (2017). Understanding deep learning requires rethinking generalization. In 5th international conference on learning representations, Toulon, France, April 24–26.Google Scholar

Zhang, R. (2019, May). Making convolutional networks shift-invariant again. In International conference on machine learning, Long Beach, CA, USA (pp. 7324–7334). Proceedings of Machine Learning Research.Google Scholar

Zhao, Z. Q., Zheng, P., Xu, S. T., & Wu, X. (2019). Object detection with deep learning: A review. IEEE Transactions on Neural Networks and Learning Systems, 30, 3212–3232.CrossRef Google Scholar PubMed

Zhou, Z., & Firestone, C. (2019). Humans can decipher adversarial images. Nature Communications, 10(1), 1–9.Google Scholar PubMed

Zhu, H., Tang, P., Park, J., Park, S., & Yuille, A. (2019). Robustness of object recognition under extreme occlusion in humans and computational models. arXiv preprint arXiv:1905.04598, 1–7.Google Scholar

Zhuang, C., Yan, S., Nayebi, A., Schrimpf, M., Frank, M. C., DiCarlo, J. J., & Yamins, D. L. (2021). Unsupervised neural network models of the ventral visual stream. Proceedings of the National Academy of Sciences of the United States of America, 118(3), e2014196118.CrossRef Google Scholar PubMed