Hostname: page-component-586b7cd67f-2plfb Total loading time: 0 Render date: 2024-11-27T00:45:12.091Z Has data issue: false hasContentIssue false

Deep neural networks are not a single hypothesis but a language for expressing computational hypotheses

Published online by Cambridge University Press:  06 December 2023

Tal Golan
Affiliation:
Department of Cognitive and Brain Sciences, Ben-Gurion University of the Negev, Be'er Sheva, Israel [email protected] brainsandmachines.org
JohnMark Taylor
Affiliation:
Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA [email protected] [email protected] [email protected] [email protected] johnmarktaylor.com hebartlab.com https://linton.vision/
Heiko Schütt
Affiliation:
Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA [email protected] [email protected] [email protected] [email protected] johnmarktaylor.com hebartlab.com https://linton.vision/ Center for Neural Science, New York University, New York, NY, USA
Benjamin Peters
Affiliation:
School of Psychology & Neuroscience, University of Glasgow, Glasgow, UK [email protected]
Rowan P. Sommers
Affiliation:
Department of Neurobiology of Language, Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands [email protected]
Katja Seeliger
Affiliation:
Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany [email protected]
Adrien Doerig
Affiliation:
Institute of Cognitive Science, University of Osnabrück, Osnabrück, Germany [email protected] [email protected] kietzmannlab.org kietzmannlab.org
Paul Linton
Affiliation:
Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA [email protected] [email protected] [email protected] [email protected] johnmarktaylor.com hebartlab.com https://linton.vision/ Presidential Scholars in Society and Neuroscience, Center for Science and Society, Columbia University, New York, NY, USA Italian Academy for Advanced Studies in America, Columbia University, New York, NY, USA
Talia Konkle
Affiliation:
Department of Psychology and Center for Brain Sciences, Harvard University, Cambridge, MA, USA [email protected] https://konklab.fas.harvard.edu/
Marcel van Gerven
Affiliation:
Donders Institute for Brain, Cognition and Behaviour, Nijmegen, The Netherlands artcogsys.com
Konrad Kording
Affiliation:
Departments of Bioengineering and Neuroscience, University of Pennsylvania, Philadelphia, PA, USA [email protected] kordinglab.com Learning in Machines and Brains Program, CIFAR, Toronto, ON, Canada [email protected] linclab.org
Blake Richards
Affiliation:
Learning in Machines and Brains Program, CIFAR, Toronto, ON, Canada [email protected] linclab.org Mila, Montreal, QC, Canada School of Computer Science, McGill University, Montreal, QC, Canada Department of Neurology & Neurosurgery, McGill University, Montreal, QC, Canada Montreal Neurological Institute, Montreal, QC, Canada
Tim C. Kietzmann
Affiliation:
Institute of Cognitive Science, University of Osnabrück, Osnabrück, Germany [email protected] [email protected] kietzmannlab.org kietzmannlab.org
Grace W. Lindsay
Affiliation:
Department of Psychology and Center for Data Science, New York University, New York, NY, USA [email protected] lindsay-lab.github.io
Nikolaus Kriegeskorte
Affiliation:
Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA [email protected] [email protected] [email protected] [email protected] johnmarktaylor.com hebartlab.com https://linton.vision/ Departments of Psychology, Neuroscience, and Electrical Engineering, Columbia University, New York, NY, USA

Abstract

An ideal vision model accounts for behavior and neurophysiology in both naturalistic conditions and designed lab experiments. Unlike psychological theories, artificial neural networks (ANNs) actually perform visual tasks and generate testable predictions for arbitrary inputs. These advantages enable ANNs to engage the entire spectrum of the evidence. Failures of particular models drive progress in a vibrant ANN research program of human vision.

Type
Open Peer Commentary
Copyright
Copyright © The Author(s), 2023. Published by Cambridge University Press

Bowers et al. discuss the limited connection between the psychological literature on human vision and recent work combining artificial neural networks (ANNs) and benchmark-based statistical evaluation. They are correct that the psychological literature has described behavioral signatures of human vision that ANNs should but do not currently explain. A model of human vision should ideally explain all available neural and behavioral data, including the unprecedentedly rich data from naturalistic benchmarks as well as data from experiments designed to address specific psychological hypotheses. None of the current models (ANNs, handcrafted computational models, and abstractly described psychological theories) meet this challenge.

Importantly, however, the failure of current ANNs to explain all available data does not amount to a refutation of neural network models in general. Falsifying the entire, highly expressive class of ANN models is impossible. ANNs are universal approximators of dynamical systems (Funahashi & Nakamura, Reference Funahashi and Nakamura1993; Schäfer & Zimmermann, Reference Schäfer and Zimmermann2007) and hence can implement any potential computational mechanism. Future ANNs may contain different computational mechanisms that have not yet been explored. ANNs therefore are best understood not as a monolithic falsifiable theory but as a computational language in which particular falsifiable hypotheses can be expressed. Bowers et al.'s long list of cited studies presenting shortcomings of particular models neither demonstrates the failure of the ANN modeling framework in general nor a lack of openness of the field to falsifications of ANN models. Instead, their list of citations rather impressively illustrates the opposite: That the emerging ANN research program (referred to as “neuroconnectionism” in Doerig et al., Reference Doerig, Sommers, Seeliger, Richards, Ismael, Lindsay and Kietzmann2022) is progressive in the sense of Lakatos: It generates a rich variety of falsifiable hypotheses (expressed in the language of ANNs) and advances through model comparison (Doerig et al., Reference Doerig, Sommers, Seeliger, Richards, Ismael, Lindsay and Kietzmann2022). Each shortcoming drives improvement. For example, the discovery of texture bias in ANNs (Geirhos et al., Reference Geirhos, Rubisch, Michaelis, Bethge, Wichmann and Brendel2019) has led to a variety of alternative training methods that make ANNs rely more strongly on larger-scale structure in images (e.g., Geirhos et al., Reference Geirhos, Rubisch, Michaelis, Bethge, Wichmann and Brendel2019; Hermann, Chen, & Kornblith, Reference Hermann, Chen, Kornblith, Larochelle, Ranzato, Hadsell, Balcan and Lin2020; Nuriel, Benaim, & Wolf, Reference Nuriel, Benaim and Wolf2021). Similarly, the discovery of adversarial susceptibility of ANNs (Szegedy et al., Reference Szegedy, Zaremba, Sutskever, Bruna, Erhan, Goodfellow and Fergus2013) has motivated much research on perceptual robustness (e.g., Cohen, Rosenfeld, & Kolter, Reference Cohen, Rosenfeld, Kolter, Chaudhuri and Salakhutdinov2019; Guo et al., Reference Guo, Lee, Leclerc, Dapello, Rao, Madry, Dicarlo, Chaudhuri, Jegelka, Song, Szepesvari, Niu and Sabato2022; Madry, Makelov, Schmidt, Tsipras, & Vladu, Reference Madry, Makelov, Schmidt, Tsipras and Vladu2019).

Bowers et al. create a false dichotomy between benchmark studies (e.g., Cichy, Roig, & Oliva, Reference Cichy, Roig and Oliva2019; Kriegeskorte et al., Reference Kriegeskorte, Mur, Ruff, Kiani, Bodurka, Esteky and Bandettini2008; Nonaka, Majima, Aoki, & Kamitani, Reference Nonaka, Majima, Aoki and Kamitani2021; Schrimpf et al., Reference Schrimpf, Kubilius, Hong, Majaj, Rajalingham, Issa and DiCarlo2018) and controlled psychological experiments. Both approaches test model-based predictions of empirical data. Traditional psychological experiments are designed to test verbally defined theories, minimizing confounders of the independent variables of theoretical interest. In contrast, the numerous experimental conditions included in natural image behavioral and neural benchmarks are high-dimensional, complex, and ecologically relevant. Controlled experiments pose specific questions. They promise to give us theoretically important bits of information but are biased by theoretical assumptions and risk missing the computational challenge of task performance under realistic conditions (Newell, Reference Newell1973; Olshausen & Field, Reference Olshausen and Field2005). Observational studies and experiments with large numbers of natural images pose more general questions. They promise evaluation of many models with comprehensive data under more naturalistic conditions, but risk inconclusive results because they are not designed to adjudicate among alternative computational mechanisms (Rust & Movshon, Reference Rust and Movshon2005). Between these extremes lies a rich space of neural and behavioral empirical tests for models of vision. The community should seek models that can account for data across this spectrum, not just one end of it.

Despite their widely discussed shortcomings (e.g., Lindsay, Reference Lindsay2021; Peters & Kriegeskorte, Reference Peters and Kriegeskorte2021; Serre, Reference Serre2019), ANNs are sometimes referred to as the “current best” models of human vision. This characterization is justified on both a priori and empirical grounds. A priori, ANNs are superior to verbally defined cognitive theories in that they are image-computable, that is, they are fully computationally specified and take images as input. These properties enable ANNs to make quantitative predictions about a broad range of empirical phenomena, rendering ANNs more amenable to falsification. Being fully computationally specified enables them to make quantitative predictions of neural and behavioral responses (an advantage shared with other cognitive computational models). Taking images as inputs enables ANNs to make predictions about neural and behavioral responses to arbitrary visual stimuli. A model that explains only a particular psychological phenomenon is a priori inferior, ceteris paribus, to a model that predicts data across a wide range of conditions and dependent measures. The discrepancies between human vision and current ANNs are “bugs” of particular models, but the fact that we can discover these bugs is a feature of image-computable ANNs, fueling empirical progress. Since ANNs are image-computable, they enable severe tests of their predictions (superstimuli, adversarial examples, metamers; Bashivan, Kar, & DiCarlo, Reference Bashivan, Kar and DiCarlo2019; Dujmović, Malhotra, & Bowers, Reference Dujmović, Malhotra and Bowers2020; Feather, Durango, Gonzalez, & McDermott, Reference Feather, Durango, Gonzalez, McDermott, Wallach, Larochelle, Beygelzimer, d'Alché-Buc, Fox and Garnett2019; Walker et al., Reference Walker, Sinz, Cobos, Muhammad, Froudarakis, Fahey and Tolias2019) and powerful model comparisons (controversial stimuli; Golan, Raju, & Kriegeskorte, Reference Golan, Raju and Kriegeskorte2020).

The empirical reason why ANNs can be called the “current best” models of human vision is that they offer unprecedented mechanistic explanations of the human capacity to make sense of complex, naturalistic inputs. Most basically, ANNs are currently the only models that can recognize objects, parse scenes, or identify faces at performance levels similar to human performance. Furthermore, they offer image-specific predictions of errors (e.g., Geirhos et al., Reference Geirhos, Narayanappa, Mitzkus, Thieringer, Bethge, Wichmann, Brendel, Ranzato, Beygelzimer, Dauphin, Liang and Wortman Vaughan2021; Rajalingham et al., Reference Rajalingham, Issa, Bashivan, Kar, Schmidt and DiCarlo2018) and reaction times (e.g., Spoerer, McClure, & Kriegeskorte, Reference Spoerer, McClure and Kriegeskorte2017). Their predictions are far from perfect but better than those of alternative models. Finally, the intermediate representations of ANNs currently best match the neural representations that underlie human visual capacities (e.g., Dwivedi, Bonner, Cichy, & Roig, Reference Dwivedi, Bonner, Cichy and Roig2021; Güçlü & van Gerven, Reference Güçlü and van Gerven2015).

In sum, ANNs provide a language that enables us to express and test falsifiable computational models that have extraordinary power and can generalize to a broad range of empirical phenomena. Lakatos (Reference Lakatos1978) noted that all theories “are born refuted and die refuted” and stressed the importance of comparing competing theories in the light of the evidence. Our studies, then, should compare many models and report both their failures and their relative successes. It is through creation and comparison of many models that our field will progress.

Financial support

This research received no specific funding from any funding agency or commercial or not-for-profit entity.

Competing interest

None.

References

Bashivan, P., Kar, K., & DiCarlo, J. J. (2019). Neural population control via deep image synthesis. Science (New York, N.Y.), 364(6439), eaav9436. https://doi.org/10.1126/science.aav9436CrossRefGoogle ScholarPubMed
Cichy, R. M., Roig, G., & Oliva, A. (2019). The Algonauts project. Nature Machine Intelligence, 1(12), 613613. https://doi.org/10.1038/s42256-019-0127-zCrossRefGoogle Scholar
Cohen, J., Rosenfeld, E., & Kolter, Z. (2019). Certified adversarial robustness via randomized smoothing. In Chaudhuri, K. & Salakhutdinov, R. (Eds.), Proceedings of the 36th international conference on machine learning. Proceedings of Machine Learning Research, Long Beach, CA, USA (Vol. 97, pp. 13101320). https://proceedings.mlr.press/v97/cohen19c.htmlGoogle Scholar
Doerig, A., Sommers, R., Seeliger, K., Richards, B., Ismael, J., Lindsay, G., … Kietzmann, T. C. (2022). The neuroconnectionist research programme. Nature Reviews Neuroscience, 24, 431450. https://doi.org/10.1038/s41583-023-00705-wCrossRefGoogle Scholar
Dujmović, M., Malhotra, G., & Bowers, J. S. (2020). What do adversarial images tell us about human vision?. eLife, 9, e55978. https://doi.org/10.7554/eLife.55978CrossRefGoogle ScholarPubMed
Dwivedi, K., Bonner, M. F., Cichy, R. M., & Roig, G. (2021). Unveiling functions of the visual cortex using task-specific deep neural networks. PLoS Computational Biology, 17(8), e1009267. https://doi.org/10.1371/journal.pcbi.1009267CrossRefGoogle ScholarPubMed
Feather, J., Durango, A., Gonzalez, R., & McDermott, J. (2019). Metamers of neural networks reveal divergence from human perceptual systems. In Wallach, H., Larochelle, H., Beygelzimer, A., d'Alché-Buc, F., Fox, E., & Garnett, R. (Eds.), Advances in Neural Information Processing Systems, Vancouver, BC, Canada (Vol. 32, pp. 1007810089). Curran Associates, Inc. https://proceedings.neurips.cc/paper/2019/file/ac27b77292582bc293a51055bfc994ee-Paper.pdfGoogle Scholar
Funahashi, K. I., & Nakamura, Y. (1993). Approximation of dynamical systems by continuous time recurrent neural networks. Neural Networks, 6(6), 801806. https://doi.org/10.1016/S0893-6080(05)80125-XCrossRefGoogle Scholar
Geirhos, R., Narayanappa, K., Mitzkus, B., Thieringer, T., Bethge, M., Wichmann, F. A., & Brendel, W. (2021). Partial success in closing the gap between human and machine vision. In Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P. S., & Wortman Vaughan, J. (Eds.), Advances in Neural Information Processing Systems (Vol. 34, pp. 2388523899). Curran Associates, Inc. https://proceedings.neurips.cc/paper/2021/file/c8877cff22082a16395a57e97232bb6f-Paper.pdfGoogle Scholar
Geirhos, R., Rubisch, P., Michaelis, C., Bethge, M., Wichmann, F. A., & Brendel, W. (2019). ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. In International conference on learning representations, New Orleans, LA, USA. https://openreview.net/forum?id=Bygh9j09KXGoogle Scholar
Golan, T., Raju, P. C., & Kriegeskorte, N. (2020). Controversial stimuli: Pitting neural networks against each other as models of human cognition. Proceedings of the National Academy of Sciences of the United States of America, 117(47), 2933029337. https://doi.org/10.1073/pnas.1912334117CrossRefGoogle ScholarPubMed
Güçlü, U., & van Gerven, M. A. J. (2015). Deep neural networks reveal a gradient in the complexity of neural representations across the ventral stream. Journal of Neuroscience, 35(27), 1000510014. https://doi.org/10.1523/JNEUROSCI.5023-14.2015CrossRefGoogle ScholarPubMed
Guo, C., Lee, M., Leclerc, G., Dapello, J., Rao, Y., Madry, A., & Dicarlo, J. (2022). Adversarially trained neural representations are already as robust as biological neural representations. In Chaudhuri, K., Jegelka, S., Song, L., Szepesvari, C., Niu, G., & Sabato, S. (Eds.), Proceedings of the 39th international conference on machine learning. Proceedings of Machine Learning Research, Baltimore, MD, USA (Vol. 162, pp. 80728081). PMLR. https://proceedings.mlr.press/v162/guo22d.htmlGoogle Scholar
Hermann, K., Chen, T., & Kornblith, S. (2020). The origins and prevalence of texture bias in convolutional neural networks. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M. F., & Lin, H. (Eds.), Advances in Neural Information Processing Systems, Vancouver, BC, Canada (Vol. 33, pp. 1900019015). Curran Associates, Inc. https://proceedings.neurips.cc/paper/2020/file/db5f9f42a7157abe65bb145000b5871a-Paper.pdfGoogle Scholar
Kriegeskorte, N., Mur, M., Ruff, D. A., Kiani, R., Bodurka, J., Esteky, H., … Bandettini, P. A. (2008). Matching categorical object representations in inferior temporal cortex of man and monkey. Neuron, 60(6), 11261141. https://doi.org/10.1016/j.neuron.2008.10.043CrossRefGoogle ScholarPubMed
Lakatos, I. (1978). Science and pseudoscience. Philosophical Papers, 1, 17.Google Scholar
Lindsay, G. W. (2021). Convolutional neural networks as a model of the visual system: Past, present, and future. Journal of Cognitive Neuroscience, 33(10), 20172031. https://doi.org/10.1162/jocn_a_01544CrossRefGoogle Scholar
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., & Vladu, A. (2019). Towards deep -learning models resistant to adversarial attacks. In International conference on learning representations, Vancouver, BC, Canada. https://openreview.net/forum?id=rJzIBfZAbGoogle Scholar
Newell, A. (1973). You can't play 20 questions with nature and win: Projective comments on the papers of this symposium. In W. G. Chase (Ed.), Visual information processing: Proceedings of the 8th annual Carnegie symposium on cognition, held at the Carnegie-Mellon University, Pittsburgh, Pennsylvania, May 19, 1972 (pp. 283–305). Academic Press.Google Scholar
Nonaka, S., Majima, K., Aoki, S. C., & Kamitani, Y. (2021). Brain hierarchy score: Which deep neural networks are hierarchically brain-like?. iScience, 24(9), 103013. https://doi.org/10.1016/j.isci.2021.103013CrossRefGoogle ScholarPubMed
Nuriel, O., Benaim, S., & Wolf, L. (2021). Permuted AdaIN: Reducing the bias towards global statistics in image classification. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 9482–9491). Online. https://openaccess.thecvf.com/content/CVPR2021/html/Nuriel_Permuted_AdaIN_Reducing_the_Bias_Towards_Global_Statistics_in_Image_CVPR_2021_paper.htmlCrossRefGoogle Scholar
Olshausen, B. A., & Field, D. J. (2005). How close are we to understanding V1?. Neural Computation, 17(8), 16651699. https://doi.org/10.1162/0899766054026639CrossRefGoogle ScholarPubMed
Peters, B., & Kriegeskorte, N. (2021). Capturing the objects of vision with neural networks. Nature Human Behaviour, 5(9), 11271144. https://doi.org/10.1038/s41562-021-01194-6CrossRefGoogle ScholarPubMed
Rajalingham, R., Issa, E. B., Bashivan, P., Kar, K., Schmidt, K., & DiCarlo, J. J. (2018). Large-scale, high-resolution comparison of the core visual object recognition behavior of humans, monkeys, and state-of-the-art deep artificial neural networks. Journal of Neuroscience, 38(33), 72557269. https://doi.org/10.1523/JNEUROSCI.0388-18.2018CrossRefGoogle ScholarPubMed
Rust, N. C., & Movshon, J. A. (2005). In praise of artifice. Nature Neuroscience, 8(12), 16471650. https://doi.org/10.1038/nn1606CrossRefGoogle ScholarPubMed
Schäfer, A. M., & Zimmermann, H. G. (2007). Recurrent neural networks are universal approximators. International Journal of Neural Systems, 17(4), 253263. https://doi.org/10.1142/S0129065707001111CrossRefGoogle ScholarPubMed
Schrimpf, M., Kubilius, J., Hong, H., Majaj, N. J., Rajalingham, R., Issa, E. B., … DiCarlo, J. J. (2018). Brain-Score: Which artificial neural network for object recognition is most brain-like?. BioRxiv, 407007. https://doi.org/10.1101/407007Google Scholar
Serre, T. (2019). Deep learning: The good, the bad, and the ugly. Annual Review of Vision Science, 5, 399426. https://doi.org/10.1146/annurev-vision-091718-014951CrossRefGoogle ScholarPubMed
Spoerer, C. J., McClure, P., & Kriegeskorte, N. (2017). Recurrent convolutional neural networks: A better model of biological object recognition. Frontiers in Psychology, 8, 1551. https://doi.org/10.3389/fpsyg.2017.01551CrossRefGoogle ScholarPubMed
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., & Fergus, R. (2013). Intriguing properties of neural networks. arXiv. arXiv:1312.6199. https://doi.org/10.48550/arXiv.1312.6199Google Scholar
Walker, E. Y., Sinz, F. H., Cobos, E., Muhammad, T., Froudarakis, E., Fahey, P. G., … Tolias, A. S. (2019). Inception loops discover what excites neurons most using deep predictive models. Nature Neuroscience, 22(12), 20602065. https://doi.org/10.1038/s41593-019-0517-xCrossRefGoogle ScholarPubMed