No CrossRef data available.
Article contents
For human-like models, train on human-like tasks
Published online by Cambridge University Press: 06 December 2023
Abstract
Bowers et al. express skepticism about deep neural networks (DNNs) as models of human vision due to DNNs' failures to account for results from psychological research. We argue that to fairly assess DNNs, we must first train them on more human-like tasks which we hypothesize will induce more human-like behaviors and representations.
- Type
- Open Peer Commentary
- Information
- Copyright
- Copyright © The Author(s), 2023. Published by Cambridge University Press
References
Arjovsky, M., Bottou, L., Gulrajani, I., & Lopez-Paz, D. (2019). Invariant risk minimization. arXiv preprint arXiv:1907.02893.Google Scholar
Baker, N., Lu, H., Erlikhman, G., & Kellman, P. J. (2018). Deep convolutional networks do not classify based on global object shape. PLoS Computational Biology, 14(12), e1006613.CrossRefGoogle Scholar
Beery, S., Van Horn, G., & Perona, P. (2018). Recognition in terra incognita. In Proceedings of the European conference on computer vision (ECCV) (pp. 456–473).CrossRefGoogle Scholar
Brohan, A., Chebotar, Y., Finn, C., Hausman, K., Herzog, A., Ho, D., … Fu, C. K. (2023). Do as I can, not as I say: Grounding language in robotic affordances. In Conference on robot learning (pp. 287–318). PMLR.Google Scholar
Chen, X., Wang, X., Changpinyo, S., Piergiovanni, A. J., Padlewski, P., Salz, D., … Soricut, R. (2023). Pali: A jointly-scaled multilingual language-image model. International conference on learning representations (ICLR).Google Scholar
Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009, June). ImageNet: A large-scale hierarchical image database. In IEEE conference on computer vision and pattern recognition (pp. 248–255).CrossRefGoogle Scholar
Gan, C., Schwartz, J., Alter, S., Schrimpf, M., Traer, J., De Freitas, J., … Yamins, D. L. K. (2021). ThreeDWorld: A platform for interactive multi-modal physical simulation. Advances in Neural Information Processing Systems (NeurIPS).Google Scholar
Geirhos, R., Jacobsen, J. H., Michaelis, C., Zemel, R., Brendel, W., Bethge, M., & Wichmann, F. A. (2020). Shortcut learning in deep neural networks. Nature Machine Intelligence, 2(11), 665–673.CrossRefGoogle Scholar
Geirhos, R., Narayanappa, K., Mitzkus, B., Thieringer, T., Bethge, M., Wichmann, F. A., & Brendel, W. (2021). Partial success in closing the gap between human and machine vision. Advances in Neural Information Processing Systems (NeurIPS), 34, 23885–23899.Google Scholar
Geirhos, R., Rubisch, P., Michaelis, C., Bethge, M., Wichmann, F. A., & Brendel, W. (2019). ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. International conference on learning representations (ICLR).Google Scholar
Greff, K., Belletti, F., Beyer, L., Doersch, C., Du, Y., Duckworth, D., … Tagliasacchi, A. (2022). Kubric: A scalable dataset generator. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 3749–3761).CrossRefGoogle Scholar
Haber, N., Mrowca, D., Wang, S., Fei-Fei, L. F., & Yamins, D. L. (2018). Learning to play with intrinsically-motivated, self-aware agents. Advances in Neural Information Processing Systems (NeurIPS), 31.Google Scholar
Hermann, K., Chen, T., & Kornblith, S. (2020). The origins and prevalence of texture bias in convolutional neural networks. Advances in Neural Information Processing Systems (NeurIPS), 33, 19000–19015.Google Scholar
Hill, F., Lampinen, A., Schneider, R., Clark, S., Botvinick, M., McClelland, J. L., & Santoro, A. (2020). Environmental drivers of systematicity and generalization in a situated agent. International conference on learning representations (ICLR).Google Scholar
Konkle, T., & Alvarez, G. A. (2022). A self-supervised domain-general learning framework for human ventral stream representation. Nature Communications, 13(1), 491.CrossRefGoogle ScholarPubMed
Kucker, S. C., Samuelson, L. K., Perry, L. K., Yoshida, H., Colunga, E., Lorenz, M. G., & Smith, L. B. (2019). Reproducibility and a unifying explanation: Lessons from the shape bias. Infant Behavior and Development, 54, 156–165.CrossRefGoogle Scholar
Kumar, M., Houlsby, N., Kalchbrenner, N., & Cubuk, E. D. (2022). Do better ImageNet classifiers assess perceptual similarity better? Transactions of Machine Learning Research.Google Scholar
Landau, B., Smith, L. B., & Jones, S. S. (1988). The importance of shape in early lexical learning. Cognitive Development, 3(3), 299–321.CrossRefGoogle Scholar
Malhotra, G., Evans, B. D., & Bowers, J. S. (2020). Hiding a plane with a pixel: Examining shape-bias in CNNs and the benefit of building in biological constraints. Vision Research, 174, 57–68.CrossRefGoogle ScholarPubMed
McCoy, R. T., Pavlick, E., & Linzen, T. (2020). Right for the wrong reasons: Diagnosing syntactic heuristics in natural language inference. In 57th annual meeting of the association for computational linguistics, ACL 2019 (pp. 3428–3448). Association for Computational Linguistics (ACL). https://aclanthology.org/P19-1334/Google Scholar
Muttenthaler, L., Dippel, J., Linhardt, L., Vandermeulen, R. A., & Kornblith, S. (2023). Human alignment of neural network representations. International conference on learning representations (ICLR).Google Scholar
Nayebi, A., Kong, N. C., Zhuang, C., Gardner, J. L., Norcia, A. M., & Yamins, D. L. (2021). Mouse visual cortex as a limited resource system that self-learns an ecologically-general representation. BioRxiv, 2021-06.Google Scholar
Nguyen, A., Yosinski, J., & Clune, J. (2015). Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 427–436).CrossRefGoogle Scholar
Puig, X., Ra, K., Boben, M., Li, J., Wang, T., Fidler, S., & Torralba, A. (2018). VirtualHome: Simulating household activities via programs. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 8494–8502).CrossRefGoogle Scholar
Savva, M., Kadian, A., Maksymets, O., Zhao, Y., Wijmans, E., Jain, B., … Batra, D. (2019). Habitat: A platform for embodied AI research. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 9339–9347).CrossRefGoogle Scholar
Schrimpf, M. (2022). Advancing system models of brain processing via integrative benchmarking. Doctoral dissertation, Massachusetts Institute of Technology.Google Scholar
Schrimpf, M., Kubilius, J., Hong, H., Majaj, N. J., Rajalingham, R., Issa, E. B., … DiCarlo, J. J. (2018). Brain-Score: Which artificial neural network for object recognition is most brain-like? BioRxiv, 407007.Google Scholar
Sun, C., Shrivastava, A., Singh, S., & Gupta, A. (2017). Revisiting unreasonable effectiveness of data in deep learning era. In Proceedings of the IEEE international conference on computer vision (pp. 843–852).CrossRefGoogle Scholar
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., & Fergus, R. (2013). Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199.Google Scholar
Weihs, L., Kembhavi, A., Ehsani, K., Pratt, S. M., Han, W., Herrasti, A., … Farhadi, A. (2021). Learning generalizable visual representations via interactive gameplay. International conference on learning representations (ICLR).Google Scholar
Xiang, F., Qin, Y., Mo, K., Xia, Y., Zhu, H., Liu, F., … Su, H. (2020). Sapien: A simulated part-based interactive environment. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 11097–11107).CrossRefGoogle Scholar
Xiao, K., Engstrom, L., Ilyas, A., & Madry, A. (2021). Noise or signal: The role of image backgrounds in object recognition. International conference on learning representations (ICLR).Google Scholar
Zhang, C., Bengio, S., Hardt, M., Recht, B., & Vinyals, O. (2017). Understanding deep learning requires rethinking generalization. International conference on learning representations (ICLR).Google Scholar
Zhuang, C., Xiang, V., Bai, Y., Jia, X., Turk-Browne, N., Norman, K., … Yamins, D. L. (2022). How well do unsupervised learning algorithms model human real-time and life-long learning? In Thirty-sixth conference on neural information processing systems datasets and benchmarks track.CrossRefGoogle Scholar
Zhuang, C., Yan, S., Nayebi, A., Schrimpf, M., Frank, M. C., DiCarlo, J. J., & Yamins, D. L. (2021). Unsupervised neural network models of the ventral visual stream. Proceedings of the National Academy of Sciences of the United States of America, 118(3), e2014196118.CrossRefGoogle ScholarPubMed
Target article
Deep problems with neural network models of human vision
Related commentaries (29)
Explananda and explanantia in deep neural network models of neurological network functions
A deep new look at color
Beyond the limitations of any imaginable mechanism: Large language models and psycholinguistics
Comprehensive assessment methods are key to progress in deep learning
Deep neural networks are not a single hypothesis but a language for expressing computational hypotheses
Even deeper problems with neural network models of language
Fixing the problems of deep neural networks will require better training data and learning algorithms
For deep networks, the whole equals the sum of the parts
For human-like models, train on human-like tasks
Going after the bigger picture: Using high-capacity models to understand mind and brain
Implications of capacity-limited, generative models for human vision
Let's move forward: Image-computable models and a common model evaluation scheme are prerequisites for a scientific understanding of human vision
Modelling human vision needs to account for subjective experience
Models of vision need some action
My pet pig won't fly and I want a refund
Neither hype nor gloom do DNNs justice
Neural networks need real-world behavior
Neural networks, AI, and the goals of modeling
Perceptual learning in humans: An active, top-down-guided process
Psychophysics may be the game-changer for deep neural networks (DNNs) to imitate the human vision
Statistical prediction alone cannot identify good models of behavior
The model-resistant richness of human visual experience
The scientific value of explanation and prediction
There is a fundamental, unbridgeable gap between DNNs and the visual cortex
Thinking beyond the ventral stream: Comment on Bowers et al.
Using DNNs to understand the primate vision: A shortcut or a distraction?
Where do the hypotheses come from? Data-driven learning in science and the brain
Why psychologists should embrace rather than abandon DNNs
You can't play 20 questions with nature and win redux
Author response
Clarifying status of DNNs as models of human vision