For human-like models, train on human-like tasks

Katherine Hermann; Aran Nayebi; Sjoerd van Steenkiste; Matt Jones

doi:10.1017/S0140525X23001516

For human-like models, train on human-like tasks

Published online by Cambridge University Press: 06 December 2023

Katherine Hermann

Aran Nayebi

Sjoerd van Steenkiste

and

Matt Jones

Show author details

Katherine Hermann: Affiliation:
Google DeepMind, Mountain View, CA, USA [email protected]
Aran Nayebi: Affiliation:
McGovern Institute, Massachusetts Institute of Technology, Cambridge, MA, USA [email protected] https://anayebi.github.io/
Sjoerd van Steenkiste: Affiliation:
Google Research, Mountain View, CA, USA [email protected] https://www.sjoerdvansteenkiste.com/
Matt Jones: Affiliation:
Google Research, Mountain View, CA, USA [email protected] https://www.sjoerdvansteenkiste.com/ Department of Psychology and Neuroscience, University of Colorado, Boulder, CO, USA [email protected] http://matt.colorado.edu

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Bowers et al. express skepticism about deep neural networks (DNNs) as models of human vision due to DNNs' failures to account for results from psychological research. We argue that to fairly assess DNNs, we must first train them on more human-like tasks which we hypothesize will induce more human-like behaviors and representations.

Type: Open Peer Commentary
Information: Behavioral and Brain Sciences , Volume 46 , 2023 , e394

DOI: https://doi.org/10.1017/S0140525X23001516 [Opens in a new window]
Copyright: Copyright © The Author(s), 2023. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Arjovsky, M., Bottou, L., Gulrajani, I., & Lopez-Paz, D. (2019). Invariant risk minimization. arXiv preprint arXiv:1907.02893.Google Scholar

Baker, N., Lu, H., Erlikhman, G., & Kellman, P. J. (2018). Deep convolutional networks do not classify based on global object shape. PLoS Computational Biology, 14(12), e1006613.CrossRef Google Scholar

Beery, S., Van Horn, G., & Perona, P. (2018). Recognition in terra incognita. In Proceedings of the European conference on computer vision (ECCV) (pp. 456–473).CrossRef Google Scholar

Brohan, A., Chebotar, Y., Finn, C., Hausman, K., Herzog, A., Ho, D., … Fu, C. K. (2023). Do as I can, not as I say: Grounding language in robotic affordances. In Conference on robot learning (pp. 287–318). PMLR.Google Scholar

Chen, X., Wang, X., Changpinyo, S., Piergiovanni, A. J., Padlewski, P., Salz, D., … Soricut, R. (2023). Pali: A jointly-scaled multilingual language-image model. International conference on learning representations (ICLR).Google Scholar

Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009, June). ImageNet: A large-scale hierarchical image database. In IEEE conference on computer vision and pattern recognition (pp. 248–255).CrossRef Google Scholar

Gan, C., Schwartz, J., Alter, S., Schrimpf, M., Traer, J., De Freitas, J., … Yamins, D. L. K. (2021). ThreeDWorld: A platform for interactive multi-modal physical simulation. Advances in Neural Information Processing Systems (NeurIPS).Google Scholar

Geirhos, R., Jacobsen, J. H., Michaelis, C., Zemel, R., Brendel, W., Bethge, M., & Wichmann, F. A. (2020). Shortcut learning in deep neural networks. Nature Machine Intelligence, 2(11), 665–673.CrossRef Google Scholar

Geirhos, R., Narayanappa, K., Mitzkus, B., Thieringer, T., Bethge, M., Wichmann, F. A., & Brendel, W. (2021). Partial success in closing the gap between human and machine vision. Advances in Neural Information Processing Systems (NeurIPS), 34, 23885–23899.Google Scholar

Geirhos, R., Rubisch, P., Michaelis, C., Bethge, M., Wichmann, F. A., & Brendel, W. (2019). ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. International conference on learning representations (ICLR).Google Scholar

Greff, K., Belletti, F., Beyer, L., Doersch, C., Du, Y., Duckworth, D., … Tagliasacchi, A. (2022). Kubric: A scalable dataset generator. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 3749–3761).CrossRef Google Scholar

Haber, N., Mrowca, D., Wang, S., Fei-Fei, L. F., & Yamins, D. L. (2018). Learning to play with intrinsically-motivated, self-aware agents. Advances in Neural Information Processing Systems (NeurIPS), 31.Google Scholar

Hermann, K., Chen, T., & Kornblith, S. (2020). The origins and prevalence of texture bias in convolutional neural networks. Advances in Neural Information Processing Systems (NeurIPS), 33, 19000–19015.Google Scholar

Hill, F., Lampinen, A., Schneider, R., Clark, S., Botvinick, M., McClelland, J. L., & Santoro, A. (2020). Environmental drivers of systematicity and generalization in a situated agent. International conference on learning representations (ICLR).Google Scholar

Konkle, T., & Alvarez, G. A. (2022). A self-supervised domain-general learning framework for human ventral stream representation. Nature Communications, 13(1), 491.CrossRef Google Scholar PubMed

Kucker, S. C., Samuelson, L. K., Perry, L. K., Yoshida, H., Colunga, E., Lorenz, M. G., & Smith, L. B. (2019). Reproducibility and a unifying explanation: Lessons from the shape bias. Infant Behavior and Development, 54, 156–165.CrossRef Google Scholar

Kumar, M., Houlsby, N., Kalchbrenner, N., & Cubuk, E. D. (2022). Do better ImageNet classifiers assess perceptual similarity better? Transactions of Machine Learning Research.Google Scholar

Landau, B., Smith, L. B., & Jones, S. S. (1988). The importance of shape in early lexical learning. Cognitive Development, 3(3), 299–321.CrossRef Google Scholar

Malhotra, G., Evans, B. D., & Bowers, J. S. (2020). Hiding a plane with a pixel: Examining shape-bias in CNNs and the benefit of building in biological constraints. Vision Research, 174, 57–68.CrossRef Google Scholar PubMed

McCoy, R. T., Pavlick, E., & Linzen, T. (2020). Right for the wrong reasons: Diagnosing syntactic heuristics in natural language inference. In 57th annual meeting of the association for computational linguistics, ACL 2019 (pp. 3428–3448). Association for Computational Linguistics (ACL). https://aclanthology.org/P19-1334/Google Scholar

Muttenthaler, L., Dippel, J., Linhardt, L., Vandermeulen, R. A., & Kornblith, S. (2023). Human alignment of neural network representations. International conference on learning representations (ICLR).Google Scholar

Nayebi, A., Kong, N. C., Zhuang, C., Gardner, J. L., Norcia, A. M., & Yamins, D. L. (2021). Mouse visual cortex as a limited resource system that self-learns an ecologically-general representation. BioRxiv, 2021-06.Google Scholar

Nguyen, A., Yosinski, J., & Clune, J. (2015). Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 427–436).CrossRef Google Scholar

Puig, X., Ra, K., Boben, M., Li, J., Wang, T., Fidler, S., & Torralba, A. (2018). VirtualHome: Simulating household activities via programs. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 8494–8502).CrossRef Google Scholar

Savva, M., Kadian, A., Maksymets, O., Zhao, Y., Wijmans, E., Jain, B., … Batra, D. (2019). Habitat: A platform for embodied AI research. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 9339–9347).CrossRef Google Scholar

Schrimpf, M. (2022). Advancing system models of brain processing via integrative benchmarking. Doctoral dissertation, Massachusetts Institute of Technology.Google Scholar

Schrimpf, M., Kubilius, J., Hong, H., Majaj, N. J., Rajalingham, R., Issa, E. B., … DiCarlo, J. J. (2018). Brain-Score: Which artificial neural network for object recognition is most brain-like? BioRxiv, 407007.Google Scholar

Sun, C., Shrivastava, A., Singh, S., & Gupta, A. (2017). Revisiting unreasonable effectiveness of data in deep learning era. In Proceedings of the IEEE international conference on computer vision (pp. 843–852).CrossRef Google Scholar

Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., & Fergus, R. (2013). Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199.Google Scholar

Weihs, L., Kembhavi, A., Ehsani, K., Pratt, S. M., Han, W., Herrasti, A., … Farhadi, A. (2021). Learning generalizable visual representations via interactive gameplay. International conference on learning representations (ICLR).Google Scholar

Xiang, F., Qin, Y., Mo, K., Xia, Y., Zhu, H., Liu, F., … Su, H. (2020). Sapien: A simulated part-based interactive environment. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 11097–11107).CrossRef Google Scholar

Xiao, K., Engstrom, L., Ilyas, A., & Madry, A. (2021). Noise or signal: The role of image backgrounds in object recognition. International conference on learning representations (ICLR).Google Scholar

Zhang, C., Bengio, S., Hardt, M., Recht, B., & Vinyals, O. (2017). Understanding deep learning requires rethinking generalization. International conference on learning representations (ICLR).Google Scholar

Zhuang, C., Xiang, V., Bai, Y., Jia, X., Turk-Browne, N., Norman, K., … Yamins, D. L. (2022). How well do unsupervised learning algorithms model human real-time and life-long learning? In Thirty-sixth conference on neural information processing systems datasets and benchmarks track.CrossRef Google Scholar

Zhuang, C., Yan, S., Nayebi, A., Schrimpf, M., Frank, M. C., DiCarlo, J. J., & Yamins, D. L. (2021). Unsupervised neural network models of the ventral visual stream. Proceedings of the National Academy of Sciences of the United States of America, 118(3), e2014196118.CrossRef Google Scholar PubMed