How Do Humans Search for Information?

Part II - How Do Humans Search for Information?

Published online by Cambridge University Press: 19 May 2022

Edited by

Irene Cogliati Dezza ,

Eric Schulz and

Charley M. Wu

Show author details

Irene Cogliati Dezza: Affiliation:
University College London
Eric Schulz: Affiliation:
Max-Planck-Institut für biologische Kybernetik, Tübingen
Charley M. Wu: Affiliation:
Eberhard-Karls-Universität Tübingen, Germany

Book contents

Get access

Summary

A summary is not available for this content so a preview has been provided. Please use the Get access link above for information on how to access this content.

Image of the first page of this content. For PDF version, please use the ‘Save PDF’ preceeding this image.'

Type: Chapter
Information: The Drive for Knowledge
The Science of Human Information Seeking
, pp. 99 - 192

DOI: https://doi.org/10.1017/9781009026949 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2022

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book purchase

Temporarily unavailable

References

Austerweil, J. L., & Griffiths, T. L. (2011). Seeking confirmation is rational for deterministic hypotheses. Cognitive Science, 35(3), 499–526. https://doi.org/10.1111/j.1551-6709.2010.01161.x.Google Scholar

Baron, J. (1985). Rationality and intelligence. Cambridge University Press.CrossRef Google Scholar

Baron, J., & Hershey, J. C. (1988). Heuristics and biases in diagnostic reasoning: I. Priors, error costs, and test accuracy. Organizational Behavior and Human Decision Processes, 41(2), 259–279. https://doi.org/10.1016/0749-5978(88)90030-1.Google Scholar

Beck, C. (2009). Generalised information and entropy measures in physics. Contemporary Physics, 50(4), 495–510. https://doi.org/10.1080/00107510902823517.CrossRef Google Scholar

Bellman, R. (1957). Dynamic programming. Princeton University Press.Google Scholar

Benish, W. A. (1999). Relative entropy as a measure of diagnostic information. Medical Decision Making, 19(2), 202–206. https://doi.org/10.1177/0272989X9901900211.CrossRef Google Scholar PubMed

Bramley, N. R., Lagnado, D. A., & Speekenbrink, M. (2015). Conservative forgetful scholars: How people learn causal structure through sequences of interventions. Journal of Experimental Psychology. Learning, Memory, and Cognition, 41(3), 708–731. https://doi.org/10.1037/xlm0000061.Google Scholar

Bruner, J. S., Goodnow, J. J., & Austin, G. A. (1956). A study of thinking. John Wiley and Sons.Google Scholar

Butko, N. J., & Movellan, J. R. (2010). Infomax control of eye movements. IEEE Transactions on Autonomous Mental Development, 2(2), 91–107. https://doi.org/10.1109/TAMD.2010.2051029.CrossRef Google Scholar

Chamberlin, T. C. (1890). The method of multiple working hypotheses. Science, 15, 92–96. https://doi.org/10.1126/science.148.3671.754 Google Scholar

Chater, N., Oaksford, M., Nakisa, R., & Redington, M. (2003). Fast, frugal, and rational: How rational norms explain behavior. Organizational Behavior and Human Decision Processes, 90(1), 63–86. https://doi.org/10.1016/S0749-5978(02)00508-3.Google Scholar

Coenen, A., Nelson, J. D. & Gureckis, T. M. (2019). Asking the right questions about the psychology of human inquiry: Nine open challenges. Psychonomic Bulletin & Review, 26, 1548–1587. https://doi.org/10.3758/s13423-018-1470-5.CrossRef Google Scholar PubMed

Crupi, V. (2019). Measures of biological diversity: Overview and unified framework. In Casetta, E., Marques da Silva, J., and Vecchi, D. (Eds.), From assessing to conserving biodiversity (pp. 123–136). Springer.Google Scholar

Crupi, V., Nelson, J. D., Meder, B., Cevolani, G., & Tentori, K. (2018). Generalized information theory meets human cognition: Introducing a unified framework to model uncertainty and information search. Cognitive Science, 42(5), 1410–1456. https://doi.org/10.1111/cogs.12613.CrossRef Google Scholar

Crupi, V., & Tentori, K. (2014). State of the field: Measuring information and confirmation. Studies in History and Philosophy of Science Part A, 47, 81–90. https://doi.org/10.1016/j.shpsa.2014.05.002.Google Scholar

Crupi, V., Tentori, K., & Lombardi, L. (2009). Pseudodiagnosticity revisited. Psychological Review, 116(4), 971–985. https://doi.org/10.1037/a0017050.Google Scholar

Denney, D. R., & Denney, N. W. (1973). The use of classification for problem solving: A comparison of middle and old age. Developmental Psychology, 9(2), 275–278. https://doi.org/10.1037/h0035092.Google Scholar

Dubey, R., & Griffiths, T. L. (2020). Reconciling novelty and complexity through a rational analysis of curiosity. Psychological Review, 127(3), 455–476. https://doi.org/10.1037/rev0000175.CrossRef Google Scholar PubMed

Filimon, F., Nelson, J. D., Sejnowski, T. J., Sereno, M. I., & Cottrell, G. W. (2020). The ventral striatum dissociates information expectation, reward anticipation, and reward receipt. Proceedings of the National Academy of Sciences, 117(26), 15200-15208. https://doi.org/10.1073/pnas.1911778117.CrossRef Google Scholar PubMed

Gigerenzer, G., & Gaissmaier, W. (2011). Heuristic decision making. Annual Review of Psychology, 62, 451–482. https://doi.org/10.1146/annurev-psych-120709-145346.Google Scholar

Gini, C. (1921). Measurement of inequality of incomes. The Economic Journal, 31(121), 124–126. https://doi.org/10.2307/2223319.Google Scholar

Good, I. J. (1950). Probability and the weight of evidence. Charles Griffin & Co.Google Scholar

Gureckis, T. M., & Markant, D. B. (2012). Self-directed learning: A cognitive and computational perspective. Perspectives on Psychological Science, 7(5), 464–481. https://doi.org/10.1177/1745691612454304.Google Scholar

Hartley, R. V. (1928). Transmission of information. Bell System Technical Journal, 7(3), 535–563.Google Scholar

Hertwig, R., & Engel, C. (2016). Homo Ignorans: Deliberately choosing not to know. Perspectives on Psychological Science, 11(3), 359–372. https://doi.org/10.1177/1745691616635594.Google Scholar

Hyafil, L., & Rivest, R. L. (1976). Constructing optimal binary decision trees is NP-complete. Information Processing Letters, 5(1), 15–17. https://doi.org/10.1016/0020-0190(76)90095-8.CrossRef Google Scholar

Kachergis, G., Berends, F., Kleijn, R. D., & Hommel, B. (2016). Human reinforcement learning of sequential action. In Papafragou, A, Grodner, D, & Mirman, D (Eds.), Proceedings of the 38th Annual Meeting of the Cognitive Science Society (CogSci 2016), pp. 193–198.Google Scholar

Kachergis, G., Rhodes, M., & Gureckis, T. (2017). Desirable difficulties during the development of active inquiry skills. Cognition, 166, 407–417. https://doi.org/10.1016/j.cognition.2017.05.021.Google Scholar

Keylock, C. J. (2005). Simpson diversity and the Shannon–Wiener index as special cases of a generalized entropy. Oikos, 109(1), 203–207. https://doi.org/10.1111/j.0030-1299.2005.13735.x.Google Scholar

Klayman, J., & Ha, Y.-W. (1987). Confirmation, disconfirmation, and information in hypothesis testing. Psychological Review, 94(2), 211–228. https://doi.org/10.1037/0033-295X.94.2.211.CrossRef Google Scholar

Kleinegesse, S. & Gutmann, M. U. (2020). Bayesian experimental design for implicit models by mutual information neural estimation. Proceedings of Machine Learning Research, 119, 5316–5326.Google Scholar

Kruschke, J.K. (2008). Bayesian approaches to associative learning: From passive to active learning. Learning & Behavior, 36, 210–226. https://doi.org/10.3758/LB.36.3.210.Google Scholar

Legge, G. E., Klitz, T. S., & Tjan, B. S. (1997). Mr. Chips: An ideal-observer model of reading. Psychological Review, 104(3), 524–553. https://doi.org/10.1037/0033-295X.104.3.524.Google Scholar

Lewontin, R.C. (1972) The apportionment of human diversity. In Dobzhansky, T., Hecht, M. K., and Steere, W. C. (Eds.), Evolutionary biology (pp. 381–398). New York, NY: Springer. https://doi.org/10.1007/978-1-4684-9063-3_14.CrossRef Google Scholar

Li, S., Sun, Y., Liu, S., Sun, Y., Gureckis, T. M., & Bramley, N. R. (2019). Active physical inference via reinforcement learning. Proceedings of the Cognitive Science Society (pp. 2126–2132). Austin, TX: Cognitive Science Society.Google Scholar

Li, Z., Bramley, N. R., & Gureckis, T. M. (2020). Expectations about future learning influence moment-to-moment feelings of suspense. https://doi.org/10.31234/osf.io/532tw.Google Scholar

Lindley, D. V. (1956). On a measure of the information provided by an experiment. The Annals of Mathematical Statistics, 27(4), 986–1005.CrossRef Google Scholar

Lookman, T., Balachandran, P. V., Xue, D., & Yuan, R. (2019). Active learning in materials science with emphasis on adaptive sampling using uncertainties for targeted design. NPJ Computational Materials, 5(1), 1–17. https://doi.org/10.1038/s41524-019-0153-8.Google Scholar

Markant, D., & Gureckis, T. M. (2012). Does the utility of information influence sampling behavior? Proceedings of the 34th Annual Conference of the Cognitive Science Society (pp. 719–724). Austin, TX: Cognitive Science Society.Google Scholar

Markant, D. B., & Gureckis, T. M. (2014). Is it better to select or to receive? Learning via active and passive hypothesis testing. Journal of Experimental Psychology: General, 143(1), 94–122. https://doi.org/10.1037/a0032108.CrossRef Google Scholar PubMed

Meder, B., & Nelson, J. D. (2012). Information search with situation-specific reward functions. Judgment and Decision Making, 7(2), 119–148.Google Scholar

Meder, B., Nelson, J. D., Jones, M., & Ruggeri, A. (2019). Stepwise versus globally optimal search in children and adults. Cognition, 191, Article 103965. https://doi.org/10.1016/j.cognition.2019.05.002.Google Scholar

Meder, B., Wu, C. M., Schulz, E., & Ruggeri, A. (2021). Development of directed and random exploration in children. Developmental Science. e13095. https://doi.org/10.1111/desc.13095.CrossRef Google Scholar

Meier, K. M., & Blair, M. R. (2013). Waiting and weighting: Information sampling is a balance between efficiency and error-reduction. Cognition, 126(2), 319–325. https://doi.org/10.1016/j.cognition.2012.09.014.Google Scholar

Mohamed, T. P., Carbonell, J. G., & Ganapathiraju, M. K. (2010). Active learning for human protein-protein interaction prediction. BMC Bioinformatics, 11, S57. https://doi.org/10.1186/1471-2105-11-S1-S57.CrossRef Google Scholar PubMed

Mosher, F. A., & Hornsby, J. R. (1966). On asking questions. In Bruner, J. S., Oliver, R. R., & Greenfield, P. M., et al. (Eds.), Studies in cognitive growth (pp. 86–102). Wiley.Google Scholar

Murphy, R. F. (2011). An active role for machine learning in drug development. Nature Chemical Biology, 7(6), 327–330. https://doi.org/10.1038/nchembio.576.Google Scholar

Myung, J. I., & Pitt, M. A. (2009). Optimal experimental design for model discrimination. Psychological Review, 116(3), 499–518. https://doi.org/10.1037/a0016104.Google Scholar

Najemnik, J., & Geisler, W. (2005). Optimal eye movement strategies in visual search. Nature, 434, 387–391. https://doi.org/10.1038/nature03390.CrossRef Google Scholar PubMed

Nakamura, K. (2006). Neural representation of information measure in the primate premotor cortex. Journal of Neurophysiology, 96(1), 478–485. https://doi.org/10.1152/jn.01326.2005.Google Scholar

Navarro, D. J., & Perfors, A. F. (2011). Hypothesis generation, sparse categories, and the positive test strategy. Psychological Review, 118(1), 120–134. https://doi.org/10.1037/a0021110.CrossRef Google Scholar PubMed

Nelson, J. D. (2005). Finding useful questions: On Bayesian diagnosticity, probability, impact, and information gain. Psychological Review, 112(4), 979–999. https://doi.org/10.1037/0033-295X.112.4.979.CrossRef Google Scholar PubMed

Nelson, J. D., & Cottrell, G. W. (2007). A probabilistic model of eye movements in concept formation. Neurocomputing, 70, 2256–2272. https://doi.org/10.1016/j.neucom.2006.02.026.Google Scholar

Nelson, J. D., Divjak, B., Gudmundsdottir, G., Martignon, L. F., & Meder, B. (2014). Children’s sequential information search is sensitive to environmental probabilities. Cognition, 130(1), 74–80.Google Scholar

Nelson, J. D., & McKenzie, C. R. M. (2009). Confirmation bias. In Kattan, M. (Ed.), The Encyclopedia of Medical Decision Making (pp. 167–171). Sage.Google Scholar

Nelson, J. D., McKenzie, C. R. M., Cottrell, G. W., & Sejnowski, T. J. (2010). Experience matters: Information acquisition optimizes probability gain. Psychological Science, 21(7), 960–969. https://doi.org/10.1177/0956797610372637.Google Scholar

Nelson, J. D., Meder, B., & Jones, M. (2018). Towards a theory of heuristic and optimal planning for sequential information search. PsyArXiv.Google Scholar

Nelson, J. D., Rosenauer, C., Crupi, V., Tentori, K., & Meder, B. (2020). On the likelihood difference heuristic and the objective utility of possible medical tests. Manuscript submitted for publication.Google Scholar

Nelson, J. D., Tenenbaum, J. B., & Movellan, J. R. (2001). Active inference in concept learning. In Moore, J. D & Stenning, K (Eds.), Proceedings of the 23rd Conference of the Cognitive Science Society, 692–697.Google Scholar

Nickerson, R. S. (1996). Hempel’s paradox and Wason’s selection task: Logical and psychological puzzles of confirmation. Thinking & Reasoning, 2, 1–31. https://doi.org/10.1080/135467896394546.CrossRef Google Scholar

Oaksford, M., & Chater, N. (1994). A rational analysis of the selection task as optimal data selection. Psychological Review, 101(4), 608–631. https://doi.org.10.1037/0033-295X.101.4.608.CrossRef Google Scholar

Oaksford, M., & Chater, N. (1996). Rational explanation of the selection task. Psychological Review, 103(2), 381–391. https://doi.org/10.1037/0033-295X.103.2.381.Google Scholar

Patil, G. P., & Taillie, C. (1982) Diversity as a concept and its measurement. Journal of the American Statistical Association, 77(379), 548–561. https://doi.org/10.1080/01621459.1982.10477845.Google Scholar

Popper, K. R. (1959). The logic of scientific discovery. Hutchinson.Google Scholar

Raiffa, H., & Schlaifer, R. O. (1961). Applied statistical decision theory. Cambridge, MA: Division of Research, Graduate School of Business Administration, Harvard University.Google Scholar

Rényi, A. (1961). On measures of entropy and information. In Neyman, J. (Ed.), Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability I (pp. 547–556). University of California Press.Google Scholar

Ruggeri, A., & Lombrozo, T. (2015). Children adapt their questions to achieve efficient search. Cognition, 143, 203–216. https://doi.org/10.1016/j.cognition.2015.07.004.CrossRef Google Scholar PubMed

Ruggeri, A., Lombrozo, T., Griffiths, T. L., & Xu, F. (2016). Sources of developmental change in the efficiency of information search. Developmental Psychology, 52(12), 2159–2173. https://doi.org/10.1037/dev0000240.Google Scholar

Ruggeri, A., Sim, Z. L., & Xu, F. (2017). “Why is Toma late to school again?” Preschoolers identify the most informative questions. Developmental Psychology, 53(9), 1620.Google Scholar

Savage, L. J. (1954). The foundations of statistics. Wiley.Google Scholar

Schulz, E., Wu, C. M., Ruggeri, A., & Meder, B. (2019). Searching for rewards like a child means less generalization and more directed exploration. Psychological Science, 30(11), 1561–1572. https://doi.org/10.1177/0956797619863663.CrossRef Google Scholar

Settles, B. (2010). Active learning literature survey. Technical Report, University of Wisconsin-Madison.Google Scholar

Shannon, C. E. (1948). A mathematical theory of communication. The Bell System Technical Journal, 27, 379–423. https://doi.org/10.1002/j.1538-7305.1948.tb01338.x.CrossRef Google Scholar

Sharma, B., & Mittal, D. (1975). New non–additive measures of entropy for discrete probability distributions. Journal of Mathematical Sciences, 10, 28–40.Google Scholar

Sharot, T., & Sunstein, C. R. (2020). How people decide what they want to know. Nature Human Behaviour, 4, 14–19. https://doi.org/10.1038/s41562-019-0793-1.Google Scholar

Siegler, R. S. (1977). The twenty questions game as a form of problem solving. Child Development, 395–403.Google Scholar

Simpson, E. H. (1949). Measurement of diversity. Nature, 163, 688. https://doi.org/10.1038/163688a0 CrossRef Google Scholar

Skov, R. B., & Sherman, S. J. (1986). Information-gathering processes: Diagnosticity, hypothesis-confirmatory strategies, and perceived hypothesis confirmation. Journal of Experimental Social Psychology, 103, 278–282. https://doi.org/10.1016/j.beproc.2014.01.014.Google Scholar

Slowiaczek, L. M., Klayman, J., Sherman, S. J., & Skov, R. B. (1992). Information selection and use in hypothesis testing: What is a good question, and what is a good answer? Memory & Cognition, 20(4), 392–405. https://doi.org/10.3758/BF03210923.Google Scholar

Steyvers, M., Tenenbaum, J. B., Wagenmakers, E. J., & Blum, B. (2003). Inferring causal networks from observations and interventions. Cognitive Science, 27, 453–489. https://doi.org/10.1016/S0364-0213(03)00010-7.CrossRef Google Scholar

Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. Cambridge, MA: MIT Press.Google Scholar

Tsallis, C. (1988). Possible generalization of Boltzmann-Gibbs statistics. Journal of Statistical Physics, 52(1–2), 479–487. https://doi.org/10.1007/BF01016429.Google Scholar

Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases. Science, 185(4157), 1124–1131. https://doi.org/10.1126/science.185.4157.1124.Google Scholar

Vajda, I. & Zvárová, J. (2007). On generalized entropies, Bayesian decisions, and statistical diversity. Kybernetika, 43(5), 675–696.Google Scholar

Wason, P. C. (1960). On the failure to eliminate hypotheses in a conceptual task. Quarterly Journal of Experimental Psychology, 12, 129–140. https://doi.org/10.1080/17470216008416717.CrossRef Google Scholar

Wason, P. C. (1968). Reasoning about a rule. Quarterly Journal of Experimental Psychology, 20, 273–281. https://doi.org/10.1080/14640746808400161.CrossRef Google Scholar PubMed

Wells, G. L., & Lindsay, R. C. (1980). On estimating the diagnosticity of eyewitness nonidentifications. Psychological Bulletin, 88(3), 776–784. https://doi.org/10.1037/0033-2909.88.3.776.Google Scholar

Wu, C. M., Meder, B., Filimon, F., & Nelson, J. D. (2017). Asking better questions: How presentation formats influence information search. Journal of Experimental Psychology: Learning, Memory, and Cognition, 43(8), 1274–1297. https://doi.org/10.1037/xlm0000374.Google Scholar PubMed

Wu, C., Schulz, E., Speekenbrink, M., Nelson, J. D., & Meder, B. (2018). Generalization guides human exploration in vast decision spaces. Nature Human Behavior, 2, 915–924. https://doi.org/10.1038/s41562-018-0467-4.Google Scholar

References

Attias, H. (2003). Planning by Probabilistic Inference. Paper presented at the Proc. of the 9th Int. Workshop on Artificial Intelligence and Statistics. https://proceedings.mlr.press/r4/attias03a.html.Google Scholar

Auer, P. (2002). Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research, 3(Nov.), 397–422.Google Scholar

Barlow, H. (1961). Possible principles underlying the transformations of sensory messages. In Rosenblith, W. (Ed.), Sensory Communication (pp. 217–234). MIT Press.Google Scholar

Barlow, H. B. (1974). Inductive inference, coding, perception, and language. Perception, 3, 123–134.Google Scholar

Barto, A. G. (2013). Intrinsic motivation and reinforcement learning. In Baldassarre, G & Mirolli, M, Intrinsically motivated learning in natural and artificial systems (pp. 17–47). Springer.Google Scholar

Barto, A., Mirolli, M., & Baldassarre, G. (2013). Novelty or Surprise? Frontiers in Psychology, 4. doi:10.3389/fpsyg.2013.00907. Retrieved from www.frontiersin.org/Journal/Abstract.aspx?s=196&name=cognitive_science&ART_DOI=10.3389/fpsyg.2013.00907.Google Scholar

Beal, M. J. (2003). Variational Algorithms for Approximate Bayesian Inference. PhD. Thesis, University College London. www.proquest.com/docview/1775215626?pq-origsite=gscholar&fromopenview=true.Google Scholar

Bellemare, M. G., Srinivasan, S., Ostrovski, G., Schaul, T., Saxton, D., & Munos, R. (2016). Unifying count-based exploration and intrinsic motivation. arXiv preprint arXiv:1606.01868.Google Scholar

Berger, J. O. (2011). Statistical decision theory and Bayesian analysis. Springer.Google Scholar

Blei, D. M., Kucukelbir, A., & McAuliffe, J. D. (2017). Variational inference: A review for statisticians. Journal of the American Statistical Association, 112(518), 859–877.CrossRef Google Scholar

Botvinick, M., & Toussaint, M. (2012). Planning as inference. Trends in Cognitive Science., 16(10), 485–488.Google Scholar

Burda, Y., Edwards, H., Storkey, A., & Klimov, O. (2018). Exploration by random network distillation. arXiv preprint arXiv:1810.12894.Google Scholar

Çatal, O., Wauthier, S., Verbelen, T., De Boom, C., & Dhoedt, B. (2020). Deep active inference for autonomous robot navigation. arXiv preprint arXiv:2003.03220.Google Scholar

Chaloner, K., & Verdinelli, I. (1995). Bayesian experimental design: A review. Statistical Science, 273–304.Google Scholar

Cullen, M., Davey, B., Friston, K. J., & Moran, R. J. (2018). Active inference in OpenAI gym: A paradigm for computational investigations into psychiatric illness. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging, 3(9), 809–818.Google Scholar PubMed

Da Costa, L., Parr, T., Sajid, N., Veselic, S., Neacsu, V., & Friston, K. (2020). Active inference on discrete state-spaces: A synthesis. Journal of Mathematical Psychology, 99, 102447. Retrieved from www.sciencedirect.com/science/article/pii/S0022249620300857.Google Scholar

Da Costa, L., Sajid, N., Parr, T., Friston, K., & Smith, R. (2020). The relationship between dynamic programming and active inference: The discrete, finite-horizon case. arXiv preprint arXiv:2009.08111.Google Scholar

Fleming, W. H., & Sheu, S. J. (2002). Risk-sensitive control and an optimal investment model II. Annals of Applied Probability, 12(2), 730–767. Retrieved from https://projecteuclid.org:443/euclid.aoap/1026915623.Google Scholar

Fountas, Z., Sajid, N., Mediano, P. A., & Friston, K. (2020). Deep active inference agents using Monte-Carlo methods. arXiv preprint arXiv:2006.04176.Google Scholar

Friston, K. J. (2010). The free-energy principle: A unified brain theory? Nature Reviews Neuroscience, 11(2), 127–138. http://dx.doi.org/10.1038/nrn2787.Google Scholar

Friston, K. (2019). A free energy principle for a particular physics. arXiv preprint arXiv:1906.10184.Google Scholar

Friston, K., Da Costa, L., Hafner, D., Hesp, C., & Parr, T. (2020). Sophisticated inference. arXiv preprint arXiv:2006.04120.Google Scholar

Friston, K. J., Daunizeau, J., Kilner, J., & Kiebel, S. J. (2010). Action and behavior: A free-energy formulation. Biological Cybernetics, 102(3), 227–260.Google Scholar

Friston, K., FitzGerald, T., Rigoli, F., Schwartenbeck, P., O’Doherty, J., & Pezzulo, G. (2016). Active inference and learning. Neuroscience and Biobehavioral Reviews, 68, 862–879. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/27375276.Google Scholar

Friston, K., FitzGerald, T., Rigoli, F., Schwartenbeck, P., & Pezzulo, G. (2017). Active inference: A process theory. Neural Computation, 29(1), 1–49. Retrieved from https://www.ncbi.nlm.nih.gov/pubmed/27870614.Google Scholar

Friston, K. J., Lin, M., Frith, C. D., Pezzulo, G., Hobson, J. A., & Ondobaka, S. (2017). Active inference, curiosity and insight. Neural Computation, 29(10), 2633–2683. Friston, K. J., Parr, T., & de Vries, B. (2017). The graphical brain: Belief propagation and active inference. Network Neuroscience, 1(4), 381–414. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/29417960.Google Scholar

Friston, K. J., Parr, T., Yufik, Y., Sajid, N., Price, C. J., & Holmes, E. (2020). Generative models, linguistic communication and active inference. Neuroscience & Biobehavioral Reviews, 118, 42–64. https://doi.org/10.1016/j.neubiorev.2020.07.005.Google Scholar

Friston, K. J., Rigoli, F., Ognibene, D., Mathys, C., Fitzgerald, T., & Pezzulo, G. (2015). Active inference and epistemic value. Cognitive Neuroscience, 6(4), 187–224. Retrieved from http://dx.doi.org/10.1080/17588928.2015.1020053.CrossRef Google Scholar PubMed

Friston, K. J., Rosch, R., Parr, T., Price, C., & Bowman, H. (2018). Deep temporal models and active inference. Neuroscience and Biobehavioral Reviews, 90, 486–501.Google Scholar

Friston, K., Schwartenbeck, P., FitzGerald, T., Moutoussis, M., Behrens, T., & Dolan, R. J. (2014). The anatomy of choice: dopamine and decision-making. Philosophical Transactions of the Royal Society B: Biological Sciences, 369 (1655). Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/25267823.Google Scholar

Gottlieb, J., Oudeyer, P.-Y., Lopes, M., & Baranes, A. (2013). Information-seeking, curiosity, and attention: computational and neural mechanisms. Trends in Cognitive Science, 17(11), 585–593. Retrieved from https://www.sciencedirect.com/science/article/pii/S1364661313002052.Google Scholar

Harsanyi, J. C. (1978). Bayesian decision theory and utilitarian ethics. The American Economic Review, 68(2), 223–228. Retrieved from www.jstor.org/stable/1816692.Google Scholar

Houthooft, R., Chen, X., Duan, Y., Schulman, J., De Turck, F., & Abbeel, P. (2016). Vime: Variational information maximizing exploration. Advances in Neural Information Processing Systems, 29, 1109–1117.Google Scholar

Itti, L., & Baldi, P. (2009). Bayesian surprise attracts human attention. Vision Research, 49(10), 1295–1306.Google Scholar

Jaynes, E. T. (1957). Information theory and statistical mechanics. Physical Review, 106(4), 620.Google Scholar

Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under risk. Econometrica, 47(2), 263–291.Google Scholar

Kaplan, R., & Friston, K. J. (2018). Planning and navigation as active inference. Biological Cybernetics, 112(4), 323–343.Google Scholar

Laureiro-Martínez, D., Brusoni, S., & Zollo, M. (2010). The neuroscientific foundations of the exploration−exploitation dilemma. Journal of Neuroscience, Psychology, and Economics, 3(2), 95.Google Scholar

Lindley, D. V. (1956). On a measure of the information provided by an experiment. The Annals of Mathematical Statistics, 986–1005.Google Scholar

Linsker, R. (1990). Perceptual neural organization: some approaches based on network models and information theory. Annual Review of Neuroscience, 13, 257–281.Google Scholar

Millidge, B., Tschantz, A., & Buckley, C. L. (2020). Whence the expected free energy? arXiv preprint arXiv:2004.08128.Google Scholar

Mirza, M. B., Adams, R. A., Mathys, C. D., & Friston, K. J. (2016). Scene construction, visual foraging, and active inference. Frontiers in Computational Neuroscience, 10 (56). Retrieved from http://journal.frontiersin.org/Article/10.3389/fncom.2016.00056/abstract. Mitchell, T., Sacks, J., & Ylvisaker, D. (1994). Asymptotic Bayes criteria for nonparametric response surface design. The Annals of Statistics, 22(2), 634–651.Google Scholar

Optican, L., & Richmond, B. J. (1987). Temporal encoding of two-dimensional patterns by single units in primate inferior cortex. II Information theoretic analysis. Journal of Neurophysiology, 57, 132–146.Google Scholar

Parr, T. (2019). The computational neurology of active vision. UCL (Unpublished doctoral thesis, University College London). https://discovery.ucl.ac.uk/id/eprint/10084391/Google Scholar

Parr, T., Da Costa, L., & Friston, K. (2020). Markov blankets, information geometry and stochastic thermodynamics. Philosophical Transactions of the Royal Society A, 378(2164), 20190159.Google Scholar

Parr, T., & Friston, K. J. (2019a). Attention or salience? Current Opinion in Psychology, 29, 1–5.Google Scholar

Parr, T., & Friston, K. J. (2019b). Generalised free energy and active inference. Biological Cybernetics, 113(5–6), 495–513.Google Scholar

Parr, T., Markovic, D., Kiebel, S. J., & Friston, K. J. (2019). Neuronal message passing using Mean-field, Bethe, and Marginal approximations. Scientific Reports, 9(1), 1889. Retrieved from https://doi.org/10.1038/s41598-018-38246-3.Google Scholar

Pathak, D., Agrawal, P., Efros, A. A., & Darrell, T. (2017). Curiosity-driven exploration by self-supervised prediction. Paper presented at the International Conference on Machine Learning.Google Scholar

Pukelsheim, F. (2006). Optimal design of experiments: SIAM.Google Scholar

Russo, D., Van Roy, B., Kazerouni, A., Osband, I., & Wen, Z. (2017). A tutorial on Thompson sampling. arXiv preprint arXiv:1707.02038.Google Scholar

Sacks, J., Welch, W. J., Mitchell, T. J., & Wynn, H. P. (1989). Design and analysis of computer experiments. Statistical Science, 4(4), 409–423.Google Scholar

Sajid, N., Ball, P. J., Parr, T., & Friston, K. J. (2021). Active inference: Demystified and compared. Neural Computation, 33(3), 674–712.Google Scholar

Savage, L. J. (1972). The foundations of statistics: Courier Corporation.Google Scholar

Schmidhuber, J. (1991a). Curious model-building control systems. In Proc. International Joint Conference on Neural Networks, Singapore. IEEE, 2, 1458–1463. https://mediatum.ub.tum.de/doc/814953/file.pdf.Google Scholar

Schmidhuber, J. (1991b). A possibility for implementing curiosity and boredom in model-building neural controllers. Paper presented at the Proc. of the international conference on simulation of adaptive behavior: From animals to animats. https://mediatum.ub.tum.de/doc/814958/file.pdf Google Scholar

Schmidhuber, J. (2006). Developmental robotics, optimal artificial curiosity, creativity, music, and the fine arts. Connection Science, 18(2), 173–187. https://doi.org/10.1080/09540090600768658.Google Scholar

Schulz, E., & Gershman, S. J. (2019). The algorithmic architecture of exploration in the human brain. Current Opinion in Neurobiology, 55, 7–14.Google Scholar

Schwartenbeck, P., Passecker, J., Hauser, T. U., FitzGerald, T. H., Kronbichler, M., & Friston, K. J. (2019). Computational mechanisms of curiosity and goal-directed exploration. eLife, 8, e.41707. https://doi.org/10.7554/eLife.41703.Google Scholar

Shewry, M. C., & Wynn, H. P. (1987). Maximum entropy sampling. Journal of Applied Statistics, 14(2), 165–170.Google Scholar

Stone, M. (1959). Application of a measure of information to the design and comparison of regression experiments. The Annals of Mathematical Statistics, 30(1), 55–70.Google Scholar

Sun, Y., Gomez, F., & Schmidhuber, J. (2011). Planning to be surprised: Optimal Bayesian exploration in dynamic environments. In Schmidhuber, J., Thórisson, K. R., & Looks, M. (Eds.), Artificial General Intelligence: 4th International Conference, AGI 2011, Mountain View, CA, USA, August 3–6,2011. Proceedings (pp. 41–51). Springer.Google Scholar

Sutton, R. S., & Barto, A. G. (1998). Introduction to Reinforcement Learning: MIT Press.Google Scholar

Todorov, E. (2008). General duality between optimal control and estimation. In 2008 47th IEEE Conference on Decision and Control (pp. 4286–4292). IEEE.Google Scholar

Tschantz, A., Seth, A. K., & Buckley, C. L. (2020). Learning action-oriented models through active inference. PLoS Computational Biology, 16(4), e1007805. Retrieved from https://doi.org/10.1371/journal.pcbi.1007805.Google Scholar

van den Broek, J. L., Wiegerinck, W. A. J. J., & Kappen, H. J. (2010). Risk-sensitive path integral control. UAI, 6, 1–8.Google Scholar

van der Himst, O., & Lanillos, P. (2020). Deep Active Inference for Partially Observable MDPs. In International Workshop on Active Inference (pp. 61–71). Springer.Google Scholar

Vasconcelos, M., Monteiro, T., & Kacelnik, A. (2015). Irrational choice and the value of information. Scientific Reports, 5(1), 13874. Retrieved from https://doi.org/10.1038/srep13874.Google Scholar

Vértes, E., & Sahani, M. (2018). Flexible and accurate inference and learning for deep generative models. arXiv preprint arXiv:1805.11051.Google Scholar

Von Neumann, J., & Morgenstern, O. (1944). Theory of games and economic behavior. Princeton University Press.Google Scholar

Wilson, R. C., Geana, A., White, J. M., Ludvig, E. A., & Cohen, J. D. (2014). Humans use directed and random exploration to solve the explore–exploit dilemma. Journal of Experimental Psychology: General, 143(6), 2074.Google Scholar

Zintgraf, L., Shiarlis, K., Igl, M., Schulze, S., Gal, Y., Hofmann, K., & Whiteson, S. (2019). VariBAD: A very good method for Bayes-adaptive deep RL via meta-learning. arXiv preprint arXiv:1910.08348.Google Scholar

References

Auer, P. (2002). Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research, 3(Nov.), 397–422.Google Scholar

Bellman, R. (1957). A Markovian decision process. Journal of Mathematics and Mechanics, 6(5), 679–684.Google Scholar

Binz, M., & Endres, D. (2019). Where do heuristics come from? In Proceedings of the 41st Annual Conference of the Cognitive Science Society, (pp. 1402–1408). Montreal, QB: Cognitive Science Society.Google Scholar

Borji, A., & Itti, L. (2013). Bayesian optimization explains human active search. Advances in Neural Information Processing Systems, 26, 55–63.Google Scholar

Bramley, N. R., Dayan, P., Griffiths, T. L., & Lagnado, D. A. (2017). Formalizing Neurath’s ship: Approximate algorithms for online causal learning. Psychological Review, 124(3), 301.Google Scholar

Brändle, F., Wu, C. M., & Schulz, E. (2020). What are we curious about? Trends in Cognitive Sciences, 24(9), 685–687.Google Scholar

Chevalier-Boisvert, M., Willems, L., & Pal, S. (2018). Minimalistic gridworld environment for openai gym. https://github.com/maximecb/gym-minigrid. GitHub.Google Scholar

Cohen, J. D., McClure, S. M., & Yu, A. J. (2007). Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration. Philosophical Transactions of the Royal Society B: Biological Sciences, 362(1481), 933–942.Google Scholar

Colas, C., Karch, T., Sigaud, O., & Oudeyer, P.-Y. (2020). Intrinsically motivated goal-conditioned reinforcement learning: a short survey. arXiv preprint arXiv:2012.09830.Google Scholar

Daw, N. D., O’Doherty, J. P., Dayan, P., Seymour, B., & Dolan, R. J. (2006). Cortical substrates for exploratory decisions in humans. Nature, 441(7095), 876–879.Google Scholar

Findling, C., Skvortsova, V., Dromnelle, R., Palminteri, S., & Wyart, V. (2019). Computational noise in reward-guided learning drives behavioral variability in volatile environments. Nature Neuroscience, 22(12), 2066–2077.Google Scholar

Frank, M. J., Doll, B. B., Oas-Terpstra, J., & Moreno, F. (2009). Prefrontal and striatal dopaminergic genes predict individual differences in exploration and exploitation. Nature Neuroscience, 12(8), 1062.Google Scholar

Geana, A., Wilson, R. C., Daw, N., & Cohen, J. D. (2016). Boredom, information-seeking and exploration. In A. Papafragou, D. Grodner, D. Mirman, & J. C. Trueswell (Eds.), Proceedings of the 38th Annual Conference of the Cognitive Science Society (pp. 1751–1756). Austin, TX: Cognitive Science Society.Google Scholar

Gershman, S. J. (2018). Deconstructing the human algorithms for exploration. Cognition, 173, 34–42.Google Scholar

Griffiths, T. (2014, 12). Manifesto for a new (computational) cognitive revolution. Cognition, 135. https://doi.org/10.1016/j.cognition.2014.11.026.Google Scholar

Hills, T. T., Todd, P. M., Lazer, D., Redish, A. D., Couzin, I. D., Group, C. S. R., et al. (2015). Exploration versus exploitation in space, mind, and society. Trends in Cognitive Sciences, 19(1), 46–54.Google Scholar

Jaksch, T., Ortner, R., & Auer, P. (2010). Near-optimal regret bounds for reinforcement learning. Journal of Machine Learning Research, 11(4), 1563–1600.Google Scholar

Jinnai, Y., Park, J. W., Abel, D., & Konidaris, G. (2019). Discovering options for exploration by minimizing cover time. arXiv preprint arXiv:1903.00606.Google Scholar

Kaplan, F., & Oudeyer, P.-Y. (2004). Maximizing learning progress: an internal reward system for development. In Iida, F., Pfeifer, R., Steels, L., & Kuniyoshi, Y. (Eds.), Embodied artificial intelligence (pp. 259–270). Springer. https://doi.org/10.1007/978-3-540-27833-7_19.Google Scholar

Kidd, C., Piantadosi, S. T., & Aslin, R. N. (2012). The Goldilocks effect: Human infants allocate attention to visual sequences that are neither too simple nor too complex. PLoS One, 7(5), e36399.Google Scholar

Klyubin, A. S., Polani, D., & Nehaniv, C. L. (2005). All else being equal be empowered. In Capcarrère, M. S., Freitas, A. A., Bentley, P. J., Johnson, C. G., & Timmis, J. (Eds.), Advances in Artificial Life. ECAL 2005. Lecture Notes in Computer Science, vol. 3630. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11553090_75.Google Scholar

Köhler, W. (1925). The mentality of apes (Vol. 74). Paul, K., Trench, Trubner & Company, Limited.Google Scholar

Krebs, J. R., Kacelnik, A., & Taylor, P. (1978). Test of optimal sampling by foraging great tits. Nature, 275(5675), 27–31.CrossRef Google Scholar

Leibfried, F., Pascual-D´ıaz, S., & Grau-Moya, J. (2019). A unified Bellman optimality principle combining reward maximization and empowerment. Advances in Neural Information Processing Systems, 32, 7869–7880.Google Scholar

Lopes, M., Lang, T., Toussaint, M., & Oudeyer, P.-Y. (2012). Exploration in model-based reinforcement learning by empirically estimating learning progress. Advances in Neural Information Processing Systems, 25, 206–214.Google Scholar

Matusch, B., Ba, J., & Hafner, D. (2020). Evaluating agents without rewards. arXiv preprint arXiv:2012.11538.Google Scholar

Mehlhorn, K., Newell, B. R., Todd, P. M., Lee, M. D., Morgan, K., Braithwaite, V. A., … Gonzalez, C. (2015). Unpacking the exploration–exploitation tradeoff: A synthesis of human and animal literatures. Decision, 2(3), 191.Google Scholar

Mohamed, S., & Rezende, D. J. (2015). Variational information maximisation for intrinsically motivated reinforcement learning. In Advances in neural information processing systems (pp. 2125–2133).Google Scholar

Nair, A., Pong, V., Dalal, M., Bahl, S., Lin, S., & Levine, S. (2018). Visual reinforcement learning with imagined goals. arXiv preprint arXiv:1807.04742.Google Scholar

Osband, I., Russo, D., & Van Roy, B. (2013). (More) efficient reinforcement learning via posterior sampling. In Advances in neural information processing systems (pp. 3003–3011).Google Scholar

Oudeyer, P.-Y., & Kaplan, F. (2009). What is intrinsic motivation? A typology of computational approaches. Frontiers in Neurorobotics, 1, 6.Google Scholar

Oudeyer, P.-Y., Kaplan, F., & Hafner, V. V. (2007). Intrinsic motivation systems for autonomous mental development. IEEE Transactions on Evolutionary Computation, 11(2), 265–286.Google Scholar

Pong, V., Gu, S., Dalal, M., & Levine, S. (2018). Temporal difference models: Model-free deep RL for model-based control. arXiv preprint arXiv:1802.09081.Google Scholar

Rich, A. S., & Gureckis, T. M. (2018). Exploratory choice reflects the future value of information. Decision, 5(3), 177.Google Scholar

Salge, C., Glackin, C., & Polani, D. (2014). Empowerment–an introduction. In Prokopenko, M. (ed.), Guided self-organization: Inception (pp. 67–114). Springer. https://doi.org/10.1007/978-3-642-53734-9_4.CrossRef Google Scholar

Sanborn, A. N., & Chater, N. (2016). Bayesian brains without probabilities. Trends in Cognitive Sciences, 20(12), 883–893.CrossRef Google Scholar PubMed

Schaul, T., Horgan, D., Gregor, K., & Silver, D. (2015). Universal value function approximators. In International conference on machine learning (pp. 1312–1320). https://proceedings.mlr.press/v37/schaul15.html.Google Scholar

Schmidhuber, J. (1991). Curious model-building control systems. In Proc. international joint conference on neural networks (pp. 1458–1463). https://doi.org/10.1109/IJCNN.1991.170605.Google Scholar

Schmidhuber, J. (2010). Formal theory of creativity, fun, and intrinsic motivation (1990–2010). IEEE Transactions on Autonomous Mental Development, 2(3), 230–247.Google Scholar

Schulz, E., Bhui, R., Love, B. C., Brier, B., Todd, M. T., & Gershman, S. J. (2019). Structured, uncertainty-driven exploration in real-world consumer choice. Proceedings of the National Academy of Sciences, 116(28), 13903–13908. https://doi.org/10.1073/pnas.1821028116.Google Scholar

Schulz, E., & Gershman, S. J. (2019). The algorithmic architecture of exploration in the human brain. Current Opinion in Neurobiology, 55, 7–14.Google Scholar

Speekenbrink, M., & Konstantinidis, E. (2015). Uncertainty and exploration in a restless bandit problem. Topics in Cognitive Science, 7(2), 351–367.Google Scholar

Stafford, T., & Dewar, M. (2014). Tracing the trajectory of skill learning with a very large sample of online game players. Psychological Science, 25(2), 511–518. https://doi.org/10.1177/0956797613511466.Google Scholar

Steyvers, M., Lee, M. D., & Wagenmakers, E.-J. (2009). A Bayesian analysis of human decision-making on bandit problems. Journal of Mathematical Psychology, 53(3), 168–179.Google Scholar

Stojić, H., Analytis, P. P., & Speekenbrink, M. (2015). Human behavior in contextual multi-armed bandit problems. In Noelle, D. C., et al. (Eds.), Proceedings of the 37th Annual Meeting of the Cognitive Science Society (pp. 2290--2295). Cognitive Science Society.Google Scholar

Stojić, H., Schulz, E., P Analytis, P., & Speekenbrink, M. (2020). It’s new, but is it good? How generalization and uncertainty guide the exploration of novel options. Journal of Experimental Psychology: General, 149(10), 1878.Google Scholar

Strehl, A. L., & Littman, M. L. (2008). An analysis of model-based interval estimation for Markov decision processes. Journal of Computer and System Sciences, 74(8), 1309–1331.Google Scholar

Strens, M. (2000). A Bayesian framework for reinforcement learning. In Proceedings of the Seventeenth International Conference on Machine Learning (ICML-2000), Stanford University, California, June 29–July 2, 2000.(Vol. 2000, pp. 943–950).Google Scholar

Sun, Y., Gomez, F., & Schmidhuber, J. (2011). Planning to be surprised: Optimal Bayesian exploration in dynamic environments. In International conference on artificial general intelligence (pp. 41–51).Google Scholar

Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. MIT Press.Google Scholar

Sutton, R. S., Modayil, J., Delp, M., Degris, T., Pilarski, P. M., White, A., & Precup, D. (2011). Horde: A scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction. In Proc. of 10th Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS 2011), May, 2–6, 2011, Taipei, Taiwan (pp. 761–768).Google Scholar

Thompson, W. R. (1933). On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3/4), 285–294.Google Scholar

Whittle, P. (1980). Multi-armed bandits and the Gittins Index. Journal of the Royal Statistical Society: Series B (Methodological), 42(2), 143–149.Google Scholar

Wilson, R. C., Bonawitz, E., Costa, V. D., & Ebitz, R. B. (2021). Balancing exploration and exploitation with information and randomization. Current Opinion in Behavioral Sciences, 38, 49–56.Google Scholar

Wilson, R. C., Shenhav, A., Straccia, M., & Cohen, J. D. (2019). The eighty five percent rule for optimal learning. Nature Communications, 10(1), 1–9.Google Scholar

Wimmer, G. E., Daw, N. D., & Shohamy, D. (2012). Generalization of value in reinforcement learning by humans. European Journal of Neuroscience, 35(7), 1092–1104.Google Scholar

Wu, C. M., Schulz, E., Garvert, M. M., Meder, B., & Schuck, N. W. (2020). Similarities and differences in spatial and non-spatial cognitive maps. PLoS Computational Biology, 16(9), e1008149.Google Scholar

Wu, C. M., Schulz, E., Speekenbrink, M., Nelson, J. D., & Meder, B. (2017). Mapping the unknown: The spatially correlated multi-armed bandit. bioRxiv, 106286.Google Scholar

Wu, C. M., Schulz, E., Speekenbrink, M., Nelson, J. D., & Meder, B. (2018). Generalization guides human exploration in vast decision spaces. Nature Human Behaviour, 2(12), 915–924.Google Scholar

Zhang, S., & Yu, A. J. (2013). Forgetful Bayes and myopic planning: Human learning and decision-making in a bandit setting. In NIPS (pp. 2607–2615). https://proceedings.neurips.cc/paper/2013/file/6c14da109e294d1e8155be8aa4b1ce8e-Paper.pdf.Google Scholar

Zheng, Z., Oh, J., Hessel, M., Xu, Z., Kroiss, M., Van Hasselt, H., … Singh, S. (2020, 13–18 Jul). What can learned intrinsic rewards capture? In Duamé . III, H & Singh, A. (Eds.), Proceedings of the 37th international conference on machine learning (Vol. 119, pp. 11436–11446). PMLR.Google Scholar

Zhu, Y., Mottaghi, R., Kolve, E., Lim, J. J., Gupta, A., Fei-Fei, L., & Farhadi, A. (2017). Target-driven visual navigation in indoor scenes using deep reinforcement learning. In 2017 IEEE international conference on robotics and automation (ICRA) (pp. 3357–3364), Singapore.Google Scholar

References

Apperly, I. (2010). Mindreaders: The cognitive basis of “theory of mind.” Psychology Press.Google Scholar

Atkisson, C., O’Brien, M. J., & Mesoudi, A. (2012). Adult learners in a novel environment use prestige-biased social learning. Evolutionary Psychology: An International Journal of Evolutionary Approaches to Psychology and Behavior, 10(3), 519–537.Google Scholar

Baker, C. L., Jara-Ettinger, J., Saxe, R., & Tenenbaum, J. B. (2017). Rational quantitative attribution of beliefs, desires and percepts in human mentalizing. Nature Human Behaviour, 1(4), 1–10.Google Scholar

Bandura, A. (1962). Social learning through imitation. Nebraska Symposium on Motivation, 330, 211–274.Google Scholar

Bhui, R., Lai, L., & Gershman, S. J. (2021). Resource-rational decision making. Current Opinion in Behavioral Sciences, 41, 15–21.Google Scholar

Botvinick, M., & Weinstein, A. (2014). Model-based hierarchical reinforcement learning and human action control. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 369(1655). https://doi.org/10.1098/rstb.2013.0480.Google Scholar

Boyd, R., & Richerson, P. J. (1988). Culture and the Evolutionary Process. University of Chicago Press.Google Scholar

Catmur, C., Walsh, V., & Heyes, C. (2009). Associative sequence learning: The role of experience in the development of imitation and the mirror system. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 364(1528), 2369–2380.Google Scholar

Charpentier, C. J., Iigaya, K., & O’Doherty, J. P. (2020). A neuro-computational account of arbitration between choice imitation and goal emulation during human observational learning. Neuron, 106(4), 687–699.e7.Google Scholar

Cogliati Dezza, I., Cleeremans, A., & Alexander, W. (2019). Should we control? The interplay between cognitive control and information integration in the resolution of the exploration-exploitation dilemma. Journal of Experimental Psychology. General, 148(6), 977–993.Google Scholar

Collette, S., Pauli, W. M., Bossaerts, P., & O’Doherty, J. (2017). Neural computations underlying inverse reinforcement learning in the human brain. eLife, 6. https://doi.org/10.7554/eLife.29718.Google Scholar

Cushman, F. (2020). Rationalization is rational. The Behavioral and Brain Sciences, 43, e28.Google Scholar

Cushman, F., & Morris, A. (2015). Habitual control of goal selection in humans. Proceedings of the National Academy of Sciences of the United States of America, 112(45), 13817–13822.Google Scholar

Dasgupta, I., & Gershman, S. J. (2021). Memory as a computational resource. Trends in Cognitive Sciences, 25(3), 240–251.Google Scholar

Dasgupta, I., Schulz, E., Goodman, N. D., & Gershman, S. J. (2018). Remembrance of inferences past: Amortization in human hypothesis generation. Cognition, 178, 67–81.Google Scholar

Daw, N. D., Niv, Y., & Dayan, P. (2005). Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nature Neuroscience, 8(12), 1704–1711.Google Scholar

Derex, M., Bonnefon, J.-F., Boyd, R., & Mesoudi, A. (2019). Causal understanding is not necessary for the improvement of culturally evolving technology. Nature Human Behaviour, 3(5), 446–452.Google Scholar

Dezfouli, A., & Balleine, B. W. (2013). Actions, action sequences and habits: Evidence that goal-directed and habitual action control are hierarchically organized. PLoS Computational Biology, 9(12), e1003364.Google Scholar

Foster, D. J. (2017). Replay comes of age. Annual Review of Neuroscience, 40, 581–602.Google Scholar

Gergely, G., & Csibra, G. (2003). Teleological reasoning in infancy: The naıve theory of rational action. In Trends in Cognitive Sciences (Vol. 7, Issue 7, pp. 287–292). https://doi.org/10.1016/s1364-6613(03)00128-1.Google Scholar

Gershman, S. J. (2020). Origin of perseveration in the trade-off between reward and complexity. Cognition, 204, 104394.Google Scholar

Gershman, S. J., Horvitz, E. J., & Tenenbaum, J. B. (2015). Computational rationality: A converging paradigm for intelligence in brains, minds, and machines. Science, 349(6245), 273–278.Google Scholar

Gershman, S. J., Markman, A. B., & Otto, A. R. (2014). Retrospective revaluation in sequential decision making: A tale of two systems. Journal of Experimental Psychology. General, 143(1), 182–194.Google Scholar

Gigerenzer, G., & Gaissmaier, W. (2011). Heuristic decision making. Annual Review of Psychology, 62, 451–482.Google Scholar

Gweon, H. (2021). Inferential Social Learning: How humans learn from others and help others learn. https://doi.org/10.31234/osf.io/8n34t.Google Scholar

Hayden, B. Y., & Niv, Y. (2021). The case against economic values in the orbitofrontal cortex (or anywhere else in the brain). Behavioral Neuroscience, 135(2), 192–201.Google Scholar

Henrich, J. (2017). The Secret of Our Success: How Culture Is Driving Human Evolution, Domesticating Our Species, and Making Us Smarter. Princeton University Press.Google Scholar

Henrich, J., & Gil-White, F. J. (2001). The evolution of prestige: Freely conferred deference as a mechanism for enhancing the benefits of cultural transmission. Evolution and Human Behavior: Official Journal of the Human Behavior and Evolution Society, 22(3), 165–196.Google Scholar

Herrmann, E., Call, J., Hernàndez-Lloreda, M. V., Hare, B., & Tomasello, M. (2007). Humans have evolved specialized skills of social cognition: the cultural intelligence hypothesis. Science, 317(5843), 1360–1366.Google Scholar

Heyes, C. (2001). Causes and consequences of imitation. Trends in Cognitive Sciences, 5(6), 253–261.Google Scholar

Heyes, C. (2002). Transformational and associative theories of imitation. Imitation in Animals and Artifacts, 607, 501–523.Google Scholar

Heyes, C. (2018). Cognitive Gadgets: The Cultural Evolution of Thinking. Harvard University Press.Google Scholar

Ho, M. K., MacGlashan, J., Littman, M. L., & Cushman, F. (2017). Social is special: A normative framework for teaching with and learning from evaluative feedback. Cognition, 167, 91–106.Google Scholar

Hoppitt, W., & Laland, K. N. (2013). Social Learning: An Introduction to Mechanisms, Methods, and Models. Princeton University Press.Google Scholar

Horner, V., & Whiten, A. (2005). Causal knowledge and imitation/emulation switching in chimpanzees (Pan troglodytes) and children (Homo sapiens). Animal Cognition, 8(3), 164–181.Google Scholar

Huys, Q. J. M., Lally, N., Faulkner, P., Eshel, N., Seifritz, E., Gershman, S. J., Dayan, P., & Roiser, J. P. (2015). Interplay of approximate planning strategies. Proceedings of the National Academy of Sciences of the United States of America, 112(10), 3098–3103.Google Scholar

Jara-Ettinger, J. (2019). Theory of mind as inverse reinforcement learning. Current Opinion in Behavioral Sciences, 29, 105–110.Google Scholar

Jara-Ettinger, J., Gweon, H., Schulz, L. E., & Tenenbaum, J. B. (2016). The Naïve Utility Calculus: Computational Principles Underlying Commonsense Psychology. Trends in Cognitive Sciences, 20(8), 589–604.Google Scholar

Jara-Ettinger, J., Gweon, H., Tenenbaum, J. B., & Schulz, L. E. (2015). Children’s understanding of the costs and rewards underlying rational action. Cognition, 140, 14–23.Google Scholar

Jern, A., Lucas, C. G., & Kemp, C. (2017). People learn other people’s preferences through inverse decision-making. Cognition, 168, 46–64.Google Scholar

Jiménez, Á. V., & Mesoudi, A. (2019). Prestige-biased social learning: Current evidence and outstanding questions. Palgrave Communications, 5(1), 20.Google Scholar

Keramati, M., Smittenaar, P., Dolan, R. J., & Dayan, P. (2016). Adaptive integration of habits into depth-limited planning defines a habitual-goal-directed spectrum. Proceedings of the National Academy of Sciences of the United States of America, 113(45), 12868–12873.Google Scholar

Kool, W., Cushman, F. A., & Gershman, S. J. (2018). Competition and cooperation between multiple reinforcement learning systems. In Morris, R., Bornstein, A., & Shenhav, A. (Eds.), Goal-directed decision making (pp. 153–178). Academic Press.Google Scholar

Kool, W., Gershman, S. J., & Cushman, F. A. (2017). Cost-benefit arbitration between multiple reinforcement-learning systems. Psychological Science, 28(9), 1321–1333.Google Scholar

Kool, W., Gershman, S. J., & Cushman, F. A. (2018). Planning complexity registers as a cost in metacontrol. Journal of Cognitive Neuroscience, 30(10), 1391–1404.Google Scholar

Legare, C. H., & Nielsen, M. (2015). Imitation and innovation: The dual engines of cultural learning. Trends in Cognitive Sciences, 19(11), 688–699.Google Scholar

Lieder, F., & Griffiths, T. L. (2020). Resource-rational analysis: Understanding human cognition as the optimal use of limited computational resources. In Behavioral and Brain Sciences (Vol. 43). https://doi.org/10.1017/s0140525x1900061x.Google Scholar

Liu, S., Brooks, N. B., & Spelke, E. S. (2019). Origins of the concepts cause, cost, and goal in prereaching infants. Proceedings of the National Academy of Sciences of the United States of America, 116(36), 17747–17752.Google Scholar

Lyons, D. E., Young, A. G., & Keil, F. C. (2007). The hidden structure of overimitation. Proceedings of the National Academy of Sciences of the United States of America, 104(50), 19751–19756.Google Scholar

Maisto, D., Friston, K., & Pezzulo, G. (2019). Caching mechanisms for habit formation in active inference. Neurocomputing, 359, 298–314.Google Scholar

McGuigan, N., Whiten, A., Flynn, E., & Horner, V. (2007). Imitation of causally opaque versus causally transparent tool use by 3- and 5-year-old children. Cognitive Development, 22(3), 353–364.Google Scholar

Miller, K. J., Botvinick, M. M., & Brody, C. D. (2017). Dorsal hippocampus contributes to model-based planning. Nature Neuroscience. https://doi.org/10.1101/096594.Google Scholar

Miller, K. J., Shenhav, A., & Ludvig, E. A. (2019). Habits without values. Psychological Review, 126(2), 292–311.Google Scholar

Miller, N. E., & Dollard, J. (1941). Social Learning and Imitation (Vol. 55). Yale University Press.Google Scholar

Momennejad, I., Otto, A. R., Daw, N. D., & Norman, K. A. (2018). Offline replay supports planning in human reinforcement learning. eLife, 7. https://doi.org/10.7554/eLife.32548.Google Scholar

Momennejad, I., Russek, E. M., Cheong, J. H., Botvinick, M. M., Daw, N. D., & Gershman, S. J. (2017). The successor representation in human reinforcement learning. Nature Human Behaviour, 1(9), 680–692.Google Scholar

Morin, O. (2016). How Traditions Live and Die. Oxford University Press.Google Scholar

Morris, A., & Cushman, F. (2018). A common framework for theories of norm compliance. Social Philosophy & Policy, 35(1), 101–127.Google Scholar

Najar, A., Bonnet, E., Bahrami, B., & Palminteri, S. (2020). The actions of others act as a pseudo-reward to drive imitation in the context of social reinforcement learning. PLoS Biology, 18(12), e3001028.Google Scholar

O’Donnell, T. J. (2015). Productivity and Reuse in Language: A Theory of Linguistic Computation and Storage. MIT Press.Google Scholar

Otto, A. R., Gershman, S. J., Markman, A. B., & Daw, N. D. (2013). The curse of planning: Dissecting multiple reinforcement-learning systems by taxing the central executive. Psychological Science, 24(5), 751–761.Google Scholar

Otto, A. R., Raio, C. M., Chiang, A., Phelps, E. A., & Daw, N. D. (2013). Working-memory capacity protects model-based learning from stress. Proceedings of the National Academy of Sciences of the United States of America, 110(52), 20941–20946.Google Scholar

Rendell, L., Boyd, R., Cownden, D., Enquist, M., Eriksson, K., Feldman, M. W., … & Laland, K. N. (2010). Why copy others? Insights from the social learning strategies tournament. Science, 328(5975), 208–213.Google Scholar

Rozenblit, L., & Keil, F. (2002). The misunderstood limits of folk science: An illusion of explanatory depth. Cognitive Science, 26(5), 521–562.Google Scholar

Russek, E. M., Momennejad, I., Botvinick, M. M., Gershman, S. J., & Daw, N. D. (2017). Predictive representations can link model-based reinforcement learning to model-free mechanisms. PLoS Computational Biology, 13(9), e1005768.Google Scholar

Scott-Phillips, T. C. (2017). A (simple) experimental demonstration that cultural evolution is not replicative, but reconstructive – and an explanation of why this difference matters. Journal of Cognition and Culture, 17(1–2), 1–11.Google Scholar

Shafto, P., Goodman, N. D., & Frank, M. C. (2012). Learning from others: The consequences of psychological reasoning for human learning. Perspectives on Psychological Science: A Journal of the Association for Psychological Science, 7(4), 341–351.Google Scholar

Shenhav, A., Botvinick, M. M., & Cohen, J. D. (2013). The expected value of control: An integrative theory of anterior cingulate cortex function. Neuron, 79(2), 217–240.Google Scholar

Skinner, B. F. (1950). Are theories of learning necessary? Psychological Review, 57(4), 193–216.Google Scholar

Solway, A., & Botvinick, M. M. (2012). Goal-directed decision making as probabilistic inference: A computational framework and potential neural correlates. Psychological Review, 119(1), 120–154.Google Scholar

Solway, A., & Botvinick, M. M. (2015). Evidence integration in model-based tree search. Proceedings of the National Academy of Sciences of the United States of America, 112(37), 11708–11713.Google Scholar

Solway, A., Diuk, C., Córdova, N., Yee, D., Barto, A. G., Niv, Y., & Botvinick, M. M. (2014). Optimal behavioral hierarchy. PLoS Computational Biology, 10(8), e1003779.Google Scholar

Sperber, D. (2006). Why a deep understanding of cultural evolution is incompatible with shallow psychology. In Enfield, N. J. & Levinson, Stephen C. (Ed.), Roots of human sociality (pp. 431–449). Routledge.Google Scholar

Strachan, J., Curioni, A., Constable, M., Knoblich, G., & Charbonneau, M. (2020). A methodology for distinguishing copying and reconstruction in cultural transmission episodes. In Denison, S, Mack, M, Xu, Y, Yang, A and Armstrong, C. B (Eds.), Proceedings of the 42nd Annual Conference of the Cognitive Science Society. https://researchportal.northumbria.ac.uk/ws/files/32896647/0831.pdf.Google Scholar

Sutton, R. S., & Barto, A. G. (2018). Reinforcement Learning, second edition: An Introduction. MIT Press.Google Scholar

Tennie, C., Call, J., & Tomasello, M. (2009). Ratcheting up the ratchet: On the evolution of cumulative culture. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 364(1528), 2405–2415.Google Scholar

Thorndike, E. L. (1932). The fundamentals of learning. https://psycnet.apa.org/record/2006-04535-000.Google Scholar

Tolman, E. C. (1948). Cognitive maps in rats and men. Psychological Review (Vol. 55, Issue 4, pp. 189–208). https://doi.org/10.1037/h0061626.Google Scholar

Tomasello, M. (1996). Do apes ape. In Heyes, C. M & Galef, B. G, Jr. (Eds.), Social Learning in Animals: The Roots of Culture, (pp. 319–346). Academic Press. https://doi.org/10.1016/B978-012273965-1/50016-9.Google Scholar

Tomasello, M., Carpenter, M., Call, J., Behne, T., & Moll, H. (2005). Understanding and sharing intentions: The origins of cultural cognition. The Behavioral and Brain Sciences, 28(5), 675–691; discussion 691–735.Google Scholar

Tomasello, M., Davis-Dasilva, M., Camak, L., & Bard, K. (1987). Observational learning of tool-use by young chimpanzees. Human Evolution, 2(2), 175–183.Google Scholar

Vélez, N., & Gweon, H. (2021). Learning from other minds: An optimistic critique of reinforcement learning models of social learning. Current Opinion in Behavioral Sciences, 38, 110–115.Google Scholar

Vikbladh, O. M., Meager, M. R., King, J., Blackmon, K., Devinsky, O., Shohamy, D., Burgess, N., & Daw, N. D. (2019). Hippocampal contributions to model-based planning and spatial memory. Neuron, 102(3), 683–693.e4.Google Scholar

Whiten, A., & Ham, R. (1992). Kingdom: Reappraisal of a century of research. Advances in the Study of Behavior, 21, 239.Google Scholar

Wu, C. M., Schulz, E., Gerbaulet, K., Pleskac, T. J., & Speekenbrink, M. (2021). Time to explore: Adaptation of exploration under time pressure. PsyArXiv. https://doi.org/10.31234/osf.io/dsw7q.Google Scholar

Zaki, J., Schirmer, J., & Mitchell, J. P. (2011). Social influence modulates the neural computation of value. Psychological Science, 22(7), 894–900.Google Scholar

Book contents

Part II - How Do Humans Search for Information?

Summary

Access options

Book purchase

Temporarily unavailable

References

References

References

References

References

Save book to Kindle

Save book to Dropbox

Save book to Google Drive