Skip to main content Accessibility help
×
Hostname: page-component-cd9895bd7-q99xh Total loading time: 0 Render date: 2024-12-24T13:37:14.293Z Has data issue: false hasContentIssue false

Part II - How Do Humans Search for Information?

Published online by Cambridge University Press:  19 May 2022

Irene Cogliati Dezza
Affiliation:
University College London
Eric Schulz
Affiliation:
Max-Planck-Institut für biologische Kybernetik, Tübingen
Charley M. Wu
Affiliation:
Eberhard-Karls-Universität Tübingen, Germany
Get access

Summary

Image of the first page of this content. For PDF version, please use the ‘Save PDF’ preceeding this image.'
Type
Chapter
Information
The Drive for Knowledge
The Science of Human Information Seeking
, pp. 99 - 192
Publisher: Cambridge University Press
Print publication year: 2022

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

References

Austerweil, J. L., & Griffiths, T. L. (2011). Seeking confirmation is rational for deterministic hypotheses. Cognitive Science, 35(3), 499526. https://doi.org/10.1111/j.1551-6709.2010.01161.x.Google Scholar
Baron, J. (1985). Rationality and intelligence. Cambridge University Press.CrossRefGoogle Scholar
Baron, J., & Hershey, J. C. (1988). Heuristics and biases in diagnostic reasoning: I. Priors, error costs, and test accuracy. Organizational Behavior and Human Decision Processes, 41(2), 259279. https://doi.org/10.1016/0749-5978(88)90030-1.Google Scholar
Beck, C. (2009). Generalised information and entropy measures in physics. Contemporary Physics, 50(4), 495510. https://doi.org/10.1080/00107510902823517.CrossRefGoogle Scholar
Bellman, R. (1957). Dynamic programming. Princeton University Press.Google Scholar
Benish, W. A. (1999). Relative entropy as a measure of diagnostic information. Medical Decision Making, 19(2), 202206. https://doi.org/10.1177/0272989X9901900211.CrossRefGoogle ScholarPubMed
Bramley, N. R., Lagnado, D. A., & Speekenbrink, M. (2015). Conservative forgetful scholars: How people learn causal structure through sequences of interventions. Journal of Experimental Psychology. Learning, Memory, and Cognition, 41(3), 708731. https://doi.org/10.1037/xlm0000061.Google Scholar
Bruner, J. S., Goodnow, J. J., & Austin, G. A. (1956). A study of thinking. John Wiley and Sons.Google Scholar
Butko, N. J., & Movellan, J. R. (2010). Infomax control of eye movements. IEEE Transactions on Autonomous Mental Development, 2(2), 91107. https://doi.org/10.1109/TAMD.2010.2051029.CrossRefGoogle Scholar
Chamberlin, T. C. (1890). The method of multiple working hypotheses. Science, 15, 9296. https://doi.org/10.1126/science.148.3671.754Google Scholar
Chater, N., Oaksford, M., Nakisa, R., & Redington, M. (2003). Fast, frugal, and rational: How rational norms explain behavior. Organizational Behavior and Human Decision Processes, 90(1), 6386. https://doi.org/10.1016/S0749-5978(02)00508-3.Google Scholar
Coenen, A., Nelson, J. D. & Gureckis, T. M. (2019). Asking the right questions about the psychology of human inquiry: Nine open challenges. Psychonomic Bulletin & Review, 26, 15481587. https://doi.org/10.3758/s13423-018-1470-5.CrossRefGoogle ScholarPubMed
Crupi, V. (2019). Measures of biological diversity: Overview and unified framework. In Casetta, E., Marques da Silva, J., and Vecchi, D. (Eds.), From assessing to conserving biodiversity (pp. 123136). Springer.Google Scholar
Crupi, V., Nelson, J. D., Meder, B., Cevolani, G., & Tentori, K. (2018). Generalized information theory meets human cognition: Introducing a unified framework to model uncertainty and information search. Cognitive Science, 42(5), 14101456. https://doi.org/10.1111/cogs.12613.CrossRefGoogle Scholar
Crupi, V., & Tentori, K. (2014). State of the field: Measuring information and confirmation. Studies in History and Philosophy of Science Part A, 47, 8190. https://doi.org/10.1016/j.shpsa.2014.05.002.Google Scholar
Crupi, V., Tentori, K., & Lombardi, L. (2009). Pseudodiagnosticity revisited. Psychological Review, 116(4), 971985. https://doi.org/10.1037/a0017050.Google Scholar
Denney, D. R., & Denney, N. W. (1973). The use of classification for problem solving: A comparison of middle and old age. Developmental Psychology, 9(2), 275278. https://doi.org/10.1037/h0035092.Google Scholar
Dubey, R., & Griffiths, T. L. (2020). Reconciling novelty and complexity through a rational analysis of curiosity. Psychological Review, 127(3), 455476. https://doi.org/10.1037/rev0000175.CrossRefGoogle ScholarPubMed
Filimon, F., Nelson, J. D., Sejnowski, T. J., Sereno, M. I., & Cottrell, G. W. (2020). The ventral striatum dissociates information expectation, reward anticipation, and reward receipt. Proceedings of the National Academy of Sciences, 117(26), 15200-15208. https://doi.org/10.1073/pnas.1911778117.CrossRefGoogle ScholarPubMed
Gigerenzer, G., & Gaissmaier, W. (2011). Heuristic decision making. Annual Review of Psychology, 62, 451482. https://doi.org/10.1146/annurev-psych-120709-145346.Google Scholar
Gini, C. (1921). Measurement of inequality of incomes. The Economic Journal, 31(121), 124126. https://doi.org/10.2307/2223319.Google Scholar
Good, I. J. (1950). Probability and the weight of evidence. Charles Griffin & Co.Google Scholar
Gureckis, T. M., & Markant, D. B. (2012). Self-directed learning: A cognitive and computational perspective. Perspectives on Psychological Science, 7(5), 464481. https://doi.org/10.1177/1745691612454304.Google Scholar
Hartley, R. V. (1928). Transmission of information. Bell System Technical Journal, 7(3), 535563.Google Scholar
Hertwig, R., & Engel, C. (2016). Homo Ignorans: Deliberately choosing not to know. Perspectives on Psychological Science, 11(3), 359372. https://doi.org/10.1177/1745691616635594.Google Scholar
Hyafil, L., & Rivest, R. L. (1976). Constructing optimal binary decision trees is NP-complete. Information Processing Letters, 5(1), 1517. https://doi.org/10.1016/0020-0190(76)90095-8.CrossRefGoogle Scholar
Kachergis, G., Berends, F., Kleijn, R. D., & Hommel, B. (2016). Human reinforcement learning of sequential action. In Papafragou, A, Grodner, D, & Mirman, D (Eds.), Proceedings of the 38th Annual Meeting of the Cognitive Science Society (CogSci 2016), pp. 193198.Google Scholar
Kachergis, G., Rhodes, M., & Gureckis, T. (2017). Desirable difficulties during the development of active inquiry skills. Cognition, 166, 407417. https://doi.org/10.1016/j.cognition.2017.05.021.Google Scholar
Keylock, C. J. (2005). Simpson diversity and the Shannon–Wiener index as special cases of a generalized entropy. Oikos, 109(1), 203207. https://doi.org/10.1111/j.0030-1299.2005.13735.x.Google Scholar
Klayman, J., & Ha, Y.-W. (1987). Confirmation, disconfirmation, and information in hypothesis testing. Psychological Review, 94(2), 211228. https://doi.org/10.1037/0033-295X.94.2.211.CrossRefGoogle Scholar
Kleinegesse, S. & Gutmann, M. U. (2020). Bayesian experimental design for implicit models by mutual information neural estimation. Proceedings of Machine Learning Research, 119, 53165326.Google Scholar
Kruschke, J.K. (2008). Bayesian approaches to associative learning: From passive to active learning. Learning & Behavior, 36, 210226. https://doi.org/10.3758/LB.36.3.210.Google Scholar
Legge, G. E., Klitz, T. S., & Tjan, B. S. (1997). Mr. Chips: An ideal-observer model of reading. Psychological Review, 104(3), 524553. https://doi.org/10.1037/0033-295X.104.3.524.Google Scholar
Lewontin, R.C. (1972) The apportionment of human diversity. In Dobzhansky, T., Hecht, M. K., and Steere, W. C. (Eds.), Evolutionary biology (pp. 381398). New York, NY: Springer. https://doi.org/10.1007/978-1-4684-9063-3_14.CrossRefGoogle Scholar
Li, S., Sun, Y., Liu, S., Sun, Y., Gureckis, T. M., & Bramley, N. R. (2019). Active physical inference via reinforcement learning. Proceedings of the Cognitive Science Society (pp. 21262132). Austin, TX: Cognitive Science Society.Google Scholar
Li, Z., Bramley, N. R., & Gureckis, T. M. (2020). Expectations about future learning influence moment-to-moment feelings of suspense. https://doi.org/10.31234/osf.io/532tw.Google Scholar
Lindley, D. V. (1956). On a measure of the information provided by an experiment. The Annals of Mathematical Statistics, 27(4), 9861005.CrossRefGoogle Scholar
Lookman, T., Balachandran, P. V., Xue, D., & Yuan, R. (2019). Active learning in materials science with emphasis on adaptive sampling using uncertainties for targeted design. NPJ Computational Materials, 5(1), 117. https://doi.org/10.1038/s41524-019-0153-8.Google Scholar
Markant, D., & Gureckis, T. M. (2012). Does the utility of information influence sampling behavior? Proceedings of the 34th Annual Conference of the Cognitive Science Society (pp. 719724). Austin, TX: Cognitive Science Society.Google Scholar
Markant, D. B., & Gureckis, T. M. (2014). Is it better to select or to receive? Learning via active and passive hypothesis testing. Journal of Experimental Psychology: General, 143(1), 94122. https://doi.org/10.1037/a0032108.CrossRefGoogle ScholarPubMed
Meder, B., & Nelson, J. D. (2012). Information search with situation-specific reward functions. Judgment and Decision Making, 7(2), 119148.Google Scholar
Meder, B., Nelson, J. D., Jones, M., & Ruggeri, A. (2019). Stepwise versus globally optimal search in children and adults. Cognition, 191, Article 103965. https://doi.org/10.1016/j.cognition.2019.05.002.Google Scholar
Meder, B., Wu, C. M., Schulz, E., & Ruggeri, A. (2021). Development of directed and random exploration in children. Developmental Science. e13095. https://doi.org/10.1111/desc.13095.CrossRefGoogle Scholar
Meier, K. M., & Blair, M. R. (2013). Waiting and weighting: Information sampling is a balance between efficiency and error-reduction. Cognition, 126(2), 319325. https://doi.org/10.1016/j.cognition.2012.09.014.Google Scholar
Mohamed, T. P., Carbonell, J. G., & Ganapathiraju, M. K. (2010). Active learning for human protein-protein interaction prediction. BMC Bioinformatics, 11, S57. https://doi.org/10.1186/1471-2105-11-S1-S57.CrossRefGoogle ScholarPubMed
Mosher, F. A., & Hornsby, J. R. (1966). On asking questions. In Bruner, J. S., Oliver, R. R., & Greenfield, P. M., et al. (Eds.), Studies in cognitive growth (pp. 86102). Wiley.Google Scholar
Murphy, R. F. (2011). An active role for machine learning in drug development. Nature Chemical Biology, 7(6), 327330. https://doi.org/10.1038/nchembio.576.Google Scholar
Myung, J. I., & Pitt, M. A. (2009). Optimal experimental design for model discrimination. Psychological Review, 116(3), 499518. https://doi.org/10.1037/a0016104.Google Scholar
Najemnik, J., & Geisler, W. (2005). Optimal eye movement strategies in visual search. Nature, 434, 387391. https://doi.org/10.1038/nature03390.CrossRefGoogle ScholarPubMed
Nakamura, K. (2006). Neural representation of information measure in the primate premotor cortex. Journal of Neurophysiology, 96(1), 478485. https://doi.org/10.1152/jn.01326.2005.Google Scholar
Navarro, D. J., & Perfors, A. F. (2011). Hypothesis generation, sparse categories, and the positive test strategy. Psychological Review, 118(1), 120134. https://doi.org/10.1037/a0021110.CrossRefGoogle ScholarPubMed
Nelson, J. D. (2005). Finding useful questions: On Bayesian diagnosticity, probability, impact, and information gain. Psychological Review, 112(4), 979999. https://doi.org/10.1037/0033-295X.112.4.979.CrossRefGoogle ScholarPubMed
Nelson, J. D., & Cottrell, G. W. (2007). A probabilistic model of eye movements in concept formation. Neurocomputing, 70, 22562272. https://doi.org/10.1016/j.neucom.2006.02.026.Google Scholar
Nelson, J. D., Divjak, B., Gudmundsdottir, G., Martignon, L. F., & Meder, B. (2014). Children’s sequential information search is sensitive to environmental probabilities. Cognition, 130(1), 7480.Google Scholar
Nelson, J. D., & McKenzie, C. R. M. (2009). Confirmation bias. In Kattan, M. (Ed.), The Encyclopedia of Medical Decision Making (pp. 167171). Sage.Google Scholar
Nelson, J. D., McKenzie, C. R. M., Cottrell, G. W., & Sejnowski, T. J. (2010). Experience matters: Information acquisition optimizes probability gain. Psychological Science, 21(7), 960969. https://doi.org/10.1177/0956797610372637.Google Scholar
Nelson, J. D., Meder, B., & Jones, M. (2018). Towards a theory of heuristic and optimal planning for sequential information search. PsyArXiv.Google Scholar
Nelson, J. D., Rosenauer, C., Crupi, V., Tentori, K., & Meder, B. (2020). On the likelihood difference heuristic and the objective utility of possible medical tests. Manuscript submitted for publication.Google Scholar
Nelson, J. D., Tenenbaum, J. B., & Movellan, J. R. (2001). Active inference in concept learning. In Moore, J. D & Stenning, K (Eds.), Proceedings of the 23rd Conference of the Cognitive Science Society, 692697.Google Scholar
Nickerson, R. S. (1996). Hempel’s paradox and Wason’s selection task: Logical and psychological puzzles of confirmation. Thinking & Reasoning, 2, 131. https://doi.org/10.1080/135467896394546.CrossRefGoogle Scholar
Oaksford, M., & Chater, N. (1994). A rational analysis of the selection task as optimal data selection. Psychological Review, 101(4), 608631. https://doi.org.10.1037/0033-295X.101.4.608.CrossRefGoogle Scholar
Oaksford, M., & Chater, N. (1996). Rational explanation of the selection task. Psychological Review, 103(2), 381391. https://doi.org/10.1037/0033-295X.103.2.381.Google Scholar
Patil, G. P., & Taillie, C. (1982) Diversity as a concept and its measurement. Journal of the American Statistical Association, 77(379), 548561. https://doi.org/10.1080/01621459.1982.10477845.Google Scholar
Popper, K. R. (1959). The logic of scientific discovery. Hutchinson.Google Scholar
Raiffa, H., & Schlaifer, R. O. (1961). Applied statistical decision theory. Cambridge, MA: Division of Research, Graduate School of Business Administration, Harvard University.Google Scholar
Rényi, A. (1961). On measures of entropy and information. In Neyman, J. (Ed.), Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability I (pp. 547556). University of California Press.Google Scholar
Ruggeri, A., & Lombrozo, T. (2015). Children adapt their questions to achieve efficient search. Cognition, 143, 203216. https://doi.org/10.1016/j.cognition.2015.07.004.CrossRefGoogle ScholarPubMed
Ruggeri, A., Lombrozo, T., Griffiths, T. L., & Xu, F. (2016). Sources of developmental change in the efficiency of information search. Developmental Psychology, 52(12), 21592173. https://doi.org/10.1037/dev0000240.Google Scholar
Ruggeri, A., Sim, Z. L., & Xu, F. (2017). “Why is Toma late to school again?” Preschoolers identify the most informative questions. Developmental Psychology, 53(9), 1620.Google Scholar
Savage, L. J. (1954). The foundations of statistics. Wiley.Google Scholar
Schulz, E., Wu, C. M., Ruggeri, A., & Meder, B. (2019). Searching for rewards like a child means less generalization and more directed exploration. Psychological Science, 30(11), 15611572. https://doi.org/10.1177/0956797619863663.CrossRefGoogle Scholar
Settles, B. (2010). Active learning literature survey. Technical Report, University of Wisconsin-Madison.Google Scholar
Shannon, C. E. (1948). A mathematical theory of communication. The Bell System Technical Journal, 27, 379423. https://doi.org/10.1002/j.1538-7305.1948.tb01338.x.CrossRefGoogle Scholar
Sharma, B., & Mittal, D. (1975). New non–additive measures of entropy for discrete probability distributions. Journal of Mathematical Sciences, 10, 2840.Google Scholar
Sharot, T., & Sunstein, C. R. (2020). How people decide what they want to know. Nature Human Behaviour, 4, 1419. https://doi.org/10.1038/s41562-019-0793-1.Google Scholar
Siegler, R. S. (1977). The twenty questions game as a form of problem solving. Child Development, 395403.Google Scholar
Simpson, E. H. (1949). Measurement of diversity. Nature, 163, 688. https://doi.org/10.1038/163688a0CrossRefGoogle Scholar
Skov, R. B., & Sherman, S. J. (1986). Information-gathering processes: Diagnosticity, hypothesis-confirmatory strategies, and perceived hypothesis confirmation. Journal of Experimental Social Psychology, 103, 278282. https://doi.org/10.1016/j.beproc.2014.01.014.Google Scholar
Slowiaczek, L. M., Klayman, J., Sherman, S. J., & Skov, R. B. (1992). Information selection and use in hypothesis testing: What is a good question, and what is a good answer? Memory & Cognition, 20(4), 392405. https://doi.org/10.3758/BF03210923.Google Scholar
Steyvers, M., Tenenbaum, J. B., Wagenmakers, E. J., & Blum, B. (2003). Inferring causal networks from observations and interventions. Cognitive Science, 27, 453489. https://doi.org/10.1016/S0364-0213(03)00010-7.CrossRefGoogle Scholar
Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. Cambridge, MA: MIT Press.Google Scholar
Tsallis, C. (1988). Possible generalization of Boltzmann-Gibbs statistics. Journal of Statistical Physics, 52(1–2), 479487. https://doi.org/10.1007/BF01016429.Google Scholar
Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases. Science, 185(4157), 11241131. https://doi.org/10.1126/science.185.4157.1124.Google Scholar
Vajda, I. & Zvárová, J. (2007). On generalized entropies, Bayesian decisions, and statistical diversity. Kybernetika, 43(5), 675696.Google Scholar
Wason, P. C. (1960). On the failure to eliminate hypotheses in a conceptual task. Quarterly Journal of Experimental Psychology, 12, 129140. https://doi.org/10.1080/17470216008416717.CrossRefGoogle Scholar
Wason, P. C. (1968). Reasoning about a rule. Quarterly Journal of Experimental Psychology, 20, 273281. https://doi.org/10.1080/14640746808400161.CrossRefGoogle ScholarPubMed
Wells, G. L., & Lindsay, R. C. (1980). On estimating the diagnosticity of eyewitness nonidentifications. Psychological Bulletin, 88(3), 776784. https://doi.org/10.1037/0033-2909.88.3.776.Google Scholar
Wu, C. M., Meder, B., Filimon, F., & Nelson, J. D. (2017). Asking better questions: How presentation formats influence information search. Journal of Experimental Psychology: Learning, Memory, and Cognition, 43(8), 12741297. https://doi.org/10.1037/xlm0000374.Google ScholarPubMed
Wu, C., Schulz, E., Speekenbrink, M., Nelson, J. D., & Meder, B. (2018). Generalization guides human exploration in vast decision spaces. Nature Human Behavior, 2, 915924. https://doi.org/10.1038/s41562-018-0467-4.Google Scholar

References

Attias, H. (2003). Planning by Probabilistic Inference. Paper presented at the Proc. of the 9th Int. Workshop on Artificial Intelligence and Statistics. https://proceedings.mlr.press/r4/attias03a.html.Google Scholar
Auer, P. (2002). Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research, 3(Nov.), 397422.Google Scholar
Barlow, H. (1961). Possible principles underlying the transformations of sensory messages. In Rosenblith, W. (Ed.), Sensory Communication (pp. 217234). MIT Press.Google Scholar
Barlow, H. B. (1974). Inductive inference, coding, perception, and language. Perception, 3, 123134.Google Scholar
Barto, A. G. (2013). Intrinsic motivation and reinforcement learning. In Baldassarre, G & Mirolli, M, Intrinsically motivated learning in natural and artificial systems (pp. 1747). Springer.Google Scholar
Barto, A., Mirolli, M., & Baldassarre, G. (2013). Novelty or Surprise? Frontiers in Psychology, 4. doi:10.3389/fpsyg.2013.00907. Retrieved from www.frontiersin.org/Journal/Abstract.aspx?s=196&name=cognitive_science&ART_DOI=10.3389/fpsyg.2013.00907.Google Scholar
Beal, M. J. (2003). Variational Algorithms for Approximate Bayesian Inference. PhD. Thesis, University College London. www.proquest.com/docview/1775215626?pq-origsite=gscholar&fromopenview=true.Google Scholar
Bellemare, M. G., Srinivasan, S., Ostrovski, G., Schaul, T., Saxton, D., & Munos, R. (2016). Unifying count-based exploration and intrinsic motivation. arXiv preprint arXiv:1606.01868.Google Scholar
Berger, J. O. (2011). Statistical decision theory and Bayesian analysis. Springer.Google Scholar
Blei, D. M., Kucukelbir, A., & McAuliffe, J. D. (2017). Variational inference: A review for statisticians. Journal of the American Statistical Association, 112(518), 859877.CrossRefGoogle Scholar
Botvinick, M., & Toussaint, M. (2012). Planning as inference. Trends in Cognitive Science., 16(10), 485488.Google Scholar
Burda, Y., Edwards, H., Storkey, A., & Klimov, O. (2018). Exploration by random network distillation. arXiv preprint arXiv:1810.12894.Google Scholar
Çatal, O., Wauthier, S., Verbelen, T., De Boom, C., & Dhoedt, B. (2020). Deep active inference for autonomous robot navigation. arXiv preprint arXiv:2003.03220.Google Scholar
Chaloner, K., & Verdinelli, I. (1995). Bayesian experimental design: A review. Statistical Science, 273304.Google Scholar
Cullen, M., Davey, B., Friston, K. J., & Moran, R. J. (2018). Active inference in OpenAI gym: A paradigm for computational investigations into psychiatric illness. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging, 3(9), 809818.Google ScholarPubMed
Da Costa, L., Parr, T., Sajid, N., Veselic, S., Neacsu, V., & Friston, K. (2020). Active inference on discrete state-spaces: A synthesis. Journal of Mathematical Psychology, 99, 102447. Retrieved from www.sciencedirect.com/science/article/pii/S0022249620300857.Google Scholar
Da Costa, L., Sajid, N., Parr, T., Friston, K., & Smith, R. (2020). The relationship between dynamic programming and active inference: The discrete, finite-horizon case. arXiv preprint arXiv:2009.08111.Google Scholar
Fleming, W. H., & Sheu, S. J. (2002). Risk-sensitive control and an optimal investment model II. Annals of Applied Probability, 12(2), 730767. Retrieved from https://projecteuclid.org:443/euclid.aoap/1026915623.Google Scholar
Fountas, Z., Sajid, N., Mediano, P. A., & Friston, K. (2020). Deep active inference agents using Monte-Carlo methods. arXiv preprint arXiv:2006.04176.Google Scholar
Friston, K. J. (2010). The free-energy principle: A unified brain theory? Nature Reviews Neuroscience, 11(2), 127138. http://dx.doi.org/10.1038/nrn2787.Google Scholar
Friston, K. (2019). A free energy principle for a particular physics. arXiv preprint arXiv:1906.10184.Google Scholar
Friston, K., Da Costa, L., Hafner, D., Hesp, C., & Parr, T. (2020). Sophisticated inference. arXiv preprint arXiv:2006.04120.Google Scholar
Friston, K. J., Daunizeau, J., Kilner, J., & Kiebel, S. J. (2010). Action and behavior: A free-energy formulation. Biological Cybernetics, 102(3), 227260.Google Scholar
Friston, K., FitzGerald, T., Rigoli, F., Schwartenbeck, P., O’Doherty, J., & Pezzulo, G. (2016). Active inference and learning. Neuroscience and Biobehavioral Reviews, 68, 862879. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/27375276.Google Scholar
Friston, K., FitzGerald, T., Rigoli, F., Schwartenbeck, P., & Pezzulo, G. (2017). Active inference: A process theory. Neural Computation, 29(1), 149. Retrieved from https://www.ncbi.nlm.nih.gov/pubmed/27870614.Google Scholar
Friston, K. J., Lin, M., Frith, C. D., Pezzulo, G., Hobson, J. A., & Ondobaka, S. (2017). Active inference, curiosity and insight. Neural Computation, 29(10), 26332683. Friston, K. J., Parr, T., & de Vries, B. (2017). The graphical brain: Belief propagation and active inference. Network Neuroscience, 1(4), 381414. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/29417960.Google Scholar
Friston, K. J., Parr, T., Yufik, Y., Sajid, N., Price, C. J., & Holmes, E. (2020). Generative models, linguistic communication and active inference. Neuroscience & Biobehavioral Reviews, 118, 4264. https://doi.org/10.1016/j.neubiorev.2020.07.005.Google Scholar
Friston, K. J., Rigoli, F., Ognibene, D., Mathys, C., Fitzgerald, T., & Pezzulo, G. (2015). Active inference and epistemic value. Cognitive Neuroscience, 6(4), 187224. Retrieved from http://dx.doi.org/10.1080/17588928.2015.1020053.CrossRefGoogle ScholarPubMed
Friston, K. J., Rosch, R., Parr, T., Price, C., & Bowman, H. (2018). Deep temporal models and active inference. Neuroscience and Biobehavioral Reviews, 90, 486501.Google Scholar
Friston, K., Schwartenbeck, P., FitzGerald, T., Moutoussis, M., Behrens, T., & Dolan, R. J. (2014). The anatomy of choice: dopamine and decision-making. Philosophical Transactions of the Royal Society B: Biological Sciences, 369 (1655). Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/25267823.Google Scholar
Gottlieb, J., Oudeyer, P.-Y., Lopes, M., & Baranes, A. (2013). Information-seeking, curiosity, and attention: computational and neural mechanisms. Trends in Cognitive Science, 17(11), 585593. Retrieved from https://www.sciencedirect.com/science/article/pii/S1364661313002052.Google Scholar
Harsanyi, J. C. (1978). Bayesian decision theory and utilitarian ethics. The American Economic Review, 68(2), 223228. Retrieved from www.jstor.org/stable/1816692.Google Scholar
Houthooft, R., Chen, X., Duan, Y., Schulman, J., De Turck, F., & Abbeel, P. (2016). Vime: Variational information maximizing exploration. Advances in Neural Information Processing Systems, 29, 11091117.Google Scholar
Itti, L., & Baldi, P. (2009). Bayesian surprise attracts human attention. Vision Research, 49(10), 12951306.Google Scholar
Jaynes, E. T. (1957). Information theory and statistical mechanics. Physical Review, 106(4), 620.Google Scholar
Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under risk. Econometrica, 47(2), 263291.Google Scholar
Kaplan, R., & Friston, K. J. (2018). Planning and navigation as active inference. Biological Cybernetics, 112(4), 323343.Google Scholar
Laureiro-Martínez, D., Brusoni, S., & Zollo, M. (2010). The neuroscientific foundations of the exploration−exploitation dilemma. Journal of Neuroscience, Psychology, and Economics, 3(2), 95.Google Scholar
Lindley, D. V. (1956). On a measure of the information provided by an experiment. The Annals of Mathematical Statistics, 9861005.Google Scholar
Linsker, R. (1990). Perceptual neural organization: some approaches based on network models and information theory. Annual Review of Neuroscience, 13, 257281.Google Scholar
Millidge, B., Tschantz, A., & Buckley, C. L. (2020). Whence the expected free energy? arXiv preprint arXiv:2004.08128.Google Scholar
Mirza, M. B., Adams, R. A., Mathys, C. D., & Friston, K. J. (2016). Scene construction, visual foraging, and active inference. Frontiers in Computational Neuroscience, 10 (56). Retrieved from http://journal.frontiersin.org/Article/10.3389/fncom.2016.00056/abstract. Mitchell, T., Sacks, J., & Ylvisaker, D. (1994). Asymptotic Bayes criteria for nonparametric response surface design. The Annals of Statistics, 22(2), 634651.Google Scholar
Optican, L., & Richmond, B. J. (1987). Temporal encoding of two-dimensional patterns by single units in primate inferior cortex. II Information theoretic analysis. Journal of Neurophysiology, 57, 132146.Google Scholar
Parr, T. (2019). The computational neurology of active vision. UCL (Unpublished doctoral thesis, University College London). https://discovery.ucl.ac.uk/id/eprint/10084391/Google Scholar
Parr, T., Da Costa, L., & Friston, K. (2020). Markov blankets, information geometry and stochastic thermodynamics. Philosophical Transactions of the Royal Society A, 378(2164), 20190159.Google Scholar
Parr, T., & Friston, K. J. (2019a). Attention or salience? Current Opinion in Psychology, 29, 15.Google Scholar
Parr, T., & Friston, K. J. (2019b). Generalised free energy and active inference. Biological Cybernetics, 113(5–6), 495513.Google Scholar
Parr, T., Markovic, D., Kiebel, S. J., & Friston, K. J. (2019). Neuronal message passing using Mean-field, Bethe, and Marginal approximations. Scientific Reports, 9(1), 1889. Retrieved from https://doi.org/10.1038/s41598-018-38246-3.Google Scholar
Pathak, D., Agrawal, P., Efros, A. A., & Darrell, T. (2017). Curiosity-driven exploration by self-supervised prediction. Paper presented at the International Conference on Machine Learning.Google Scholar
Pukelsheim, F. (2006). Optimal design of experiments: SIAM.Google Scholar
Russo, D., Van Roy, B., Kazerouni, A., Osband, I., & Wen, Z. (2017). A tutorial on Thompson sampling. arXiv preprint arXiv:1707.02038.Google Scholar
Sacks, J., Welch, W. J., Mitchell, T. J., & Wynn, H. P. (1989). Design and analysis of computer experiments. Statistical Science, 4(4), 409423.Google Scholar
Sajid, N., Ball, P. J., Parr, T., & Friston, K. J. (2021). Active inference: Demystified and compared. Neural Computation, 33(3), 674712.Google Scholar
Savage, L. J. (1972). The foundations of statistics: Courier Corporation.Google Scholar
Schmidhuber, J. (1991a). Curious model-building control systems. In Proc. International Joint Conference on Neural Networks, Singapore. IEEE, 2, 14581463. https://mediatum.ub.tum.de/doc/814953/file.pdf.Google Scholar
Schmidhuber, J. (1991b). A possibility for implementing curiosity and boredom in model-building neural controllers. Paper presented at the Proc. of the international conference on simulation of adaptive behavior: From animals to animats. https://mediatum.ub.tum.de/doc/814958/file.pdfGoogle Scholar
Schmidhuber, J. (2006). Developmental robotics, optimal artificial curiosity, creativity, music, and the fine arts. Connection Science, 18(2), 173187. https://doi.org/10.1080/09540090600768658.Google Scholar
Schulz, E., & Gershman, S. J. (2019). The algorithmic architecture of exploration in the human brain. Current Opinion in Neurobiology, 55, 714.Google Scholar
Schwartenbeck, P., Passecker, J., Hauser, T. U., FitzGerald, T. H., Kronbichler, M., & Friston, K. J. (2019). Computational mechanisms of curiosity and goal-directed exploration. eLife, 8, e.41707. https://doi.org/10.7554/eLife.41703.Google Scholar
Shewry, M. C., & Wynn, H. P. (1987). Maximum entropy sampling. Journal of Applied Statistics, 14(2), 165170.Google Scholar
Stone, M. (1959). Application of a measure of information to the design and comparison of regression experiments. The Annals of Mathematical Statistics, 30(1), 5570.Google Scholar
Sun, Y., Gomez, F., & Schmidhuber, J. (2011). Planning to be surprised: Optimal Bayesian exploration in dynamic environments. In Schmidhuber, J., Thórisson, K. R., & Looks, M. (Eds.), Artificial General Intelligence: 4th International Conference, AGI 2011, Mountain View, CA, USA, August 3–6,2011. Proceedings (pp. 4151). Springer.Google Scholar
Sutton, R. S., & Barto, A. G. (1998). Introduction to Reinforcement Learning: MIT Press.Google Scholar
Todorov, E. (2008). General duality between optimal control and estimation. In 2008 47th IEEE Conference on Decision and Control (pp. 42864292). IEEE.Google Scholar
Tschantz, A., Seth, A. K., & Buckley, C. L. (2020). Learning action-oriented models through active inference. PLoS Computational Biology, 16(4), e1007805. Retrieved from https://doi.org/10.1371/journal.pcbi.1007805.Google Scholar
van den Broek, J. L., Wiegerinck, W. A. J. J., & Kappen, H. J. (2010). Risk-sensitive path integral control. UAI, 6, 18.Google Scholar
van der Himst, O., & Lanillos, P. (2020). Deep Active Inference for Partially Observable MDPs. In International Workshop on Active Inference (pp. 6171). Springer.Google Scholar
Vasconcelos, M., Monteiro, T., & Kacelnik, A. (2015). Irrational choice and the value of information. Scientific Reports, 5(1), 13874. Retrieved from https://doi.org/10.1038/srep13874.Google Scholar
Vértes, E., & Sahani, M. (2018). Flexible and accurate inference and learning for deep generative models. arXiv preprint arXiv:1805.11051.Google Scholar
Von Neumann, J., & Morgenstern, O. (1944). Theory of games and economic behavior. Princeton University Press.Google Scholar
Wilson, R. C., Geana, A., White, J. M., Ludvig, E. A., & Cohen, J. D. (2014). Humans use directed and random exploration to solve the explore–exploit dilemma. Journal of Experimental Psychology: General, 143(6), 2074.Google Scholar
Zintgraf, L., Shiarlis, K., Igl, M., Schulze, S., Gal, Y., Hofmann, K., & Whiteson, S. (2019). VariBAD: A very good method for Bayes-adaptive deep RL via meta-learning. arXiv preprint arXiv:1910.08348.Google Scholar

References

Auer, P. (2002). Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research, 3(Nov.), 397422.Google Scholar
Bellman, R. (1957). A Markovian decision process. Journal of Mathematics and Mechanics, 6(5), 679684.Google Scholar
Binz, M., & Endres, D. (2019). Where do heuristics come from? In Proceedings of the 41st Annual Conference of the Cognitive Science Society, (pp. 14021408). Montreal, QB: Cognitive Science Society.Google Scholar
Borji, A., & Itti, L. (2013). Bayesian optimization explains human active search. Advances in Neural Information Processing Systems, 26, 5563.Google Scholar
Bramley, N. R., Dayan, P., Griffiths, T. L., & Lagnado, D. A. (2017). Formalizing Neurath’s ship: Approximate algorithms for online causal learning. Psychological Review, 124(3), 301.Google Scholar
Brändle, F., Wu, C. M., & Schulz, E. (2020). What are we curious about? Trends in Cognitive Sciences, 24(9), 685687.Google Scholar
Chevalier-Boisvert, M., Willems, L., & Pal, S. (2018). Minimalistic gridworld environment for openai gym. https://github.com/maximecb/gym-minigrid. GitHub.Google Scholar
Cohen, J. D., McClure, S. M., & Yu, A. J. (2007). Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration. Philosophical Transactions of the Royal Society B: Biological Sciences, 362(1481), 933942.Google Scholar
Colas, C., Karch, T., Sigaud, O., & Oudeyer, P.-Y. (2020). Intrinsically motivated goal-conditioned reinforcement learning: a short survey. arXiv preprint arXiv:2012.09830.Google Scholar
Daw, N. D., O’Doherty, J. P., Dayan, P., Seymour, B., & Dolan, R. J. (2006). Cortical substrates for exploratory decisions in humans. Nature, 441(7095), 876879.Google Scholar
Findling, C., Skvortsova, V., Dromnelle, R., Palminteri, S., & Wyart, V. (2019). Computational noise in reward-guided learning drives behavioral variability in volatile environments. Nature Neuroscience, 22(12), 20662077.Google Scholar
Frank, M. J., Doll, B. B., Oas-Terpstra, J., & Moreno, F. (2009). Prefrontal and striatal dopaminergic genes predict individual differences in exploration and exploitation. Nature Neuroscience, 12(8), 1062.Google Scholar
Geana, A., Wilson, R. C., Daw, N., & Cohen, J. D. (2016). Boredom, information-seeking and exploration. In A. Papafragou, D. Grodner, D. Mirman, & J. C. Trueswell (Eds.), Proceedings of the 38th Annual Conference of the Cognitive Science Society (pp. 1751–1756). Austin, TX: Cognitive Science Society.Google Scholar
Gershman, S. J. (2018). Deconstructing the human algorithms for exploration. Cognition, 173, 3442.Google Scholar
Griffiths, T. (2014, 12). Manifesto for a new (computational) cognitive revolution. Cognition, 135. https://doi.org/10.1016/j.cognition.2014.11.026.Google Scholar
Hills, T. T., Todd, P. M., Lazer, D., Redish, A. D., Couzin, I. D., Group, C. S. R., et al. (2015). Exploration versus exploitation in space, mind, and society. Trends in Cognitive Sciences, 19(1), 4654.Google Scholar
Houthooft, R., Chen, X., Duan, Y., Schulman, J., De Turck, F., & Abbeel, P. (2016). Vime: Variational information maximizing exploration. Advances in Neural Information Processing Systems, 29, 11091117.Google Scholar
Jaksch, T., Ortner, R., & Auer, P. (2010). Near-optimal regret bounds for reinforcement learning. Journal of Machine Learning Research, 11(4), 15631600.Google Scholar
Jinnai, Y., Park, J. W., Abel, D., & Konidaris, G. (2019). Discovering options for exploration by minimizing cover time. arXiv preprint arXiv:1903.00606.Google Scholar
Kaplan, F., & Oudeyer, P.-Y. (2004). Maximizing learning progress: an internal reward system for development. In Iida, F., Pfeifer, R., Steels, L., & Kuniyoshi, Y. (Eds.), Embodied artificial intelligence (pp. 259270). Springer. https://doi.org/10.1007/978-3-540-27833-7_19.Google Scholar
Kidd, C., Piantadosi, S. T., & Aslin, R. N. (2012). The Goldilocks effect: Human infants allocate attention to visual sequences that are neither too simple nor too complex. PLoS One, 7(5), e36399.Google Scholar
Klyubin, A. S., Polani, D., & Nehaniv, C. L. (2005). All else being equal be empowered. In Capcarrère, M. S., Freitas, A. A., Bentley, P. J., Johnson, C. G., & Timmis, J. (Eds.), Advances in Artificial Life. ECAL 2005. Lecture Notes in Computer Science, vol. 3630. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11553090_75.Google Scholar
Köhler, W. (1925). The mentality of apes (Vol. 74). Paul, K., Trench, Trubner & Company, Limited.Google Scholar
Krebs, J. R., Kacelnik, A., & Taylor, P. (1978). Test of optimal sampling by foraging great tits. Nature, 275(5675), 2731.CrossRefGoogle Scholar
Leibfried, F., Pascual-D´ıaz, S., & Grau-Moya, J. (2019). A unified Bellman optimality principle combining reward maximization and empowerment. Advances in Neural Information Processing Systems, 32, 78697880.Google Scholar
Lopes, M., Lang, T., Toussaint, M., & Oudeyer, P.-Y. (2012). Exploration in model-based reinforcement learning by empirically estimating learning progress. Advances in Neural Information Processing Systems, 25, 206214.Google Scholar
Matusch, B., Ba, J., & Hafner, D. (2020). Evaluating agents without rewards. arXiv preprint arXiv:2012.11538.Google Scholar
Mehlhorn, K., Newell, B. R., Todd, P. M., Lee, M. D., Morgan, K., Braithwaite, V. A., … Gonzalez, C. (2015). Unpacking the exploration–exploitation tradeoff: A synthesis of human and animal literatures. Decision, 2(3), 191.Google Scholar
Mohamed, S., & Rezende, D. J. (2015). Variational information maximisation for intrinsically motivated reinforcement learning. In Advances in neural information processing systems (pp. 21252133).Google Scholar
Nair, A., Pong, V., Dalal, M., Bahl, S., Lin, S., & Levine, S. (2018). Visual reinforcement learning with imagined goals. arXiv preprint arXiv:1807.04742.Google Scholar
Osband, I., Russo, D., & Van Roy, B. (2013). (More) efficient reinforcement learning via posterior sampling. In Advances in neural information processing systems (pp. 30033011).Google Scholar
Oudeyer, P.-Y., & Kaplan, F. (2009). What is intrinsic motivation? A typology of computational approaches. Frontiers in Neurorobotics, 1, 6.Google Scholar
Oudeyer, P.-Y., Kaplan, F., & Hafner, V. V. (2007). Intrinsic motivation systems for autonomous mental development. IEEE Transactions on Evolutionary Computation, 11(2), 265286.Google Scholar
Pong, V., Gu, S., Dalal, M., & Levine, S. (2018). Temporal difference models: Model-free deep RL for model-based control. arXiv preprint arXiv:1802.09081.Google Scholar
Rich, A. S., & Gureckis, T. M. (2018). Exploratory choice reflects the future value of information. Decision, 5(3), 177.Google Scholar
Salge, C., Glackin, C., & Polani, D. (2014). Empowerment–an introduction. In Prokopenko, M. (ed.), Guided self-organization: Inception (pp. 67114). Springer. https://doi.org/10.1007/978-3-642-53734-9_4.CrossRefGoogle Scholar
Sanborn, A. N., & Chater, N. (2016). Bayesian brains without probabilities. Trends in Cognitive Sciences, 20(12), 883893.CrossRefGoogle ScholarPubMed
Schaul, T., Horgan, D., Gregor, K., & Silver, D. (2015). Universal value function approximators. In International conference on machine learning (pp. 13121320). https://proceedings.mlr.press/v37/schaul15.html.Google Scholar
Schmidhuber, J. (1991). Curious model-building control systems. In Proc. international joint conference on neural networks (pp. 14581463). https://doi.org/10.1109/IJCNN.1991.170605.Google Scholar
Schmidhuber, J. (2010). Formal theory of creativity, fun, and intrinsic motivation (1990–2010). IEEE Transactions on Autonomous Mental Development, 2(3), 230247.Google Scholar
Schulz, E., Bhui, R., Love, B. C., Brier, B., Todd, M. T., & Gershman, S. J. (2019). Structured, uncertainty-driven exploration in real-world consumer choice. Proceedings of the National Academy of Sciences, 116(28), 1390313908. https://doi.org/10.1073/pnas.1821028116.Google Scholar
Schulz, E., & Gershman, S. J. (2019). The algorithmic architecture of exploration in the human brain. Current Opinion in Neurobiology, 55, 714.Google Scholar
Speekenbrink, M., & Konstantinidis, E. (2015). Uncertainty and exploration in a restless bandit problem. Topics in Cognitive Science, 7(2), 351367.Google Scholar
Stafford, T., & Dewar, M. (2014). Tracing the trajectory of skill learning with a very large sample of online game players. Psychological Science, 25(2), 511518. https://doi.org/10.1177/0956797613511466.Google Scholar
Steyvers, M., Lee, M. D., & Wagenmakers, E.-J. (2009). A Bayesian analysis of human decision-making on bandit problems. Journal of Mathematical Psychology, 53(3), 168179.Google Scholar
Stojić, H., Analytis, P. P., & Speekenbrink, M. (2015). Human behavior in contextual multi-armed bandit problems. In Noelle, D. C., et al. (Eds.), Proceedings of the 37th Annual Meeting of the Cognitive Science Society (pp. 2290--2295). Cognitive Science Society.Google Scholar
Stojić, H., Schulz, E., P Analytis, P., & Speekenbrink, M. (2020). It’s new, but is it good? How generalization and uncertainty guide the exploration of novel options. Journal of Experimental Psychology: General, 149(10), 1878.Google Scholar
Strehl, A. L., & Littman, M. L. (2008). An analysis of model-based interval estimation for Markov decision processes. Journal of Computer and System Sciences, 74(8), 13091331.Google Scholar
Strens, M. (2000). A Bayesian framework for reinforcement learning. In Proceedings of the Seventeenth International Conference on Machine Learning (ICML-2000), Stanford University, California, June 29–July 2, 2000.(Vol. 2000, pp. 943950).Google Scholar
Sun, Y., Gomez, F., & Schmidhuber, J. (2011). Planning to be surprised: Optimal Bayesian exploration in dynamic environments. In International conference on artificial general intelligence (pp. 4151).Google Scholar
Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. MIT Press.Google Scholar
Sutton, R. S., Modayil, J., Delp, M., Degris, T., Pilarski, P. M., White, A., & Precup, D. (2011). Horde: A scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction. In Proc. of 10th Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS 2011), May, 2–6, 2011, Taipei, Taiwan (pp. 761768).Google Scholar
Thompson, W. R. (1933). On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3/4), 285294.Google Scholar
Whittle, P. (1980). Multi-armed bandits and the Gittins Index. Journal of the Royal Statistical Society: Series B (Methodological), 42(2), 143149.Google Scholar
Wilson, R. C., Bonawitz, E., Costa, V. D., & Ebitz, R. B. (2021). Balancing exploration and exploitation with information and randomization. Current Opinion in Behavioral Sciences, 38, 4956.Google Scholar
Wilson, R. C., Geana, A., White, J. M., Ludvig, E. A., & Cohen, J. D. (2014). Humans use directed and random exploration to solve the explore–exploit dilemma. Journal of Experimental Psychology: General, 143(6), 2074.Google Scholar
Wilson, R. C., Shenhav, A., Straccia, M., & Cohen, J. D. (2019). The eighty five percent rule for optimal learning. Nature Communications, 10(1), 19.Google Scholar
Wimmer, G. E., Daw, N. D., & Shohamy, D. (2012). Generalization of value in reinforcement learning by humans. European Journal of Neuroscience, 35(7), 10921104.Google Scholar
Wu, C. M., Schulz, E., Garvert, M. M., Meder, B., & Schuck, N. W. (2020). Similarities and differences in spatial and non-spatial cognitive maps. PLoS Computational Biology, 16(9), e1008149.Google Scholar
Wu, C. M., Schulz, E., Speekenbrink, M., Nelson, J. D., & Meder, B. (2017). Mapping the unknown: The spatially correlated multi-armed bandit. bioRxiv, 106286.Google Scholar
Wu, C. M., Schulz, E., Speekenbrink, M., Nelson, J. D., & Meder, B. (2018). Generalization guides human exploration in vast decision spaces. Nature Human Behaviour, 2(12), 915924.Google Scholar
Zhang, S., & Yu, A. J. (2013). Forgetful Bayes and myopic planning: Human learning and decision-making in a bandit setting. In NIPS (pp. 26072615). https://proceedings.neurips.cc/paper/2013/file/6c14da109e294d1e8155be8aa4b1ce8e-Paper.pdf.Google Scholar
Zheng, Z., Oh, J., Hessel, M., Xu, Z., Kroiss, M., Van Hasselt, H., … Singh, S. (2020, 13–18 Jul). What can learned intrinsic rewards capture? In Duamé . III, H & Singh, A. (Eds.), Proceedings of the 37th international conference on machine learning (Vol. 119, pp. 1143611446). PMLR.Google Scholar
Zhu, Y., Mottaghi, R., Kolve, E., Lim, J. J., Gupta, A., Fei-Fei, L., & Farhadi, A. (2017). Target-driven visual navigation in indoor scenes using deep reinforcement learning. In 2017 IEEE international conference on robotics and automation (ICRA) (pp. 33573364), Singapore.Google Scholar

References

Apperly, I. (2010). Mindreaders: The cognitive basis of “theory of mind.” Psychology Press.Google Scholar
Atkisson, C., O’Brien, M. J., & Mesoudi, A. (2012). Adult learners in a novel environment use prestige-biased social learning. Evolutionary Psychology: An International Journal of Evolutionary Approaches to Psychology and Behavior, 10(3), 519537.Google Scholar
Baker, C. L., Jara-Ettinger, J., Saxe, R., & Tenenbaum, J. B. (2017). Rational quantitative attribution of beliefs, desires and percepts in human mentalizing. Nature Human Behaviour, 1(4), 110.Google Scholar
Bandura, A. (1962). Social learning through imitation. Nebraska Symposium on Motivation, 330, 211274.Google Scholar
Bhui, R., Lai, L., & Gershman, S. J. (2021). Resource-rational decision making. Current Opinion in Behavioral Sciences, 41, 1521.Google Scholar
Botvinick, M., & Weinstein, A. (2014). Model-based hierarchical reinforcement learning and human action control. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 369(1655). https://doi.org/10.1098/rstb.2013.0480.Google Scholar
Boyd, R., & Richerson, P. J. (1988). Culture and the Evolutionary Process. University of Chicago Press.Google Scholar
Catmur, C., Walsh, V., & Heyes, C. (2009). Associative sequence learning: The role of experience in the development of imitation and the mirror system. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 364(1528), 23692380.Google Scholar
Charpentier, C. J., Iigaya, K., & O’Doherty, J. P. (2020). A neuro-computational account of arbitration between choice imitation and goal emulation during human observational learning. Neuron, 106(4), 687–699.e7.Google Scholar
Cogliati Dezza, I., Cleeremans, A., & Alexander, W. (2019). Should we control? The interplay between cognitive control and information integration in the resolution of the exploration-exploitation dilemma. Journal of Experimental Psychology. General, 148(6), 977993.Google Scholar
Collette, S., Pauli, W. M., Bossaerts, P., & O’Doherty, J. (2017). Neural computations underlying inverse reinforcement learning in the human brain. eLife, 6. https://doi.org/10.7554/eLife.29718.Google Scholar
Cushman, F. (2020). Rationalization is rational. The Behavioral and Brain Sciences, 43, e28.Google Scholar
Cushman, F., & Morris, A. (2015). Habitual control of goal selection in humans. Proceedings of the National Academy of Sciences of the United States of America, 112(45), 1381713822.Google Scholar
Dasgupta, I., & Gershman, S. J. (2021). Memory as a computational resource. Trends in Cognitive Sciences, 25(3), 240251.Google Scholar
Dasgupta, I., Schulz, E., Goodman, N. D., & Gershman, S. J. (2018). Remembrance of inferences past: Amortization in human hypothesis generation. Cognition, 178, 6781.Google Scholar
Daw, N. D., Niv, Y., & Dayan, P. (2005). Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nature Neuroscience, 8(12), 17041711.Google Scholar
Derex, M., Bonnefon, J.-F., Boyd, R., & Mesoudi, A. (2019). Causal understanding is not necessary for the improvement of culturally evolving technology. Nature Human Behaviour, 3(5), 446452.Google Scholar
Dezfouli, A., & Balleine, B. W. (2013). Actions, action sequences and habits: Evidence that goal-directed and habitual action control are hierarchically organized. PLoS Computational Biology, 9(12), e1003364.Google Scholar
Foster, D. J. (2017). Replay comes of age. Annual Review of Neuroscience, 40, 581602.Google Scholar
Gergely, G., & Csibra, G. (2003). Teleological reasoning in infancy: The naıve theory of rational action. In Trends in Cognitive Sciences (Vol. 7, Issue 7, pp. 287292). https://doi.org/10.1016/s1364-6613(03)00128-1.Google Scholar
Gershman, S. J. (2020). Origin of perseveration in the trade-off between reward and complexity. Cognition, 204, 104394.Google Scholar
Gershman, S. J., Horvitz, E. J., & Tenenbaum, J. B. (2015). Computational rationality: A converging paradigm for intelligence in brains, minds, and machines. Science, 349(6245), 273278.Google Scholar
Gershman, S. J., Markman, A. B., & Otto, A. R. (2014). Retrospective revaluation in sequential decision making: A tale of two systems. Journal of Experimental Psychology. General, 143(1), 182194.Google Scholar
Gigerenzer, G., & Gaissmaier, W. (2011). Heuristic decision making. Annual Review of Psychology, 62, 451482.Google Scholar
Gweon, H. (2021). Inferential Social Learning: How humans learn from others and help others learn. https://doi.org/10.31234/osf.io/8n34t.Google Scholar
Hayden, B. Y., & Niv, Y. (2021). The case against economic values in the orbitofrontal cortex (or anywhere else in the brain). Behavioral Neuroscience, 135(2), 192201.Google Scholar
Henrich, J. (2017). The Secret of Our Success: How Culture Is Driving Human Evolution, Domesticating Our Species, and Making Us Smarter. Princeton University Press.Google Scholar
Henrich, J., & Gil-White, F. J. (2001). The evolution of prestige: Freely conferred deference as a mechanism for enhancing the benefits of cultural transmission. Evolution and Human Behavior: Official Journal of the Human Behavior and Evolution Society, 22(3), 165196.Google Scholar
Herrmann, E., Call, J., Hernàndez-Lloreda, M. V., Hare, B., & Tomasello, M. (2007). Humans have evolved specialized skills of social cognition: the cultural intelligence hypothesis. Science, 317(5843), 13601366.Google Scholar
Heyes, C. (2001). Causes and consequences of imitation. Trends in Cognitive Sciences, 5(6), 253261.Google Scholar
Heyes, C. (2002). Transformational and associative theories of imitation. Imitation in Animals and Artifacts, 607, 501523.Google Scholar
Heyes, C. (2018). Cognitive Gadgets: The Cultural Evolution of Thinking. Harvard University Press.Google Scholar
Ho, M. K., MacGlashan, J., Littman, M. L., & Cushman, F. (2017). Social is special: A normative framework for teaching with and learning from evaluative feedback. Cognition, 167, 91106.Google Scholar
Hoppitt, W., & Laland, K. N. (2013). Social Learning: An Introduction to Mechanisms, Methods, and Models. Princeton University Press.Google Scholar
Horner, V., & Whiten, A. (2005). Causal knowledge and imitation/emulation switching in chimpanzees (Pan troglodytes) and children (Homo sapiens). Animal Cognition, 8(3), 164181.Google Scholar
Huys, Q. J. M., Lally, N., Faulkner, P., Eshel, N., Seifritz, E., Gershman, S. J., Dayan, P., & Roiser, J. P. (2015). Interplay of approximate planning strategies. Proceedings of the National Academy of Sciences of the United States of America, 112(10), 30983103.Google Scholar
Jara-Ettinger, J. (2019). Theory of mind as inverse reinforcement learning. Current Opinion in Behavioral Sciences, 29, 105110.Google Scholar
Jara-Ettinger, J., Gweon, H., Schulz, L. E., & Tenenbaum, J. B. (2016). The Naïve Utility Calculus: Computational Principles Underlying Commonsense Psychology. Trends in Cognitive Sciences, 20(8), 589604.Google Scholar
Jara-Ettinger, J., Gweon, H., Tenenbaum, J. B., & Schulz, L. E. (2015). Children’s understanding of the costs and rewards underlying rational action. Cognition, 140, 1423.Google Scholar
Jern, A., Lucas, C. G., & Kemp, C. (2017). People learn other people’s preferences through inverse decision-making. Cognition, 168, 4664.Google Scholar
Jiménez, Á. V., & Mesoudi, A. (2019). Prestige-biased social learning: Current evidence and outstanding questions. Palgrave Communications, 5(1), 20.Google Scholar
Keramati, M., Smittenaar, P., Dolan, R. J., & Dayan, P. (2016). Adaptive integration of habits into depth-limited planning defines a habitual-goal-directed spectrum. Proceedings of the National Academy of Sciences of the United States of America, 113(45), 1286812873.Google Scholar
Kool, W., Cushman, F. A., & Gershman, S. J. (2018). Competition and cooperation between multiple reinforcement learning systems. In Morris, R., Bornstein, A., & Shenhav, A. (Eds.), Goal-directed decision making (pp. 153178). Academic Press.Google Scholar
Kool, W., Gershman, S. J., & Cushman, F. A. (2017). Cost-benefit arbitration between multiple reinforcement-learning systems. Psychological Science, 28(9), 13211333.Google Scholar
Kool, W., Gershman, S. J., & Cushman, F. A. (2018). Planning complexity registers as a cost in metacontrol. Journal of Cognitive Neuroscience, 30(10), 13911404.Google Scholar
Legare, C. H., & Nielsen, M. (2015). Imitation and innovation: The dual engines of cultural learning. Trends in Cognitive Sciences, 19(11), 688699.Google Scholar
Lieder, F., & Griffiths, T. L. (2020). Resource-rational analysis: Understanding human cognition as the optimal use of limited computational resources. In Behavioral and Brain Sciences (Vol. 43). https://doi.org/10.1017/s0140525x1900061x.Google Scholar
Liu, S., Brooks, N. B., & Spelke, E. S. (2019). Origins of the concepts cause, cost, and goal in prereaching infants. Proceedings of the National Academy of Sciences of the United States of America, 116(36), 1774717752.Google Scholar
Lyons, D. E., Young, A. G., & Keil, F. C. (2007). The hidden structure of overimitation. Proceedings of the National Academy of Sciences of the United States of America, 104(50), 1975119756.Google Scholar
Maisto, D., Friston, K., & Pezzulo, G. (2019). Caching mechanisms for habit formation in active inference. Neurocomputing, 359, 298314.Google Scholar
McGuigan, N., Whiten, A., Flynn, E., & Horner, V. (2007). Imitation of causally opaque versus causally transparent tool use by 3- and 5-year-old children. Cognitive Development, 22(3), 353364.Google Scholar
Miller, K. J., Botvinick, M. M., & Brody, C. D. (2017). Dorsal hippocampus contributes to model-based planning. Nature Neuroscience. https://doi.org/10.1101/096594.Google Scholar
Miller, K. J., Shenhav, A., & Ludvig, E. A. (2019). Habits without values. Psychological Review, 126(2), 292311.Google Scholar
Miller, N. E., & Dollard, J. (1941). Social Learning and Imitation (Vol. 55). Yale University Press.Google Scholar
Momennejad, I., Otto, A. R., Daw, N. D., & Norman, K. A. (2018). Offline replay supports planning in human reinforcement learning. eLife, 7. https://doi.org/10.7554/eLife.32548.Google Scholar
Momennejad, I., Russek, E. M., Cheong, J. H., Botvinick, M. M., Daw, N. D., & Gershman, S. J. (2017). The successor representation in human reinforcement learning. Nature Human Behaviour, 1(9), 680692.Google Scholar
Morin, O. (2016). How Traditions Live and Die. Oxford University Press.Google Scholar
Morris, A., & Cushman, F. (2018). A common framework for theories of norm compliance. Social Philosophy & Policy, 35(1), 101127.Google Scholar
Najar, A., Bonnet, E., Bahrami, B., & Palminteri, S. (2020). The actions of others act as a pseudo-reward to drive imitation in the context of social reinforcement learning. PLoS Biology, 18(12), e3001028.Google Scholar
O’Donnell, T. J. (2015). Productivity and Reuse in Language: A Theory of Linguistic Computation and Storage. MIT Press.Google Scholar
Otto, A. R., Gershman, S. J., Markman, A. B., & Daw, N. D. (2013). The curse of planning: Dissecting multiple reinforcement-learning systems by taxing the central executive. Psychological Science, 24(5), 751761.Google Scholar
Otto, A. R., Raio, C. M., Chiang, A., Phelps, E. A., & Daw, N. D. (2013). Working-memory capacity protects model-based learning from stress. Proceedings of the National Academy of Sciences of the United States of America, 110(52), 2094120946.Google Scholar
Rendell, L., Boyd, R., Cownden, D., Enquist, M., Eriksson, K., Feldman, M. W., … & Laland, K. N. (2010). Why copy others? Insights from the social learning strategies tournament. Science, 328(5975), 208213.Google Scholar
Rozenblit, L., & Keil, F. (2002). The misunderstood limits of folk science: An illusion of explanatory depth. Cognitive Science, 26(5), 521562.Google Scholar
Russek, E. M., Momennejad, I., Botvinick, M. M., Gershman, S. J., & Daw, N. D. (2017). Predictive representations can link model-based reinforcement learning to model-free mechanisms. PLoS Computational Biology, 13(9), e1005768.Google Scholar
Scott-Phillips, T. C. (2017). A (simple) experimental demonstration that cultural evolution is not replicative, but reconstructive – and an explanation of why this difference matters. Journal of Cognition and Culture, 17(1–2), 111.Google Scholar
Shafto, P., Goodman, N. D., & Frank, M. C. (2012). Learning from others: The consequences of psychological reasoning for human learning. Perspectives on Psychological Science: A Journal of the Association for Psychological Science, 7(4), 341351.Google Scholar
Shenhav, A., Botvinick, M. M., & Cohen, J. D. (2013). The expected value of control: An integrative theory of anterior cingulate cortex function. Neuron, 79(2), 217240.Google Scholar
Skinner, B. F. (1950). Are theories of learning necessary? Psychological Review, 57(4), 193216.Google Scholar
Solway, A., & Botvinick, M. M. (2012). Goal-directed decision making as probabilistic inference: A computational framework and potential neural correlates. Psychological Review, 119(1), 120154.Google Scholar
Solway, A., & Botvinick, M. M. (2015). Evidence integration in model-based tree search. Proceedings of the National Academy of Sciences of the United States of America, 112(37), 1170811713.Google Scholar
Solway, A., Diuk, C., Córdova, N., Yee, D., Barto, A. G., Niv, Y., & Botvinick, M. M. (2014). Optimal behavioral hierarchy. PLoS Computational Biology, 10(8), e1003779.Google Scholar
Sperber, D. (2006). Why a deep understanding of cultural evolution is incompatible with shallow psychology. In Enfield, N. J. & Levinson, Stephen C. (Ed.), Roots of human sociality (pp. 431449). Routledge.Google Scholar
Strachan, J., Curioni, A., Constable, M., Knoblich, G., & Charbonneau, M. (2020). A methodology for distinguishing copying and reconstruction in cultural transmission episodes. In Denison, S, Mack, M, Xu, Y, Yang, A and Armstrong, C. B (Eds.), Proceedings of the 42nd Annual Conference of the Cognitive Science Society. https://researchportal.northumbria.ac.uk/ws/files/32896647/0831.pdf.Google Scholar
Sutton, R. S., & Barto, A. G. (2018). Reinforcement Learning, second edition: An Introduction. MIT Press.Google Scholar
Tennie, C., Call, J., & Tomasello, M. (2009). Ratcheting up the ratchet: On the evolution of cumulative culture. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 364(1528), 24052415.Google Scholar
Thorndike, E. L. (1932). The fundamentals of learning. https://psycnet.apa.org/record/2006-04535-000.Google Scholar
Tolman, E. C. (1948). Cognitive maps in rats and men. Psychological Review (Vol. 55, Issue 4, pp. 189208). https://doi.org/10.1037/h0061626.Google Scholar
Tomasello, M. (1996). Do apes ape. In Heyes, C. M & Galef, B. G, Jr. (Eds.), Social Learning in Animals: The Roots of Culture, (pp. 319346). Academic Press. https://doi.org/10.1016/B978-012273965-1/50016-9.Google Scholar
Tomasello, M., Carpenter, M., Call, J., Behne, T., & Moll, H. (2005). Understanding and sharing intentions: The origins of cultural cognition. The Behavioral and Brain Sciences, 28(5), 675691; discussion 691735.Google Scholar
Tomasello, M., Davis-Dasilva, M., Camak, L., & Bard, K. (1987). Observational learning of tool-use by young chimpanzees. Human Evolution, 2(2), 175183.Google Scholar
Vélez, N., & Gweon, H. (2021). Learning from other minds: An optimistic critique of reinforcement learning models of social learning. Current Opinion in Behavioral Sciences, 38, 110115.Google Scholar
Vikbladh, O. M., Meager, M. R., King, J., Blackmon, K., Devinsky, O., Shohamy, D., Burgess, N., & Daw, N. D. (2019). Hippocampal contributions to model-based planning and spatial memory. Neuron, 102(3), 683–693.e4.Google Scholar
Whiten, A., & Ham, R. (1992). Kingdom: Reappraisal of a century of research. Advances in the Study of Behavior, 21, 239.Google Scholar
Wu, C. M., Schulz, E., Gerbaulet, K., Pleskac, T. J., & Speekenbrink, M. (2021). Time to explore: Adaptation of exploration under time pressure. PsyArXiv. https://doi.org/10.31234/osf.io/dsw7q.Google Scholar
Zaki, J., Schirmer, J., & Mitchell, J. P. (2011). Social influence modulates the neural computation of value. Psychological Science, 22(7), 894900.Google Scholar

Save book to Kindle

To save this book to your Kindle, first ensure [email protected] is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×