Hostname: page-component-78c5997874-s2hrs Total loading time: 0 Render date: 2024-11-17T18:22:03.415Z Has data issue: false hasContentIssue false

Quo vadis, planning?

Published online by Cambridge University Press:  23 September 2024

Jacques Pesnot-Lerousseau
Affiliation:
Institute for Language, Communication, and the Brain, Aix-Marseille Univ, Marseille, France [email protected] Aix Marseille Univ, Inserm, INS, Inst Neurosci Syst, Marseille, France
Christopher Summerfield*
Affiliation:
Department of Experimental Psychology, University of Oxford, Oxford, UK [email protected] https://humaninformationprocessing.com/
*
*Corresponding author.

Abstract

Deep meta-learning is the driving force behind advances in contemporary AI research, and a promising theory of flexible cognition in natural intelligence. We agree with Binz et al. that many supposedly “model-based” behaviours may be better explained by meta-learning than by classical models. We argue that this invites us to revisit our neural theories of problem solving and goal-directed planning.

Type
Open Peer Commentary
Copyright
Copyright © The Author(s), 2024. Published by Cambridge University Press

The most impressive feats of natural intelligence are the most unfathomable. Caledonian crows fashion hooks to retrieve grubs, honey badgers build ladders to escape from enclosures, and humans have worked out how to split the atom (de Waal, Reference de Waal2016). Humans and other animals are capable of remarkable feats of problem solving in open-ended environments, but we lack computational theories of how this might be achieved (Summerfield, Reference Summerfield2022). In the target article, Binz et al. introduce neuroscientists to an exciting new tool: Deep meta-learning. This computational approach provides an interesting candidate solution for some of nature's most startling and puzzling behaviours.

Across the twentieth century, superlative intelligence was synonymous with a capacity for planning, so early AI researchers believed that if a machine ever vanquished a human at chess, then AI would have been solved. Classical models conceive of the world as a list of states and their transition probabilities; planning requires efficient search, or mental exploration of possible pathways to reach a goal. Neuroscientists still lean heavily on these classical models to understand how rodents and primates solve problems like navigating to a spatial destination, assuming that these rely on “model-based” search or forms of offline rumination (Daw & Dayan, Reference Daw and Dayan2014). However, in contemporary machine learning, new explanations of complex sequential behaviours are emerging.

Today – nearly three decades since the first electronic chess grandmaster – AI research is dominated by deep network models, which exploit massive training datasets to learn complex functions mapping inputs onto outputs. In fact, in machine learning, explicitly model-based solutions to open-ended problems have not lived up to their promise. To give one example: In 1997, the chess program Deep Blue defeated the world champion Garry Kasparov using tree search with an alpha–beta search algorithm, ushering in an era where computers played stronger chess than people. In 2017, DeepMind's hybrid network AlphaZero defeated chess computer champion Stockfish by augmenting the search algorithm (Monte Carlo Tree Search) with a deep neural network that learned from (self-)play to evaluate board positions (Silver et al., Reference Silver, Hubert, Schrittwieser, Antonoglou, Lai, Guez and Hassabis2018). In early 2024, performance comparable to the best human players was achieved using a deep network alone without search, thanks to computational innovation (transformer networks) and increasing scale (millions of parameters). AI research has implied that when big brains are exposed to big data, explicit forms of lookahead play a limited role in their success. The authors encapsulate this view with a quote attributed to 1920s grandmaster José Raúl Capablanca: “I see only one move ahead, but it is always the correct one” (Ruoss et al., Reference Ruoss, Delétang, Medapati, Grau-Moya, Wenliang, Catt and Genewein2024).

Deep meta-learning applies powerful function approximation to sequential decision problems, where optimal policies may involve forms of exploration or active hypothesis testing to meet long-term objectives. In the domain of reinforcement learning (RL), deep meta-RL can account for human behaviours on benchmark problems thought to tap into model-based inference or planning, such as the “two-step” decision task, without invoking the need for search. This is because a deep neural network equipped with a stateful activation memory, and meta-trained to a wide range of sequential decision problems, can learn a policy that is intrinsically cognitively flexible. It learns to react on the fly to the twists and turns of novel sequential environments, and thus produce the sorts of behaviours that were previously thought to be possible only with model-based forms of inference. Paradoxically, although “meta-learning” means “learning to learn,” inner loop learning can occur in “frozen” networks – those without parameter updates. This offers a plausible model of how recurrent neural systems for memory and control, housed in prefrontal cortex, allow us to solve problems we have never seen before without explicit forms of search (Wang et al., Reference Wang, Kurth-Nelson, Kumaran, Tirumala, Soyer, Leibo and Botvinick2018).

In AI research today, the most successful deep meta-learning systems are large transformer-based networks that are trained to complete sequences of tokenised natural language (Large Language Models or LLMs). These networks learn semantic and syntactic patterns that allow them to solve a very open-ended problem – constructing a relevant, coherent sentence. Researchers working with LLMs today call deep meta-learning “in context learning” because instead of using recurrent memory, transformers are purely feedforward networks that rely on autoregression – past outputs are fed back in as inputs, providing a context on which to condition the generative process. Although transformers do not resemble plausible neural algorithms, their striking success has opened up new questions concerning neural computation. For example, in-context learning proceeds faster when exemplar ordering is structured rather than random, like human learning but unlike traditional in-weight learning (Chan et al., Reference Chan, Santoro, Lampinen, Wang, Singh, Richemond and Hill2022b), and “in-context learning” may be better suited to rule learning than in-weight learning (Chan et al., Reference Chan, Dasgupta, Kim, Kumaran, Lampinen and Hill2022a). Meta-learning thus offers new tools for psychologists and neuroscientists interested in biological learning and memory in natural agents.

Binz et al. argue that we should take deep meta-learning seriously as an alternative to Bayesian decision theory. We would go further, arguing that meta-learning is a candidate general theory for flexible cognition. It explains why executive function improves dramatically with experience (Ericsson & Charness, Reference Ericsson and Charness1994). Unlike Deep Blue, human world chess no.1 Magnus Carlsen has improved since his first game at the age of five. Purely search-based accounts flexible cognition are obliged to propose that performance will plateau as soon as the transition function (e.g., the rules of chess) are fully mastered, or else posit unexplained ways in which search policies deepen or otherwise mutate with practice (Van Opheusden et al., Reference Van Opheusden, Kuperwajs, Galbiati, Bnaya, Li and Ma2023). In a world where states are heterogeneous and noisy, deep meta-learning explains how we can generalise sequential behaviours to novel states. In a world where speed and processing power are at a premium, meta-learning shifts the burden of inference to the training period, allowing for fast and efficient online computation. Meta-learning is a general theory of natural intelligence that is – more than classical counterpart – fit for the real world.

Undoubtedly, humans and other animals do engage in explicit forms of planning, especially when the stakes are high. But many sequential behaviours that are thought to index this ability may rely more on deep meta-learning than classical planning.

Financial support

This work was supported by Fondation Pour l'Audition FPA RD-2021-2 (J. P.-L.), Institute for Language, Communication, and the Brain ILCB (J. P.-L.) and Wellcome Trust Discovery Award (227928/Z/23/Z) to (C. S.).

Competing interests

None.

References

Chan, S. C. Y., Dasgupta, I., Kim, J., Kumaran, D., Lampinen, A. K., & Hill, F. (2022a). Transformers generalize differently from information stored in context vs in weights. https://doi.org/10.48550/ARXIV.2210.05675CrossRefGoogle Scholar
Chan, S. C. Y., Santoro, A., Lampinen, A. K., Wang, J. X., Singh, A., Richemond, P. H., … Hill, F. (2022b). Data distributional properties drive emergent in-context learning in transformers. https://doi.org/10.48550/ARXIV.2205.05055CrossRefGoogle Scholar
Daw, N. D., & Dayan, P. (2014). The algorithmic anatomy of model-based evaluation. Philosophical Transactions of the Royal Society B, 369, 20130478. https://doi.org/10.1098/rstb.2013.0478CrossRefGoogle ScholarPubMed
de Waal, F. (2016). Are we smart enough to know how smart animals are? W. W. Norton & Company.Google Scholar
Ericsson, K. A., & Charness, N. (1994). Expert performance: Its structure and acquisition. American Psychologist, 49, 725747. https://doi.org/10.1037/0003-066X.49.8.725CrossRefGoogle Scholar
Ruoss, A., Delétang, G., Medapati, S., Grau-Moya, J., Wenliang, L. K., Catt, E., … Genewein, T. (2024). Grandmaster-level chess without search.Google Scholar
Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., … Hassabis, D. (2018). A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science, 362, 11401144. https://doi.org/10.1126/science.aar6404CrossRefGoogle ScholarPubMed
Summerfield, C. (2022). Natural general intelligence: How understanding the brain can help us build A1. Oxford University Press.CrossRefGoogle Scholar
Van Opheusden, B., Kuperwajs, I., Galbiati, G., Bnaya, Z., Li, Y., & Ma, W. J. (2023). Expertise increases planning depth in human gameplay. Nature, 618, 10001005. https://doi.org/10.1038/s41586-023-06124-2CrossRefGoogle ScholarPubMed
Wang, J. X., Kurth-Nelson, Z., Kumaran, D., Tirumala, D., Soyer, H., Leibo, J.Z., … Botvinick, M. (2018). Prefrontal cortex as a meta-reinforcement learning system. Nature Neuroscience, 21, 860868. https://doi.org/10.1038/s41593-018-0147-8CrossRefGoogle ScholarPubMed