Binz et al. craft a comprehensive outline for advancing meta-learning (MetaL) on the basis of several arguments concerning the tractability of optimal learning algorithms, manipulation of complexity, and integration into the rational aspects of cognition, all seen as basic requirements for a domain-general model of cognition. Architectural features include an inductive process from experience driven by repetitive interaction with the environment, necessitating (i) an inner loop of “base learning,” and (ii) an outer loop (or MetaL) process through which the system is effectively trained by the environment to ameliorate its inner loop learning algorithms. A key aspect of the model is its dependence on the relation between the typical duration of a (general, MetaL) problem-solving episode and the typical duration of a (particular, learned) solution.
While Binz et al. focus on MetaL as a practical methodology for modeling human cognition, it is also interesting to ask how MetaL as Binz et al. describe it, fits into the conceptual framework of cognition in general, and also to ask how it applies both to organisms other than humans and to artificial (or hybrid) systems operating in task environments very different from the human task environment. From a broad perspective, MetaL is one function of metacognition (e.g., Cox, Reference Cox2005; Flavell, Reference Flavell1979; Shea & Frith, Reference Shea and Frith2019). Both MetaL and metacognition more generally engage memory and attention as they are neurophysiologically enacted by brain regions including the default mode network (Glahn et al., Reference Glahn, Winkler, Kochunov and Blangero2010), as reviewed for the two theories in Wang (Reference Wang2021) and Kuchling, Fields, and Levin (Reference Kuchling, Fields and Levin2022), respectively.
When MetaL is viewed as implemented by a metaprocessor that is a proper component of a larger cognitive system, one can ask explicitly about the metaprocessor's task environment and how it relates to the larger system's task environment. MetaL operates in a task environment of learning algorithms and outcomes, or equivalently, a task environment of metaparameters and test scores. How the latter are measured is straightforward for a human modeler employing MetaL as a methodology, but is less straightforward when an explicit system-scale architecture must be specified. The question in this case becomes that of how the object-level components of a system use the feedback received from the external environment to train the metaprocessor. The answer cannot, on pain of infinite regress, be MetaL. The relative inflexibility of object-level components as “trainers” of their associated metaprocessors effectively bakes in some level of non-optimality in any multilayer system.
Binz et al. emphasize that MetaL operates on a longer timescale than object-level learning. Given a task environment that imposes selective pressures with different timescales, natural selection will drive systems toward layered architectures that exhibit MetaL (Kuchling et al., Reference Kuchling, Fields and Levin2022). Indeed the need for a “learning to learn” capability has long been emphasized in the active-inference literature (e.g., Friston et al., Reference Friston, FitzGerald, Rigoli, Schwartenbeck, O'Doherty and Pezzulo2016). Active inference under the free-energy principle (FEP) is in an important sense “just physics” (Friston, Reference Friston2019; Friston et al., Reference Friston, Da Costa, Sakthivadivel, Heins, Pavliotis, Ramstead and Parr2023; Ramstead et al., Reference Ramstead, Sakthivadivel, Heins, Koudahl, Millidge, Da Costa and Friston2022); indeed the FEP itself is just a classical limit of the principle of unitarity, that is, of conservation of information (Fields et al., Reference Fields, Fabrocini, Friston, Glazebrook, Hazan, Levin and Marcianò2023; Fields, Friston, Glazebrook, & Levin, Reference Fields, Friston, Glazebrook and Levin2022). One might expect, therefore, that MetaL as defined by Binz et al. is not just useful, but ubiquitous in physical systems with sufficient degrees of freedom. As this is at bottom a question of mathematics, testing it does not require experimental investigation.
What does call out for experimental investigation is the extent to which MetaL can be identified in systems much simpler than humans. Biochemical pathways can be trained, via reinforcement learning, to occupy different regions of their attractor landscapes (Biswas, Manika, Hoel, & Levin, Reference Biswas, Manika, Hoel and Levin2021, Reference Biswas, Clawson and Levin2022). Do sufficiently complex biochemical networks that operate on multiple timescales exhibit MetaL? Environmental exploration and learning are ubiquitous throughout phylogeny (Levin, Reference Levin2022, Reference Levin2023); is MetaL equally ubiquitous? Learning often amounts to changing the salience distribution over inputs, or in Bayesian terms, adjusting precision assignments to priors. To what extent can we describe the implementation of MetaL by organisms in terms of adjustments of sensitivity/salience landscapes – and hence attractor landscapes – on the various spaces that compose their umwelts?
As Binz et al. point out, in the absence of a mechanism for concrete mathematical analysis, MetaL forsakes interpretable analytic solutions and hence generates an “explanation problem” (cf. Samek, Montavon, Lapuschkin, Anders, & Müller, Reference Samek, Montavon, Lapuschkin, Anders and Müller2021). As in the case of deep AI systems more generally, experimental techniques from cognitive psychology may be the most productive approach to this problem for human-like systems (Taylor & Taylor, Reference Taylor and Taylor2021). Relevant to this is an associated spectrum of ideas, including how problem solving is innately perceptual, how inference is “Bayesian satisficing” not optimization (Chater, Reference Chater2018; Sanborn & Chater, Reference Sanborn and Chater2016), the relevance of heuristics (Gigerenzer & Gaissmaier, Reference Gigerenzer and Gaissmaier2011; cf. Fields & Glazebrook, Reference Fields and Glazebrook2020), and how heuristics, biases, and confabulation limit reportable self-knowledge (Fields, Glazebrook, & Levin, Reference Fields, Glazebrook and Levin2024). Here again, the possibility of studying MetaL in more tractable experimental systems in which the implementing architecture can be manipulated biochemically and bioelectrically, may offer a way forward not available with either human subjects or deep neural networks.
Binz et al. craft a comprehensive outline for advancing meta-learning (MetaL) on the basis of several arguments concerning the tractability of optimal learning algorithms, manipulation of complexity, and integration into the rational aspects of cognition, all seen as basic requirements for a domain-general model of cognition. Architectural features include an inductive process from experience driven by repetitive interaction with the environment, necessitating (i) an inner loop of “base learning,” and (ii) an outer loop (or MetaL) process through which the system is effectively trained by the environment to ameliorate its inner loop learning algorithms. A key aspect of the model is its dependence on the relation between the typical duration of a (general, MetaL) problem-solving episode and the typical duration of a (particular, learned) solution.
While Binz et al. focus on MetaL as a practical methodology for modeling human cognition, it is also interesting to ask how MetaL as Binz et al. describe it, fits into the conceptual framework of cognition in general, and also to ask how it applies both to organisms other than humans and to artificial (or hybrid) systems operating in task environments very different from the human task environment. From a broad perspective, MetaL is one function of metacognition (e.g., Cox, Reference Cox2005; Flavell, Reference Flavell1979; Shea & Frith, Reference Shea and Frith2019). Both MetaL and metacognition more generally engage memory and attention as they are neurophysiologically enacted by brain regions including the default mode network (Glahn et al., Reference Glahn, Winkler, Kochunov and Blangero2010), as reviewed for the two theories in Wang (Reference Wang2021) and Kuchling, Fields, and Levin (Reference Kuchling, Fields and Levin2022), respectively.
When MetaL is viewed as implemented by a metaprocessor that is a proper component of a larger cognitive system, one can ask explicitly about the metaprocessor's task environment and how it relates to the larger system's task environment. MetaL operates in a task environment of learning algorithms and outcomes, or equivalently, a task environment of metaparameters and test scores. How the latter are measured is straightforward for a human modeler employing MetaL as a methodology, but is less straightforward when an explicit system-scale architecture must be specified. The question in this case becomes that of how the object-level components of a system use the feedback received from the external environment to train the metaprocessor. The answer cannot, on pain of infinite regress, be MetaL. The relative inflexibility of object-level components as “trainers” of their associated metaprocessors effectively bakes in some level of non-optimality in any multilayer system.
Binz et al. emphasize that MetaL operates on a longer timescale than object-level learning. Given a task environment that imposes selective pressures with different timescales, natural selection will drive systems toward layered architectures that exhibit MetaL (Kuchling et al., Reference Kuchling, Fields and Levin2022). Indeed the need for a “learning to learn” capability has long been emphasized in the active-inference literature (e.g., Friston et al., Reference Friston, FitzGerald, Rigoli, Schwartenbeck, O'Doherty and Pezzulo2016). Active inference under the free-energy principle (FEP) is in an important sense “just physics” (Friston, Reference Friston2019; Friston et al., Reference Friston, Da Costa, Sakthivadivel, Heins, Pavliotis, Ramstead and Parr2023; Ramstead et al., Reference Ramstead, Sakthivadivel, Heins, Koudahl, Millidge, Da Costa and Friston2022); indeed the FEP itself is just a classical limit of the principle of unitarity, that is, of conservation of information (Fields et al., Reference Fields, Fabrocini, Friston, Glazebrook, Hazan, Levin and Marcianò2023; Fields, Friston, Glazebrook, & Levin, Reference Fields, Friston, Glazebrook and Levin2022). One might expect, therefore, that MetaL as defined by Binz et al. is not just useful, but ubiquitous in physical systems with sufficient degrees of freedom. As this is at bottom a question of mathematics, testing it does not require experimental investigation.
What does call out for experimental investigation is the extent to which MetaL can be identified in systems much simpler than humans. Biochemical pathways can be trained, via reinforcement learning, to occupy different regions of their attractor landscapes (Biswas, Manika, Hoel, & Levin, Reference Biswas, Manika, Hoel and Levin2021, Reference Biswas, Clawson and Levin2022). Do sufficiently complex biochemical networks that operate on multiple timescales exhibit MetaL? Environmental exploration and learning are ubiquitous throughout phylogeny (Levin, Reference Levin2022, Reference Levin2023); is MetaL equally ubiquitous? Learning often amounts to changing the salience distribution over inputs, or in Bayesian terms, adjusting precision assignments to priors. To what extent can we describe the implementation of MetaL by organisms in terms of adjustments of sensitivity/salience landscapes – and hence attractor landscapes – on the various spaces that compose their umwelts?
As Binz et al. point out, in the absence of a mechanism for concrete mathematical analysis, MetaL forsakes interpretable analytic solutions and hence generates an “explanation problem” (cf. Samek, Montavon, Lapuschkin, Anders, & Müller, Reference Samek, Montavon, Lapuschkin, Anders and Müller2021). As in the case of deep AI systems more generally, experimental techniques from cognitive psychology may be the most productive approach to this problem for human-like systems (Taylor & Taylor, Reference Taylor and Taylor2021). Relevant to this is an associated spectrum of ideas, including how problem solving is innately perceptual, how inference is “Bayesian satisficing” not optimization (Chater, Reference Chater2018; Sanborn & Chater, Reference Sanborn and Chater2016), the relevance of heuristics (Gigerenzer & Gaissmaier, Reference Gigerenzer and Gaissmaier2011; cf. Fields & Glazebrook, Reference Fields and Glazebrook2020), and how heuristics, biases, and confabulation limit reportable self-knowledge (Fields, Glazebrook, & Levin, Reference Fields, Glazebrook and Levin2024). Here again, the possibility of studying MetaL in more tractable experimental systems in which the implementing architecture can be manipulated biochemically and bioelectrically, may offer a way forward not available with either human subjects or deep neural networks.
Financial support
The authors have received no funding towards this contribution.
Competing interests
None.