Binz et al. describe four different meta-learning approaches and focus on the last one – methods for learning arbitrary new tasks without the need for a priori hypotheses about brain or cognitive architectures. They show that this approach can be implemented in recurrent neural networks (RNNs) that are universal approximators (Hornik, Stinchcombe, & White, Reference Hornik, Stinchcombe and White1989), and argue that it is powerful in producing Bayesian (near-optimal) learning in an arbitrarily large set of cognitive tasks. While acknowledging the power of the proposed framework for artificial intelligence (AI), we question its usefulness in cognitive and neuroscience research. We argue that an alternative approach of hyperparameter optimization (which was first proposed by Doya, Reference Doya2002, and is mentioned but not discussed by Binz et al.) is far more powerful for this role.
To be valuable for empirical research, a computational framework should generate models that are interpretable in neurocognitive terms and make predictions that can be falsified or confirmed through empirical tests. The internal computations used by the models should be analogous to those of neurocognitive systems (e.g., attention, memory, valuation, etc.; e.g., Castelvecchi, Reference Castelvecchi2016), and predict activity patterns that can be empirically validated. The framework advocated by Binz et al. has neither property, and instead generates models that are governed by immense numbers of free parameters (up to billions) and are not interpretable in cognitive terms, amounting to a “black box” data-driven approach.
A hyperparameter optimization approach alleviates these concerns by constraining the models it generate to emulate biologically plausible architectures. This allows for formulating and testing mechanistic hypotheses that are based in established literature. The reinforcement meta-learner (RML) model is a good illustration of this framework in the context of executive function (Silvetti, Vassena, Abrahamse, & Verguts, Reference Silvetti, Vassena, Abrahamse and Verguts2018).
Consistent with abundant empirical evidence on biological executive circuits (e.g., Shackman et al., Reference Shackman, Salomons, Slagter, Fox, Winter and Davidson2011; Silvetti, Seurinck, van Bochove, & Verguts, Reference Silvetti, Seurinck, van Bochove and Verguts2013; Varazzani, San-Galli, Gilardeau, & Bouret, Reference Varazzani, San-Galli, Gilardeau and Bouret2015; Yarkoni, Poldrack, Nichols, Van Essen, & Wager, Reference Yarkoni, Poldrack, Nichols, Van Essen and Wager2011), the RML emulates interactions between the medial prefrontal cortex (MPFC) and two catecholamine nuclei – the ventral tegmental area, releasing dopamine (DA), and the locus coeruleus, releasing norepinephrine (NE). The MPFC module monitors reward rates conveyed by DA and, when detecting a “need for control” (e.g., a decrease in the rates), calls for the release of NE and DA. In turn, these neurotransmitters are broadcast to task-specific cognitive modules and enhance their efficiency, thereby restoring performance and reward rates. The MPFC registers a boost of neurotransmitter release as a cost and uses Bayesian and reinforcement-learning (RL) optimization to learn control settings that maximize rewards while minimizing costs. The RML thus uses traditional Bayesian/RL optimization frameworks to simultaneously regulate motor input and internal cognitive computations, thus modeling both first-order performance and its executive (meta-level) control.
Recent studies have shown that the RML explains empirical findings that have long stumped traditional frameworks, including nonstandard reward modulations in visual areas (Horan, Daddaoua, & Gottlieb, Reference Horan, Daddaoua and Gottlieb2019; Silvetti, Lasaponara, Daddaoua, Horan, & Gottlieb, Reference Silvetti, Lasaponara, Daddaoua, Horan and Gottlieb2023) and curiosity – the intrinsic desire to obtain information in the absence of instrumental rewards (Daddaoua, Lopes, & Gottlieb, Reference Daddaoua, Lopes and Gottlieb2016; Horan et al., Reference Horan, Daddaoua and Gottlieb2019; Silvetti et al., Reference Silvetti, Lasaponara, Daddaoua, Horan and Gottlieb2023). By monitoring the volatility of the environment, the RML provides a meta-learning-based explanation of the empirical finding of volatility-sensitive learning rates (Silvetti et al., Reference Silvetti, Seurinck, van Bochove and Verguts2013, Reference Silvetti, Vassena, Abrahamse and Verguts2018). Moreover, when coupled to modules emulating memory, motor output, decision making, or attention, the RML reproduces a wide array of behavioral and neural results related, respectively, to memory capacity, motor effort, adaptive regulation of learning rates, and instrumental or curiosity-driven information gathering (Silvetti et al., Reference Silvetti, Vassena, Abrahamse and Verguts2018, Reference Silvetti, Lasaponara, Daddaoua, Horan and Gottlieb2023). Thus, despite its biologically constrained architecture, the RML gains considerable flexibility and generalizability because it can control different task-specific cognitive computations.
Because the RML uses a biologically plausible architecture with a parsimonious parameter set, it generates a rich set of novel predictions that can be tested against empirical data. These predictions involve possible relationships between behavior and neural activity, between neural activity and neurotransmitter release, and between activity in different brain structures. Existing versions of the RML make predictions about individual computations (e.g., how much memory effort to engage in a particular context) but future versions can be extended to probe how the brain arbitrates between computations (e.g., how it trades-off between relying on memory versus acquiring new sensory information when performing a task).
In conclusion, different meta-learning approaches can differ greatly in their comparative strengths. The entirely unconstrained approach discussed by Binz et al. may be desirable for AI applications where there is no need for biological constraints, for example, when developing an algorithm for a self-driving car, or optimizing planning in multiple tasks. In contrast, we believe that a biologically constrained meta-learning framework is vastly superior for advancing cognitive and neuroscience research (Marblestone, Wayne, & Kording, Reference Marblestone, Wayne and Kording2017). Such a biologically constrained framework is grounded in the neuroscientific literature, and can generate testable and falsifiable hypotheses about neurobiological processes underlying cognitive function.
Binz et al. describe four different meta-learning approaches and focus on the last one – methods for learning arbitrary new tasks without the need for a priori hypotheses about brain or cognitive architectures. They show that this approach can be implemented in recurrent neural networks (RNNs) that are universal approximators (Hornik, Stinchcombe, & White, Reference Hornik, Stinchcombe and White1989), and argue that it is powerful in producing Bayesian (near-optimal) learning in an arbitrarily large set of cognitive tasks. While acknowledging the power of the proposed framework for artificial intelligence (AI), we question its usefulness in cognitive and neuroscience research. We argue that an alternative approach of hyperparameter optimization (which was first proposed by Doya, Reference Doya2002, and is mentioned but not discussed by Binz et al.) is far more powerful for this role.
To be valuable for empirical research, a computational framework should generate models that are interpretable in neurocognitive terms and make predictions that can be falsified or confirmed through empirical tests. The internal computations used by the models should be analogous to those of neurocognitive systems (e.g., attention, memory, valuation, etc.; e.g., Castelvecchi, Reference Castelvecchi2016), and predict activity patterns that can be empirically validated. The framework advocated by Binz et al. has neither property, and instead generates models that are governed by immense numbers of free parameters (up to billions) and are not interpretable in cognitive terms, amounting to a “black box” data-driven approach.
A hyperparameter optimization approach alleviates these concerns by constraining the models it generate to emulate biologically plausible architectures. This allows for formulating and testing mechanistic hypotheses that are based in established literature. The reinforcement meta-learner (RML) model is a good illustration of this framework in the context of executive function (Silvetti, Vassena, Abrahamse, & Verguts, Reference Silvetti, Vassena, Abrahamse and Verguts2018).
Consistent with abundant empirical evidence on biological executive circuits (e.g., Shackman et al., Reference Shackman, Salomons, Slagter, Fox, Winter and Davidson2011; Silvetti, Seurinck, van Bochove, & Verguts, Reference Silvetti, Seurinck, van Bochove and Verguts2013; Varazzani, San-Galli, Gilardeau, & Bouret, Reference Varazzani, San-Galli, Gilardeau and Bouret2015; Yarkoni, Poldrack, Nichols, Van Essen, & Wager, Reference Yarkoni, Poldrack, Nichols, Van Essen and Wager2011), the RML emulates interactions between the medial prefrontal cortex (MPFC) and two catecholamine nuclei – the ventral tegmental area, releasing dopamine (DA), and the locus coeruleus, releasing norepinephrine (NE). The MPFC module monitors reward rates conveyed by DA and, when detecting a “need for control” (e.g., a decrease in the rates), calls for the release of NE and DA. In turn, these neurotransmitters are broadcast to task-specific cognitive modules and enhance their efficiency, thereby restoring performance and reward rates. The MPFC registers a boost of neurotransmitter release as a cost and uses Bayesian and reinforcement-learning (RL) optimization to learn control settings that maximize rewards while minimizing costs. The RML thus uses traditional Bayesian/RL optimization frameworks to simultaneously regulate motor input and internal cognitive computations, thus modeling both first-order performance and its executive (meta-level) control.
Recent studies have shown that the RML explains empirical findings that have long stumped traditional frameworks, including nonstandard reward modulations in visual areas (Horan, Daddaoua, & Gottlieb, Reference Horan, Daddaoua and Gottlieb2019; Silvetti, Lasaponara, Daddaoua, Horan, & Gottlieb, Reference Silvetti, Lasaponara, Daddaoua, Horan and Gottlieb2023) and curiosity – the intrinsic desire to obtain information in the absence of instrumental rewards (Daddaoua, Lopes, & Gottlieb, Reference Daddaoua, Lopes and Gottlieb2016; Horan et al., Reference Horan, Daddaoua and Gottlieb2019; Silvetti et al., Reference Silvetti, Lasaponara, Daddaoua, Horan and Gottlieb2023). By monitoring the volatility of the environment, the RML provides a meta-learning-based explanation of the empirical finding of volatility-sensitive learning rates (Silvetti et al., Reference Silvetti, Seurinck, van Bochove and Verguts2013, Reference Silvetti, Vassena, Abrahamse and Verguts2018). Moreover, when coupled to modules emulating memory, motor output, decision making, or attention, the RML reproduces a wide array of behavioral and neural results related, respectively, to memory capacity, motor effort, adaptive regulation of learning rates, and instrumental or curiosity-driven information gathering (Silvetti et al., Reference Silvetti, Vassena, Abrahamse and Verguts2018, Reference Silvetti, Lasaponara, Daddaoua, Horan and Gottlieb2023). Thus, despite its biologically constrained architecture, the RML gains considerable flexibility and generalizability because it can control different task-specific cognitive computations.
Because the RML uses a biologically plausible architecture with a parsimonious parameter set, it generates a rich set of novel predictions that can be tested against empirical data. These predictions involve possible relationships between behavior and neural activity, between neural activity and neurotransmitter release, and between activity in different brain structures. Existing versions of the RML make predictions about individual computations (e.g., how much memory effort to engage in a particular context) but future versions can be extended to probe how the brain arbitrates between computations (e.g., how it trades-off between relying on memory versus acquiring new sensory information when performing a task).
In conclusion, different meta-learning approaches can differ greatly in their comparative strengths. The entirely unconstrained approach discussed by Binz et al. may be desirable for AI applications where there is no need for biological constraints, for example, when developing an algorithm for a self-driving car, or optimizing planning in multiple tasks. In contrast, we believe that a biologically constrained meta-learning framework is vastly superior for advancing cognitive and neuroscience research (Marblestone, Wayne, & Kording, Reference Marblestone, Wayne and Kording2017). Such a biologically constrained framework is grounded in the neuroscientific literature, and can generate testable and falsifiable hypotheses about neurobiological processes underlying cognitive function.
Acknowledgments
Tim Vriens is a PhD student enrolled in the National PhD in Artificial Intelligence, XXXVII cycle, course on Health and life sciences, hosted by Università Campus Bio-Medico di Roma, Italy.
Financial support
M. S. is funded by the Italian Ministry of University and Research, PRIN 2022 program, Grant No. 64.20227MPSEH. M. H. is supported via the Sainsbury Wellcome Centre PhD Programme and has received grants from Reinholdt W. Jorck og Hustrus Fond, Knud Højgaards Fond and Anglo-Danish Society
Competing interests
None.