Connectionist-versus-Bayesian debates have occurred in cognitive science for decades (e.g., Griffiths, Chater, Kemp, Perfors, & Tenenbaum, Reference Griffiths, Chater, Kemp, Perfors and Tenenbaum2010; McClelland et al., Reference McClelland, Botvinick, Noelle, Plaut, Rogers, Seidenberg and Smith2010), with each side progressing in theory, models, and algorithms, in turn impelling the other side to advance, resulting in a cycle of fruitful engagement. The recent summary of the meta-learning paradigm that Binz et al. proposed in the target article bridges the two by proposing how meta-learning in recurrent neural networks can address some of the traditional challenges of Bayesian approaches. But, by failing to recognize and engage with the latest iteration of Bayesian modeling approaches – including probabilistic programming as a unifying paradigm for probabilistic, symbolic, and differentiable computation (Cusumano-Towner, Saad, Lew, & Mansinghka, Reference Cusumano-Towner, Saad, Lew and Mansinghka2019) – this article fails to push the meta-learning paradigm as far as it could go.
The authors begin their defense of meta-learning by citing the intractability of exact Bayesian inference. However, this fails to address how and why meta-learning is superior to approximate inference for modeling cognition. As the authors themselves note, Bayesian modelers use a variety of approximate inference methods, including neural-network-powered variational inference (Dasgupta, Schulz, Tenenbaum, & Gershman, Reference Dasgupta, Schulz, Tenenbaum and Gershman2020; Kingma & Welling, Reference Kingma and Welling2013), Markov chain Monte Carlo (Ullman, Goodman, & Tenenbaum, Reference Ullman, Goodman and Tenenbaum2012), and Sequential Monte Carlo methods (Levy, Reali, & Griffiths, Reference Levy, Reali and Griffiths2008; Vul, Alvarez, Tenenbaum, & Black, Reference Vul, Alvarez, Tenenbaum and Black2009), which have all shown considerable success in modeling how humans perform inference (or fail to) in presumably intractable settings. As such, it is hardly an argument in favor of meta-learning – and against “traditional” Bayesian models – that exact inference is intractable.
This omission is just one way in which the article fails to engage with a modern incarnation of the Bayesian modeler's toolkit – Probabilistic Programming. In the past two decades, we have seen the development of probabilistic programming as unifying formalism for modeling the probabilistic, symbolic, and data-driven aspects of human cognition (Lake, Salakhutdinov, & Tenenbaum, Reference Lake, Salakhutdinov and Tenenbaum2015), as embodied in probabilistic programming language such as Church (Goodman, Mansinghka, Roy, Bonawitz, & Tenenbaum, Reference Goodman, Mansinghka, Roy, Bonawitz and Tenenbaum2012), webPPL (Goodman & Stuhlmüller, Reference Goodman and Stuhlmüllerelectronic), Pyro (Bingham et al., Reference Bingham, Chen, Jankowiak, Obermeyer, Pradhan, Karaletsos and Goodman2019), and Gen (Cusumano-Towner et al., Reference Cusumano-Towner, Saad, Lew and Mansinghka2019). These languages enable modelers to explore a much wider range of computational architectures than the standard meta-learning setup, which requires modelers to reformulate human cognition as a sequence prediction problem. Probabilistic programming allows modelers to unite the strengths of general-purpose predictors (i.e., neural networks) with theoretically informed constraints and model-based reasoning. For instance, Ong, Soh, Zaki, and Goodman (Reference Ong, Soh, Zaki and Goodman2021) showed how reasoning about others’ emotions can be modeled by combining the constraints implied by cognitive appraisal theory with bottom-up representations learnt via neural networks from emotional facial expressions. Similarly, several recent papers have shown how the linguistic abilities of large language models (LLMs) can be integrated with rational models of planning, communication, and inverse planning (Wong et al., Reference Wong, Grand, Lew, Goodman, Mansinghka, Andreas and Tenenbaum2023; Ying, Zhi-Xuan, Mansinghka, & Tenenbaum, Reference Ying, Zhi-Xuan, Mansinghka and Tenenbaum2023), modeling human inferences that LLM-based sequence prediction alone struggle with (Zhi-Xuan, Ying, Mansinghka, & Tenenbaum, Reference Zhi-Xuan, Ying, Mansinghka and Tenenbaum2024).
What flexibility does probabilistic programming afford over pure meta-learning? As the article notes, one potential benefit of meta-learning is that it avoids the need for a specific Bayesian model to perform inference over. Crucially, meta-learning achieves this by having access to sufficiently similar data at training and test time, such that the meta-learned algorithm is sufficiently well-adapted to the implied class of data-generating processes. Human cognition is much more adaptive. We do not simply adjust our learning to fit past distributions; we also construct, modify, abstract, and refactor entire theories about how the world works (Rule, Tenenbaum, & Piantadosi, Reference Rule, Tenenbaum and Piantadosi2020; Tenenbaum, Kemp, Griffiths, & Goodman, Reference Tenenbaum, Kemp, Griffiths and Goodman2011; Ullman & Tenenbaum, Reference Ullman and Tenenbaum2020), reasoning with such theories on downstream tasks (Tsividis et al., Reference Tsividis, Loula, Burga, Foss, Campero, Pouncy and Tenenbaum2021). This capacity is not captured by pure meta-learning, which occurs “offline.” By contrast, probabilistic programming allows modeling these patterns of thought: Theory building can be formulated as program induction (Lake et al., Reference Lake, Salakhutdinov and Tenenbaum2015; Saad, Cusumano-Towner, Schaechtle, Rinard, & Mansinghka, Reference Saad, Cusumano-Towner, Schaechtle, Rinard and Mansinghka2019), refactoring as program merging (Hwang, Stuhlmüller, & Goodman, Reference Hwang, Stuhlmüller and Goodman2011), and abstraction-guided reasoning as coarse-to-fine inference (Cusumano-Towner, Bichsel, Gehr, Vechev, & Mansinghka, Reference Cusumano-Towner, Bichsel, Gehr, Vechev and Mansinghka2018; Stuhlmüller, Hawkins, Siddharth, & Goodman, Reference Stuhlmüller, Hawkins, Siddharth and Goodman2015). Inference meta-programs (Cusumano-Towner et al., Reference Cusumano-Towner, Saad, Lew and Mansinghka2019; Lew et al., Reference Lew, Matheos, Zhi-Xuan, Ghavamizadeh, Gothoskar, Russell and Mansinghka2023) allow us to model how people invoke modeling and inference strategies as needed: One can employ meta-learned inference when one believes a familiar model applies, but also flexibly compute inferences when a model is learned, extended, or abstracted. On this view, meta-learning has an important role to play in modeling human cognition, but not for all of our cognitive capacities.
Another way of understanding the relationship between meta-learning and probabilistic programming is that the former uses implicit statistical assumptions while the latter's assumptions are explicit. Meta-learning assumes that the structure of the world is conveyed in the statistical structure of data across independent instances. With sufficient coverage of the training distribution, flexible deep learning approaches fit this structure and use it to generalize. But they may not do so in a way that may provide any insight into the computational problem being solved by humans. Probabilistic programs, by contrast, explicitly hypothesize the statistical patterns to be found in data, providing constraints that, if satisfied, yield insights for cognition. This implicit–explicit distinction both frames the relative value of the approaches and suggests an alternative relation: A Bayesian model need not subsume or integrate what is learned by a deep learning model, but simply explicate it, at a higher level of analysis. Through this lens, having to specify an inference problem is not a limitation, but a virtue.
The best of both worlds will be to compose and further refine these paradigms, such as using deep amortized inference (like meta-learning for Probabilistic Programming), using Bayesian tools (and other tools for mechanistic interpretation) to understand the results of meta-learning, or constructing neurosymbolic models (e.g., by grounding the outputs of meta-learned models in probabilistic programs, as in Wong et al., Reference Wong, Grand, Lew, Goodman, Mansinghka, Andreas and Tenenbaum2023). As a very recent example, Zhou, Feinman, and Lake (Reference Zhou, Feinman and Lake2024) proposed a neurosymbolic program induction model to capture human visual learning, using both Bayesian program induction and meta-learning, achieving the best of both approaches: Interpretability and parsimony, as well as capturing additional variance using flexible function approximators. We believe that the field should move beyond “Connectionist-versus-Bayesian” debates to instead explore hybrid “Connectionist-and-Bayesian” approaches.
Connectionist-versus-Bayesian debates have occurred in cognitive science for decades (e.g., Griffiths, Chater, Kemp, Perfors, & Tenenbaum, Reference Griffiths, Chater, Kemp, Perfors and Tenenbaum2010; McClelland et al., Reference McClelland, Botvinick, Noelle, Plaut, Rogers, Seidenberg and Smith2010), with each side progressing in theory, models, and algorithms, in turn impelling the other side to advance, resulting in a cycle of fruitful engagement. The recent summary of the meta-learning paradigm that Binz et al. proposed in the target article bridges the two by proposing how meta-learning in recurrent neural networks can address some of the traditional challenges of Bayesian approaches. But, by failing to recognize and engage with the latest iteration of Bayesian modeling approaches – including probabilistic programming as a unifying paradigm for probabilistic, symbolic, and differentiable computation (Cusumano-Towner, Saad, Lew, & Mansinghka, Reference Cusumano-Towner, Saad, Lew and Mansinghka2019) – this article fails to push the meta-learning paradigm as far as it could go.
The authors begin their defense of meta-learning by citing the intractability of exact Bayesian inference. However, this fails to address how and why meta-learning is superior to approximate inference for modeling cognition. As the authors themselves note, Bayesian modelers use a variety of approximate inference methods, including neural-network-powered variational inference (Dasgupta, Schulz, Tenenbaum, & Gershman, Reference Dasgupta, Schulz, Tenenbaum and Gershman2020; Kingma & Welling, Reference Kingma and Welling2013), Markov chain Monte Carlo (Ullman, Goodman, & Tenenbaum, Reference Ullman, Goodman and Tenenbaum2012), and Sequential Monte Carlo methods (Levy, Reali, & Griffiths, Reference Levy, Reali and Griffiths2008; Vul, Alvarez, Tenenbaum, & Black, Reference Vul, Alvarez, Tenenbaum and Black2009), which have all shown considerable success in modeling how humans perform inference (or fail to) in presumably intractable settings. As such, it is hardly an argument in favor of meta-learning – and against “traditional” Bayesian models – that exact inference is intractable.
This omission is just one way in which the article fails to engage with a modern incarnation of the Bayesian modeler's toolkit – Probabilistic Programming. In the past two decades, we have seen the development of probabilistic programming as unifying formalism for modeling the probabilistic, symbolic, and data-driven aspects of human cognition (Lake, Salakhutdinov, & Tenenbaum, Reference Lake, Salakhutdinov and Tenenbaum2015), as embodied in probabilistic programming language such as Church (Goodman, Mansinghka, Roy, Bonawitz, & Tenenbaum, Reference Goodman, Mansinghka, Roy, Bonawitz and Tenenbaum2012), webPPL (Goodman & Stuhlmüller, Reference Goodman and Stuhlmüllerelectronic), Pyro (Bingham et al., Reference Bingham, Chen, Jankowiak, Obermeyer, Pradhan, Karaletsos and Goodman2019), and Gen (Cusumano-Towner et al., Reference Cusumano-Towner, Saad, Lew and Mansinghka2019). These languages enable modelers to explore a much wider range of computational architectures than the standard meta-learning setup, which requires modelers to reformulate human cognition as a sequence prediction problem. Probabilistic programming allows modelers to unite the strengths of general-purpose predictors (i.e., neural networks) with theoretically informed constraints and model-based reasoning. For instance, Ong, Soh, Zaki, and Goodman (Reference Ong, Soh, Zaki and Goodman2021) showed how reasoning about others’ emotions can be modeled by combining the constraints implied by cognitive appraisal theory with bottom-up representations learnt via neural networks from emotional facial expressions. Similarly, several recent papers have shown how the linguistic abilities of large language models (LLMs) can be integrated with rational models of planning, communication, and inverse planning (Wong et al., Reference Wong, Grand, Lew, Goodman, Mansinghka, Andreas and Tenenbaum2023; Ying, Zhi-Xuan, Mansinghka, & Tenenbaum, Reference Ying, Zhi-Xuan, Mansinghka and Tenenbaum2023), modeling human inferences that LLM-based sequence prediction alone struggle with (Zhi-Xuan, Ying, Mansinghka, & Tenenbaum, Reference Zhi-Xuan, Ying, Mansinghka and Tenenbaum2024).
What flexibility does probabilistic programming afford over pure meta-learning? As the article notes, one potential benefit of meta-learning is that it avoids the need for a specific Bayesian model to perform inference over. Crucially, meta-learning achieves this by having access to sufficiently similar data at training and test time, such that the meta-learned algorithm is sufficiently well-adapted to the implied class of data-generating processes. Human cognition is much more adaptive. We do not simply adjust our learning to fit past distributions; we also construct, modify, abstract, and refactor entire theories about how the world works (Rule, Tenenbaum, & Piantadosi, Reference Rule, Tenenbaum and Piantadosi2020; Tenenbaum, Kemp, Griffiths, & Goodman, Reference Tenenbaum, Kemp, Griffiths and Goodman2011; Ullman & Tenenbaum, Reference Ullman and Tenenbaum2020), reasoning with such theories on downstream tasks (Tsividis et al., Reference Tsividis, Loula, Burga, Foss, Campero, Pouncy and Tenenbaum2021). This capacity is not captured by pure meta-learning, which occurs “offline.” By contrast, probabilistic programming allows modeling these patterns of thought: Theory building can be formulated as program induction (Lake et al., Reference Lake, Salakhutdinov and Tenenbaum2015; Saad, Cusumano-Towner, Schaechtle, Rinard, & Mansinghka, Reference Saad, Cusumano-Towner, Schaechtle, Rinard and Mansinghka2019), refactoring as program merging (Hwang, Stuhlmüller, & Goodman, Reference Hwang, Stuhlmüller and Goodman2011), and abstraction-guided reasoning as coarse-to-fine inference (Cusumano-Towner, Bichsel, Gehr, Vechev, & Mansinghka, Reference Cusumano-Towner, Bichsel, Gehr, Vechev and Mansinghka2018; Stuhlmüller, Hawkins, Siddharth, & Goodman, Reference Stuhlmüller, Hawkins, Siddharth and Goodman2015). Inference meta-programs (Cusumano-Towner et al., Reference Cusumano-Towner, Saad, Lew and Mansinghka2019; Lew et al., Reference Lew, Matheos, Zhi-Xuan, Ghavamizadeh, Gothoskar, Russell and Mansinghka2023) allow us to model how people invoke modeling and inference strategies as needed: One can employ meta-learned inference when one believes a familiar model applies, but also flexibly compute inferences when a model is learned, extended, or abstracted. On this view, meta-learning has an important role to play in modeling human cognition, but not for all of our cognitive capacities.
Another way of understanding the relationship between meta-learning and probabilistic programming is that the former uses implicit statistical assumptions while the latter's assumptions are explicit. Meta-learning assumes that the structure of the world is conveyed in the statistical structure of data across independent instances. With sufficient coverage of the training distribution, flexible deep learning approaches fit this structure and use it to generalize. But they may not do so in a way that may provide any insight into the computational problem being solved by humans. Probabilistic programs, by contrast, explicitly hypothesize the statistical patterns to be found in data, providing constraints that, if satisfied, yield insights for cognition. This implicit–explicit distinction both frames the relative value of the approaches and suggests an alternative relation: A Bayesian model need not subsume or integrate what is learned by a deep learning model, but simply explicate it, at a higher level of analysis. Through this lens, having to specify an inference problem is not a limitation, but a virtue.
The best of both worlds will be to compose and further refine these paradigms, such as using deep amortized inference (like meta-learning for Probabilistic Programming), using Bayesian tools (and other tools for mechanistic interpretation) to understand the results of meta-learning, or constructing neurosymbolic models (e.g., by grounding the outputs of meta-learned models in probabilistic programs, as in Wong et al., Reference Wong, Grand, Lew, Goodman, Mansinghka, Andreas and Tenenbaum2023). As a very recent example, Zhou, Feinman, and Lake (Reference Zhou, Feinman and Lake2024) proposed a neurosymbolic program induction model to capture human visual learning, using both Bayesian program induction and meta-learning, achieving the best of both approaches: Interpretability and parsimony, as well as capturing additional variance using flexible function approximators. We believe that the field should move beyond “Connectionist-versus-Bayesian” debates to instead explore hybrid “Connectionist-and-Bayesian” approaches.
Financial support
This research received no specific grant from any funding agency, commercial or not-for-profit sectors.
Competing interest
None.