Meta-learned models of cognition offer an exciting opportunity to address a central weakness of current cognitive models, whether Bayesian or not: Cognitive models generally do not “see” the experimental stimuli shown to participants. Experimenters instead feed models low-dimensional descriptions of the stimuli, which are often in terms of the psychological features imagined by the experimenter, or sometimes are the psychological descriptions that best fit participants’ judgments (e.g., stimulus similarity judgments; Nosofsky, Sanders, Meagher, & Douglas, Reference Nosofsky, Sanders, Meagher and Douglas2018).
For example, in studies of probability judgment, participants have been asked to judge the probability that “Bill plays jazz for a hobby” after having been given the description, “Bill is 34 years old. He is intelligent, but unimaginative, compulsive, and generally lifeless. In school, he was strong in mathematics but weak in social studies and humanities” (Tversky & Kahneman, Reference Tversky and Kahneman1983). Current probability judgment models reduce these descriptions down to a single unknown number, and attempt to find the latent probability that best fits the data (e.g., Zhu, Sanborn, & Chater, Reference Zhu, Sanborn and Chater2020).
Models trained on the underlying statistics of the environment, as meta-learned models are, can bypass this need to infer a latent variable, instead making predictions from the actual descriptions used. Indeed, even relatively simple models of semantics that locate phrases in a vector space produce judgments that correlate with the probabilities experimental participants give (Bhatia, Reference Bhatia2017). Meta-learned models could thus explain a great deal of the variability in human behavior, and allow experimenters to generalize beyond the stimuli shown to participants.
However, used as descriptive models, normative meta-learned models of cognition inherit a fundamental problem from the Bayesian approach: People's reliable deviations from normative behavior. One compelling line of research shows that probability judgments are incoherent in a way that Bayesian models are not. Using the above example of Bill, Tversky and Kahneman (Reference Tversky and Kahneman1983) found participants ranked the probability of “Bill is an accountant who plays jazz for a hobby” as higher than that of “Bill plays jazz for a hobby.” This violates the extension rule of probability because the set of all accountants who play jazz for a hobby is a subset of all people who play jazz for a hobby, no matter how Bill is described.
The target article discusses constraining meta-learned models to better describe behavior, such as reducing the number of hidden units or restricting the representational fidelity of units. These manipulations have produced a surprising and interesting range of biases, including stochastic and incoherent probability judgments (Dasgupta, Schulz, Tenenbaum, & Gershman, Reference Dasgupta, Schulz, Tenenbaum and Gershman2020). However, this is just the start to explaining human biases. Even a single bias such as the conjunction fallacy has intricacies, such as the higher rate of conjunction fallacies when choosing versus estimating (Wedell & Moro, Reference Wedell and Moro2008), and greater variability in judgments of conjunctions than those of simple events (Costello & Watts, Reference Costello and Watts2017).
Cognitive process models aim to explain these biases in detail. For conjunction fallacies, a variety of well-supported models exist, based on ideas such as participants sampling events with noise in the retrieval process (Costello & Watts, Reference Costello and Watts2014), or by sacrificing probabilistic coherence to improve judgment accuracy based on samples (Zhu et al., Reference Zhu, Sanborn and Chater2020), or by representing conjunctions as a weighted average of simple events (Juslin, Nilsson, & Winman, Reference Juslin, Nilsson and Winman2009), or by using quantum probability (Busemeyer, Pothos, Franco, & Trueblood, Reference Busemeyer, Pothos, Franco and Trueblood2011). These kinds of models capture many details of the empirical effects, through simple and intuitive mechanisms like adjusting the amount of noise or number of samples, which helps identify experiments to distinguish between them.
Mechanistically modifying meta-learned models to explain cognitive biases to the level cognitive process models do appears difficult. While changes to network structure are powerful ways to induce different biases that could identify implementation-level constraints in the brain, the effects of these kinds of changes are generally hard to intuit, while training constrained meta-learning models to test different manipulations will be slow and computationally expensive. Thus, it will be challenging to reproduce existing biases in detail or to design effective experiments for testing these constraints.
Combining meta-learned models with cognitive process models is more promising. One possibility is to have meta-learned models act as a “front end” that takes stimuli and converts them to a feature-based representation, which is then operated on by a cognitive process model. The parameters of the cognitive process model could be fit to human data, or potentially the cognitive process model could be encoded into the network (e.g., Peterson, Bourgin, Agrawal, Reichman, & Griffiths, Reference Peterson, Bourgin, Agrawal, Reichman and Griffiths2021), and meta-learning could be done on the front end and the cognitive process parameters end-to-end.
However, as meta-learned models of cognition produce posterior predictive distributions, rational process models offer a straightforward connection that does not require retraining meta-learned models. Rational process models do not directly use a posterior predictive distribution, but instead assume that the posterior predictive distribution is approximated (i.e., using the posterior mean, posterior median, or other summary statistic depending on task), most often using a statistical sampling algorithm (Griffiths, Vul, & Sanborn, Reference Griffiths, Vul and Sanborn2012). Such a model can explain details of the conjunction fallacy, and also a wide range of other biases, such as stochastic choice, anchoring and repulsion effects in estimates, long-range autocorrelations in judgment, and the flaws in random sequence generation (Castillo, León-Villagrá, Chater, & Sanborn, Reference Castillo, León-Villagrá, Chater and Sanborn2024; Spicer, Zhu, Chater, & Sanborn, Reference Spicer, Zhu, Chater and Sanborn2022; Vul, Goodman, Griffiths, & Tenenbaum, Reference Vul, Goodman, Griffiths and Tenenbaum2014; Zhu, León-Villagrá, Chater, & Sanborn, Reference Zhu, León-Villagrá, Chater and Sanborn2022, Reference Zhu, Sundh, Spicer, Chater and Sanborn2023). What these models have lacked, however, is a principled way to construct the posterior predictive distribution from environmental statistics, and here meta-learned models offer that exciting possibility.
While rational process models offer what we think is a natural choice for integration, any sort of combination with existing cognitive models offers benefits. Being able to explain both the details of biases as cognitive process models do, as well as showing sensitivity to actual stimuli is a powerful combination that moves toward the long-standing goal of a general model of cognition. Overall we see meta-learned models of cognition as not supplanting existing cognitive models, but as a way to make them much more powerful and relevant to understanding and predicting behavior.
Meta-learned models of cognition offer an exciting opportunity to address a central weakness of current cognitive models, whether Bayesian or not: Cognitive models generally do not “see” the experimental stimuli shown to participants. Experimenters instead feed models low-dimensional descriptions of the stimuli, which are often in terms of the psychological features imagined by the experimenter, or sometimes are the psychological descriptions that best fit participants’ judgments (e.g., stimulus similarity judgments; Nosofsky, Sanders, Meagher, & Douglas, Reference Nosofsky, Sanders, Meagher and Douglas2018).
For example, in studies of probability judgment, participants have been asked to judge the probability that “Bill plays jazz for a hobby” after having been given the description, “Bill is 34 years old. He is intelligent, but unimaginative, compulsive, and generally lifeless. In school, he was strong in mathematics but weak in social studies and humanities” (Tversky & Kahneman, Reference Tversky and Kahneman1983). Current probability judgment models reduce these descriptions down to a single unknown number, and attempt to find the latent probability that best fits the data (e.g., Zhu, Sanborn, & Chater, Reference Zhu, Sanborn and Chater2020).
Models trained on the underlying statistics of the environment, as meta-learned models are, can bypass this need to infer a latent variable, instead making predictions from the actual descriptions used. Indeed, even relatively simple models of semantics that locate phrases in a vector space produce judgments that correlate with the probabilities experimental participants give (Bhatia, Reference Bhatia2017). Meta-learned models could thus explain a great deal of the variability in human behavior, and allow experimenters to generalize beyond the stimuli shown to participants.
However, used as descriptive models, normative meta-learned models of cognition inherit a fundamental problem from the Bayesian approach: People's reliable deviations from normative behavior. One compelling line of research shows that probability judgments are incoherent in a way that Bayesian models are not. Using the above example of Bill, Tversky and Kahneman (Reference Tversky and Kahneman1983) found participants ranked the probability of “Bill is an accountant who plays jazz for a hobby” as higher than that of “Bill plays jazz for a hobby.” This violates the extension rule of probability because the set of all accountants who play jazz for a hobby is a subset of all people who play jazz for a hobby, no matter how Bill is described.
The target article discusses constraining meta-learned models to better describe behavior, such as reducing the number of hidden units or restricting the representational fidelity of units. These manipulations have produced a surprising and interesting range of biases, including stochastic and incoherent probability judgments (Dasgupta, Schulz, Tenenbaum, & Gershman, Reference Dasgupta, Schulz, Tenenbaum and Gershman2020). However, this is just the start to explaining human biases. Even a single bias such as the conjunction fallacy has intricacies, such as the higher rate of conjunction fallacies when choosing versus estimating (Wedell & Moro, Reference Wedell and Moro2008), and greater variability in judgments of conjunctions than those of simple events (Costello & Watts, Reference Costello and Watts2017).
Cognitive process models aim to explain these biases in detail. For conjunction fallacies, a variety of well-supported models exist, based on ideas such as participants sampling events with noise in the retrieval process (Costello & Watts, Reference Costello and Watts2014), or by sacrificing probabilistic coherence to improve judgment accuracy based on samples (Zhu et al., Reference Zhu, Sanborn and Chater2020), or by representing conjunctions as a weighted average of simple events (Juslin, Nilsson, & Winman, Reference Juslin, Nilsson and Winman2009), or by using quantum probability (Busemeyer, Pothos, Franco, & Trueblood, Reference Busemeyer, Pothos, Franco and Trueblood2011). These kinds of models capture many details of the empirical effects, through simple and intuitive mechanisms like adjusting the amount of noise or number of samples, which helps identify experiments to distinguish between them.
Mechanistically modifying meta-learned models to explain cognitive biases to the level cognitive process models do appears difficult. While changes to network structure are powerful ways to induce different biases that could identify implementation-level constraints in the brain, the effects of these kinds of changes are generally hard to intuit, while training constrained meta-learning models to test different manipulations will be slow and computationally expensive. Thus, it will be challenging to reproduce existing biases in detail or to design effective experiments for testing these constraints.
Combining meta-learned models with cognitive process models is more promising. One possibility is to have meta-learned models act as a “front end” that takes stimuli and converts them to a feature-based representation, which is then operated on by a cognitive process model. The parameters of the cognitive process model could be fit to human data, or potentially the cognitive process model could be encoded into the network (e.g., Peterson, Bourgin, Agrawal, Reichman, & Griffiths, Reference Peterson, Bourgin, Agrawal, Reichman and Griffiths2021), and meta-learning could be done on the front end and the cognitive process parameters end-to-end.
However, as meta-learned models of cognition produce posterior predictive distributions, rational process models offer a straightforward connection that does not require retraining meta-learned models. Rational process models do not directly use a posterior predictive distribution, but instead assume that the posterior predictive distribution is approximated (i.e., using the posterior mean, posterior median, or other summary statistic depending on task), most often using a statistical sampling algorithm (Griffiths, Vul, & Sanborn, Reference Griffiths, Vul and Sanborn2012). Such a model can explain details of the conjunction fallacy, and also a wide range of other biases, such as stochastic choice, anchoring and repulsion effects in estimates, long-range autocorrelations in judgment, and the flaws in random sequence generation (Castillo, León-Villagrá, Chater, & Sanborn, Reference Castillo, León-Villagrá, Chater and Sanborn2024; Spicer, Zhu, Chater, & Sanborn, Reference Spicer, Zhu, Chater and Sanborn2022; Vul, Goodman, Griffiths, & Tenenbaum, Reference Vul, Goodman, Griffiths and Tenenbaum2014; Zhu, León-Villagrá, Chater, & Sanborn, Reference Zhu, León-Villagrá, Chater and Sanborn2022, Reference Zhu, Sundh, Spicer, Chater and Sanborn2023). What these models have lacked, however, is a principled way to construct the posterior predictive distribution from environmental statistics, and here meta-learned models offer that exciting possibility.
While rational process models offer what we think is a natural choice for integration, any sort of combination with existing cognitive models offers benefits. Being able to explain both the details of biases as cognitive process models do, as well as showing sensitivity to actual stimuli is a powerful combination that moves toward the long-standing goal of a general model of cognition. Overall we see meta-learned models of cognition as not supplanting existing cognitive models, but as a way to make them much more powerful and relevant to understanding and predicting behavior.
Acknowledgments
None.
Financial support
A. N. S. and C. T. were supported by a European Research Council consolidator grant (817492 – SAMPLING). H. Y. was supported by a Chancellor's International Scholarship from the University of Warwick.
Competing interest
None.