Conviction Narrative Theory (CNT) claims that probabilistic formalisms cannot explain decision-making under radical uncertainty, where “probabilities cannot be assigned to outcomes.” Accordingly, it replaces statistical analysis with a folk-psychological account: Decision-making agents adopt narratives that “feel right” to explain the available data and to imagine and evaluate plausible futures.
Since one narrative is more convincing than another, they presumably both encode probable outcomes intuitively – which is why the narrative that feels right is adopted. If so, what, exactly, makes a candidate narrative “feel right” to an agent?
I have argued elsewhere (Solms, Reference Solms2021; Solms & Friston, Reference Solms and Friston2018) that feelings represent fluctuating uncertainty under active inference: Decreasing uncertainty (inverse precision) in a course of action feels “good” for an agent because it increases confidence in its likelihood to minimize its variational free energy, and vice versa. It is possible to reduce this decision-making process to an equation: One that has recently been translated into folk-psychological terms by Smith, Ramstead, and Kiefer (Reference Smith, Ramstead and Kiefer2022). In what follows, therefore, I paraphrase them closely.
The relevant formalism (the POMDP formulation of active inference) pivots on the concept of a “policy” (denoted by π). A policy is a possible course of action that decision-making agents can entertain; it predicts a specific sequence of state transitions. What CNT calls “adopting a narrative,” therefore, is formalized under active inference as “policy selection.” Another pivotal concept is “expected confidence” (denoted by γ) which is a precision estimate for beliefs about expected free energy. “Expected free energy” (denoted by G) is the future variational free energy (the future F) expected under the outcomes anticipated when following a policy, given a model.
Here is the formalism:
It says that a policy is more likely to be selected if it minimizes risk plus ambiguity.
Technically, under active inference, “risk” encodes the anticipated Kullback–Leibler divergence (denoted by DKL) between two quantities. The first quantity (q(oτ|π)) corresponds to the observations an agent anticipates at each time-point (denoted by oτ) if it selects one policy versus another. This formalizes the folk-psychological notion of expectations. The second quantity (p(oτ)) corresponds to a policy-independent prior over observations which encodes the observations that are congruent with an agent's “prior preference distribution.” This term formalizes the folk-psychological notion of goals. Selecting a policy to minimize free energy (G(π)) entails minimizing the difference between these prior expectations and the anticipated outcomes. An agent tries to infer which policy will generate outcomes closest to its goals, and selects the one that it believes is most likely to achieve what it desires (or values).
“Ambiguity,” technically, encodes the expected entropy (denoted by H) of the likelihood function for a given state at each time-point (denoted by sτ). Entropy measures the precision of a distribution. This means that an agent that minimizes G will actively seek states with the most precise mapping to observations (i.e., it will select policies expected to generate the most informative outcomes).
The posterior probability distribution over policies is
This says that the most likely policies are those which minimize G, under constraints afforded by E(π) and F(π) – where σ denotes a function that converts the result back to a proper distribution with non-negative values summing to one. E(π) encodes a fixed prior over policies that can be used to model habitual actions, in line with past influences. F(π) scores the free energy of past and present observations under each policy (i.e., the evidence they provided for each policy). It reflects how well each policy predicts the observations that have already been received.
How much weight should an agent afford a policy based on evidence from the past relative to the expected free energy of observations in the future? It depends upon the precision estimate for beliefs about expected free energy (γ). This pivotal quantity controls how much model-based predictions about outcomes contribute to policy selection. It formalizes how much the predictions are trusted. Lower values for γ cause an agent to act habitually and with less confidence in its future plans.
The precision estimate is updated with each new observation, allowing an agent to modulate confidence in its model of the future. This updating is performed via a hyperprior (β), the rate parameter of the γ distribution:
Here, the arrow (←) indicates iterative value-updating (until convergence), β 0 is a prior on β, and p(π 0) corresponds to p(π) before an observation has been made to generate F(π). That is, p(π 0) = σ(lnE(π)−γG(π)). For our purposes, this quantity within the value that updates the hyperprior (βupdate) is especially important since it concerns feelings. This is a type of prediction error indicating whether a new observation provides evidence for or against beliefs about G(π) – that is, whether or not G(π) is consistent with the F(π) generated by a new observation. When it leads to an increase in γ (when confidence in G(π) increases), it acts as evidence for a positive feeling, and vice versa.
The equation for posteriors over policies determines how likely a policy is. In folk-psychological terms, it determines an agent's drive to select one narrative over another. We have seen that this arises from two influences: The prior over policies (E(π)) and the expected free energy (G(π)). While E(π) maps well to habitual policies, G(π) reflects the inferred value of each policy based on beliefs (e.g., p(oτ|π) and p(oτ|sτ)) and desired outcomes (i.e., p(oτ)). The policy with the lowest expected free energy therefore formalizes the intentions of an agent. In the absence of habitual influences, the intention would become the policy the agent feels most driven to choose in p(π).
The pivotal role of feelings is clear. The formalism shows that when new observations support a current policy (when they are consistent with G(π)), γ increases, which generates “good” feelings, and vice versa. This increase in γ boosts an agent's confidence in its beliefs about expected variational free energy, which in turn reduces the appeal of habitual policies, E(π).
In short, decision-making agents adopt policies that feel right to explain the available data and to imagine and evaluate plausible futures.
Conviction Narrative Theory (CNT) claims that probabilistic formalisms cannot explain decision-making under radical uncertainty, where “probabilities cannot be assigned to outcomes.” Accordingly, it replaces statistical analysis with a folk-psychological account: Decision-making agents adopt narratives that “feel right” to explain the available data and to imagine and evaluate plausible futures.
Since one narrative is more convincing than another, they presumably both encode probable outcomes intuitively – which is why the narrative that feels right is adopted. If so, what, exactly, makes a candidate narrative “feel right” to an agent?
I have argued elsewhere (Solms, Reference Solms2021; Solms & Friston, Reference Solms and Friston2018) that feelings represent fluctuating uncertainty under active inference: Decreasing uncertainty (inverse precision) in a course of action feels “good” for an agent because it increases confidence in its likelihood to minimize its variational free energy, and vice versa. It is possible to reduce this decision-making process to an equation: One that has recently been translated into folk-psychological terms by Smith, Ramstead, and Kiefer (Reference Smith, Ramstead and Kiefer2022). In what follows, therefore, I paraphrase them closely.
The relevant formalism (the POMDP formulation of active inference) pivots on the concept of a “policy” (denoted by π). A policy is a possible course of action that decision-making agents can entertain; it predicts a specific sequence of state transitions. What CNT calls “adopting a narrative,” therefore, is formalized under active inference as “policy selection.” Another pivotal concept is “expected confidence” (denoted by γ) which is a precision estimate for beliefs about expected free energy. “Expected free energy” (denoted by G) is the future variational free energy (the future F) expected under the outcomes anticipated when following a policy, given a model.
Here is the formalism:
It says that a policy is more likely to be selected if it minimizes risk plus ambiguity.
Technically, under active inference, “risk” encodes the anticipated Kullback–Leibler divergence (denoted by DKL) between two quantities. The first quantity (q(oτ|π)) corresponds to the observations an agent anticipates at each time-point (denoted by oτ) if it selects one policy versus another. This formalizes the folk-psychological notion of expectations. The second quantity (p(oτ)) corresponds to a policy-independent prior over observations which encodes the observations that are congruent with an agent's “prior preference distribution.” This term formalizes the folk-psychological notion of goals. Selecting a policy to minimize free energy (G(π)) entails minimizing the difference between these prior expectations and the anticipated outcomes. An agent tries to infer which policy will generate outcomes closest to its goals, and selects the one that it believes is most likely to achieve what it desires (or values).
“Ambiguity,” technically, encodes the expected entropy (denoted by H) of the likelihood function for a given state at each time-point (denoted by sτ). Entropy measures the precision of a distribution. This means that an agent that minimizes G will actively seek states with the most precise mapping to observations (i.e., it will select policies expected to generate the most informative outcomes).
The posterior probability distribution over policies is
This says that the most likely policies are those which minimize G, under constraints afforded by E(π) and F(π) – where σ denotes a function that converts the result back to a proper distribution with non-negative values summing to one. E(π) encodes a fixed prior over policies that can be used to model habitual actions, in line with past influences. F(π) scores the free energy of past and present observations under each policy (i.e., the evidence they provided for each policy). It reflects how well each policy predicts the observations that have already been received.
How much weight should an agent afford a policy based on evidence from the past relative to the expected free energy of observations in the future? It depends upon the precision estimate for beliefs about expected free energy (γ). This pivotal quantity controls how much model-based predictions about outcomes contribute to policy selection. It formalizes how much the predictions are trusted. Lower values for γ cause an agent to act habitually and with less confidence in its future plans.
The precision estimate is updated with each new observation, allowing an agent to modulate confidence in its model of the future. This updating is performed via a hyperprior (β), the rate parameter of the γ distribution:
Here, the arrow (←) indicates iterative value-updating (until convergence), β 0 is a prior on β, and p(π 0) corresponds to p(π) before an observation has been made to generate F(π). That is, p(π 0) = σ(lnE(π)−γG(π)). For our purposes, this quantity within the value that updates the hyperprior (βupdate) is especially important since it concerns feelings. This is a type of prediction error indicating whether a new observation provides evidence for or against beliefs about G(π) – that is, whether or not G(π) is consistent with the F(π) generated by a new observation. When it leads to an increase in γ (when confidence in G(π) increases), it acts as evidence for a positive feeling, and vice versa.
The equation for posteriors over policies determines how likely a policy is. In folk-psychological terms, it determines an agent's drive to select one narrative over another. We have seen that this arises from two influences: The prior over policies (E(π)) and the expected free energy (G(π)). While E(π) maps well to habitual policies, G(π) reflects the inferred value of each policy based on beliefs (e.g., p(oτ|π) and p(oτ|sτ)) and desired outcomes (i.e., p(oτ)). The policy with the lowest expected free energy therefore formalizes the intentions of an agent. In the absence of habitual influences, the intention would become the policy the agent feels most driven to choose in p(π).
The pivotal role of feelings is clear. The formalism shows that when new observations support a current policy (when they are consistent with G(π)), γ increases, which generates “good” feelings, and vice versa. This increase in γ boosts an agent's confidence in its beliefs about expected variational free energy, which in turn reduces the appeal of habitual policies, E(π).
In short, decision-making agents adopt policies that feel right to explain the available data and to imagine and evaluate plausible futures.
Acknowledgement
Ryan Smith kindly edited the MS for technical accuracy.
Financial support
This research received no specific grant from any funding agency, commercial or not-for-profit sectors.
Competing interest
None.