Introduction
Early life trauma, including emotional, physical, and sexual abuse, is a well-established risk factor for multiple forms of mental and physical well-being (Dube et al., Reference Dube, Anda, Felitti, Chapman, Williamson and Giles2001, Reference Dube, Felitti, Dong, Chapman, Giles and Anda2003; Felitti et al., Reference Felitti, Anda, Nordenberg, Williamson, Spitz, Edwards and Marks1998). While impact of early life trauma on behavioral and neurophysiological systems related to stress and threat responding are primary mechanisms explaining conferred risk for psychopathology (Mcewen, Reference Mcewen2004; McLaughlin, Sheridan, Humphreys, Belsky, & Ellis, Reference McLaughlin, Sheridan, Humphreys, Belsky and Ellis2021; Nemeroff, Reference Nemeroff2016), there has been growing recognition and interest in the role of systems related to decision-making and reward learning as additional and non-mutually exclusive pathways to psychopathology (Fonzo, Reference Fonzo2018; Hanson, Williams, Bangasser, & Peña, Reference Hanson, Williams, Bangasser and Peña2021; McLaughlin, DeCross, Jovanovic, & Tottenham, Reference McLaughlin, DeCross, Jovanovic and Tottenham2019). Further elaboration of specific mechanistic pathways will hopefully continue to inform development of prevention and intervention modalities.
Several lines of research support a mechanistic pathway of altered reward learning and decision-making mediating the relationship between early life trauma and psychopathology, particularly internalizing symptoms. Youth exposed to early life trauma learn reward contingencies more slowly and have decreased activation of striatum and dorsal anterior cingulate during reward learning tasks (Cisler et al., Reference Cisler, Esbensen, Sellnow, Ross, Weaver, Sartin-Tarm and Kilts2019; Gerin et al., Reference Gerin, Puetz, Blair, White, Sethi, Hoffmann and McCrory2017; Hanson, Hariri, & Williamson, Reference Hanson, Hariri and Williamson2015; Harms, Shannon Bowen, Hanson, & Pollak, Reference Harms, Shannon Bowen, Hanson and Pollak2018; Lenow, Scott Steele, Smitherman, Kilts, & Cisler, Reference Lenow, Scott Steele, Smitherman, Kilts and Cisler2014). Similarly, youth with internalizing disorders demonstrate decreased striatal responses during the receipt and anticipation of reward (Auerbach, Admon, & Pizzagalli, Reference Auerbach, Admon and Pizzagalli2014; Keren et al., Reference Keren, O'Callaghan, Vidal-Ribas, Buzzell, Brotman, Leibenluft and Stringaris2018; Rappaport, Kandala, Luby, & Barch, Reference Rappaport, Kandala, Luby and Barch2020), consistent with altered neural reward responsiveness as a mechanism of observed clinical symptoms (e.g. anhedonia, avoidance of potentially rewarding activities, etc). Indeed, prospective studies demonstrate that decreased striatal reactivity to rewards predict development of future internalizing symptoms among youth (Hanson et al., Reference Hanson, Hariri and Williamson2015; Stringaris et al., Reference Stringaris, Vidal-Ribas Belil, Artiges, Lemaitre, Gollier-Briant and Wolke2015). While decreased striatal responses to reward are more consistently observed among depressed youth (Tang et al., Reference Tang, Harrewijn, Benson, Haller, Guyer, Perez-Edgar and Fox2022), reduced striatal activation to reward has also been observed in large samples of youth with anxiety disorders (Auerbach et al., Reference Auerbach, Pagliaccio, Hubbard, Frosch, Kremens, Cosby and Pizzagalli2022), and altered striatal response to reward also predicts anxiety symptom reduction during treatment among youth with anxiety disorders (Sequeira et al., Reference Sequeira, Silk, Ladouceur, Hanson, Ryan, Morgan and Forbes2021), possibly by enabling greater engagement with therapy.
The role of prospective episodic memory and mental simulation represent an emerging area of interest in the study of reward learning and decision-making (Biderman, Bakkour, & Shohamy, Reference Biderman, Bakkour and Shohamy2020; Dasgupta & Gershman, Reference Dasgupta and Gershman2021; Mattar & Lengyel, Reference Mattar and Lengyel2022; Schacter, Benoit, & Szpunar, Reference Schacter, Benoit and Szpunar2017; Sosa & Giocomo, Reference Sosa and Giocomo2021), though these processes have never been examined among at-risk youth. Numerous lines of research using animal and human models demonstrates that neural patterns associated with memory representations for the possible outcomes of a choice are activated at the time of choice as a form of mental simulation of future events (i.e. neural ‘preplay’) (Biderman et al., Reference Biderman, Bakkour and Shohamy2020; Doll, Duncan, Simon, Shohamy, & Daw, Reference Doll, Duncan, Simon, Shohamy and Daw2015; Schacter et al., Reference Schacter, Benoit and Szpunar2017; Shadlen & Shohamy, Reference Shadlen and Shohamy2016; Sosa & Giocomo, Reference Sosa and Giocomo2021; Widloski & Foster, Reference Widloski and Foster2022; Wikenheiser & Redish, Reference Wikenheiser and Redish2015; Yu & Frank, Reference Yu and Frank2015; Zielinski, Tang, & Jadhav, Reference Zielinski, Tang and Jadhav2020). For example, memory representations for an aversive outcome become active prior to selecting amongst choices where an aversive outcome is possible and the magnitude of these representations predicts subsequent choices to avoid the expected aversive outcome (Castegnetti et al., Reference Castegnetti, Tzovara, Khemka, Melinščak, Barnes, Dolan and Bach2020; Moughrabi et al., Reference Moughrabi, Botsford, Gruichich, Azar, Heilicher, Hiser and Cisler2022). One emerging model explaining these phenomena posits that reactivation of memory representations reflects a prospective planning process, whereby the learner imagines possible outcomes for different branches of a decision tree and uses these imagined outcomes to inform selection of an appropriate response given the current context and goals (Biderman et al., Reference Biderman, Bakkour and Shohamy2020; Doll et al., Reference Doll, Duncan, Simon, Shohamy and Daw2015; Schacter et al., Reference Schacter, Benoit and Szpunar2017). Further, experimental studies suggest that engaging imagined future rewarding outcomes increases reward-related neural activity in the medial prefrontal cortex (Peters & Büchel, Reference Peters and Büchel2010). Note that mental simulation of imagined outcomes as a mechanism of reward decision-making is a separate, though likely related process, to reward anticipation.
Testing the hypothesis of altered reactivation of reward representations at the time of choice among at risk youth has the potential to extend and complement prior work suggesting altered striatal and salience network activity during the anticipation and receipt of reward outcomes (Auerbach et al., Reference Auerbach, Pagliaccio, Hubbard, Frosch, Kremens, Cosby and Pizzagalli2022; Birn, Roeber, & Pollak, Reference Birn, Roeber and Pollak2017; Cisler et al., Reference Cisler, Esbensen, Sellnow, Ross, Weaver, Sartin-Tarm and Kilts2019; Harms et al., Reference Harms, Shannon Bowen, Hanson and Pollak2018; Lenow et al., Reference Lenow, Scott Steele, Smitherman, Kilts and Cisler2014). Indeed, understanding processes at the time of choice during laboratory tasks may help explain clinical behavior in this population, such as choices to behaviorally withdraw and/or avoid activities. For example, decreased mental simulation of reward might help explain behavioral withdrawal, such that youth who cannot engage a mental simulation of a rewarding outcome see little reason to exert effort to engage in the behavior. In the context of laboratory reinforcement learning tasks (e.g. bandit tasks), response selection is a separate, though related, process from response valuation. One concept related to selecting responses with varying degrees of expected value is the exploration-exploitation tradeoff (Daw, O'Doherty, Dayan, Seymour, & Dolan, Reference Daw, O'Doherty, Dayan, Seymour and Dolan2006; Schulz & Gershman, Reference Schulz and Gershman2019; Wilson, Bonawitz, Costa, & Ebitz, Reference Wilson, Bonawitz, Costa and Ebitz2021). Exploitation broadly refers to a strategy that favors selecting responses that have a high expectation of value; exploration broadly refers to a strategy favoring a wider sampling of available response. Exploration has been differentiated into random exploration and information-directed exploitation (Schulz & Gershman, Reference Schulz and Gershman2019; Wilson et al., Reference Wilson, Bonawitz, Costa and Ebitz2021). The latter refers to a strategy of sampling amongst available choices for the explicit purpose of gaining information about those choices. The former refers to an ostensibly stochastic process underlying response selection, such that choice is uncoupled from both the choice's expected outcome probability and the value of gaining information about the environment by selecting that choice. Whereas younger children tend to show random exploration, adolescents show increasingly structured information-directed exploration (Meder, Wu, Schulz, & Ruggeri, Reference Meder, Wu, Schulz and Ruggeri2021; Somerville et al., Reference Somerville, Sasse, Garrad, Drysdale, Abi Akar, Insel and Wilson2017). In the context of prospective memory representations for reward and mental simulation as a mechanism for decision-making, it is plausible that individual differences in random exploration are explained by individual differences in mental simulation for reward. For example, youth exposed to trauma and/or with internalizing symptoms with limited access to reward memory exemplars might be expected to make ostensibly stochastic decisions for reasons other than expected value due to their difficulty generating prospective reward representations.
No prior research has tested this hypothesis about prospective memory representations, with only limited and inconsistent prior computationally-driven behavioral investigations of choice strategies during reward learning among youth with trauma exposure and/or internalizing symptoms (Cisler et al., Reference Cisler, Esbensen, Sellnow, Ross, Weaver, Sartin-Tarm and Kilts2019; Harms et al., Reference Harms, Shannon Bowen, Hanson and Pollak2018; Humphreys et al., Reference Humphreys, Lee, Telzer, Gabard-Durnam, Goff, Flannery and Tottenham2015; Sheridan et al., Reference Sheridan, McLaughlin, Winter, Fox, Zeanah and Nelson2018). Some studies using foraging tasks suggest increased exploitation among adults with significant histories of early life adversity (Lenow, Constantino, Daw, & Phelps, Reference Lenow, Constantino, Daw and Phelps2017; Lloyd, McKay, & Furl, Reference Lloyd, McKay and Furl2022). A large sample of previously institutionalized youth demonstrated greater exploitation compared to typically developing youth on a risky decision-making task (Humphreys et al., Reference Humphreys, Lee, Telzer, Gabard-Durnam, Goff, Flannery and Tottenham2015), though this task may better reflect risk-taking (Humphreys, Lee, & Tottenham, Reference Humphreys, Lee and Tottenham2013; Lejuez et al., Reference Lejuez, Read, Kahler, Richards, Ramsey, Stuart and Brown2002) than exploration. By contrast, one small prior study using a three-arm bandit task found increased choice stochasticity during social decision-making among assaulted adolescent girls (Lenow, Cisler, & Bush, Reference Lenow, Cisler and Bush2015), and a larger study of youth with mixed histories of assault and clinical symptoms completing a similar task did not identify significant relationships between trauma exposure variables and exploration / exploitation strategies (Cisler et al., Reference Cisler, Esbensen, Sellnow, Ross, Weaver, Sartin-Tarm and Kilts2019). Among adults, a meta-analysis identified decreased reward sensitivity among depressed individuals (Huys, Pizzagalli, Bogdan, & Dayan, Reference Huys, Pizzagalli, Bogdan and Dayan2013), though as the authors note, their reward sensitivity parameter was mathematically interchangeable with an exploitation parameter, consistent with other research among depressed adults (Blanco, Otto, Maddox, Beevers, & Love, Reference Blanco, Otto, Maddox, Beevers and Love2013; Dubois & Hauser, Reference Dubois and Hauser2022). Accordingly, further investigation into choice selection strategies and their neurocircuitry mechanisms among youth exposed to trauma and/or with internalizing symptoms is necessary.
Here, we aim to investigate aberrant generation of prospective memory representations for reward and their relationships with reward learning strategies as well as trauma exposure and internalizing symptoms among youth.
Methods
61 adolescent girls, age 11–17, participated in the study at two different sites: Little Rock, AR and the surrounding area (n = 26 participants; n = 13 exposed to assault), and Madison, WI and the surrounding area (n = 35 participants; n = 18 exposed to assault). Participants were recruited from community-wide advertising, social medial posting, and outpatient mental health clinic referrals. Healthy controls were recruited based on absence of current mental health disorders, trauma exposure, and psychiatric treatment histories. Inclusion criteria for the assaulted group consisted of a history of directly experienced physical or sexual assault that the participant could remember. Exclusion criteria for all participants included histories of psychotic symptoms, developmental disorders, major medical disorders, MRI contraindications, pregnancy, history of loss of consciousness greater than 10 min. Psychotropic medication was not exclusionary for the assaulted adolescents; however, a stable dose on any medication for at least 4 weeks was required. Table 1 presents clinical and demographic characteristics. Imaging data were excluded for one participant, an assaulted girl, due to excessive head motion, and imaging data were unusable from two participants, both controls, due to technical error during scanning. The imaging analyses included 58 participants and all participants' data were used in behavioral analyses. All study procedures were approved by the local IRB committees.
Note. IQ was assessed from the Receptive One-Word Picture Vocabulary Test. CTQ, Childhood Trauma Questionnaire; UCLA PTSD RI, UCLA PTSD Reaction Index; CAPS, Clinician Administered PTSD Scale; CBCL, Child Behavior Checklist; CBCL values represent raw values; DERS, Difficulties in Emotion Regulation Scale. Psychopathology was assessed using the Mini-International Neuropsychiatric Interview for Children and Adolescents (MINI Kid). Bolded values represent a statistical difference, two-tailed (p < 0.05).
Portions of these data pertaining to the impact of trauma characteristics on outcome processing (i.e. prediction error encoding and latent state belief updating) have previously been published (Cisler et al., Reference Cisler, Esbensen, Sellnow, Ross, Weaver, Sartin-Tarm and Kilts2019; Letkiewicz, Cochran, Privratsky, James, & Cisler, Reference Letkiewicz, Cochran, Privratsky, James and Cisler2022). The present analysis is a novel investigation of multivariate representations at the time of choice as a function of trauma exposure characteristics and internalizing symptoms.
Assessments
Internalizing symptoms were assessed with the caregiver-rated Child Behavior Checklist (Achenbach, Reference Achenbach1991) (CBCL), consisting of the sum of anxiety, depression, and somatic concern subscales. The Clinician Administered PTSD Scale, Child and Adolescent Version (CAPS) (Nader, Blake, Pynoos, Newman, & Weathers, Reference Nader, Blake, Pynoos, Newman and Weathersn.d.), was used to assess PTSD symptoms, and PTSD diagnoses followed definitions established by prior studies among youth (Cohen, Deblinger, Mannarino, & Steer, Reference Cohen, Deblinger, Mannarino and Steer2004). The Mini-International Neuropsychiatric Interview for Children and Adolescents (MINI-KID) (Sheehan et al., Reference Sheehan, Sheehan, Shytle, Janavs, Bannon, Rogers and Wilkinson2010) assessed for current and lifetime comorbid mental health disorders. Assault exposure histories were defined using the trauma assessment section of the National Survey of Adolescents (NSA) (Kilpatrick et al., Reference Kilpatrick, Ruggiero, Acierno, Saunders, Resnick and Best2003). Participants also completed the Childhood Trauma Questionnaire (Bernstein et al., Reference Bernstein, Fink, Handelsman, Foote, Lovejoy, Wenzel and Ruggiero1994), providing a continuous measure of the total severity of early life maltreatment and trauma across the domains of emotional abuse, physical abuse, sexual abuse, emotional neglect, and physical neglect. We also assessed participants' verbal IQ (Brownell, Reference Brownell2000).
MRI acquisition and image preprocessing
See online Supplemental material.
Reinforcement learning task
Participants completed a three-arm bandit task using social stimuli (Fig. 1a) in a counterbalanced order. Participants were directed to give $10 to one of three mock people who returned either $20 or $0. The probabilities of positive returns varied by arm, either 80, 50, or 20%. Probabilities changed across the mock people every 30 trials, for a total of 90 trials. The same faces were used for all trials. Participants were informed that their compensation would be proportional to task performance. Additional information is provided in online Supplemental material and Fig. 1 legend.
Modeling Reinforcement Learning. Behavior during the RL tasks was modeled using versions of the Rescorla-Wagner (RW) model (Sutton & Barto, Reference Sutton and Barto1998). Consistent with prior research (Hauser, Iannaccone, Walitza, Brandeis, & Brem, Reference Hauser, Iannaccone, Walitza, Brandeis and Brem2015; Ross, Lenow, Kilts, & Cisler, Reference Ross, Lenow, Kilts and Cisler2018), four different RW-based models were tested, which manipulated whether the model updated the expected value of the unchosen option (Hauser et al., Reference Hauser, Iannaccone, Walitza, Brandeis and Brem2015) and whether the model was risk-sensitive (Niv, Edlund, Dayan, & O'Doherty, Reference Niv, Edlund, Dayan and O'Doherty2012). Expected reward values for each arm were transformed into choice probabilities using a softmax function, providing individually varying βs that reflect the degree to which an individual's choices are driven by reward expectations. Model fitting was conducted using hierarchical Bayesian inference (Piray, Dezfouli, Heskes, Frank, & Daw, Reference Piray, Dezfouli, Heskes, Frank and Daw2019). See online Supplemental material for additional information.
Independent Component Analysis. An Independent Component Analysis (Calhoun, Adali, Pearlson, & Pekar, Reference Calhoun, Adali, Pearlson and Pekar2001) (ICA) with a model order of 35 components was conducted on the full voxelwise fMRI timecourses. This model order delivered a good balance between component reliability estimated across 50 ICASSO iterations and interpretability of canonical networks. 8 of the 35 components were deemed functional networks of interest after visual inspection (see Fig. 3a below). Components arising from artifacts of head motion or CSF and components of non-interest (i.e. motor, sensorimotor, and visual networks), which are not hypothesized to be relevant for understanding trauma, internalizing symptoms, reward learning, or PTSD (Auerbach et al., Reference Auerbach, Pagliaccio, Hubbard, Frosch, Kremens, Cosby and Pizzagalli2022; Patel, Spreng, Shin, & Girard, Reference Patel, Spreng, Shin and Girard2012), were excluded.
Multivariate pattern analyses of prospective mental representations during choice
Figure 1b provides an overview of the analytical approach, which is in direct accord with our previous MVPA investigation of prospective representations of reward and threat as a mechanism of decision-making (Moughrabi et al., Reference Moughrabi, Botsford, Gruichich, Azar, Heilicher, Hiser and Cisler2022). The first step was to demonstrate that network activity patterns at the time of reward delivery could accurately be decoded. Each participants' trial-by-trial activation patterns at the time of reward delivery were characterized using 3 dLSS. The timepoint × voxel matrices were centered within each timepoint to ensure no differences in overall activation across trials. Support vector machines (SVM), using a radial basis function kernel implemented in Matlab through libsvm (Chang & Lin, Reference Chang and Lin2011), were used to decode reward outcomes (binary classification). We established the accuracy of the decoders using leave-one-out cross-validation across subjects (i.e. one subject was designated as the left-out test subject, decoders were trained on the remaining test subjects (i.e. N-1 sample size), then the decoder was tested on the independent left-out subject's data. This process was repeated until all subjects served as the left-out test subject. The reward decoder accuracy was defined as the mean of sensitivity and specificity.
After testing accuracy of the reward decoders, the next step was to apply the reward decoders to participant's data at the time of choice. 3dLSS was used to define trial-by-trial activation at the time of choice. A leave-one-out approach was used, such that a subject was designated as the left-out test subject, the reward decoders were trained on all remaining participants' reward outcome data, and the resulting reward decoders were applied to the left-out participant's choice data. This process was repeated for each subject. This resulted in hyperplane distances representing the degree to which the trained multivariate patterns (reward outcomes) were active at the time of choice. This process was repeated separately for each ICA network of interest, resulting in unique predictions (i.e. hyperplane distances) about reward representation activation for each separate network.
Our primary interest was investigating coupling between prospective reward representation at the time of choice and the expected reward value, derived from the computational model, of the chosen arm. That is, the degree to which a youth is expecting reward for a given choice should be related to the degree of activation of prospective reward representations at the time of that choice. To test this hypothesis, we conducted linear mixed effects models (LMEMs), in which trial-by-trial reward expectations (V of the chosen arm from the fitted computational model) were regressed onto the trial-by-trial hyperplane distances. We stringently controlled for multiple comparisons across the 8 ICA networks with Bonferroni correction, resulting in a corrected alpha of p = 0.0063. These models included covariates for age, IQ, and head motion. We included an additional covariate for each subject's cross-validation reward decoding accuracy (Greene et al., Reference Greene, Shen, Noble, Horien, Hahn, Arora and Constable2022). Main results without these covariates, which remain essentially unchanged, are included in the online Supplemental material. We modeled subject and site as random effects in all models, with subject nested within site.
LMEMs then tested whether individual differences moderated the coupling between prospective reward representations (hyperplane distances) and expected reward, using identical models and including interaction terms with the individual difference variable. We first investigated associations with trauma exposure (continuous measure of log transformed CTQ total score or dichotomous assault exposure in separate LMEMs) on coupling of reward representations with expected reward. Subsequent models then retained trauma exposure severity (log transformed CTQ total score) as a covariate and tested CBCL internalizing symptoms, PTSD symptoms, and decomposed CBCL internalizing symptoms into its constituent scales of depression, anxiety, and somatic complaints. While the study recruited controls and assaulted participants as separate groups, given the continuous distributions of CTQ total scores and internalizing symptoms (online Supplemental Fig. S1), we opted to use these continuous variables among the entire sample to conserve statistical power. Bonferroni correction again controlled for family-wise multiple comparisons. Mediation analyses tested the significance of hypothesized indirect effects through bootstrapping with replacement using 50 000 iterations following contemporary recommendations for mediation analyses (Hayes & Rockwood, Reference Hayes and Rockwood2017).
Results
Relationship between learning parameters and clinical characteristics
We first investigated relationships between clinical variables and softmax βs from the best fitting model (Fig. 2a). Regression models, conducted separately for CTQ total scores and dichotomous control v. assault group comparisons, did not demonstrate significant relationships between softmax βs and CTQ total scores, p = 0.76 (Fig. 2b) nor dichotomous control v. assaulted group comparisons, p = 0.58. When controlling for CTQ total scores, identical models demonstrated that CBCL internalizing symptoms were significantly related to softmax βs, t(51) = −3.15, p = 0.003 (Fig. 2c), demonstrating decreased choice preference for high reward options and greater response stochasticity. Decomposing internalizing symptoms in separate models demonstrated similar relationships with depression symptoms, t(51) = −2.70, p = 0.009, anxiety, t(51) = −3.2, p = 0.002, and somatic complaints, t(51) = −2.37, p = 0.02 (online Supplemental Figs S1a–c). CAPS total symptom severity scores among the traumatized youth were similarly negatively related to softmax βs, t(25) = −2.54, p = 0.018. There were no relationships between trauma characteristics and clinical variables with positive or negative learning rates (ps > 0.3).
Multivariate representations for reward at the time of choice and coupling with reward expectations
Leave-one-out cross-validation accuracy for reward outcomes was above chance for all ICA networks (Fig. 3b), demonstrating that reward (v. loss) outcomes in a left-out participant could accurately be decoded from the other participants' patterns of voxel activity. We also observed that classifier cross-validation accuracy was not correlated with trauma characteristics (ps > 0.31 for assault group, ps > 0.47 for CTQ total score), internalizing symptoms (ps > 0.19), or PTSD symptom severity (ps > 0.6), suggesting that decoded reward representations were equally accurate regardless of trauma or clinical symptoms.
SVM classifiers were then applied to left-out participants' voxel patterns at the time of choice, resulting in trial-by-trial predictions about the degree to which reward representations were active while the participant contemplated which arm of the task to select. LMEMs tested the degree to which these trial-by-trial prospective reward representations were coupled with trial-by-trial reward expectations (i.e. V) derived from the computational model fit to participants' observed behavior. These models demonstrated that prospective reward representations in each of the tested networks were strongly coupled with expected reward for the chosen arm (Fig. 3c).
We next tested whether this coupling between prospective reward representations and expected reward varied as a function of behavioral strategies on the task. LMEMs demonstrated that coupling between reward representations and expected reward was positively associated with softmax βs in the salience, t(4690) = 3.22, p = 0.001, medial PFC, t(4690) = 3.88, p < 0.001, anterior insula, t(4690) = 3.41, p < 0.001, and striatum networks, t(4690) = 3.39, p < 0.001 (Fig. 3d), such that individuals who generated greater prospective reward representations in proportion to the expected reward probabilities of the chosen arm also demonstrated behavioral strategies favoring the selection of high value arms.
Associations among clinical characteristics and coupling between reward representations and expected reward
LMEMs demonstrated that greater CBCL internalizing symptoms was associated with de-coupling of reward expectations for a chosen arm and activation of prospective reward representations in the striatum network, t(4847) = −3.66, p < 0.001 (Fig. 4a). Additional models decomposing CBCL internalizing symptoms demonstrated similar relationships with depression, t(4847) = 3.94, p = 0.001, anxiety, t(4847) = 3.07, p = 0.002, and somatic complaints, t(4847) = −2.01, p = 0.04. Neither trauma characteristics (all p > 0.42 for CTQ total score; all p > 0.06 for assault group comparisons) nor PTSD symptom severity among the assaulted adolescents (all p > 0.048) were associated with coupling of prospective reward representations and reward expectations in any network when controlling for multiple comparisons. While these models controlled for overall trauma severity (CTQ total score), we conducted an additional post-hoc analysis to differentiate associations with assault exposure (i.e. the variable used for inclusion into the study) and internalizing symptoms (see Fig. 4b and 4c).
As an additional test of specificity, we demonstrated that internalizing symptoms, but not externalizing symptoms, were related to altered coupling of reward representations in the striatum (see online Supplemental material).
Prospective reward representations mediate the association between internalizing symptoms and behavioral strategies during learning
We statistically tested whether coupling between prospective reward representations and reward expectation in the striatum mediated the association internalizing symptoms and softmax βs (Fig. 5a). We observed a significant indirect effect of internalizing symptoms through prospective reward representations in the striatum when tested through bootstrapping with 50 000 iterations (p = 0.014, ab path B = −0.36, 95% CI −0.76 to −0.055 (Fig. 5b). Decomposing internalizing symptoms, the indirect effect mediating pathway was also significant for depression symptoms (p = 0.013, ab path B = −0.51, 95% CI −1.07 to −0.085), anxiety symptoms (p = 0.014, ab path B = −0.39, 95% CI −0.85 to −0.055), but not somatic complaints (p = 0.067, ab path B = −0.26, 95% CI −0.66 to 0.013) (online Supplemental Figs S1d–f).
Ruling out site differences as confound
While we explicitly modeled site as a random factor in all analyses, we conducted additional analyses stratifying by site. As indicated in online Supplemental Figs S2a–c, effects were comparable at both sites and interaction terms testing significant differences in effects between sites were all non-significant (p >0.19).
Discussion
We observed that internalizing symptoms among youth, but not child maltreatment or assault exposure, were related to a particular behavioral strategy during the task. Whereas youth with lower internalizing symptoms favored selecting task arms with higher expected value, youth with higher internalizing symptoms had less preference for selecting arms with higher expected value and instead demonstrated greater stochasticity in their choices. While softmax βs are linked with the well-known exploration/exploitation tradeoff, recent work on choice models during decision-making differentiates between directed and random exploration (Schulz & Gershman, Reference Schulz and Gershman2019; Wilson et al., Reference Wilson, Bonawitz, Costa and Ebitz2021). The former is exploration to obtain valuable information, whereas the latter reflects random noise in the decision-making process and is more akin to behavior captured by lower softmax βs. As such, the behavioral strategy observed among youth with higher internalizing symptoms appears less driven by expected reward probabilities and instead reflects underlying stochasticity in response selection.
To probe the mechanisms of this decision-making process and its relationship to reward expectations, we tested whether prospective representations of reward at the time of choice were coupled with expectations of reward. Consistent with hypotheses and the growing literature demonstrating a role for prospective memory representations as a fundamental mechanism of decision-making (Biderman et al., Reference Biderman, Bakkour and Shohamy2020; Doll et al., Reference Doll, Duncan, Simon, Shohamy and Daw2015; Gillespie et al., Reference Gillespie, Astudillo Maya, Denovellis, Liu, Kastner, Coulter and Frank2021; Moughrabi et al., Reference Moughrabi, Botsford, Gruichich, Azar, Heilicher, Hiser and Cisler2022; Schacter et al., Reference Schacter, Benoit and Szpunar2017), we observed significant coupling between reward expectations and magnitude of prospective reward representations. Our observation that multiple networks demonstrated significant coupling highlights a distributed network for reward encoding and is analogous to recent observations of the distributed, rather than localized, networks that encode subjective fear (Zhou et al., Reference Zhou, Zhao, Qi, Geng, Yao, Kendrick and Becker2021). Further, coupling in the salience, medial PFC, anterior insula, and striatum networks was strongly associated with behavioral strategies characterized by favoring the selection of arms with higher expected value. That is, youth who favored choosing high reward arms also generated greater prospective representations of reward towards high reward arms. Recent interest has increased in understanding mechanisms underlying noise in decision-making (Collins & Shenhav, Reference Collins and Shenhav2022; Schulz & Gershman, Reference Schulz and Gershman2019; Wilson et al., Reference Wilson, Bonawitz, Costa and Ebitz2021), and the current data, though correlational, support prospective representations of reward as a mechanism supporting a behavioral strategy characterized by favoring choices with higher expected value.
Next, we demonstrated that internalizing symptoms, but not assault exposure or maltreatment characteristics, were associated with less coupling between reward expectations and prospective representations of reward in the striatum network. Further, a statistical mediation model supported decreased coupling between reward expectations and prospective representations of reward as a mechanism mediating the association between internalizing symptoms and softmax βs. In this hypothesized model, the probability of reward for a given action does not engage a prospective representation for reward in the striatum among youth with internalizing symptoms. Consequently, youth with internalizing symptoms make decisions that are less governed by the likelihood of reward. These altered mechanisms of decision-making may help explain real-world behavior among youth with internalizing symptoms. For example, youth with depression symptoms may be biased to behaviorally withdraw and avoid ostensibly rewarding activities (e.g. social activities, going to school, extracurricular activities) due to a lack of generation of prospective mental representations of possible rewarding/meaningful occurrences during those activities.
The observation that internalizing symptoms, but not early life trauma that is a robust risk factor for internalizing symptoms, was related to the brain and behavioral alterations suggests these novel deficits in prospection are more strongly linked with the expression of psychopathology rather than risk for psychopathology. While prior research and theory suggests a link between childhood trauma and altered reward learning (Blair et al., Reference Blair, Aloi, Bashford-Largo, Zhang, Elowsky, Lukoff and Blair2022; Hanson et al., Reference Hanson, Hariri and Williamson2015; McLaughlin & Sheridan, Reference McLaughlin and Sheridan2016), it is not readily discernable why this link was not detected in the current study. It could be that prospective representations in the striatum are uniquely related to internalizing symptoms, whereas outcome processing of rewards is more linked with early life trauma (Cisler et al., Reference Cisler, Esbensen, Sellnow, Ross, Weaver, Sartin-Tarm and Kilts2019; Letkiewicz et al., Reference Letkiewicz, Cochran, Privratsky, James and Cisler2022). Future research with larger sample sizes is necessary to continue to differentiate the unique impacts of trauma v. psychopathology on the various facets of reward learning and decision-making.
To our knowledge, this is the first demonstration of prospective multivariate representations of reward in the striatum as a possible mechanism of altered decision-making among youth with internalizing symptoms. Nonetheless, these data are fully consistent with related prior work demonstrating altered striatal activation during the anticipation and receipt of reward among youth with internalizing symptoms (Auerbach et al., Reference Auerbach, Pagliaccio, Hubbard, Frosch, Kremens, Cosby and Pizzagalli2022; Stringaris et al., Reference Stringaris, Vidal-Ribas Belil, Artiges, Lemaitre, Gollier-Briant and Wolke2015), behavioral inhibition (Guyer et al., Reference Guyer, Benson, Choate, Bar-Haim, Perez-Edgar, Jarcho and Nelson2014), and adults with mood and anxiety disorders (Cooper, Arulpragasam, & Treadway, Reference Cooper, Arulpragasam and Treadway2018) and provide further support for emerging models emphasizing the role of altered decision-making for reward as a mechanism of psychopathology following trauma (Cisler & Herringa, Reference Cisler and Herringa2021; Fonzo, Reference Fonzo2018; McLaughlin et al., Reference McLaughlin, DeCross, Jovanovic and Tottenham2019; McLaughlin, Colich, Rodman, & Weissman, Reference McLaughlin, Colich, Rodman and Weissman2020). While we observed associations between internalizing symptoms and prospective reward representations in the striatum, it will be important to investigate additional brain regions and networks associated with episodic future thinking and reward [e.g. medial PFC, hippocampus, etc., (Peters & Büchel, Reference Peters and Büchel2010; Schacter et al., Reference Schacter, Benoit and Szpunar2017)] and link these mechanisms with treatment response (Berwian et al., Reference Berwian, Wenzel, Collins, Seifritz, Stephan, Walter and Huys2020; Webb, Murray, Tierney, Forbes, & Pizzagalli, Reference Webb, Murray, Tierney, Forbes and Pizzagalli2022).
The current study is not without limitation. The sample was limited to adolescent girls and generalization to males and adults needs to be established. We used a relatively simple three-arm bandit task of social reward learning with binary outcomes, and the degree to which the results generalize to more complex task [e.g. two stage Markov task (Daw, Gershman, Seymour, Dayan, & Dolan, Reference Daw, Gershman, Seymour, Dayan and Dolan2011)] needs to be tested. Our sample was recruited based on the presence of assault exposure, and while this resulted in a natural variation in the degree of internalizing symptoms in the current sample, testing among explicitly defined groups of youth with anxiety and depressive disorders is needed. Further, the effects we observed were limited to caregiver-report and future studies should seek to expand effects to additional modes of assessment.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S0033291723000478
Financial support
This work was supported by MH119132, MH108753, MH10680.