Hostname: page-component-78c5997874-xbtfd Total loading time: 0 Render date: 2024-11-05T04:54:48.990Z Has data issue: false hasContentIssue false

Diminished prospective mental representations of reward mediate reward learning strategies among youth with internalizing symptoms

Published online by Cambridge University Press:  07 March 2023

Josh M. Cisler*
Affiliation:
Department of Psychiatry and Behavioral Sciences, Dell Medical School, University of Texas at Austin, USA Institute for Early Life Adversity Research, Dell Medical School, University of Texas at Austin, USA
Amanda J. F. Tamman
Affiliation:
Menninger Department of Psychiatry and Behavioral Sciences, Baylor College of Medicine, Houston, TX, USA
Greg A. Fonzo
Affiliation:
Department of Psychiatry and Behavioral Sciences, Dell Medical School, University of Texas at Austin, USA Institute for Early Life Adversity Research, Dell Medical School, University of Texas at Austin, USA Center for Psychedelic Research and Therapy, Dell Medical School, University of Texas at Austin, USA
*
Author for correspondence: Josh M. Cisler, E-mail: [email protected]
Rights & Permissions [Opens in a new window]

Abstract

Background

Adolescent internalizing symptoms and trauma exposure have been linked with altered reward learning processes and decreased ventral striatal responses to rewarding cues. Recent computational work on decision-making highlights an important role for prospective representations of the imagined outcomes of different choices. This study tested whether internalizing symptoms and trauma exposure among youth impact the generation of prospective reward representations during decision-making and potentially mediate altered behavioral strategies during reward learning.

Methods

Sixty-one adolescent females with varying exposure to interpersonal violence exposure (n = 31 with histories of physical or sexual assault) and severity of internalizing symptoms completed a social reward learning task during fMRI. Multivariate pattern analyses (MVPA) were used to decode neural reward representations at the time of choice.

Results

MVPA demonstrated that rewarding outcomes could accurately be decoded within several large-scale distributed networks (e.g. frontoparietal and striatum networks), that these reward representations were reactivated prospectively at the time of choice in proportion to the expected probability of receiving reward, and that youth with behavioral strategies that favored exploiting high reward options demonstrated greater prospective generation of reward representations. Youth internalizing symptoms, but not trauma exposure characteristics, were negatively associated with both the behavioral strategy of exploiting high reward options as well as the prospective generation of reward representations in the striatum.

Conclusions

These data suggest diminished prospective mental simulation of reward as a mechanism of altered reward learning strategies among youth with internalizing symptoms.

Type
Original Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
Copyright © The Author(s), 2023. Published by Cambridge University Press

Introduction

Early life trauma, including emotional, physical, and sexual abuse, is a well-established risk factor for multiple forms of mental and physical well-being (Dube et al., Reference Dube, Anda, Felitti, Chapman, Williamson and Giles2001, Reference Dube, Felitti, Dong, Chapman, Giles and Anda2003; Felitti et al., Reference Felitti, Anda, Nordenberg, Williamson, Spitz, Edwards and Marks1998). While impact of early life trauma on behavioral and neurophysiological systems related to stress and threat responding are primary mechanisms explaining conferred risk for psychopathology (Mcewen, Reference Mcewen2004; McLaughlin, Sheridan, Humphreys, Belsky, & Ellis, Reference McLaughlin, Sheridan, Humphreys, Belsky and Ellis2021; Nemeroff, Reference Nemeroff2016), there has been growing recognition and interest in the role of systems related to decision-making and reward learning as additional and non-mutually exclusive pathways to psychopathology (Fonzo, Reference Fonzo2018; Hanson, Williams, Bangasser, & Peña, Reference Hanson, Williams, Bangasser and Peña2021; McLaughlin, DeCross, Jovanovic, & Tottenham, Reference McLaughlin, DeCross, Jovanovic and Tottenham2019). Further elaboration of specific mechanistic pathways will hopefully continue to inform development of prevention and intervention modalities.

Several lines of research support a mechanistic pathway of altered reward learning and decision-making mediating the relationship between early life trauma and psychopathology, particularly internalizing symptoms. Youth exposed to early life trauma learn reward contingencies more slowly and have decreased activation of striatum and dorsal anterior cingulate during reward learning tasks (Cisler et al., Reference Cisler, Esbensen, Sellnow, Ross, Weaver, Sartin-Tarm and Kilts2019; Gerin et al., Reference Gerin, Puetz, Blair, White, Sethi, Hoffmann and McCrory2017; Hanson, Hariri, & Williamson, Reference Hanson, Hariri and Williamson2015; Harms, Shannon Bowen, Hanson, & Pollak, Reference Harms, Shannon Bowen, Hanson and Pollak2018; Lenow, Scott Steele, Smitherman, Kilts, & Cisler, Reference Lenow, Scott Steele, Smitherman, Kilts and Cisler2014). Similarly, youth with internalizing disorders demonstrate decreased striatal responses during the receipt and anticipation of reward (Auerbach, Admon, & Pizzagalli, Reference Auerbach, Admon and Pizzagalli2014; Keren et al., Reference Keren, O'Callaghan, Vidal-Ribas, Buzzell, Brotman, Leibenluft and Stringaris2018; Rappaport, Kandala, Luby, & Barch, Reference Rappaport, Kandala, Luby and Barch2020), consistent with altered neural reward responsiveness as a mechanism of observed clinical symptoms (e.g. anhedonia, avoidance of potentially rewarding activities, etc). Indeed, prospective studies demonstrate that decreased striatal reactivity to rewards predict development of future internalizing symptoms among youth (Hanson et al., Reference Hanson, Hariri and Williamson2015; Stringaris et al., Reference Stringaris, Vidal-Ribas Belil, Artiges, Lemaitre, Gollier-Briant and Wolke2015). While decreased striatal responses to reward are more consistently observed among depressed youth (Tang et al., Reference Tang, Harrewijn, Benson, Haller, Guyer, Perez-Edgar and Fox2022), reduced striatal activation to reward has also been observed in large samples of youth with anxiety disorders (Auerbach et al., Reference Auerbach, Pagliaccio, Hubbard, Frosch, Kremens, Cosby and Pizzagalli2022), and altered striatal response to reward also predicts anxiety symptom reduction during treatment among youth with anxiety disorders (Sequeira et al., Reference Sequeira, Silk, Ladouceur, Hanson, Ryan, Morgan and Forbes2021), possibly by enabling greater engagement with therapy.

The role of prospective episodic memory and mental simulation represent an emerging area of interest in the study of reward learning and decision-making (Biderman, Bakkour, & Shohamy, Reference Biderman, Bakkour and Shohamy2020; Dasgupta & Gershman, Reference Dasgupta and Gershman2021; Mattar & Lengyel, Reference Mattar and Lengyel2022; Schacter, Benoit, & Szpunar, Reference Schacter, Benoit and Szpunar2017; Sosa & Giocomo, Reference Sosa and Giocomo2021), though these processes have never been examined among at-risk youth. Numerous lines of research using animal and human models demonstrates that neural patterns associated with memory representations for the possible outcomes of a choice are activated at the time of choice as a form of mental simulation of future events (i.e. neural ‘preplay’) (Biderman et al., Reference Biderman, Bakkour and Shohamy2020; Doll, Duncan, Simon, Shohamy, & Daw, Reference Doll, Duncan, Simon, Shohamy and Daw2015; Schacter et al., Reference Schacter, Benoit and Szpunar2017; Shadlen & Shohamy, Reference Shadlen and Shohamy2016; Sosa & Giocomo, Reference Sosa and Giocomo2021; Widloski & Foster, Reference Widloski and Foster2022; Wikenheiser & Redish, Reference Wikenheiser and Redish2015; Yu & Frank, Reference Yu and Frank2015; Zielinski, Tang, & Jadhav, Reference Zielinski, Tang and Jadhav2020). For example, memory representations for an aversive outcome become active prior to selecting amongst choices where an aversive outcome is possible and the magnitude of these representations predicts subsequent choices to avoid the expected aversive outcome (Castegnetti et al., Reference Castegnetti, Tzovara, Khemka, Melinščak, Barnes, Dolan and Bach2020; Moughrabi et al., Reference Moughrabi, Botsford, Gruichich, Azar, Heilicher, Hiser and Cisler2022). One emerging model explaining these phenomena posits that reactivation of memory representations reflects a prospective planning process, whereby the learner imagines possible outcomes for different branches of a decision tree and uses these imagined outcomes to inform selection of an appropriate response given the current context and goals (Biderman et al., Reference Biderman, Bakkour and Shohamy2020; Doll et al., Reference Doll, Duncan, Simon, Shohamy and Daw2015; Schacter et al., Reference Schacter, Benoit and Szpunar2017). Further, experimental studies suggest that engaging imagined future rewarding outcomes increases reward-related neural activity in the medial prefrontal cortex (Peters & Büchel, Reference Peters and Büchel2010). Note that mental simulation of imagined outcomes as a mechanism of reward decision-making is a separate, though likely related process, to reward anticipation.

Testing the hypothesis of altered reactivation of reward representations at the time of choice among at risk youth has the potential to extend and complement prior work suggesting altered striatal and salience network activity during the anticipation and receipt of reward outcomes (Auerbach et al., Reference Auerbach, Pagliaccio, Hubbard, Frosch, Kremens, Cosby and Pizzagalli2022; Birn, Roeber, & Pollak, Reference Birn, Roeber and Pollak2017; Cisler et al., Reference Cisler, Esbensen, Sellnow, Ross, Weaver, Sartin-Tarm and Kilts2019; Harms et al., Reference Harms, Shannon Bowen, Hanson and Pollak2018; Lenow et al., Reference Lenow, Scott Steele, Smitherman, Kilts and Cisler2014). Indeed, understanding processes at the time of choice during laboratory tasks may help explain clinical behavior in this population, such as choices to behaviorally withdraw and/or avoid activities. For example, decreased mental simulation of reward might help explain behavioral withdrawal, such that youth who cannot engage a mental simulation of a rewarding outcome see little reason to exert effort to engage in the behavior. In the context of laboratory reinforcement learning tasks (e.g. bandit tasks), response selection is a separate, though related, process from response valuation. One concept related to selecting responses with varying degrees of expected value is the exploration-exploitation tradeoff (Daw, O'Doherty, Dayan, Seymour, & Dolan, Reference Daw, O'Doherty, Dayan, Seymour and Dolan2006; Schulz & Gershman, Reference Schulz and Gershman2019; Wilson, Bonawitz, Costa, & Ebitz, Reference Wilson, Bonawitz, Costa and Ebitz2021). Exploitation broadly refers to a strategy that favors selecting responses that have a high expectation of value; exploration broadly refers to a strategy favoring a wider sampling of available response. Exploration has been differentiated into random exploration and information-directed exploitation (Schulz & Gershman, Reference Schulz and Gershman2019; Wilson et al., Reference Wilson, Bonawitz, Costa and Ebitz2021). The latter refers to a strategy of sampling amongst available choices for the explicit purpose of gaining information about those choices. The former refers to an ostensibly stochastic process underlying response selection, such that choice is uncoupled from both the choice's expected outcome probability and the value of gaining information about the environment by selecting that choice. Whereas younger children tend to show random exploration, adolescents show increasingly structured information-directed exploration (Meder, Wu, Schulz, & Ruggeri, Reference Meder, Wu, Schulz and Ruggeri2021; Somerville et al., Reference Somerville, Sasse, Garrad, Drysdale, Abi Akar, Insel and Wilson2017). In the context of prospective memory representations for reward and mental simulation as a mechanism for decision-making, it is plausible that individual differences in random exploration are explained by individual differences in mental simulation for reward. For example, youth exposed to trauma and/or with internalizing symptoms with limited access to reward memory exemplars might be expected to make ostensibly stochastic decisions for reasons other than expected value due to their difficulty generating prospective reward representations.

No prior research has tested this hypothesis about prospective memory representations, with only limited and inconsistent prior computationally-driven behavioral investigations of choice strategies during reward learning among youth with trauma exposure and/or internalizing symptoms (Cisler et al., Reference Cisler, Esbensen, Sellnow, Ross, Weaver, Sartin-Tarm and Kilts2019; Harms et al., Reference Harms, Shannon Bowen, Hanson and Pollak2018; Humphreys et al., Reference Humphreys, Lee, Telzer, Gabard-Durnam, Goff, Flannery and Tottenham2015; Sheridan et al., Reference Sheridan, McLaughlin, Winter, Fox, Zeanah and Nelson2018). Some studies using foraging tasks suggest increased exploitation among adults with significant histories of early life adversity (Lenow, Constantino, Daw, & Phelps, Reference Lenow, Constantino, Daw and Phelps2017; Lloyd, McKay, & Furl, Reference Lloyd, McKay and Furl2022). A large sample of previously institutionalized youth demonstrated greater exploitation compared to typically developing youth on a risky decision-making task (Humphreys et al., Reference Humphreys, Lee, Telzer, Gabard-Durnam, Goff, Flannery and Tottenham2015), though this task may better reflect risk-taking (Humphreys, Lee, & Tottenham, Reference Humphreys, Lee and Tottenham2013; Lejuez et al., Reference Lejuez, Read, Kahler, Richards, Ramsey, Stuart and Brown2002) than exploration. By contrast, one small prior study using a three-arm bandit task found increased choice stochasticity during social decision-making among assaulted adolescent girls (Lenow, Cisler, & Bush, Reference Lenow, Cisler and Bush2015), and a larger study of youth with mixed histories of assault and clinical symptoms completing a similar task did not identify significant relationships between trauma exposure variables and exploration / exploitation strategies (Cisler et al., Reference Cisler, Esbensen, Sellnow, Ross, Weaver, Sartin-Tarm and Kilts2019). Among adults, a meta-analysis identified decreased reward sensitivity among depressed individuals (Huys, Pizzagalli, Bogdan, & Dayan, Reference Huys, Pizzagalli, Bogdan and Dayan2013), though as the authors note, their reward sensitivity parameter was mathematically interchangeable with an exploitation parameter, consistent with other research among depressed adults (Blanco, Otto, Maddox, Beevers, & Love, Reference Blanco, Otto, Maddox, Beevers and Love2013; Dubois & Hauser, Reference Dubois and Hauser2022). Accordingly, further investigation into choice selection strategies and their neurocircuitry mechanisms among youth exposed to trauma and/or with internalizing symptoms is necessary.

Here, we aim to investigate aberrant generation of prospective memory representations for reward and their relationships with reward learning strategies as well as trauma exposure and internalizing symptoms among youth.

Methods

61 adolescent girls, age 11–17, participated in the study at two different sites: Little Rock, AR and the surrounding area (n = 26 participants; n = 13 exposed to assault), and Madison, WI and the surrounding area (n = 35 participants; n = 18 exposed to assault). Participants were recruited from community-wide advertising, social medial posting, and outpatient mental health clinic referrals. Healthy controls were recruited based on absence of current mental health disorders, trauma exposure, and psychiatric treatment histories. Inclusion criteria for the assaulted group consisted of a history of directly experienced physical or sexual assault that the participant could remember. Exclusion criteria for all participants included histories of psychotic symptoms, developmental disorders, major medical disorders, MRI contraindications, pregnancy, history of loss of consciousness greater than 10 min. Psychotropic medication was not exclusionary for the assaulted adolescents; however, a stable dose on any medication for at least 4 weeks was required. Table 1 presents clinical and demographic characteristics. Imaging data were excluded for one participant, an assaulted girl, due to excessive head motion, and imaging data were unusable from two participants, both controls, due to technical error during scanning. The imaging analyses included 58 participants and all participants' data were used in behavioral analyses. All study procedures were approved by the local IRB committees.

Table 1. Clinical and demographic characteristics of the participants

Note. IQ was assessed from the Receptive One-Word Picture Vocabulary Test. CTQ, Childhood Trauma Questionnaire; UCLA PTSD RI, UCLA PTSD Reaction Index; CAPS, Clinician Administered PTSD Scale; CBCL, Child Behavior Checklist; CBCL values represent raw values; DERS, Difficulties in Emotion Regulation Scale. Psychopathology was assessed using the Mini-International Neuropsychiatric Interview for Children and Adolescents (MINI Kid). Bolded values represent a statistical difference, two-tailed (p < 0.05).

Portions of these data pertaining to the impact of trauma characteristics on outcome processing (i.e. prediction error encoding and latent state belief updating) have previously been published (Cisler et al., Reference Cisler, Esbensen, Sellnow, Ross, Weaver, Sartin-Tarm and Kilts2019; Letkiewicz, Cochran, Privratsky, James, & Cisler, Reference Letkiewicz, Cochran, Privratsky, James and Cisler2022). The present analysis is a novel investigation of multivariate representations at the time of choice as a function of trauma exposure characteristics and internalizing symptoms.

Assessments

Internalizing symptoms were assessed with the caregiver-rated Child Behavior Checklist (Achenbach, Reference Achenbach1991) (CBCL), consisting of the sum of anxiety, depression, and somatic concern subscales. The Clinician Administered PTSD Scale, Child and Adolescent Version (CAPS) (Nader, Blake, Pynoos, Newman, & Weathers, Reference Nader, Blake, Pynoos, Newman and Weathersn.d.), was used to assess PTSD symptoms, and PTSD diagnoses followed definitions established by prior studies among youth (Cohen, Deblinger, Mannarino, & Steer, Reference Cohen, Deblinger, Mannarino and Steer2004). The Mini-International Neuropsychiatric Interview for Children and Adolescents (MINI-KID) (Sheehan et al., Reference Sheehan, Sheehan, Shytle, Janavs, Bannon, Rogers and Wilkinson2010) assessed for current and lifetime comorbid mental health disorders. Assault exposure histories were defined using the trauma assessment section of the National Survey of Adolescents (NSA) (Kilpatrick et al., Reference Kilpatrick, Ruggiero, Acierno, Saunders, Resnick and Best2003). Participants also completed the Childhood Trauma Questionnaire (Bernstein et al., Reference Bernstein, Fink, Handelsman, Foote, Lovejoy, Wenzel and Ruggiero1994), providing a continuous measure of the total severity of early life maltreatment and trauma across the domains of emotional abuse, physical abuse, sexual abuse, emotional neglect, and physical neglect. We also assessed participants' verbal IQ (Brownell, Reference Brownell2000).

MRI acquisition and image preprocessing

See online Supplemental material.

Reinforcement learning task

Participants completed a three-arm bandit task using social stimuli (Fig. 1a) in a counterbalanced order. Participants were directed to give $10 to one of three mock people who returned either $20 or $0. The probabilities of positive returns varied by arm, either 80, 50, or 20%. Probabilities changed across the mock people every 30 trials, for a total of 90 trials. The same faces were used for all trials. Participants were informed that their compensation would be proportional to task performance. Additional information is provided in online Supplemental material and Fig. 1 legend.

Fig. 1. (a) Depiction of the social reward three-arm bandit task. Participants completed 90 trials. Trials began with presentation of three faces and participants chose one face in which to invest $10. The choice phase lasted until participants made a selection, which was then indicated with a blue box around it for 1s. An anticipation phase followed while they waited for the outcome of the choice, which consisted of a jittered fixation cross for 1.5–3s. The outcome phase was subsequently displayed and consisted of binary return of either $20 (net increase of $10) or no return (net loss of $10). The outcome phase presented the outcome of the trial (win or loss) for 2s, updated the points total for 1s, followed by a jittered fixation cross of 1.5–3s prior to starting the next trial. (b) Depiction of the MVPA pipeline. For each ICA network separately, trial × voxel matrices of beta coefficients are created for all participants except one left out participant separately for reward outcomes during the task. Support vector machine classifiers are then trained on these data, resulting in a decoder for reward outcomes. Next, this reward decoder is applied to the trial × voxel matrix of beta coefficients at the time choice for the participant that was left out of the training. This results in a prediction about the degree to which the reward representations are active at the time of choice, which can be compared to the magnitude of reward the participant was expecting for that given choice. This process is repeated until each participant has served as the left-out test participant.

Modeling Reinforcement Learning. Behavior during the RL tasks was modeled using versions of the Rescorla-Wagner (RW) model (Sutton & Barto, Reference Sutton and Barto1998). Consistent with prior research (Hauser, Iannaccone, Walitza, Brandeis, & Brem, Reference Hauser, Iannaccone, Walitza, Brandeis and Brem2015; Ross, Lenow, Kilts, & Cisler, Reference Ross, Lenow, Kilts and Cisler2018), four different RW-based models were tested, which manipulated whether the model updated the expected value of the unchosen option (Hauser et al., Reference Hauser, Iannaccone, Walitza, Brandeis and Brem2015) and whether the model was risk-sensitive (Niv, Edlund, Dayan, & O'Doherty, Reference Niv, Edlund, Dayan and O'Doherty2012). Expected reward values for each arm were transformed into choice probabilities using a softmax function, providing individually varying βs that reflect the degree to which an individual's choices are driven by reward expectations. Model fitting was conducted using hierarchical Bayesian inference (Piray, Dezfouli, Heskes, Frank, & Daw, Reference Piray, Dezfouli, Heskes, Frank and Daw2019). See online Supplemental material for additional information.

Independent Component Analysis. An Independent Component Analysis (Calhoun, Adali, Pearlson, & Pekar, Reference Calhoun, Adali, Pearlson and Pekar2001) (ICA) with a model order of 35 components was conducted on the full voxelwise fMRI timecourses. This model order delivered a good balance between component reliability estimated across 50 ICASSO iterations and interpretability of canonical networks. 8 of the 35 components were deemed functional networks of interest after visual inspection (see Fig. 3a below). Components arising from artifacts of head motion or CSF and components of non-interest (i.e. motor, sensorimotor, and visual networks), which are not hypothesized to be relevant for understanding trauma, internalizing symptoms, reward learning, or PTSD (Auerbach et al., Reference Auerbach, Pagliaccio, Hubbard, Frosch, Kremens, Cosby and Pizzagalli2022; Patel, Spreng, Shin, & Girard, Reference Patel, Spreng, Shin and Girard2012), were excluded.

Multivariate pattern analyses of prospective mental representations during choice

Figure 1b provides an overview of the analytical approach, which is in direct accord with our previous MVPA investigation of prospective representations of reward and threat as a mechanism of decision-making (Moughrabi et al., Reference Moughrabi, Botsford, Gruichich, Azar, Heilicher, Hiser and Cisler2022). The first step was to demonstrate that network activity patterns at the time of reward delivery could accurately be decoded. Each participants' trial-by-trial activation patterns at the time of reward delivery were characterized using 3 dLSS. The timepoint × voxel matrices were centered within each timepoint to ensure no differences in overall activation across trials. Support vector machines (SVM), using a radial basis function kernel implemented in Matlab through libsvm (Chang & Lin, Reference Chang and Lin2011), were used to decode reward outcomes (binary classification). We established the accuracy of the decoders using leave-one-out cross-validation across subjects (i.e. one subject was designated as the left-out test subject, decoders were trained on the remaining test subjects (i.e. N-1 sample size), then the decoder was tested on the independent left-out subject's data. This process was repeated until all subjects served as the left-out test subject. The reward decoder accuracy was defined as the mean of sensitivity and specificity.

After testing accuracy of the reward decoders, the next step was to apply the reward decoders to participant's data at the time of choice. 3dLSS was used to define trial-by-trial activation at the time of choice. A leave-one-out approach was used, such that a subject was designated as the left-out test subject, the reward decoders were trained on all remaining participants' reward outcome data, and the resulting reward decoders were applied to the left-out participant's choice data. This process was repeated for each subject. This resulted in hyperplane distances representing the degree to which the trained multivariate patterns (reward outcomes) were active at the time of choice. This process was repeated separately for each ICA network of interest, resulting in unique predictions (i.e. hyperplane distances) about reward representation activation for each separate network.

Our primary interest was investigating coupling between prospective reward representation at the time of choice and the expected reward value, derived from the computational model, of the chosen arm. That is, the degree to which a youth is expecting reward for a given choice should be related to the degree of activation of prospective reward representations at the time of that choice. To test this hypothesis, we conducted linear mixed effects models (LMEMs), in which trial-by-trial reward expectations (V of the chosen arm from the fitted computational model) were regressed onto the trial-by-trial hyperplane distances. We stringently controlled for multiple comparisons across the 8 ICA networks with Bonferroni correction, resulting in a corrected alpha of p = 0.0063. These models included covariates for age, IQ, and head motion. We included an additional covariate for each subject's cross-validation reward decoding accuracy (Greene et al., Reference Greene, Shen, Noble, Horien, Hahn, Arora and Constable2022). Main results without these covariates, which remain essentially unchanged, are included in the online Supplemental material. We modeled subject and site as random effects in all models, with subject nested within site.

LMEMs then tested whether individual differences moderated the coupling between prospective reward representations (hyperplane distances) and expected reward, using identical models and including interaction terms with the individual difference variable. We first investigated associations with trauma exposure (continuous measure of log transformed CTQ total score or dichotomous assault exposure in separate LMEMs) on coupling of reward representations with expected reward. Subsequent models then retained trauma exposure severity (log transformed CTQ total score) as a covariate and tested CBCL internalizing symptoms, PTSD symptoms, and decomposed CBCL internalizing symptoms into its constituent scales of depression, anxiety, and somatic complaints. While the study recruited controls and assaulted participants as separate groups, given the continuous distributions of CTQ total scores and internalizing symptoms (online Supplemental Fig. S1), we opted to use these continuous variables among the entire sample to conserve statistical power. Bonferroni correction again controlled for family-wise multiple comparisons. Mediation analyses tested the significance of hypothesized indirect effects through bootstrapping with replacement using 50 000 iterations following contemporary recommendations for mediation analyses (Hayes & Rockwood, Reference Hayes and Rockwood2017).

Results

Relationship between learning parameters and clinical characteristics

We first investigated relationships between clinical variables and softmax βs from the best fitting model (Fig. 2a). Regression models, conducted separately for CTQ total scores and dichotomous control v. assault group comparisons, did not demonstrate significant relationships between softmax βs and CTQ total scores, p = 0.76 (Fig. 2b) nor dichotomous control v. assaulted group comparisons, p = 0.58. When controlling for CTQ total scores, identical models demonstrated that CBCL internalizing symptoms were significantly related to softmax βs, t(51) = −3.15, p = 0.003 (Fig. 2c), demonstrating decreased choice preference for high reward options and greater response stochasticity. Decomposing internalizing symptoms in separate models demonstrated similar relationships with depression symptoms, t(51) = −2.70, p = 0.009, anxiety, t(51) = −3.2, p = 0.002, and somatic complaints, t(51) = −2.37, p = 0.02 (online Supplemental Figs S1a–c). CAPS total symptom severity scores among the traumatized youth were similarly negatively related to softmax βs, t(25) = −2.54, p = 0.018. There were no relationships between trauma characteristics and clinical variables with positive or negative learning rates (ps > 0.3).

Fig. 2. (a) Akaike Information Criterion values of model fit for the compared models. We tested a factorial manipulation of anticorrelated or not anticorrelated models (denoted with A+ or A−) and risk sensitive or not risk sensitivity models (denoted with RS+ or RS−). Consistent with our past studies using Matlab's fmincon for model fitting (Cisler et al., Reference Cisler, Esbensen, Sellnow, Ross, Weaver, Sartin-Tarm and Kilts2019; Ross et al., Reference Ross, Lenow, Kilts and Cisler2018), our updated approach using hierarchical Bayesian inference (Piray et al., Reference Piray, Dezfouli, Heskes, Frank and Daw2019) similarly demonstrated the anticorrelated and risk sensitive model fit the data best. (b) There were no relationships between Childhood Trauma Questionnaire total severity scores and softmax βs, representing individual differences in exploitation / exploration strategies on the task. (c) There was a significant inverse relationship between CBCL internalizing symptoms and softmax βs, suggesting decreased exploitative behavior among those with greater internalizing symptoms.

Multivariate representations for reward at the time of choice and coupling with reward expectations

Leave-one-out cross-validation accuracy for reward outcomes was above chance for all ICA networks (Fig. 3b), demonstrating that reward (v. loss) outcomes in a left-out participant could accurately be decoded from the other participants' patterns of voxel activity. We also observed that classifier cross-validation accuracy was not correlated with trauma characteristics (ps > 0.31 for assault group, ps > 0.47 for CTQ total score), internalizing symptoms (ps > 0.19), or PTSD symptom severity (ps > 0.6), suggesting that decoded reward representations were equally accurate regardless of trauma or clinical symptoms.

Fig. 3. (a) Depiction of spatial maps from the Independent Component Analysis. (b) Reward decoding performance for each ICA network. Decoding performance was defined as the mean of sensitivity and specificity in correctly classifying reward outcomes from the left-out participant using the model trained on the remaining participants' data. (c) β coefficients reflecting the degree to which value expectation, derived from the computational model, of the chosen arm on the task predicted the magnitude of MVPA predicted reward presentations (i.e. SVM hyperplane predictions) at the time of choice. All networks demonstrated significant coupling between reward expectation and magnitude of reward representations when controlling for multiple comparisons. (d). ICA networks demonstrating significant interactions between softmax βs and coupling between reward expectation and magnitude of reward representations (i.e. SVM hyperplane predictions), suggesting that those who generated greater reward representations in proportion to expected reward also tended to use behavioral strategies to exploit high reward arms.

SVM classifiers were then applied to left-out participants' voxel patterns at the time of choice, resulting in trial-by-trial predictions about the degree to which reward representations were active while the participant contemplated which arm of the task to select. LMEMs tested the degree to which these trial-by-trial prospective reward representations were coupled with trial-by-trial reward expectations (i.e. V) derived from the computational model fit to participants' observed behavior. These models demonstrated that prospective reward representations in each of the tested networks were strongly coupled with expected reward for the chosen arm (Fig. 3c).

We next tested whether this coupling between prospective reward representations and expected reward varied as a function of behavioral strategies on the task. LMEMs demonstrated that coupling between reward representations and expected reward was positively associated with softmax βs in the salience, t(4690) = 3.22, p = 0.001, medial PFC, t(4690) = 3.88, p < 0.001, anterior insula, t(4690) = 3.41, p < 0.001, and striatum networks, t(4690) = 3.39, p < 0.001 (Fig. 3d), such that individuals who generated greater prospective reward representations in proportion to the expected reward probabilities of the chosen arm also demonstrated behavioral strategies favoring the selection of high value arms.

Associations among clinical characteristics and coupling between reward representations and expected reward

LMEMs demonstrated that greater CBCL internalizing symptoms was associated with de-coupling of reward expectations for a chosen arm and activation of prospective reward representations in the striatum network, t(4847) = −3.66, p < 0.001 (Fig. 4a). Additional models decomposing CBCL internalizing symptoms demonstrated similar relationships with depression, t(4847) = 3.94, p = 0.001, anxiety, t(4847) = 3.07, p = 0.002, and somatic complaints, t(4847) = −2.01, p = 0.04. Neither trauma characteristics (all p > 0.42 for CTQ total score; all p > 0.06 for assault group comparisons) nor PTSD symptom severity among the assaulted adolescents (all p > 0.048) were associated with coupling of prospective reward representations and reward expectations in any network when controlling for multiple comparisons. While these models controlled for overall trauma severity (CTQ total score), we conducted an additional post-hoc analysis to differentiate associations with assault exposure (i.e. the variable used for inclusion into the study) and internalizing symptoms (see Fig. 4b and 4c).

Fig. 4. (a) Scatter plot depicting relationship between CBCL internalizing symptoms and coupling between MVPA reward representations during choice and reward expectations. (b) Even though we controlled for CTQ total severity in our primary analyses, we conducted an additional analysis differentiating effects of assault exposure and internalizing symptoms. We used a median split to identify control adolescents with low v. high internalizing symptoms, and separately used a median split to identify assaulted adolescents with low v. high internalizing symptoms. Separating the sample in this manner allows differentiation of impacts due to assault exposure and internalizing symptoms. If coupling of prospective reward representations in the striatum were more strongly associated with assault exposure, we would expect that both assault groups would demonstrate impairment relative to both control groups, with relative homogeneity within groups. By contrast, if coupling of prospective reward representations in the striatum were more strongly associated with internalizing symptoms, we would instead expect coupling of prospective reward representations to follow the pattern of internalizing symptoms across the groups in accordance with panel B. (c) As can be seen in Fig. 4c, individual differences in coupling with prospective reward representations clearly tracked individual differences in internalizing symptoms and not assault exposure, t(51) = −3.14, p = 0.003 (regression model with group coded as follows in accordance with differences in CBCL internalizing symptoms [see panel B]: control low symptoms = 0, control high symptoms = 1; assault low symptoms = 1, assault high symptoms = 2).

As an additional test of specificity, we demonstrated that internalizing symptoms, but not externalizing symptoms, were related to altered coupling of reward representations in the striatum (see online Supplemental material).

Prospective reward representations mediate the association between internalizing symptoms and behavioral strategies during learning

We statistically tested whether coupling between prospective reward representations and reward expectation in the striatum mediated the association internalizing symptoms and softmax βs (Fig. 5a). We observed a significant indirect effect of internalizing symptoms through prospective reward representations in the striatum when tested through bootstrapping with 50 000 iterations (p = 0.014, ab path B = −0.36, 95% CI −0.76 to −0.055 (Fig. 5b). Decomposing internalizing symptoms, the indirect effect mediating pathway was also significant for depression symptoms (p = 0.013, ab path B = −0.51, 95% CI −1.07 to −0.085), anxiety symptoms (p = 0.014, ab path B = −0.39, 95% CI −0.85 to −0.055), but not somatic complaints (p = 0.067, ab path B = −0.26, 95% CI −0.66 to 0.013) (online Supplemental Figs S1d–f).

Fig. 5. (a) Graphical depiction of mediation model, where internalizing symptoms predict decreased coupling between MVPA reward representations and expectations of reward in the striatum (i.e. path a), and decreased coupling of reward representations in the striatum predict decreased choices to exploit high reward arms on the task (i.e. path b). Path c refers to the total effect of internalizing symptoms on behavioral strategies on the task, and path c’ refers to the direct effect after accounting for the indirect effect (i.e. path ab) through MVPA reward representations. (b). The significance of the indirect effect was tested through 50 000 bootstrap iterations and demonstrating that the 95% confidence interval does not include zero.

Ruling out site differences as confound

While we explicitly modeled site as a random factor in all analyses, we conducted additional analyses stratifying by site. As indicated in online Supplemental Figs S2a–c, effects were comparable at both sites and interaction terms testing significant differences in effects between sites were all non-significant (p >0.19).

Discussion

We observed that internalizing symptoms among youth, but not child maltreatment or assault exposure, were related to a particular behavioral strategy during the task. Whereas youth with lower internalizing symptoms favored selecting task arms with higher expected value, youth with higher internalizing symptoms had less preference for selecting arms with higher expected value and instead demonstrated greater stochasticity in their choices. While softmax βs are linked with the well-known exploration/exploitation tradeoff, recent work on choice models during decision-making differentiates between directed and random exploration (Schulz & Gershman, Reference Schulz and Gershman2019; Wilson et al., Reference Wilson, Bonawitz, Costa and Ebitz2021). The former is exploration to obtain valuable information, whereas the latter reflects random noise in the decision-making process and is more akin to behavior captured by lower softmax βs. As such, the behavioral strategy observed among youth with higher internalizing symptoms appears less driven by expected reward probabilities and instead reflects underlying stochasticity in response selection.

To probe the mechanisms of this decision-making process and its relationship to reward expectations, we tested whether prospective representations of reward at the time of choice were coupled with expectations of reward. Consistent with hypotheses and the growing literature demonstrating a role for prospective memory representations as a fundamental mechanism of decision-making (Biderman et al., Reference Biderman, Bakkour and Shohamy2020; Doll et al., Reference Doll, Duncan, Simon, Shohamy and Daw2015; Gillespie et al., Reference Gillespie, Astudillo Maya, Denovellis, Liu, Kastner, Coulter and Frank2021; Moughrabi et al., Reference Moughrabi, Botsford, Gruichich, Azar, Heilicher, Hiser and Cisler2022; Schacter et al., Reference Schacter, Benoit and Szpunar2017), we observed significant coupling between reward expectations and magnitude of prospective reward representations. Our observation that multiple networks demonstrated significant coupling highlights a distributed network for reward encoding and is analogous to recent observations of the distributed, rather than localized, networks that encode subjective fear (Zhou et al., Reference Zhou, Zhao, Qi, Geng, Yao, Kendrick and Becker2021). Further, coupling in the salience, medial PFC, anterior insula, and striatum networks was strongly associated with behavioral strategies characterized by favoring the selection of arms with higher expected value. That is, youth who favored choosing high reward arms also generated greater prospective representations of reward towards high reward arms. Recent interest has increased in understanding mechanisms underlying noise in decision-making (Collins & Shenhav, Reference Collins and Shenhav2022; Schulz & Gershman, Reference Schulz and Gershman2019; Wilson et al., Reference Wilson, Bonawitz, Costa and Ebitz2021), and the current data, though correlational, support prospective representations of reward as a mechanism supporting a behavioral strategy characterized by favoring choices with higher expected value.

Next, we demonstrated that internalizing symptoms, but not assault exposure or maltreatment characteristics, were associated with less coupling between reward expectations and prospective representations of reward in the striatum network. Further, a statistical mediation model supported decreased coupling between reward expectations and prospective representations of reward as a mechanism mediating the association between internalizing symptoms and softmax βs. In this hypothesized model, the probability of reward for a given action does not engage a prospective representation for reward in the striatum among youth with internalizing symptoms. Consequently, youth with internalizing symptoms make decisions that are less governed by the likelihood of reward. These altered mechanisms of decision-making may help explain real-world behavior among youth with internalizing symptoms. For example, youth with depression symptoms may be biased to behaviorally withdraw and avoid ostensibly rewarding activities (e.g. social activities, going to school, extracurricular activities) due to a lack of generation of prospective mental representations of possible rewarding/meaningful occurrences during those activities.

The observation that internalizing symptoms, but not early life trauma that is a robust risk factor for internalizing symptoms, was related to the brain and behavioral alterations suggests these novel deficits in prospection are more strongly linked with the expression of psychopathology rather than risk for psychopathology. While prior research and theory suggests a link between childhood trauma and altered reward learning (Blair et al., Reference Blair, Aloi, Bashford-Largo, Zhang, Elowsky, Lukoff and Blair2022; Hanson et al., Reference Hanson, Hariri and Williamson2015; McLaughlin & Sheridan, Reference McLaughlin and Sheridan2016), it is not readily discernable why this link was not detected in the current study. It could be that prospective representations in the striatum are uniquely related to internalizing symptoms, whereas outcome processing of rewards is more linked with early life trauma (Cisler et al., Reference Cisler, Esbensen, Sellnow, Ross, Weaver, Sartin-Tarm and Kilts2019; Letkiewicz et al., Reference Letkiewicz, Cochran, Privratsky, James and Cisler2022). Future research with larger sample sizes is necessary to continue to differentiate the unique impacts of trauma v. psychopathology on the various facets of reward learning and decision-making.

To our knowledge, this is the first demonstration of prospective multivariate representations of reward in the striatum as a possible mechanism of altered decision-making among youth with internalizing symptoms. Nonetheless, these data are fully consistent with related prior work demonstrating altered striatal activation during the anticipation and receipt of reward among youth with internalizing symptoms (Auerbach et al., Reference Auerbach, Pagliaccio, Hubbard, Frosch, Kremens, Cosby and Pizzagalli2022; Stringaris et al., Reference Stringaris, Vidal-Ribas Belil, Artiges, Lemaitre, Gollier-Briant and Wolke2015), behavioral inhibition (Guyer et al., Reference Guyer, Benson, Choate, Bar-Haim, Perez-Edgar, Jarcho and Nelson2014), and adults with mood and anxiety disorders (Cooper, Arulpragasam, & Treadway, Reference Cooper, Arulpragasam and Treadway2018) and provide further support for emerging models emphasizing the role of altered decision-making for reward as a mechanism of psychopathology following trauma (Cisler & Herringa, Reference Cisler and Herringa2021; Fonzo, Reference Fonzo2018; McLaughlin et al., Reference McLaughlin, DeCross, Jovanovic and Tottenham2019; McLaughlin, Colich, Rodman, & Weissman, Reference McLaughlin, Colich, Rodman and Weissman2020). While we observed associations between internalizing symptoms and prospective reward representations in the striatum, it will be important to investigate additional brain regions and networks associated with episodic future thinking and reward [e.g. medial PFC, hippocampus, etc., (Peters & Büchel, Reference Peters and Büchel2010; Schacter et al., Reference Schacter, Benoit and Szpunar2017)] and link these mechanisms with treatment response (Berwian et al., Reference Berwian, Wenzel, Collins, Seifritz, Stephan, Walter and Huys2020; Webb, Murray, Tierney, Forbes, & Pizzagalli, Reference Webb, Murray, Tierney, Forbes and Pizzagalli2022).

The current study is not without limitation. The sample was limited to adolescent girls and generalization to males and adults needs to be established. We used a relatively simple three-arm bandit task of social reward learning with binary outcomes, and the degree to which the results generalize to more complex task [e.g. two stage Markov task (Daw, Gershman, Seymour, Dayan, & Dolan, Reference Daw, Gershman, Seymour, Dayan and Dolan2011)] needs to be tested. Our sample was recruited based on the presence of assault exposure, and while this resulted in a natural variation in the degree of internalizing symptoms in the current sample, testing among explicitly defined groups of youth with anxiety and depressive disorders is needed. Further, the effects we observed were limited to caregiver-report and future studies should seek to expand effects to additional modes of assessment.

Supplementary material

The supplementary material for this article can be found at https://doi.org/10.1017/S0033291723000478

Financial support

This work was supported by MH119132, MH108753, MH10680.

Footnotes

Prospective mental representations for reward are diminished among youth with internalizing symptoms and mediate reward learning behavioral strategies

References

Achenbach, T. M. (1991). Manual for the child behavior checklist/418 and 1991 profile. Burlington, VT: Department of Psychiatry, University of Vermont.Google Scholar
Auerbach, R. P., Admon, R., & Pizzagalli, D. A. (2014). Adolescent depression: Stress and reward dysfunction. Harvard Review of Psychiatry, 22(3), 139148. doi: 10.1097/HRP.0000000000000034CrossRefGoogle ScholarPubMed
Auerbach, R. P., Pagliaccio, D., Hubbard, N. A., Frosch, I., Kremens, R., Cosby, E., … Pizzagalli, D. A. (2022). Reward-Related neural circuitry in depressed and anxious adolescents: A human connectome project. Journal of the American Academy of Child & Adolescent Psychiatry, 61(2), 308320. doi: 10.1016/j.jaac.2021.04.014CrossRefGoogle ScholarPubMed
Bernstein, D. P., Fink, L., Handelsman, L., Foote, J., Lovejoy, M., Wenzel, K., … Ruggiero, J. (1994). Initial reliability and validity of a new retrospective measure of child abuse and neglect. The American Journal of Psychiatry, 151(8), 11321136. doi: 10.1176/ajp.151.8.1132Google ScholarPubMed
Berwian, I. M., Wenzel, J. G., Collins, A. G. E., Seifritz, E., Stephan, K. E., Walter, H., & Huys, Q. J. M. (2020). Computational mechanisms of effort and reward decisions in patients with depression and their association with relapse after antidepressant discontinuation. JAMA Psychiatry, 77(5), 513522. doi: 10.1001/jamapsychiatry.2019.4971CrossRefGoogle ScholarPubMed
Biderman, N., Bakkour, A., & Shohamy, D. (2020). What are memories for? The hippocampus bridges past experience with future decisions. Trends in Cognitive Sciences, 24(7), 542556. doi: 10.1016/j.tics.2020.04.004CrossRefGoogle ScholarPubMed
Birn, R. M., Roeber, B. J., & Pollak, S. D. (2017). Early childhood stress exposure, reward pathways, and adult decision making. Proceedings of the National Academy of Sciences, 114(51), 1354913554. doi: 10.1073/pnas.1708791114CrossRefGoogle ScholarPubMed
Blair, K. S., Aloi, J., Bashford-Largo, J., Zhang, R., Elowsky, J., Lukoff, J., … Blair, R. J. (2022). Different forms of childhood maltreatment have different impacts on the neural systems involved in the representation of reinforcement value. Developmental Cognitive Neuroscience, 53, 101051. doi: 10.1016/j.dcn.2021.101051CrossRefGoogle ScholarPubMed
Blanco, N. J., Otto, A. R., Maddox, W. T., Beevers, C. G., & Love, B. C. (2013). The influence of depression symptoms on exploratory decision-making. Cognition, 129(3), 563568.CrossRefGoogle ScholarPubMed
Brownell, R. (2000). Receptive one-word picture vocabulary test: Manual. Novato, CA: Academic Therapy Publications.Google Scholar
Calhoun, V. D., Adali, T., Pearlson, G. D., & Pekar, J. J. (2001). A method for making group inferences from functional MRI data using independent component analysis. Human Brain Mapping, 14(3), 140151.CrossRefGoogle ScholarPubMed
Castegnetti, G., Tzovara, A., Khemka, S., Melinščak, F., Barnes, G. R., Dolan, R. J., & Bach, D. R. (2020). Representation of probabilistic outcomes during risky decision-making. Nature Communications, 11(1), 2419. doi: 10.1038/s41467-020-16202-yCrossRefGoogle ScholarPubMed
Chang, C.-C., & Lin, C.-J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(3), 27:127:27. doi: 10.1145/1961189.1961199CrossRefGoogle Scholar
Cisler, J. M., Esbensen, K., Sellnow, K., Ross, M., Weaver, S., Sartin-Tarm, A., … Kilts, C. D. (2019). Differential roles of the salience network during prediction error encoding and facial emotion processing among female adolescent assault victims. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging, 4(4), 371380.Google ScholarPubMed
Cisler, J. M., & Herringa, R. J. (2021). Posttraumatic stress disorder and the developing adolescent brain. Biological Psychiatry, 89(2), 144151.CrossRefGoogle ScholarPubMed
Cohen, J. A., Deblinger, E., Mannarino, A. P., & Steer, R. A. (2004). A multisite, randomized controlled trial for children with sexual abuse–related PTSD symptoms. Journal of the American Academy of Child & Adolescent Psychiatry, 43(4), 393402.CrossRefGoogle ScholarPubMed
Collins, A. G. E., & Shenhav, A. (2022). Advances in modeling learning and decision-making in neuroscience. Neuropsychopharmacology, 47(1), 104118. doi: 10.1038/s41386-021-01126-yCrossRefGoogle ScholarPubMed
Cooper, J. A., Arulpragasam, A. R., & Treadway, M. T. (2018). Anhedonia in depression: Biological mechanisms and computational models. Current Opinion in Behavioral Sciences, 22, 128135. doi: 10.1016/j.cobeha.2018.01.024CrossRefGoogle ScholarPubMed
Dasgupta, I., & Gershman, S. J. (2021). Memory as a computational resource. Trends in Cognitive Sciences, 25(3), 240251. doi: 10.1016/j.tics.2020.12.008CrossRefGoogle ScholarPubMed
Daw, N. D., Gershman, S. J., Seymour, B., Dayan, P., & Dolan, R. J. (2011). Model-based influences on humans’ choices and striatal prediction errors. Neuron, 69(6), 12041215.CrossRefGoogle ScholarPubMed
Daw, N. D., O'Doherty, J. P., Dayan, P., Seymour, B., & Dolan, R. J. (2006). Cortical substrates for exploratory decisions in humans. Nature, 441(7095), 876879. doi: 10.1038/nature04766CrossRefGoogle ScholarPubMed
Doll, B. B., Duncan, K. D., Simon, D. A., Shohamy, D., & Daw, N. D. (2015). Model-based choices involve prospective neural activity. Nature Neuroscience, 18(5), 767772. doi: 10.1038/nn.3981CrossRefGoogle ScholarPubMed
Dube, S. R., Anda, R. F., Felitti, V. J., Chapman, D. P., Williamson, D. F., & Giles, W. H. (2001). Childhood abuse, household dysfunction, and the risk of attempted suicide throughout the life span: Findings from the adverse childhood experiences study. Jama, 286(24), 30893096.CrossRefGoogle ScholarPubMed
Dube, S. R., Felitti, V. J., Dong, M., Chapman, D. P., Giles, W. H., & Anda, R. F. (2003). Childhood abuse, neglect, and household dysfunction and the risk of illicit drug use: The adverse childhood experiences study. Pediatrics, 111(3), 564572.CrossRefGoogle ScholarPubMed
Dubois, M., & Hauser, T. U. (2022). Value-free random exploration is linked to impulsivity. Nature Communications, 13(1), 4542. doi: 10.1038/s41467-022-31918-9CrossRefGoogle ScholarPubMed
Felitti, V. J., Anda, R. F., Nordenberg, D., Williamson, D. F., Spitz, A. M., Edwards, V., … Marks, J. S. (1998). Relationship of childhood abuse and household dysfunction to many of the leading causes of death in adults: The Adverse Childhood Experiences (ACE) study. American Journal of Preventive Medicine, 14(4), 245258.CrossRefGoogle ScholarPubMed
Fonzo, G. A. (2018). Diminished positive affect and traumatic stress: A biobehavioral review and commentary on trauma affective neuroscience. Neurobiology of Stress, 9, 214230.CrossRefGoogle Scholar
Gerin, M. I., Puetz, V. B., Blair, R. J. R., White, S., Sethi, A., Hoffmann, F., … McCrory, E. J. (2017). A neurocomputational investigation of reinforcement-based decision making as a candidate latent vulnerability mechanism in maltreated children. Development and Psychopathology, 29(5), 16891705. doi: 10.1017/S095457941700133XCrossRefGoogle ScholarPubMed
Gillespie, A. K., Astudillo Maya, D. A., Denovellis, E. L., Liu, D. F., Kastner, D. B., Coulter, M. E., … Frank, L. M. (2021). Hippocampal replay reflects specific past experiences rather than a plan for subsequent choice. Neuron, 109(19), 31493163.e6. doi: 10.1016/j.neuron.2021.07.029CrossRefGoogle ScholarPubMed
Greene, A. S., Shen, X., Noble, S., Horien, C., Hahn, C. A., Arora, J., … Constable, R. T. (2022). Brain–phenotype models fail for individuals who defy sample stereotypes. Nature, 609(7925), 109118. doi: 10.1038/s41586-022-05118-wCrossRefGoogle ScholarPubMed
Guyer, A. E., Benson, B., Choate, V. R., Bar-Haim, Y., Perez-Edgar, K., Jarcho, J. M., … Nelson, E. E. (2014). Lasting associations between early-childhood temperament and late-adolescent reward-circuitry response to peer feedback. Development and Psychopathology, 26(1), 229243. doi: 10.1017/S0954579413000941CrossRefGoogle ScholarPubMed
Hanson, J. L., Hariri, A. R., & Williamson, D. E. (2015). Blunted ventral Striatum development in adolescence reflects emotional neglect and predicts depressive symptoms. Biological Psychiatry, 78(9), 598605. doi: 10.1016/j.biopsych.2015.05.010CrossRefGoogle ScholarPubMed
Hanson, J. L., Williams, A. V., Bangasser, D. A., & Peña, C. J. (2021). Impact of early life stress on reward circuit function and regulation. Frontiers in Psychiatry, 12, 744690. https://www.frontiersin.org/articles/10.3389/fpsyt.2021.744690CrossRefGoogle ScholarPubMed
Harms, M. B., Shannon Bowen, K. E., Hanson, J. L., & Pollak, S. D. (2018). Instrumental learning and cognitive flexibility processes are impaired in children exposed to early life stress. Developmental Science, 21(4), e12596. doi: 10.1111/desc.12596CrossRefGoogle ScholarPubMed
Hauser, T. U., Iannaccone, R., Walitza, S., Brandeis, D., & Brem, S. (2015). Cognitive flexibility in adolescence: Neural and behavioral mechanisms of reward prediction error processing in adaptive decision making during development. NeuroImage, 104, 347354. doi: 10.1016/j.neuroimage.2014.09.018CrossRefGoogle ScholarPubMed
Hayes, A. F., & Rockwood, N. J. (2017). Regression-based statistical mediation and moderation analysis in clinical research: Observations, recommendations, and implementation. Behaviour Research and Therapy, 98, 3957. doi: 10.1016/j.brat.2016.11.001CrossRefGoogle ScholarPubMed
Humphreys, K. L., Lee, S. S., Telzer, E. H., Gabard-Durnam, L. J., Goff, B., Flannery, J., & Tottenham, N. (2015). Exploration – exploitation strategy is dependent on early experience. Developmental Psychobiology, 57(3), 313321. doi: 10.1002/dev.21293CrossRefGoogle ScholarPubMed
Humphreys, K. L., Lee, S. S., & Tottenham, N. (2013). Not all risk taking behavior is bad: Associative sensitivity predicts learning during risk taking among high sensation seekers. Personality and Individual Differences, 54(6), 709715. doi: 10.1016/j.paid.2012.11.031CrossRefGoogle ScholarPubMed
Huys, Q. J., Pizzagalli, D. A., Bogdan, R., & Dayan, P. (2013). Mapping anhedonia onto reinforcement learning: A behavioural meta-analysis. Biology of Mood & Anxiety Disorders, 3(1), 12. doi: 10.1186/2045-5380-3-12CrossRefGoogle ScholarPubMed
Keren, H., O'Callaghan, G., Vidal-Ribas, P., Buzzell, G. A., Brotman, M. A., Leibenluft, E., … Stringaris, A. (2018). Reward processing in depression: A conceptual and meta-analytic review across fMRI and EEG studies. The American Journal of Psychiatry, 175(11), 11111120. doi: 10.1176/appi.ajp.2018.17101124CrossRefGoogle ScholarPubMed
Kilpatrick, D. G., Ruggiero, K. J., Acierno, R., Saunders, B. E., Resnick, H. S., & Best, C. L. (2003). Violence and risk of PTSD, major depression, substance abuse/dependence, and comorbidity: Results from the national survey of adolescents. Journal of Consulting and Clinical Psychology, 71(4), 692700. doi: 10.1037/0022-006X.71.4.692CrossRefGoogle ScholarPubMed
Lejuez, C. W., Read, J. P., Kahler, C. W., Richards, J. B., Ramsey, S. E., Stuart, G. L., … Brown, R. A. (2002). Evaluation of a behavioral measure of risk taking: The Balloon Analogue Risk Task (BART). Journal of Experimental Psychology. Applied, 8(2), 7584.CrossRefGoogle ScholarPubMed
Lenow, J., Cisler, J., & Bush, K. (2015). Altered trust learning mechanisms among female adolescent victims of interpersonal violence. Journal of Interpersonal Violence. doi: 10.1177/0886260515604411Google ScholarPubMed
Lenow, J. K., Constantino, S. M., Daw, N. D., & Phelps, E. A. (2017). Chronic and acute stress promote overexploitation in serial decision making. Journal of Neuroscience, 37(23), 56815689.CrossRefGoogle ScholarPubMed
Lenow, J. K., Scott Steele, J., Smitherman, S., Kilts, C. D., & Cisler, J. M. (2014). Attenuated behavioral and brain responses to trust violations among assaulted adolescent girls. Psychiatry Research, 223(1), 18. doi: 10.1016/j.pscychresns.2014.04.005CrossRefGoogle ScholarPubMed
Letkiewicz, A. M., Cochran, A. L., Privratsky, A. A., James, G. A., & Cisler, J. M. (2022). Value estimation and latent-state update-related neural activity during fear conditioning predict posttraumatic stress disorder symptom severity. Cognitive, Affective, & Behavioral Neuroscience, 22(1), 199213. doi: 10.3758/s13415-021-00943-4CrossRefGoogle ScholarPubMed
Lloyd, A., McKay, R. T., & Furl, N. (2022). Individuals with adverse childhood experiences explore less and underweight reward feedback. Proceedings of the National Academy of Sciences of the United States of America, 119(4), e2109373119. doi: 10.1073/pnas.2109373119CrossRefGoogle ScholarPubMed
Mattar, M. G., & Lengyel, M. (2022). Planning in the brain. Neuron, 110(6), 914934.CrossRefGoogle ScholarPubMed
Mcewen, B. S. (2004). Protection and damage from acute and chronic stress: Allostasis and allostatic overload and relevance to the pathophysiology of psychiatric disorders. Annals of the New York Academy of Sciences, 1032(1), 17. doi: 10.1196/annals.1314.001CrossRefGoogle Scholar
McLaughlin, K. A., Colich, N. L., Rodman, A. M., & Weissman, D. G. (2020). Mechanisms linking childhood trauma exposure and psychopathology: A transdiagnostic model of risk and resilience. BMC Medicine, 18(1), 111.CrossRefGoogle ScholarPubMed
McLaughlin, K. A., DeCross, S. N., Jovanovic, T., & Tottenham, N. (2019). Mechanisms linking childhood adversity with psychopathology: Learning as an intervention target. Behaviour Research and Therapy, 118, 101109.CrossRefGoogle ScholarPubMed
McLaughlin, K. A., & Sheridan, M. A. (2016). Beyond cumulative risk: A dimensional approach to childhood adversity. Current Directions in Psychological Science, 25(4), 239245.CrossRefGoogle Scholar
McLaughlin, K. A., Sheridan, M. A., Humphreys, K. L., Belsky, J., & Ellis, B. J. (2021). The value of dimensional models of early experience: Thinking clearly about concepts and categories. Perspectives on Psychological Science, 16(6), 14631472.CrossRefGoogle ScholarPubMed
Meder, B., Wu, C. M., Schulz, E., & Ruggeri, A. (2021). Development of directed and random exploration in children. Developmental Science, 24(4), e13095. doi: 10.1111/desc.13095CrossRefGoogle ScholarPubMed
Moughrabi, N., Botsford, C., Gruichich, T. S., Azar, A., Heilicher, M., Hiser, J., … Cisler, J. M. (2022). Large-scale neural network computations and multivariate representations during approach-avoidance conflict decision-making. NeuroImage, 264, 119709. doi: 10.1016/j.neuroimage.2022.119709CrossRefGoogle ScholarPubMed
Nader, K., Blake, D., Pynoos, R. S., Newman, E., & Weathers, F. (n.d.). Clinician-Administered PTSD scale, child and adolescent version. White River Junction, VT: National Center for PTSD.Google Scholar
Nemeroff, C. B. (2016). Paradise lost: The neurobiological and clinical consequences of child abuse and neglect. Neuron, 89(5), 892909. doi: 10.1016/j.neuron.2016.01.019CrossRefGoogle ScholarPubMed
Niv, Y., Edlund, J. A., Dayan, P., & O'Doherty, J. P. (2012). Neural prediction errors reveal a risk-sensitive reinforcement-learning process in the human brain. Journal of Neuroscience, 32(2), 551562. doi: 10.1523/JNEUROSCI.5498-10.2012CrossRefGoogle ScholarPubMed
Patel, R., Spreng, R. N., Shin, L. M., & Girard, T. A. (2012). Neurocircuitry models of posttraumatic stress disorder and beyond: A meta-analysis of functional neuroimaging studies. Neuroscience & Biobehavioral Reviews, 36(9), 21302142. doi: 10.1016/j.neubiorev.2012.06.003CrossRefGoogle ScholarPubMed
Peters, J., & Büchel, C. (2010). Episodic future thinking reduces reward delay discounting through an enhancement of prefrontal-mediotemporal interactions. Neuron, 66(1), 138148. doi: 10.1016/j.neuron.2010.03.026CrossRefGoogle ScholarPubMed
Piray, P., Dezfouli, A., Heskes, T., Frank, M. J., & Daw, N. D. (2019). Hierarchical Bayesian inference for concurrent model fitting and comparison for group studies. PLOS Computational Biology, 15(6), e1007043. doi: 10.1371/journal.pcbi.1007043CrossRefGoogle ScholarPubMed
Rappaport, B. I., Kandala, S., Luby, J. L., & Barch, D. M. (2020). Brain reward system dysfunction in adolescence: Current, cumulative, and developmental periods of depression. The American Journal of Psychiatry, 177(8), 754763. doi: 10.1176/appi.ajp.2019.19030281CrossRefGoogle ScholarPubMed
Ross, M. C., Lenow, J. K., Kilts, C. D., & Cisler, J. M. (2018). Altered neural encoding of prediction errors in assault-related posttraumatic stress disorder. Journal of Psychiatric Research, 103, 8390.CrossRefGoogle ScholarPubMed
Schacter, D. L., Benoit, R. G., & Szpunar, K. K. (2017). Episodic future thinking: Mechanisms and functions. Current Opinion in Behavioral Sciences, 17, 4150. doi: 10.1016/j.cobeha.2017.06.002CrossRefGoogle ScholarPubMed
Schulz, E., & Gershman, S. J. (2019). The algorithmic architecture of exploration in the human brain. Current Opinion in Neurobiology, 55, 714. doi: 10.1016/j.conb.2018.11.003CrossRefGoogle ScholarPubMed
Sequeira, S. L., Silk, J. S., Ladouceur, C. D., Hanson, J. L., Ryan, N. D., Morgan, J. K., … Forbes, E. E. (2021). Association of neural reward circuitry function with response to psychotherapy in youths with anxiety disorders. American Journal of Psychiatry, 178(4), 343351. doi: 10.1176/appi.ajp.2020.20010094CrossRefGoogle ScholarPubMed
Shadlen, M. N., & Shohamy, D. (2016). Decision making and sequential sampling from memory. Neuron, 90(5), 927939. doi: 10.1016/j.neuron.2016.04.036CrossRefGoogle ScholarPubMed
Sheehan, D. V., Sheehan, K. H., Shytle, R. D., Janavs, J., Bannon, Y., Rogers, J. E., … Wilkinson, B. (2010). Reliability and validity of the Mini International Neuropsychiatric Interview for Children and Adolescents (MINI-KID). The Journal of Clinical Psychiatry, 71(03), 313326. doi: 10.4088/JCP.09m05305whiCrossRefGoogle ScholarPubMed
Sheridan, M. A., McLaughlin, K. A., Winter, W., Fox, N., Zeanah, C., & Nelson, C. A. (2018). Early deprivation disruption of associative learning is a developmental pathway to depression and social problems. Nature Communications, 9(1), 2216. doi: 10.1038/s41467-018-04381-8CrossRefGoogle ScholarPubMed
Somerville, L. H., Sasse, S. F., Garrad, M. C., Drysdale, A. T., Abi Akar, N., Insel, C., & Wilson, R. C. (2017). Charting the expansion of strategic exploratory behavior during adolescence. Journal of Experimental Psychology: General, 146, 155164. doi: 10.1037/xge0000250CrossRefGoogle ScholarPubMed
Sosa, M., & Giocomo, L. M. (2021). Navigating for reward. Nature Reviews Neuroscience, 22(8), 472487.CrossRefGoogle ScholarPubMed
Stringaris, A., Vidal-Ribas Belil, P., Artiges, E., Lemaitre, H., Gollier-Briant, F., & Wolke, S., … IMAGEN Consortium. (2015). The brain's response to reward anticipation and depression in adolescence: Dimensionality, specificity, and longitudinal predictions in a community-based sample. The American Journal of Psychiatry, 172(12), 12151223. doi: 10.1176/appi.ajp.2015.14101298CrossRefGoogle Scholar
Sutton, R. S., & Barto, A. G. (1998). Learning: An Introduction. Cambridge, MA: MIT Press.Google Scholar
Tang, A., Harrewijn, A., Benson, B., Haller, S. P., Guyer, A. E., Perez-Edgar, K. E., … Fox, N. A. (2022). Striatal activity to reward anticipation as a moderator of the association between early behavioral inhibition and changes in anxiety and depressive symptoms from adolescence to adulthood. JAMA Psychiatry, 79(12), 11991208. doi: 10.1001/jamapsychiatry.2022.3483CrossRefGoogle ScholarPubMed
Webb, C. A., Murray, L., Tierney, A. O., Forbes, E. E., & Pizzagalli, D. A. (2022). Reward-related predictors of symptom change in behavioral activation therapy for anhedonic adolescents: A multimodal approach. Neuropsychopharmacology, 48(4), 623632. doi: 10.1038/s41386-022-01481-4CrossRefGoogle ScholarPubMed
Widloski, J., & Foster, D. J. (2022). Flexible rerouting of hippocampal replay sequences around changing barriers in the absence of global place field remapping. Neuron, 110(9), 15471558.CrossRefGoogle ScholarPubMed
Wikenheiser, A. M., & Redish, A. D. (2015). Hippocampal theta sequences reflect current goals. Nature Neuroscience, 18(2), 289294.CrossRefGoogle ScholarPubMed
Wilson, R. C., Bonawitz, E., Costa, V. D., & Ebitz, R. B. (2021). Balancing exploration and exploitation with information and randomization. Current Opinion in Behavioral Sciences, 38, 4956. doi: 10.1016/j.cobeha.2020.10.001CrossRefGoogle ScholarPubMed
Yu, J. Y., & Frank, L. M. (2015). Hippocampal–cortical interaction in decision making. Neurobiology of Learning and Memory, 117, 3441. doi: 10.1016/j.nlm.2014.02.002CrossRefGoogle ScholarPubMed
Zhou, F., Zhao, W., Qi, Z., Geng, Y., Yao, S., Kendrick, K. M., … Becker, B. (2021). A distributed fMRI-based signature for the subjective experience of fear. Nature Communications, 12(1), 6643. doi: 10.1038/s41467-021-26977-3CrossRefGoogle ScholarPubMed
Zielinski, M. C., Tang, W., & Jadhav, S. P. (2020). The role of replay and theta sequences in mediating hippocampal-prefrontal interactions for memory and cognition. Hippocampus, 30(1), 6072. doi: 10.1002/hipo.22821CrossRefGoogle ScholarPubMed
Figure 0

Table 1. Clinical and demographic characteristics of the participants

Figure 1

Fig. 1. (a) Depiction of the social reward three-arm bandit task. Participants completed 90 trials. Trials began with presentation of three faces and participants chose one face in which to invest $10. The choice phase lasted until participants made a selection, which was then indicated with a blue box around it for 1s. An anticipation phase followed while they waited for the outcome of the choice, which consisted of a jittered fixation cross for 1.5–3s. The outcome phase was subsequently displayed and consisted of binary return of either $20 (net increase of $10) or no return (net loss of $10). The outcome phase presented the outcome of the trial (win or loss) for 2s, updated the points total for 1s, followed by a jittered fixation cross of 1.5–3s prior to starting the next trial. (b) Depiction of the MVPA pipeline. For each ICA network separately, trial × voxel matrices of beta coefficients are created for all participants except one left out participant separately for reward outcomes during the task. Support vector machine classifiers are then trained on these data, resulting in a decoder for reward outcomes. Next, this reward decoder is applied to the trial × voxel matrix of beta coefficients at the time choice for the participant that was left out of the training. This results in a prediction about the degree to which the reward representations are active at the time of choice, which can be compared to the magnitude of reward the participant was expecting for that given choice. This process is repeated until each participant has served as the left-out test participant.

Figure 2

Fig. 2. (a) Akaike Information Criterion values of model fit for the compared models. We tested a factorial manipulation of anticorrelated or not anticorrelated models (denoted with A+ or A−) and risk sensitive or not risk sensitivity models (denoted with RS+ or RS−). Consistent with our past studies using Matlab's fmincon for model fitting (Cisler et al., 2019; Ross et al., 2018), our updated approach using hierarchical Bayesian inference (Piray et al., 2019) similarly demonstrated the anticorrelated and risk sensitive model fit the data best. (b) There were no relationships between Childhood Trauma Questionnaire total severity scores and softmax βs, representing individual differences in exploitation / exploration strategies on the task. (c) There was a significant inverse relationship between CBCL internalizing symptoms and softmax βs, suggesting decreased exploitative behavior among those with greater internalizing symptoms.

Figure 3

Fig. 3. (a) Depiction of spatial maps from the Independent Component Analysis. (b) Reward decoding performance for each ICA network. Decoding performance was defined as the mean of sensitivity and specificity in correctly classifying reward outcomes from the left-out participant using the model trained on the remaining participants' data. (c) β coefficients reflecting the degree to which value expectation, derived from the computational model, of the chosen arm on the task predicted the magnitude of MVPA predicted reward presentations (i.e. SVM hyperplane predictions) at the time of choice. All networks demonstrated significant coupling between reward expectation and magnitude of reward representations when controlling for multiple comparisons. (d). ICA networks demonstrating significant interactions between softmax βs and coupling between reward expectation and magnitude of reward representations (i.e. SVM hyperplane predictions), suggesting that those who generated greater reward representations in proportion to expected reward also tended to use behavioral strategies to exploit high reward arms.

Figure 4

Fig. 4. (a) Scatter plot depicting relationship between CBCL internalizing symptoms and coupling between MVPA reward representations during choice and reward expectations. (b) Even though we controlled for CTQ total severity in our primary analyses, we conducted an additional analysis differentiating effects of assault exposure and internalizing symptoms. We used a median split to identify control adolescents with low v. high internalizing symptoms, and separately used a median split to identify assaulted adolescents with low v. high internalizing symptoms. Separating the sample in this manner allows differentiation of impacts due to assault exposure and internalizing symptoms. If coupling of prospective reward representations in the striatum were more strongly associated with assault exposure, we would expect that both assault groups would demonstrate impairment relative to both control groups, with relative homogeneity within groups. By contrast, if coupling of prospective reward representations in the striatum were more strongly associated with internalizing symptoms, we would instead expect coupling of prospective reward representations to follow the pattern of internalizing symptoms across the groups in accordance with panel B. (c) As can be seen in Fig. 4c, individual differences in coupling with prospective reward representations clearly tracked individual differences in internalizing symptoms and not assault exposure, t(51) = −3.14, p = 0.003 (regression model with group coded as follows in accordance with differences in CBCL internalizing symptoms [see panel B]: control low symptoms = 0, control high symptoms = 1; assault low symptoms = 1, assault high symptoms = 2).

Figure 5

Fig. 5. (a) Graphical depiction of mediation model, where internalizing symptoms predict decreased coupling between MVPA reward representations and expectations of reward in the striatum (i.e. path a), and decreased coupling of reward representations in the striatum predict decreased choices to exploit high reward arms on the task (i.e. path b). Path c refers to the total effect of internalizing symptoms on behavioral strategies on the task, and path c’ refers to the direct effect after accounting for the indirect effect (i.e. path ab) through MVPA reward representations. (b). The significance of the indirect effect was tested through 50 000 bootstrap iterations and demonstrating that the 95% confidence interval does not include zero.

Supplementary material: File

Cisler et al. supplementary material

Cisler et al. supplementary material

Download Cisler et al. supplementary material(File)
File 565.8 KB