1. Introduction
Moral cognition is a multidimensional neurocognitive domain implicated in decisions, judgments and inferences about what constitutes required or acceptable social behavior (Reese et al., Reference Reese, Bryant and Ethridge2020; Van Bavel et al., Reference Van Bavel, FeldmanHall and Mende-Siedlecki2015; Wong, Reference Wong and LaFollette2019; Yu et al., Reference Yu, Siegel and Crockett2019). Its deployment in daily life involves reasoning, impulse control, experience learning and conceptualizations of socially relevant values, traits and events (Greene, Reference Greene2015). Such mechanisms can be influenced by bilingual experience (Costa & Sebastián-Gallés, Reference Costa and Sebastián-Gallés2014; Titone & Tiv, Reference Titone and Tiv2022), prompting the prediction that moral decisions may change depending on whether scenarios are presented in the participants' first or second language (L1, L2). Yet, this moral foreign-language effect (MFLE) has been only inconsistently observed (Brouwer, Reference Brouwer2021; Čavar & Tytus, Reference Čavar and Tytus2018; Dylman & Champoux-Larsson, Reference Dylman and Champoux-Larsson2020; Winskel & Bhatt, Reference Winskel and Bhatt2020), suggesting that it may be affected by interindividual variability. Here we tackle the issue focusing on L2 proficiency (L2p, a person's current level of mastery of her or his L2), a factor that varies widely among bilinguals, has been measured in most MFLE studies, and systematically influences outcomes in relevant domains. Examining this topic is vital to illuminate the links between linguistic experience and moral cognition, constrain models of socio-affective processing in bilinguals and inform translational developments therefrom.
Most evidence on the MFLE comes from moral decision tasks. These place participants in a first-person position, face them with a moral dilemma and require them to make a choice that will be beneficial for some people but detrimental (often deadly) to others (Bartels et al., Reference Bartels, Bauman, Cushman, Pizarro, McGraw, Keren and Wu2015; Tassy et al., Reference Tassy, Oullier, Mancini and Wicker2013; Yu et al., Reference Yu, Siegel and Crockett2019). The field has favored incongruent moral dilemmas, presenting a utilitarian option that maximizes aggregate welfare (e.g., letting one person die to save another five) and a deontological option based on moral norms (e.g., inhibiting action to save one person at the expense of other five) (Conway & Gawronski, Reference Conway and Gawronski2013).Footnote 1
These can be divided into impersonal dilemmas, in which utilitarian decisions involve no physical contact with the victim; and personal dilemmas, in which such decisions require direct use of force on the victim (Greene, Reference Greene, Gazzaniga and Mangun2014). The former include the trolley or switch dilemma (Thomson, Reference Thomson1985), where a trolley fast approaches a group of people on the rails and only the participant can press a switch that changes the trolley's direction, sacrificing a single person instead of multiple ones. On the other hand, a typical personal version of this task would be the footbridge dilemma (Thomson, Reference Thomson1976), where the participant, witnessing the scene from a footbridge, can save five lives by pushing another person in front of the vehicle.
A foundational study on bilinguals (Costa et al., Reference Costa, Foucart, Hayakawa, Aparici, Apesteguia, Heafner and Keysar2014b) reported more utilitarian choices when dilemmas were presented in L2 as opposed to L1, extending evidence of reduced intuitive biases during L2 processing (Costa et al., Reference Costa, Foucart, Arnon, Aparici and Apesteguia2014a; Keysar et al., Reference Keysar, Hayakawa and An2012). This has sparked the notion that bilinguals' moral cognition depends on the language used, arguably due to a combination of linguistic, executive and affective factors (Hayakawa et al., Reference Hayakawa, Costa, Foucart and Keysar2016; Pavlenko, Reference Pavlenko2017). Yet, ulterior evidence proved mixed, with only some studies replicating this finding and meta-analytical results revealing only small (Del Maschio et al., Reference Del Maschio, Crespi, Peressotti, Abutalebi and Sulpizio2022a) or small-to-moderate (Circi et al., Reference Circi, Gatti, Russo and Vecchi2021; Stankovic et al., Reference Stankovic, Biedermann and Hamamura2022) MFLEs. Results are inconsistent even for the footbridge dilemma, the task offering the strongest supporting evidence (Circi et al., Reference Circi, Gatti, Russo and Vecchi2021; Del Maschio et al., Reference Del Maschio, Crespi, Peressotti, Abutalebi and Sulpizio2022a; Stankovic et al., Reference Stankovic, Biedermann and Hamamura2022). Such heterogeneity indicates that the effect may be modulated by subject-level variables impinging on bilingual cognition, crucially including L2p.
L2p represents an individual's degree of L2 knowledge and skills to function in specific communicative situations and modalities (Hulstijn, Reference Hulstijn2011). The construct encompasses multiple subfactors, including productive and receptive abilities across phonological, lexico-semantic, morphosyntactic and pragmatic factors (Gullifer et al., Reference Gullifer, Kousaie, Gilbert, Grant, Giroud, Coulter, Klein, Baum, Phillips and Titone2021; Olson, Reference Olson2023). L2p ranks among the most widely studied individual variables in bilingualism research (Olson, Reference Olson2023; Park et al., Reference Park, Solon, Dehghan-Chaleshtori and Ghanbar2022), be it as a cut-off variable (for sample selection), as a controlled variable (for group matching) or as a manipulated variable (for testing its impact on specific outcome measures) (Hulstijn, Reference Hulstijn2012; Olson, Reference Olson2023). It can be measured with objective methods (e.g., standardized tests, experimental tasks) or via subjective ratings (e.g., self-report questionnaires). The latter are dominant in the literature, accounting for more than 60% of published studies (Olson, Reference Olson2023). The same is true for MFLE studies – in fact, 86% of those considered in this work used subjective measures exclusively (see the Supplementary material).
Based on conventional or sample-specific cut-offs, a distinction can be made between bilinguals with low, intermediate and high L2p, among other subdivisions. For example, studies using the Language Experience and Proficiency Questionnaire (e.g., Kaushanskaya et al., Reference Kaushanskaya, Blumenfeld and Marian2020; Marian et al., Reference Marian, Blumenfeld and Kaushanskaya2007) typically establish a cut-off of 7 (out of 10) to establish high L2p, while other works, including MFLE research (Corey et al., Reference Corey, Hayakawa, Foucart, Aparici, Botella, Costa and Keysar2017; Costa et al., Reference Costa, Foucart, Hayakawa, Aparici, Apesteguia, Heafner and Keysar2014b; Geipel et al., Reference Geipel, Hadjichristidis and Surian2015), split participants into low and high L2p groups based on the sample's median L2p – usually around 70% on a 0–100% scale. Importantly, although framing L2p as a continuum offers powerful avenues for correlational research, its discretization through cut-offs offers a useful heuristic given the often subtle nature of MFLEs.
Regardless of its measurement, this variable is known to modulate diverse cognitive domains. As such, higher L2p levels have been linked to heightened emotional processing (activation of affective mechanisms by arousing stimuli; Caldwell-Harris, Reference Caldwell-Harris2015; Harris et al., Reference Harris, Gleason, Ayçiçeǧi and Pavlenko2006; Imbault et al., Reference Imbault, Titone, Warriner and Kuperman2021; Pavlenko, Reference Pavlenko2017; Sutton et al., Reference Sutton, Altarriba, Gianico and Basnight-Brown2007), enriched mental imagery (visual or otherwise perceptual representations of events in the absence of direct sensory input; Hayakawa & Keysar, Reference Hayakawa and Keysar2018), increased inhibitory control (the capacity to suppress prepotent information to favor adequate task completion; Goral et al., Reference Goral, Campanelli and Spiro2015; Hui et al., Reference Hui, Yuan, Fong and Wang2020; Thanissery et al., Reference Thanissery, Parihar and Kar2020), more efficient lexico-semantic processing (access to and retrieval of words' meanings; Abutalebi, Reference Abutalebi2008; Bialystok & Craik, Reference Bialystok and Craik2010; Cuppini et al., Reference Cuppini, Magosso and Ursino2013; Dijkstra et al., Reference Dijkstra, Wahl, Buytenhuijs, Van Halem, Al-Jibouri, De Korte and Rekké2019; Ibáñez et al., Reference Ibáñez, Manes, Escobar, Trujillo, Andreucci and Hurtado2010; Keating, Reference Keating2017; Liberto et al., Reference Liberto, Nie, Yeaton, Khalighinejad, Shamma and Mesgarani2021; Zheng et al., Reference Zheng, Mobbs and Yu2020), stronger embodied resonance (reactivation of sensorimotor brain mechanisms subserving the bodily experiences denoted by linguistic material; Bergen et al., Reference Bergen, Lau, Narayan, Stojanovic and Wheeler2010; Birba et al., Reference Birba, Beltrán, Martorell Caro, Trevisan, Kogan, Sedeño, Ibáñez and García2020; Ibáñez et al., Reference Ibáñez, Manes, Escobar, Trujillo, Andreucci and Hurtado2010; Kogan et al., Reference Kogan, Muñoz, Ibáñez and García2020; Vukovic, Reference Vukovic2013), enhanced code switching flexibility (alternation between languages during continuous speech; Kootstra et al., Reference Kootstra, Van Hell and Dijkstra2012) and better numerical processing (the ability to perform mental operations involving digits and figures; Hoshino et al., Reference Hoshino, Dussias and Kroll2010; Van Rinsveld et al., Reference Van Rinsveld, Schiltz, Landerl, Brunner and Ugen2016). Higher L2p also impacts complex social phenomena, as it is related to more effective lying and lie detection (Caldwell-Harris & Ayçiçeǧi-Dinn, Reference Caldwell-Harris and Ayçiçeǧi-Dinn2009; Elliott & Leach, Reference Elliott and Leach2016), increased prosocial sentiments (Miller et al., Reference Miller, Solis-Barroso and Delgado2021), greater altruism (Liu et al., Reference Liu, Wang, Timmer and Jiao2022) and enhanced theory of mind capabilities (Nguyen & Astington, Reference Nguyen and Astington2014). Briefly, L2p is a key determinant of multiple operations in bilingual cognition.
Suggestively, the domains abovementioned are critically engaged during moral dilemma tasks. Consider the footbridge dilemma. In deciding whether to push the man or not, participants must tap into emotional processes (as affective reactions are commonly found on sacrificial dilemmas) (Chan et al., Reference Chan, Gu, Ng and Tse2016; Klenk, Reference Klenk2021), lexico-semantic processing (as conceptual information must be accessed to understand and perform the task), mental imagery (as the scene is either explicitly or implicitly visualized) (Hayakawa & Keysar, Reference Hayakawa and Keysar2018), action inhibition (as prepotent decisions may need to be suppressed for moral reasons) (Gawronski et al., Reference Gawronski, Armstrong, Conway, Friesdorf and Hütter2017), embodied resonance (as the notion of pushing the man likely engages sensorimotor simulations) (García et al., Reference García, Moguilner, Torquati, García-Marco, Herrera, Muñoz, Castillo, Kleineschay, Sedeño and Ibáñez2019; Greene, Reference Greene, Gazzaniga and Mangun2014) and numerical processing (as the number of people to “exchange” for a single life is a relevant decision factor) (Cao et al., Reference Cao, Zhang, Song, Wang, Miao and Peng2017). L2p may impact the MFLE by modulating these processes. Indeed, during L2 tasks, low L2p reduces the vividness and sensorimotor reactivations of mental scenes (Altın et al., Reference Altın, Okur, Yalçın, Eraçıkbaş and Aktan-Erciyes2022; Birba et al., Reference Birba, Beltrán, Martorell Caro, Trevisan, Kogan, Sedeño, Ibáñez and García2020), hampers inhibitory reactions (Hui et al., Reference Hui, Yuan, Fong and Wang2020; Thanissery et al., Reference Thanissery, Parihar and Kar2020), lessens prosocial and empathic tendencies (Dewaele & Wei, Reference Dewaele and Wei2012; Ferré et al., Reference Ferré, Guasch, Stadthagen-Gonzalez and Comesaña2022) and limits numerical processing capacities (Garcia et al., Reference Garcia, Faghihi, Raola and Vaid2021; Van Rinsveld et al., Reference Van Rinsveld, Schiltz, Landerl, Brunner and Ugen2016). Accordingly, L2p could partly account for the heterogeneous results around the MFLE.
Some studies and reviews have tackled this hypothesis by design (Brouwer, Reference Brouwer2019, Reference Brouwer2021; Čavar & Tytus, Reference Čavar and Tytus2018; Circi et al., Reference Circi, Gatti, Russo and Vecchi2021; Costa et al., Reference Costa, Foucart, Hayakawa, Aparici, Apesteguia, Heafner and Keysar2014b; Del Maschio et al., Reference Del Maschio, Crespi, Peressotti, Abutalebi and Sulpizio2022a; Geipel et al., Reference Geipel, Hadjichristidis and Surian2015; Hayakawa & Keysar, Reference Hayakawa and Keysar2018; Hayakawa et al., Reference Hayakawa, Tannenbaum, Costa, Corey and Keysar2017; Shin & Kim, Reference Shin and Kim2017), while several others have acknowledged such a link, albeit briefly (Costa et al., Reference Costa, Corey, Hayakawa, Aparici, Vives and Keysar2019; Driver, Reference Driver2020; Dylman & Champoux-Larsson, Reference Dylman and Champoux-Larsson2020; Hayakawa et al., Reference Hayakawa, Costa, Foucart and Keysar2016; Miozzo et al., Reference Miozzo, Navarrete, Ongis, Mello, Girotto and Peressotti2020; Pavlenko, Reference Pavlenko2017; Winskel & Bhatt, Reference Winskel and Bhatt2020; Wong & Ng, Reference Wong and Ng2018). Moreover, meta-analytical evidence underscores correlations between self-reported L2p and the MFLE on personal dilemmas (Stankovic et al., Reference Stankovic, Biedermann and Hamamura2022). However, the literature lacks a systematic conceptual framework describing the multidimensional impact that L2p could exert on the MFLE. Some studies have factored it out, and roughly half the corpus considers it only for group-matching purposes. Similarly, most reviews address it only vaguely amidst several other potential subject-level confounds – an issue that is further complicated by the lack of standardized L2p measures across studies (Hulstijn, Reference Hulstijn2011; Tomoschuk et al., Reference Tomoschuk, Ferreira and Gollan2019; Zell & Krizan, Reference Zell and Krizan2014). Furthermore, these issues impinge on meta-analyses of the MFLE, particularly within the two that found no significant L2p modulations – pointing at measurement heterogeneity across the literature and low statistical power (Circi et al., Reference Circi, Gatti, Russo and Vecchi2021; Del Maschio et al., Reference Del Maschio, Crespi, Peressotti, Abutalebi and Sulpizio2022a). Crucially, too, a detailed rationale is lacking of how L2p might modulate multiple processes recruited during moral decision making, distancing the field from overarching accounts of the construct. In fact, no integrative work has focused at length on L2p as a potential modulator of the MFLE. Thus, an important gap emerges toward understanding how this intraindividual factor may impinge on core interindividual phenomena across bilingual persons.
Here, we propose an empirico-theoretical framework to conceptualize the impact of L2p on moral decision making. First, we provide a systematic review of bilingualism studies on incongruent moral dilemmas. Then, we distill the main findings regarding MFLE in bilinguals with (a) intermediate and (b) high L2p. Third, we provide a rationale for interpreting how L2p might account for the observed patterns due to its influence on multiple task-relevant factors. Finally, we outline core challenges and opportunities for future research. Overall, we aim to lay the groundwork for strategic examinations of how bilingual experience may shape a fundamental aspect of daily social cognition.
2. Review criteria
The review was performed in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-analysis (PRISMA) guideline (Page et al., Reference Page, McKenzie, Bossuyt, Boutron, Hoffmann, Mulrow, Shamseer, Tetzlaff, Akl, Brennan, Chou, Glanville, Grimshaw, Hróbjartsson, Lalu, Li, Loder, Mayo-Wilson, Mcdonald and Moher2021). Articles were retrieved via ScienceDirect (www.sciencedirect.com), PubMed (www.pubmed.ncbi.nlm.nih.gov), the Web of Science (www.webofscience.com) and Google Scholar (www.scholar.google.com), with a final search completed in December 2021 (Figure 1). Searches were done with the term “foreign language effect” alone as well as with the following combinations: “foreign language effect” OR “foreign language” OR “bilingual” AND “moral” OR “decisions” OR “dilemma” OR “emotions” OR “empathy.” The same terms along with the term “review” were introduced on Google's main search engine to detect papers absent in the online libraries. Additionally, the terms “moral,” “decisions,” “dilemma,” “emotions,” and “footbridge” were checked on the Bilingualism: Language and Cognition website. Three recent meta-analyses were also revised (Circi et al., Reference Circi, Gatti, Russo and Vecchi2021; Del Maschio et al., Reference Del Maschio, Crespi, Peressotti, Abutalebi and Sulpizio2022a; Stankovic et al., Reference Stankovic, Biedermann and Hamamura2022). Finally, the References sections of all papers, including five reviews, were screened for further relevant publications.
The above process resulted in 57 papers. First, out of 74 screened records, we excluded those that did not elicit decisions on moral dilemmas (e.g., those involving economic/framing dilemmas, such as the Asian disease problem, n = 43) and/or did not report original experiments (e.g., reviews, n = 6). We further excluded non-peer-reviewed articles (n = 3), as these lack fundamental checks of scientific quality and may report findings that deviate from those found in ulterior peer-reviewed versions, although we checked for discrepant evidence to account for potential publication bias effects. Application of such criteria led to 22 articles. These were screened for inclusion considering the following parameters: (i) presence of at least one group of bilingual participants, (ii) inclusion of at least one task involving decisions about one or more incongruent moral dilemmas (i.e., those involving a first person moral “yes or no” decision), (iii) use of statistical tests on the presence or absence of an MFLE and (iv) reports of mean L2p based on a Likert-type scale. This resulted in the exclusion of nine articles, which (a) failed to report L2p values (n = 4), (b) framed the use of regionalisms as L2 use (n = 1) or (c) quantified L2p via standardized exams or basic tasks that did not allow for normalization with the standard Likert scales used to measure L2p in most studies (n = 6). The latter exclusion criterion was applied because Likert-based self-reports of L2p (a) represent the most common measure of the construct (Hulstijn, Reference Hulstijn2012), maximizing comparability of present conclusions with relevant literature; (b) constitute good predictors of objective proficiency (Gollan et al., Reference Gollan, Weissbergr, Runnqvist, Montoya and Cera2012; Langdon et al., Reference Langdon, Wiig and Nielsen2005; Marian et al., Reference Marian, Blumenfeld and Kaushanskaya2007) and (c) allow for a linear normalization of outcomes to reveal potential proficiency-related modulations of the MFLE.
Our final database included 11 articles spanning 57 experiments (see the Supplementary material). These were systematically analyzed on a spreadsheet containing columns for the following aspects: title, authors, year of publication, number of participants, mean age, L1 and L2 of the sample(s), L2p (including method of measurement and reported value, if applicable), age of L2 appropriation, experimental stimuli, types of dilemma involved (personal/impersonal), results regarding the MFLE and additional relevant results. Clarification for data from one paper was required via an e-mail to its corresponding author, given that it seemed to contain erroneous information (lower L1p than L2p scores). For details, see the Supplementary material (Table S1).
3. Literature overview
Experiments were organized based on the participants' Likert-based L2p estimations. These were derived from task-relevant macroskills (reading or listening) when available, since test modality might influence relevant L2p skills differentially (Hulstijn, Reference Hulstijn2011; McLean et al., Reference McLean, Stewart and Batty2020; Wagner, Reference Wagner and Kunnan2013) and MFLE meta-analyses have found significant L2p effects only when differentially analyzing reading and listening L2p (Stankovic et al., Reference Stankovic, Biedermann and Hamamura2022) – as opposed to average global L2p measures (Circi et al., Reference Circi, Gatti, Russo and Vecchi2021; Del Maschio et al., Reference Del Maschio, Crespi, Peressotti, Abutalebi and Sulpizio2022a). Ratings of global proficiency were considered only when such task-relevant results were not reported. Since the corpus encompassed different scale ranges, L2p ratings were normalized to ensure comparability across experiments. Each Likert scale was framed as a continuous variable from 0% to 100%, encompassing ten qualitative L2p levels (Figure 2). Each L2p mean value was normalized following a reported formula (Del Maschio et al., Reference Del Maschio, Crespi, Peressotti, Abutalebi and Sulpizio2022a): ((x − a)/(b − a)) × 100, where x is the reported L2p mean and a and b represent the minimum and maximum values of the scale, respectively. Studies were then sorted based on their samples' normalized L2p value, identifying those that yielded significant and non-significant MFLEs in impersonal and personal dilemmas (Figure 3). All plots start from the lower intermediate L2p level, as no lower proficiency studies were included for review.
Considering the patterns in the figures above, together with compatible meta-analytical evidence (Stankovic et al., Reference Stankovic, Biedermann and Hamamura2022), studies are next reviewed for each of those L2p levels separately, yielding 28 experiments with intermediate L2p levels (<70% normalized L2p) and 29 high L2p levels (≥70% normalized L2p).
Finally, identification of mediating domains for our explanatory framework was also based on literature-driven criteria. Specifically, domains were deemed relevant if at least two MFLE studies implicated them in decision-making patterns, and if they were related to L2p in at least one further study from the general bilingualism literature.
3.1. The MFLE at intermediate L2p levels
Impersonal dilemmas consistently exhibit more utilitarian than non-utilitarian choices across studies. Yet, this pattern does not differ between the participants' two languages. This is true for the widely used switch/trolley dilemma, which yielded similar response patterns in both languages when based either on its typical question “Would you push the man?” (Corey et al., Reference Corey, Hayakawa, Foucart, Aparici, Botella, Costa and Keysar2017; Costa et al., Reference Costa, Foucart, Hayakawa, Aparici, Apesteguia, Heafner and Keysar2014b; Geipel et al., Reference Geipel, Hadjichristidis and Surian2015; Shin & Kim, Reference Shin and Kim2017; Figure 4A), or on an outcome-driven paraphrasis such as “Would you let five people die?” (Corey et al., Reference Corey, Hayakawa, Foucart, Aparici, Botella, Costa and Keysar2017). These results are robust irrespective of the participants' specific L1s and L2s (Costa et al., Reference Costa, Foucart, Hayakawa, Aparici, Apesteguia, Heafner and Keysar2014b; Geipel et al., Reference Geipel, Hadjichristidis and Surian2015). Yet, most studies on impersonal dilemmas had English as their L2, inviting further research on more diverse language pairs – a critical point given that overreliance on English has been shown to bias findings in related fields (Blasi et al., Reference Blasi, Henrich, Adamou, Kemmerer and Majid2022; García et al., Reference García, de Leon, Tee, Blasi and Gorno-Tempini2023).
A non-significant MFLE was also observed in the fumes dilemma (Shin & Kim, Reference Shin and Kim2017) which requires deciding whether toxic gas threatening three patients at a hospital should be redirected by pressing a switch, killing only one patient in another room (Greene et al., Reference Greene, Morelli, Lowenberg, Nystrom and Cohen2008). Moreover, non-significant effects were observed in studies comparing L2p subgroups or performing correlations between L2p and response type (Corey et al., Reference Corey, Hayakawa, Foucart, Aparici, Botella, Costa and Keysar2017; Costa et al., Reference Costa, Foucart, Hayakawa, Aparici, Apesteguia, Heafner and Keysar2014b; Geipel et al., Reference Geipel, Hadjichristidis and Surian2015; Shin & Kim, Reference Shin and Kim2017) – except in Corey et al. (Reference Corey, Hayakawa, Foucart, Aparici, Botella, Costa and Keysar2017), experiment 3a, where a significant negative correlation between L2p and odds of making an utilitarian choice was found for a classic written switch dilemma. The only partial exception comes from Brouwer (Reference Brouwer2019), experiment 2, who observed a significant MFLE across six dilemmas (three impersonal, three personal), with no interaction between language and dilemma type. Suggestively, this is the only experiment with this L2p level in the corpus that used auditory stimuli. This point is noteworthy because auditory input can attenuate emotional responses during L1 (but not L2) processing, potentially prompting differential moral response patterns in each language (Jankowiak & Korpal, Reference Jankowiak and Korpal2018).
A different pattern emerges with personal dilemmas, which reveal consistent MFLEs across studies. At intermediate L2p levels, utilitarian choices are significantly more frequent in L2 than in L1. This was observed for English–Spanish (Costa et al., Reference Costa, Foucart, Hayakawa, Aparici, Apesteguia, Heafner and Keysar2014b, experiment 2; Figure 4A), Chinese–English (Chan et al., Reference Chan, Gu, Ng and Tse2016, footbridge only; Geipel et al., Reference Geipel, Hadjichristidis and Surian2015, experiment 2), Spanish–English (Corey et al., Reference Corey, Hayakawa, Foucart, Aparici, Botella, Costa and Keysar2017, experiments 3a and 3b), Korean–English (Shin & Kim, Reference Shin and Kim2017), Italian–English (Geipel et al., Reference Geipel, Hadjichristidis and Surian2015, experiment 1), Dutch–English (Brouwer, Reference Brouwer2019, Reference Brouwer2021), Swedish–French (Dylman & Champoux-Larsson, Reference Dylman and Champoux-Larsson2020, experiment 2b; Figure 4B) and Italian–German (Geipel et al., Reference Geipel, Hadjichristidis and Surian2015, experiment 1) bilinguals. The effect is most systematic for the footbridge dilemma in its classical version (Chan et al., Reference Chan, Gu, Ng and Tse2016; Costa et al., Reference Costa, Foucart, Hayakawa, Aparici, Apesteguia, Heafner and Keysar2014b; Dylman & Champoux-Larsson, Reference Dylman and Champoux-Larsson2020, experiment 2b; Geipel et al., Reference Geipel, Hadjichristidis and Surian2015; Shin & Kim, Reference Shin and Kim2017), even when comparing samples with different languages as L1 and L2 (Costa et al., Reference Costa, Foucart, Hayakawa, Aparici, Apesteguia, Heafner and Keysar2014b; Geipel et al., Reference Geipel, Hadjichristidis and Surian2015).
The MFLE on the footbridge dilemma was replicated when participants are given the option to sacrifice the man by pushing a button, baring physical brute force but maintaining the instrumental nature of the death – a modification that seemed to reduce the magnitude of the effect (Corey et al., Reference Corey, Hayakawa, Foucart, Aparici, Botella, Costa and Keysar2017, experiment 3a). An MFLE was also observed when the question is changed from “Would you push the man?” to “Would you let five people die?,” suggesting that it is robust even when consequences are highlighted (Corey et al., Reference Corey, Hayakawa, Foucart, Aparici, Botella, Costa and Keysar2017, experiment 3b). Significant MFLEs also emerge in other personal dilemmas (involving, e.g., decisions on suffocating a baby to save more people, and transplanting organs from a healthy patient to save other five) with both written (Shin & Kim, Reference Shin and Kim2017) and auditory (Brouwer, Reference Brouwer2019) stimuli. Reinforcing these patterns, the frequency of utilitarian responses in L2 was shown to be higher for subgroups with lower L2p and to correlate negatively with L2p (Costa et al., Reference Costa, Foucart, Hayakawa, Aparici, Apesteguia, Heafner and Keysar2014b; Geipel et al., Reference Geipel, Hadjichristidis and Surian2015; Shin & Kim, Reference Shin and Kim2017; Figure 4A) – although no such correlations emerged in the Corey et al. (Reference Corey, Hayakawa, Foucart, Aparici, Botella, Costa and Keysar2017) modified footbridge dilemmas (experiments 3a and 3b).
A noteworthy exception can be found in Chan et al. (Reference Chan, Gu, Ng and Tse2016), who failed to find an MFLE in analyses performed over 22 personal dilemmas. However, this study did find a significant MFLE when isolating responses to the footbridge dilemma. Interestingly, the MFLE in personal dilemmas seems to disappear when the participants' two languages are structurally or typologically similar, as seen for Swedish–Norwegian and high L2p Norwegian–Swedish bilinguals on the footbridge dilemma (Dylman & Champoux-Larsson, Reference Dylman and Champoux-Larsson2020, experiments 3a and 3b),Footnote 2 and the same lack of MFLE was found on moral choice decision tasks disregarding type for German–English and English–German bilinguals (Hayakawa et al., Reference Hayakawa, Tannenbaum, Costa, Corey and Keysar2017, experiments 1, 4, 5 and 6).
Overall, research on the MFLE at intermediate L2p levels reveals three tentative patterns. First, this effect seems mostly absent for impersonal dilemmas, as seen in roughly 80% of experiments. Second, it proves quite systematic for personal dilemmas, as seen in nearly 85% of experiments (especially those using the footbridge dilemma, yielding significant MFLEs in 90% of cases). Third, utilitarian decisions in L2 seem to increase as L2p decreases. Finally, the few exceptions to these patterns might be related to presentation modality and language similarity. These observations are discussed in section 4.
3.2. The MFLE at high L2p levels
As observed for mid-proficiency bilinguals, impersonal dilemmas also yield non-significant MFLEs in high L2p groups. Most studies examined the switch/trolley dilemma, all but one reporting non-significant MFLEs across written (Brouwer, Reference Brouwer2019, Reference Brouwer2021; Corey et al., Reference Corey, Hayakawa, Foucart, Aparici, Botella, Costa and Keysar2017; Costa et al., Reference Costa, Foucart, Hayakawa, Aparici, Apesteguia, Heafner and Keysar2014b; Dylman & Champoux-Larsson, Reference Dylman and Champoux-Larsson2020) and auditory (Brouwer, Reference Brouwer2021) modalities, even for native-like L2p participants (Winskel & Bhatt, Reference Winskel and Bhatt2020). Null effects were also reported upon switching languages between dilemma types, adding social identification factors or highlighting consequences and responsibilities on the action (Corey et al., Reference Corey, Hayakawa, Foucart, Aparici, Botella, Costa and Keysar2017). The same occurred with other impersonal dilemmas requiring participants to decide whether to keep the money upon finding a wallet, lie on their tax returns (Brouwer, Reference Brouwer2019, Reference Brouwer2021), choose who should lose the prize money on a TV show (Winskel & Bhatt, Reference Winskel and Bhatt2020). Likewise, a meta-analysis of Corey et al. (Reference Corey, Hayakawa, Foucart, Aparici, Botella, Costa and Keysar2017) experiments showed no effects of L2p or language on choices for the switch dilemma overall – the only exception was experiment 1a, in which the odds of utilitarian choices on a regular switch dilemma increased for the L2 group, though less significantly than or the footbridge dilemma. Also, in Corey et al.'s (Reference Corey, Hayakawa, Foucart, Aparici, Botella, Costa and Keysar2017) study, only two out of its eight experiments yielded significant differences between higher and lower L2p subgroups, the former making more utilitarian choices on the switch dilemma.
Personal dilemmas, on the other hand, did not yield the same results observed for intermediate L2p levels. Far from consistent, MFLEs were less common across high L2p groups. Four experiments failed to find MFLE on the footbridge dilemma. This happened in highly proficient Hindi–English bilinguals (Winskel & Bhatt, Reference Winskel and Bhatt2020), and in experiments 2a and 3b of Dylman and Champoux-Larsson (Reference Dylman and Champoux-Larsson2020), involving Swedish–English (with very high L2p; Figure 4B) and Norwegian–Swedish bilinguals, respectively. Notably, a non-significant MFLE was found when accounting for aversion by changing the question to “Would you let five people die by not pushing him?” (Corey et al., Reference Corey, Hayakawa, Foucart, Aparici, Botella, Costa and Keysar2017, experiment 3c). On the other hand, the classic written version of the footbridge dilemma did yield an MFLE in high L2p Spanish–English participants (Corey et al., Reference Corey, Hayakawa, Foucart, Aparici, Botella, Costa and Keysar2017, experiments 1a and 2a; Costa et al., Reference Costa, Foucart, Hayakawa, Aparici, Apesteguia, Heafner and Keysar2014b, experiment 2). The same occurred when the task made explicit the victims' nationalities (to test for social group identification) and when the utilitarian decision caused the man to be disabled for life instead of killing him (Corey et al., Reference Corey, Hayakawa, Foucart, Aparici, Botella, Costa and Keysar2017, experiments 2b and 3d).Footnote 3
The only report that checked for L2p as a possible mediator of the MFLE on personal dilemmas was Corey et al. (Reference Corey, Hayakawa, Foucart, Aparici, Botella, Costa and Keysar2017). Intra-experimental results were less conclusive, since experiments 1b, 2a, 2b and 3d failed to find differences, while only two (1a and 3c) found evidence of low L2p groups making more utilitarian choices than high L2p and L1 groups on the footbridge dilemma. Yet, a meta-analysis of all experiments in the study found that utilitarian choices on the footbridge dilemma increased as L2p decreased.
The inconsistency of the MFLE in personal dilemmas is not exclusive to the footbridge task. A non-significant MFLE was found in native-like Hindi–English bilinguals on two personal dilemmas in which direct actions on a TV show determined whether a family or a player would fall or be pushed into the water and lose all their prize money (Winskel & Bhatt, Reference Winskel and Bhatt2020). Likewise, no MFLE was observed in Dutch–English bilinguals across several personal dilemmas (Brouwer, Reference Brouwer2019). Contrastingly, a significant MFLE did emerge on a similar task and with a similar sample upon aggregating the results of the footbridge and other dilemmas in both written and auditory modalities (Brouwer, Reference Brouwer2021). A significant MFLE was also found on Spanish–English bilinguals for the terrorist dilemma, in which deciding to kill a terrorism hostage entails saving other five people (Corey et al., Reference Corey, Hayakawa, Foucart, Aparici, Botella, Costa and Keysar2017, experiment 1b).
Lastly, note that a battery of 24 moral dilemmas without personal/impersonal classifications also failed to yield evidence of an MFLE on Polish L1 speakers with either English, German, Spanish or French as their L2 (Białek et al., Reference Białek, Paruzel-Czachura and Gawronski2019). This study, as the one by Hayakawa et al. (Reference Hayakawa, Tannenbaum, Costa, Corey and Keysar2017), only found evidence of an MFLE when checking for more complex parameters than the utilitarian versus deontological distinction, as those provided by process-dissociation paradigms (Conway & Gawronski, Reference Conway and Gawronski2013) or the consequences, norms and preference for inaction (CNI) model (Gawronski et al., Reference Gawronski, Armstrong, Conway, Friesdorf and Hütter2017; Hennig & Hütter, Reference Hennig and Hütter2021). Different accounts looking to complexify the MFLE will be addressed in section 4.
Finally, we checked three non-peer-reviewed MFLE articles for discrepant evidence to account for potential publication bias effects. Only one met our exclusion criteria, reporting a null MFLE for the footbridge dilemma across high L2p bilinguals, alongside and non-significant tendency toward an MFLE in intermediate L2p bilinguals (Zeybek, Reference Zeybek2021).
Overall, across high L2p groups, impersonal moral dilemmas usually will yield non-significant MFLEs. Conversely, personal dilemmas yield unsystematic results, with half the experiments reporting non-significant MFLEs, even for the footbridge dilemma (44.4% non-significant MFLE). This pattern differs from that observed in intermediate L2p groups, who exhibited significant MFLEs in almost all personal dilemmas, especially the footbridge one. As discussed below, these patterns may be driven by numerous factors, including the self-reported nature of L2p levels, dilemma types and measurement methods.
4. Discussion
This systematic review examined the impact of L2p on the MFLE. Briefly, L2p rarely modulates responses to impersonal dilemmas, which typically yield non-significant MFLEs. Conversely, it does seem to impact personal dilemmas, with MFLEs proving consistent at intermediate L2p levels but unsystematic at high L2p levels. Below we discuss these findings, advance a multidimensional framework of the phenomenon and identify core challenges for the field.
The MFLE is systematically absent in impersonal dilemmas. The only exceptions correspond to a mild MFLE on the switch dilemma in Corey et al. (Reference Corey, Hayakawa, Foucart, Aparici, Botella, Costa and Keysar2017, experiment 1a) – though seven other impersonal dilemma experiments in the same report failed to find it – and to a battery of three auditory dilemmas in Brouwer (Reference Brouwer2019) – which also escaped replication in a later report (Brouwer, Reference Brouwer2021). More crucially for our current focus, the effect remains null irrespective of L2p, as moral decisions were almost always similar between languages in both intermediate and high L2p levels. Such is the case across different dilemmas and language pairs (Corey et al., Reference Corey, Hayakawa, Foucart, Aparici, Botella, Costa and Keysar2017; Geipel et al., Reference Geipel, Hadjichristidis and Surian2015). This observation aligns with meta-analytic evidence (Stankovic et al., Reference Stankovic, Biedermann and Hamamura2022) and is consistent with several reports performing L2p analyses of their experiments (Corey et al., Reference Corey, Hayakawa, Foucart, Aparici, Botella, Costa and Keysar2017; Costa et al., Reference Costa, Foucart, Hayakawa, Aparici, Apesteguia, Heafner and Keysar2014b; Geipel et al., Reference Geipel, Hadjichristidis and Surian2015; Shin & Kim, Reference Shin and Kim2017; Wong & Ng, Reference Wong and Ng2018). Overall, impersonal dilemmas fail to yield an MFLE across varied L2p levels.
Conversely, the MFLE does seem sensitive to lower L2p in the face of personal dilemmas. Utilitarian decisions in L2 increase systematically at intermediate L2p levels, with MFLEs emerging in 85% of studies. Yet, this pattern proves inconsistent at high L2p levels, as the MFLE appears in only 50% of studies. This discrepancy seems task-independent, as the effect has proven significant for intermediate- and null for high-proficiency bilinguals on the footbridge, the baby and the vitamins dilemmas (Brouwer, Reference Brouwer2019). Moreover, it has been reported in samples who speak typologically similar (e.g., Dutch–English; Brouwer, Reference Brouwer2019) and typologically different (e.g., Spanish–English; Corey et al., Reference Corey, Hayakawa, Foucart, Aparici, Botella, Costa and Keysar2017) languages. These patterns are noteworthy given that, as seen in different meta-analyses, the MFLE, at large, seems highly sensitive to task- and subject-level variables (Circi et al., Reference Circi, Gatti, Russo and Vecchi2021; Del Maschio et al., Reference Del Maschio, Crespi, Peressotti, Abutalebi and Sulpizio2022a). Indeed, a recent meta-analysis found a significant negative L2p effect on utilitarian decisions when exclusively targeting personal dilemmas (Stankovic et al., Reference Stankovic, Biedermann and Hamamura2022). This is also consistent with several reports performing L2p analyses of their experiments (Corey et al., Reference Corey, Hayakawa, Foucart, Aparici, Botella, Costa and Keysar2017; Costa et al., Reference Costa, Foucart, Hayakawa, Aparici, Apesteguia, Heafner and Keysar2014b; Geipel et al., Reference Geipel, Hadjichristidis and Surian2015; Shin & Kim, Reference Shin and Kim2017; Wong & Ng, Reference Wong and Ng2018). Interestingly, the two intermediate L2p experiments yielding a non-significant MFLE may not have met key requisites of the hypothesis. For instance, they depicted scenarios that may not actually represent incongruent personal dilemmas – e.g., killing your grandmother in spite after she denies you a gift (Chan et al., Reference Chan, Gu, Ng and Tse2016). Also, they involved highly similar languages, such as Swedish and Norwegian (Dylman & Champoux-Larsson, Reference Dylman and Champoux-Larsson2020), unlike others yielding significant MFLEs (Chan et al., Reference Chan, Gu, Ng and Tse2016; Costa et al., Reference Costa, Foucart, Hayakawa, Aparici, Apesteguia, Heafner and Keysar2014b; Shin & Kim, Reference Shin and Kim2017; Winskel & Bhatt, Reference Winskel and Bhatt2020), which were based on cross-script bilinguals (Chinese, Korean, Hebrew or Hindi as L1 and English as L2). Overall, L2p seems to be a robust modulator of the MFLE in personal dilemmas.
These patterns call for a conceptual framework on the role of L2p in the MFLE. We propose that L2p influences this effect due to its influences on various domains recruited by moral cognition tasks. Four such factors would be critical in this sense, namely: mental imagery vividness, inhibitory control, prosocial tendencies and numerical processing, all analyzed under the scope of affective processing and cognitive control efforts driven by personal incongruent moral dilemmas (Figure 5). Note that, as stated in section 3, these domains were identified based on the presence of specific evidence in the literature. Accordingly, this list should not be deemed exhaustive, as the impact of L2p on the MFLE may also be shaped by other factors (e.g., level of overlap between L1 and L2 semantic systems).
Most MFLE theories account for it by reference to a classic dual decision-making system involving intuitive versus rational decisions based on fast emotional or slow normative responses, respectively (Tversky & Kahneman, Reference Tversky and Kahneman1981). The role of affection and rational deliberation as separate factors has been widely revised in the MFLE literature (Hadjichristidis et al., Reference Hadjichristidis, Geipel, Keysar and Srinivasan2019; Hayakawa et al., Reference Hayakawa, Costa, Foucart and Keysar2016, Reference Hayakawa, Tannenbaum, Costa, Corey and Keysar2017; Pavlenko, Reference Pavlenko2017), as discussed in section 5. Current trends highlight the role of affectivity and cognitive control when bilinguals face conflicting situations, such as incongruent moral dilemmas, and entail complex processes that are intrinsically related even at neurological levels (Inzlicht et al., Reference Inzlicht, Bartholow and Hirsh2015; Okon-Singer et al., Reference Okon-Singer, Hendler, Pessoa and Shackman2015). For example, in deciding whether one person should die to save other five based on one's direct action, individuals' responses exhibit more negative emotional valence and arousal (Christensen et al., Reference Christensen, Flexas, Calabrese, Gut, Gomila, Decety, Van, Stock and Leuven2014; Tasso et al., Reference Tasso, Sarlo and Lotto2017), alongside increased brain activation of emotional processing areas (Greene et al., Reference Greene, Sommerville, Nystrom, Darley and Cohen2001; Schaich Borg et al., Reference Schaich Borg, Hynes, Van Horn, Grafton and Sinnott-Armstrong2006; Xue et al., Reference Xue, Wang and Tang2013). Yet, concepts such as pure “emotion reduction” or “cognitive load excess” have been deemed too broad (McFarlane & Perez, Reference McFarlane and Perez2020) or empirically inconclusive (Hadjichristidis et al., Reference Hadjichristidis, Geipel, Keysar and Srinivasan2019) respectively, to be pointed as direct originators of the MFLE. Instead, it is likely that only specific processes of affectivity and cognitive control are involved in bilingual decision making on incongruent moral dilemmas. Compatibly, the presented theoretical framework will highlight four relevant factors for moral decision making on bilinguals under the scope of evidence on affectivity and cognitive control related processes, along with postulates on how low L2p might modulate them toward utilitarian choices in personal dilemmas.
4.1. Mental imagery
The scenarios in moral dilemma tasks evoke rich mental imagery, including conceptualizations and sensorimotor experiences associated with the situations at hand (Pearson et al., Reference Pearson, Naselaris, Holmes and Kosslyn2015). In personal dilemmas, reduced visual imaging of the intentional, instrumentalized, negative and harmful action has been proposed to increase utilitarian choices (Corey et al., Reference Corey, Hayakawa, Foucart, Aparici, Botella, Costa and Keysar2017; Hayakawa & Keysar, Reference Hayakawa and Keysar2018; Klenk, Reference Klenk2021). Conversely, harmful means tend to be more vivid than their beneficial ends (Amit & Greene, Reference Amit and Greene2012), activating affective and prosocial deterrents of hurtful behavior.
L2p can influence the vividness of mental imagery while reading moral dilemmas. In this sense, L2p correlates positively with imagery skills (Altın et al., Reference Altın, Okur, Yalçın, Eraçıkbaş and Aktan-Erciyes2022) and with vividness of motor imaging simulations (Hayakawa & Keysar, Reference Hayakawa and Keysar2018) during L2 processing. Indeed, the greater the L2p, the stronger the coupling of motor brain networks during L2 text reading, suggesting more consolidated embodied simulations that resemble L1 processing (Birba et al., Reference Birba, Beltrán, Martorell Caro, Trevisan, Kogan, Sedeño, Ibáñez and García2020). Therefore, lower L2p could entail reduced sensorimotor reactivations and less vivid mental visualizations of the action on the victim, dampening affective reactions against harmful behavior. This would increase interpersonal detachment, favoring more utilitarian choices in L2 than in L1.
4.2. Inhibitory control
In moral (and, more particularly, personal) dilemmas, difficulties with inhibitory control – namely, the capacity to suppress ongoing thoughts, actions and emotions (Lucifora et al., Reference Lucifora, Martino, Curcuruto, Salehinejad and Vicario2021; Petersen et al., Reference Petersen, Hoyniak, McQuillan, Bates and Staples2016) – can increase utilitarian decisions (Lucifora et al., Reference Lucifora, Martino, Curcuruto, Salehinejad and Vicario2021; van den Bos et al., Reference van den Bos, Müller and Damen2011), likely by reducing cognitive control mechanisms that are prompted against an action/inaction choice. Indeed, inhibitory behavior toward harmful actions in personal moral decision making is likely prompted by affective aversion reported as negative arousal (McDonald et al., Reference McDonald, Defever and Navarrete2017), and it correlates with an increase in inhibition-related neurotransmitters, such as serotonin (Crockett et al., Reference Crockett, Clark, Hauser and Robbins2010; Pattij & Schoffelmeer, Reference Pattij and Schoffelmeer2015).
L2p may affect bilinguals' inhibitory control. In this sense, L2p is related positively with better inhibitory control in response to L2 stimuli, as seen in studies using the Simon task (Goral et al., Reference Goral, Campanelli and Spiro2015), the Stroop task (Hui et al., Reference Hui, Yuan, Fong and Wang2020) and other standard go/no-go inhibition tasks (Thanissery et al., Reference Thanissery, Parihar and Kar2020), likely because bilinguals have to develop stronger cognitive control systems as they process L2 stimuli more efficiently when they can inhibit and break away from L1 lexical schemas (Grant et al., Reference Grant, Legault, Li, Schwieter and Paradis2019). Since prepotent response suppression seems critical to process stimuli that provoke stronger preferences for deontological inaction, like personal moral dilemmas (Amit & Greene, Reference Amit and Greene2012; McDonald et al., Reference McDonald, Defever and Navarrete2017), a lower L2p could entail less inhibitory control on personal moral dilemmas, thus reducing affective and cognitive action deterrents and increasing utilitarian decisions.
4.3. Prosocial behavior
In sacrificial dilemmas, utilitarian decisions can be reduced by prosocial behavior – i.e., tendencies for positive social behavior toward others (Pfattheicher et al., Reference Pfattheicher, Nielsen and Thielmann2022) – more empathic concern (Djeriouat & Trémolière, Reference Djeriouat and Trémolière2014; Körner et al., Reference Körner, Deutsch and Gawronski2020; Takamatsu, Reference Takamatsu2018), more honest and humble personality traits (Djeriouat & Trémolière, Reference Djeriouat and Trémolière2014), an enhanced social context by public reveal of decisions (Andersson et al., Reference Andersson, Erlandsson, Västfjäll and Tinghög2020), adherence to social norms (Körner et al., Reference Körner, Deutsch and Gawronski2020) and reduced psychopathic traits (Körner et al., Reference Körner, Deutsch and Gawronski2020).
L2p could modulate prosocial traits in bilinguals. Notably, prosocial personality traits seem to weaken as L2p decreases. This has been shown, for example, in bilingualism studies tapping on altruism (Liu et al., Reference Liu, Wang, Timmer and Jiao2022), prosocial amicability (Miller et al., Reference Miller, Solis-Barroso and Delgado2021) and empathic concern (Dewaele & Wei, Reference Dewaele and Wei2012). The evidence further suggests that higher L2p might be correlated with enhanced fast emotional reactions to social contexts (Liu et al., Reference Liu, Wang, Timmer and Jiao2022), likely because proficient bilinguals have easier access to emotional and emotion-laden words related to socialization and cooperation (Ferré et al., Reference Ferré, Guasch, Stadthagen-Gonzalez and Comesaña2022; Miller et al., Reference Miller, Solis-Barroso and Delgado2021), and an empathic tendency toward learning their L2 properly (Dewaele & Wei, Reference Dewaele and Wei2012). Social detachment as a result of bilingual experience has been often proposed as a potential explanation of the MFLE, with different works discussing how specific contexts of L2 acquisition and use could blunt emotional and normative responses (Del Maschio et al., Reference Del Maschio, Del Mauro, Bellini, Abutalebi and Sulpizio2022b; Hadjichristidis et al., Reference Hadjichristidis, Geipel, Keysar and Srinivasan2019; Hayakawa et al., Reference Hayakawa, Costa, Foucart and Keysar2016; Miozzo et al., Reference Miozzo, Navarrete, Ongis, Mello, Girotto and Peressotti2020). In this sense, L2p influences on prosociality might further modulate the MFLE. Specifically, reduced altruism and empathy in low L2p individuals might favor more interpersonally detached decisions. This would increase utilitarian choices on L2 moral decisions, potentially reflecting lower adherence to social norms (Białek et al., Reference Białek, Paruzel-Czachura and Gawronski2019; Hennig & Hütter, Reference Hennig and Hütter2021) against lesser access to affective and prosocial cognitive resources.
4.4. Numerical processing
Numerical words shape the development and integration of numerosity skills (Leibovich et al., Reference Leibovich, Katzin, Harel and Henik2017) by evoking sensory-motor and abstract connotations of their referents (Fischer, Reference Fischer2018). This domain is central to moral dilemmas, which hinge heavily on quantitative estimations. Indeed, the number of potential victims when choosing not to act predicts the probability of making a utilitarian decision (Cao et al., Reference Cao, Zhang, Song, Wang, Miao and Peng2017; Tassy et al., Reference Tassy, Oullier, Mancini and Wicker2013). Simply put, the more potential victims the moral dilemma presents, the more likely it is to decide to push the person from the footbridge.
Suggestively, lower L2p individuals may find it harder to engage in context-sensitive quantitative processing in L2, favoring more literal and grammatical cues (Hoshino et al., Reference Hoshino, Dussias and Kroll2010). Indeed, they tend to engage non-relevant grammatical L1 mechanisms when weighing numerical magnitudes in L2, which affects processing of the latter (Van Rinsveld et al., Reference Van Rinsveld, Schiltz, Landerl, Brunner and Ugen2016), which likely occurs because L2 numerical processing in less-proficient L2 users is not as efficient as in L1, maybe even leading them to rely on L1 conceptual representations (Garcia et al., Reference Garcia, Faghihi, Raola and Vaid2021). Thus, implicit estimations of the number of victims might further bias moral decisions depending on L2p. Specifically, if lower L2p reduces sensitivity to conceptual and abstract quantities, it could also interfere with weighing the contextual impact of how many people would die in the dilemma, reducing affective and prosocial reactions. This would increase the chances of utilitarian decisions, and therefore an MFLE, in mid-proficiency relative to high-proficiency bilinguals.
4.5. Theoretical considerations and implications
Briefly, in the realm of personal dilemmas, we posit that the impact of L2p on the MFLE would be mediated by affective and cognitive factors of at least mental imagery, inhibitory control, prosocial behavior tendencies and numerical processing.
Importantly, this view also accounts for the absence of L2p modulations, and of the MFLE at large, in impersonal dilemmas. Overall, relative to personal dilemmas, impersonal ones show no predominance of personal force (Bago et al., Reference Bago, Kovacs, Protzko, Nagy, Kekecs, Palfi, Adamkovic, Adamus, Albalooshi, Albayrak-Aydemir, Alfian, Alper, Alvarez-Solas, Alves, Amaya, Andresen, Anjum, Ansari, Arriaga and Aczel2022), an increased preference for action (Corey et al., Reference Corey, Hayakawa, Foucart, Aparici, Botella, Costa and Keysar2017; Stankovic et al., Reference Stankovic, Biedermann and Hamamura2022) and the already discussed lack of MFLE on bilinguals. In our proposed theoretical framework, reduced L2p would modulate different factors of affectivity and cognitive control that specifically increase preference for action on personal dilemmas. Yet, impersonal dilemmas already showcase higher utilitarian rates than the ones produced by MFLE on personal ones (Corey et al., Reference Corey, Hayakawa, Foucart, Aparici, Botella, Costa and Keysar2017; Costa et al., Reference Costa, Foucart, Hayakawa, Aparici, Apesteguia, Heafner and Keysar2014b; Geipel et al., Reference Geipel, Hadjichristidis and Surian2015; Stankovic et al., Reference Stankovic, Biedermann and Hamamura2022). In impersonal dilemmas, mental imagery of the victim can prove less vivid (Amit & Greene, Reference Amit and Greene2012), autonomic inhibitory reactions are reduced (McDonald et al., Reference McDonald, Defever and Navarrete2017), empathic traits exert little influence (Nasello & Triffaux, Reference Nasello and Triffaux2023) and so does the variance of number of lives (Cao et al., Reference Cao, Zhang, Song, Wang, Miao and Peng2017).
Broader evidence on affectivity and cognitive control shows that, compared with personal dilemmas, impersonal ones seem to involve reduced emotional engagement (Christensen et al., Reference Christensen, Flexas, Calabrese, Gut, Gomila, Decety, Van, Stock and Leuven2014), less sensitivity to negative arousal states (Chan et al., Reference Chan, Gu, Ng and Tse2016; McDonald et al., Reference McDonald, Defever and Navarrete2017; Wong & Ng, Reference Wong and Ng2018; Youssef et al., Reference Youssef, Dookeeram, Basdeo, Francis, Doman, Mamed, Maloo, Degannes, Dobo, Ditshotlo and Legall2012) and lower conflict processing demands (Xue et al., Reference Xue, Wang and Tang2013). Overall, if these are all factors less markedly involved in L1 impersonal dilemmas, then their modulation by L2p would be negligible during L2 tasks, which would account for the absence of an MFLE in a dilemma type that already reduces affective and rational action aversion by itself.
This work carries four main implications. First, reciprocal links between linguistic and socio-cognitive skills have been reported in varied populations. For example, comprehension of social concepts correlates with the integrity of social cognition networks in neurodegenerative patients (Birba et al., Reference Birba, Fittipaldi, Cediel Escobar, Gonzalez Campo, Legaz, Galiani, Díaz Rivera, Martorell Caro, Alifano, Piña-Escudero, Cardona, Neely, Forno, Carpinella, Slachevsky, Serrano, Sedeño, Ibáñez and García2022; Lopes da Cunha et al., Reference Lopes da Cunha, Fittipaldi, González Campo, Kauffman, Rodríguez-Quiroga, Yacovino, Ibáñez, Birba and García2023), and emotional language can bias moral judgments in laypersons but not in legal experts (Baez et al., Reference Baez, Patiño-Sáenz, Martínez-Cotrina, Aponte, Caicedo, Santamaría-García, Pastor, González-Gadea, Haissiner, García and Ibáñez2020). Our study adds to this trend, showing that socio-affective functions may also be shaped by individual language profiles. Second, different proposals have emerged to characterize bilingual social cognition (Hayakawa et al., Reference Hayakawa, Costa, Foucart and Keysar2016; Pavlenko, Reference Pavlenko2017), but these have failed to systematically account for the role of L2p. The present framework partly bridges this gap, offering more nuanced views of the phenomenon while identifying specific factors to be operationalized in future research. Also, to our knowledge, this is the first MFLE review focused on the impact of L2p, offering a fine-grained view that escapes previous meta-analytical and theoretical works (Circi et al., Reference Circi, Gatti, Russo and Vecchi2021; Del Maschio et al., Reference Del Maschio, Crespi, Peressotti, Abutalebi and Sulpizio2022a; Stankovic et al., Reference Stankovic, Biedermann and Hamamura2022). Moreover, no previous work has advanced a mechanistic account of the multifactorial impact of L2p on mediators of the MFLE, let alone while including a rationale of the null MFLE typically observed in impersonal dilemmas. Third, insofar as social cognition mediates daily educational events (Li & Jeong, Reference Li and Jeong2020; Sato, Reference Sato2017), understanding these links could inform L2 classroom management practices. For example, depending on their students' L2p, teachers could consider whether group activities requiring decisions from group leaders should be performed in L2 and/or supported by instructor's facilitation. Indeed, socio-cognitive domains play increasingly prominent roles in L2 learning models (Cancienne, Reference Cancienne and Mullen2019; Miri & Pishghadam, Reference Miri and Pishghadam2021; Pishghadam et al., Reference Pishghadam, Adamson and Shayesteh2013). Also, social cognition might impact clinical decision making, inviting reflections on how to manage L2-based interactions. For instance, when bilingual caregivers are faced with decisions on a relative's health and its impact on their family, establishing their L2p might be critical to establish which language should mediate communication with physicians – especially in cases when these do not speak the same L1 as the caregivers. Finally, since L2 research can inform public safety (Pavlenko, Reference Pavlenko2017) and educational policies (Garcia, Reference Garcia and Coulmas2017), important translational insights may be derived from systematic consideration of L2p in the field.
5. Outstanding challenges and future research
The evidence and the framework presented above enable new reflections on the role of L2p in moral cognition. Yet, many shortcomings can be identified, paving the way for further research. Here we discuss four core challenges to be addressed in future works.
In line with more than half of studies on bilingualism (Park et al., Reference Park, Solon, Dehghan-Chaleshtori and Ghanbar2022), L2p measures in our corpus are mainly restricted to subjective measures. Granted, these measures have been shown to correlate with objective outcomes and to predict behavioral performance in relevant tasks (Gollan et al., Reference Gollan, Weissbergr, Runnqvist, Montoya and Cera2012; Gullifer et al., Reference Gullifer, Kousaie, Gilbert, Grant, Giroud, Coulter, Klein, Baum, Phillips and Titone2021; Langdon et al., Reference Langdon, Wiig and Nielsen2005; Marian et al., Reference Marian, Blumenfeld and Kaushanskaya2007; Santilli et al., Reference Santilli, Vilas, Mikulan, Martorell Caro, Muñoz, Sedeño, Ibáñez and García2019). However, they are prone to self-image and desirability biases, and their results are often mis-analyzed as being normally distributed (Veríssimo, Reference Veríssimo2021). Importantly, responses to moral dilemmas might be influenced by aspects of proficiency that are often overlooked by standard instruments, such as how comfortable participants feel when using the L2 or how often they are exposed to the language. These factors might influence at least some of the modulating variables of our model (e.g., prosociality), ultimately shaping moral decision patterns and the MFLE. Future MFLE research should expand the standard operationalization of L2p to delve into these issues. We recognize that other L2p quantifications have been proposed in the literature and that participants' classification into higher or lower L2p groups can be affected by different criteria. Future works could explore whether the MFLE patterns established here remain stable across distinct L2p quantification systems.
Also, few studies control for intercultural and interlinguistic variables when comparing multiple samples using different L1 and L2 pairs, even though proficiency ratings differ between them (Hulstijn, Reference Hulstijn2011, Reference Hulstijn2012; Tomoschuk et al., Reference Tomoschuk, Ferreira and Gollan2019). Thus, the field would greatly profit from the addition of broad, objective assessments of general and specific L2 skills. Moreover, as recently proposed (Claussenius-Kalman et al., Reference Claussenius-Kalman, Hernandez and Li2021; Dewaele & Wei, Reference Dewaele and Wei2012; Gullifer et al., Reference Gullifer, Kousaie, Gilbert, Grant, Giroud, Coulter, Klein, Baum, Phillips and Titone2021), future works should enrich L2p assessments with measures of interacting factors, such as daily L2 usage (Del Maschio et al., Reference Del Maschio, Del Mauro, Bellini, Abutalebi and Sulpizio2022b; Sulpizio et al., Reference Sulpizio, Del Maschio, Del Mauro, Fedeli and Abutalebi2020), exposure (Gullifer et al., Reference Gullifer, Kousaie, Gilbert, Grant, Giroud, Coulter, Klein, Baum, Phillips and Titone2021) and L2 entropy – i.e., the balance of interactional contexts (Gullifer et al., Reference Gullifer, Kousaie, Gilbert, Grant, Giroud, Coulter, Klein, Baum, Phillips and Titone2021; Gullifer & Titone, Reference Gullifer and Titone2020). This could be achieved drawing from recent models (Hulstijn, Reference Hulstijn2020; Marian & Hayakawa, Reference Marian and Hayakawa2021; Titone & Tiv, Reference Titone and Tiv2022) that capture the influence of contextualized individual experience (including L2p) on bilingual profiles in general, and on moral cognition in particular.
Moreover, the mediating domains we identified above are also influenced by other aspects of bilingual experience that correlate with L2p, such as age of L2 acquisition (Bialystok, Reference Bialystok2015; Durand López, Reference Durand López2021; Gullifer & Titone, Reference Gullifer and Titone2020; Kapa & Colombo, Reference Kapa and Colombo2013), flexibility for communicative contexts of use (Gullifer et al., Reference Gullifer, Kousaie, Gilbert, Grant, Giroud, Coulter, Klein, Baum, Phillips and Titone2021) and L2 exposure (Anderson et al., Reference Anderson, Hawrylewicz and Bialystok2020; Gullifer et al., Reference Gullifer, Kousaie, Gilbert, Grant, Giroud, Coulter, Klein, Baum, Phillips and Titone2021; Tomoschuk et al., Reference Tomoschuk, Ferreira and Gollan2019; Vukovic, Reference Vukovic2013). Insofar as these use-related variables are key drivers of socio-emotional and cognitive effects in bilinguals, they may also influence the impact of L2p on the MFLE. Yet, depending on the task, the impact of L2p on different domains may be partly independent from other aspects of bilingual experience (Archila-Suerte et al., Reference Archila-Suerte, Zevin, Bunta and Hernandez2012; Del Maschio et al., Reference Del Maschio, Del Mauro, Bellini, Abutalebi and Sulpizio2022b; Oh et al., Reference Oh, Graham, Ng, Yeh, Chan and Edwards2019; Wartenburger et al., Reference Wartenburger, Heekeren, Abutalebi, Cappa, Villringer and Perani2003). New studies should be designed to disentangle the relative contributions of all these subject variables to the MFLE. This would allow compiling robust data on participants' bilingual experience factors and modeling their influence on moral dilemma responses in both L1 and L2.
Other key constructs would also benefit from more refined definitions and operationalizations. For example, the notion of reduced emotional responses in L2 has been proposed as a partial explanation of the MFLE since the first report on the topic (Costa et al., Reference Costa, Foucart, Hayakawa, Aparici, Apesteguia, Heafner and Keysar2014b). However, conceptualizations of emotional responses often overlook critical factors and fail to capture their full complexity (McFarlane & Perez, Reference McFarlane and Perez2020), which may partly account for the mixed results regarding emotional reduction in personal moral dilemmas (Chan et al., Reference Chan, Gu, Ng and Tse2016; McDonald et al., Reference McDonald, Defever and Navarrete2017; Wong & Ng, Reference Wong and Ng2018; Youssef et al., Reference Youssef, Dookeeram, Basdeo, Francis, Doman, Mamed, Maloo, Degannes, Dobo, Ditshotlo and Legall2012). Our proposed framework, and the field at large, could be enriched by more fine-grained approaches to this construct. Thorough screenings are required of culturally situated discretized emotions relevant to moral dilemmas (Michelini et al., Reference Michelini, Acuña, Guzmán and Godoy2019). Indeed, condensing complex emotions and emotion-laden stimuli into basic affective features such as valence and arousal risks missing cultural differences key for comparing samples (Ferré et al., Reference Ferré, Guasch, Stadthagen-Gonzalez and Comesaña2022; Lim, Reference Lim2016; Schiller et al., Reference Schiller, Yu, Alia-Klein, Becker, Cromwell, Dolcos, Eslinger, Frewen, Kemp, Pace-Schott, Raber, Silton, Stefanova, Williams, Abe, Aghajani, Albrecht, Alexander, Anders and Leonie2023; Yik et al., Reference Yik, Mues, Sze, Kuppens, Tuerlinckx, De Roover, Kwok, Schwartz, Abu-Hilal, Adebayo, Aguilar, Al-Bahrani, Anderson, Andrade, Bratko, Bushina, Choi, Cieciuch, Dru and Russell2023). Furthermore, harmonized parameters are needed to constrain assumptions on emotional state types (McFarlane & Perez, Reference McFarlane and Perez2020), and normative data to define emotion valence baselines between groups (McFarlane & Perez, Reference McFarlane and Perez2020). While MFLE research mainly aims to detect increments or reductions of emotional states, several studies lack control dilemmas and general non-emotional baselines are typically absent, precluding robust comparisons even within studies.
By the same token, the standard dichotomy between a utilitarian and a deontological choice might oversimplify the processes underlying decisions in moral dilemmas. Interesting insights come from a method aimed to disentangle parameters of deontology and utilitarianism in moral dilemmas. Conway and Gawronski (Reference Conway and Gawronski2013) presented these parameters as components of a singular decision process and captured their probability of driving responses based on answers to congruent and incongruent moral dilemmas. With this approach, Hayakawa et al. (Reference Hayakawa, Tannenbaum, Costa, Corey and Keysar2017) found a differential reduction only on deontological responses in L2. This finding challenges MFLE accounts focused on heightened utilitarianism, as such a pattern was actually reduced in half the studies reported. Compatibly, applications of the CNI model (Gawronski et al., Reference Gawronski, Armstrong, Conway, Friesdorf and Hütter2017) have revealed a distinct decrease of sensitivity to social norms in L2 moral dilemmas (Białek et al., Reference Białek, Paruzel-Czachura and Gawronski2019; Feng & Liu, Reference Feng and Liu2022; Hennig & Hütter, Reference Hennig and Hütter2021). Despite criticism against these models' parameters (Baron & Goodwin, Reference Baron and Goodwin2020; Kunnari et al., Reference Kunnari, Sundvall and Laakasuo2020), a mosaic, dimensional view of deontological and utilitarian decision could deepen our understanding of the MFLE and its links with L2p (Gawronski et al., Reference Gawronski, Conway, Hütter, Luke, Armstrong and Friesdorf2020; Kroneisen & Heck, Reference Kroneisen and Heck2020; Luke & Gawronski, Reference Luke and Gawronski2022; Zhang et al., Reference Zhang, Kong, Li, Zhao and Gao2018).
Utilitarianism, in particular, is often described as a moral commonsensical process that weighs welfare. Yet, it has been proposed to represent an impartial universal principle seeking maximal welfare for everyone, irrespective of personal values, closeness to the victim and gravity of consequences, among other factors (Kahane, Reference Kahane2015). Therefore, reports of “utilitarian choices” to sacrificial dilemmas modulated by aversion to harm others, less empathic concern or psychopathic or egotistic traits, may actually describe a proto-utilitarian principle driven by how convenient it is to cause instrumental harm (Everett & Kahane, Reference Everett and Kahane2020). Recent instruments (Kahane et al., Reference Kahane, Everett, Earp, Caviola, Faber, Crockett and Savulescu2018) allow capturing these distinctions, which may illuminate important aspects of how cognitive variables, including L2p and its modulating factors, shape moral cognition.
In this sense, it would be interesting to examine whether lower L2p differentially impacts utilitarianism in both its “negative” (permissiveness toward instrumental harm) and “positive” (impartial, universal beneficence) dimensions. These dimensions differ based on respondents' nationality and personality (Everett et al., Reference Everett, Colombatto, Awad, Boggio, Bos, Brady, Chawla, Chituc, Chung, Drupp, Goel, Grosskopf, Hjorth, Ji, Kealoha, Kim, Lin, Ma, Maréchal and Crockett2021; Navajas et al., Reference Navajas, Heduan, Garbulsky, Tagliazucchi, Ariely and Sigman2021), highlighting the relevance of individual factors. L2p might be one of such variables. In particular, lower L2p would be related to lower altruism, amicability and empathic concern, which are negatively associated with instrumental harm and psychopathy (Dewaele & Wei, Reference Dewaele and Wei2012; Everett & Kahane, Reference Everett and Kahane2020; Liu et al., Reference Liu, Wang, Timmer and Jiao2022; Miller et al., Reference Miller, Solis-Barroso and Delgado2021). Utilitarian choices could thus be increased based on the “negative” proto-utilitarian principle of instrumental harm for welfare. Strategic empirical studies would be needed to test this conjecture.
Four additional points should be noted for future research. First, building on studies with native-language tasks (Crockett et al., Reference Crockett, Siegel, Kurth-Nelson, Dayan and Dolan2017; Riva et al., Reference Riva, Manfrinati, Sacchi, Pisoni and Romero Lauro2019; Van Bavel et al., Reference Van Bavel, FeldmanHall and Mende-Siedlecki2015), the field could incorporate neuroscientific insights, including research on the neural regions and electrophysiological mechanisms underpinning between-language differences during moral decision making. This would be crucial, for instance, to find dissociations between moral decision processes and our framework's L2p-related modulators. Second, it would be useful to favor more naturalistic settings. Typical dilemmas allow for tight control of important variables but they are distant from the dilemmas that people face daily. If, as recently proposed, moral decisions are influenced by the plausibility of dilemmas (Carron et al., Reference Carron, Blanc and Brigaud2022; Kneer & Hannikainen, Reference Kneer and Hannikainen2022; Körner et al., Reference Körner, Joffe and Deutsch2019) and participant engagement (Körner & Deutsch, Reference Körner and Deutsch2023), then current notions about the MFLE could be enriched or even challenged by more ecological paradigms. Third, utilitarian decisions seem to increase when made by groups rather than by individuals, arguably because group increases detachment from social norms (Keshmirian et al., Reference Keshmirian, Deroy and Bahrami2022) and from rational views in welfare discussions (Curşeu et al., Reference Curşeu, Fodor, Pavelea and Meslec2020). Yet, no study has assessed group-level moral judgment in bilinguals, let alone focusing on L2p. This important gap should be addressed via novel designs in future research, comparing L1 performance with L2 outcomes in bilingual groups with varying L2p levels. For example, a battery of moral dilemmas (Hayakawa et al., Reference Hayakawa, Tannenbaum, Costa, Corey and Keysar2017) could be presented in written form to each individual for self-completion, and then another set of comparable tasks could be administered to each group for communal discussion and consensual decision making – exclusively in L1 or in L2, respectively. This would enable comparisons between individual and group outcomes, revealing the extent to which distributed deliberation impinges on the MFLE across L2p levels. Audio recordings of the discussions could allow for automated transcription analyses to detect argumentative and otherwise communicative patterns in each group. Finally, although a systematic review was suitable for our aim of developing a theoretical framework, future works could employ complementary approaches, such as meta-analyses.
6. Conclusions
The MFLE seems sensitive to L2p, especially in the case of personal moral dilemmas. This effect may be mediated by mental imagery, inhibitory control, tendencies for prosocial behavior and numerical processing, all of which are sensitive to L2p. This multidimensional framework affords a synthetic explanation of diverse results in the current literature, opening rich avenues for systematic future research.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S1366728924000312.
Data availability statement
No data were used in this work other than the information retrieved from the papers reviewed, as summarized in the Supplementary material.
Acknowledgments
Federico Teitelbaum Dorfman was supported by an EVC-CIN 2019 grant from the Consejo Interuniversitario Nacional, Argentina. Pablo Barttfeld was supported by Agencia Nacional de Promoción Científica y Tecnológica, Argentina (grants 2018-0314 and 2021-CAT-I-00083). Adolfo M. García is an Atlantic Fellow at the Global Brain Health Institute (GBHI) and is partially supported by the National Institute On Aging of the National Institutes of Health (R01AG075775); ANID (FONDECYT Regular 1210176, 1210195); GBHI, Alzheimer's Association, and Alzheimer's Society (Alzheimer's Association GBHI ALZ UK-22-865742); Universidad de Santiago de Chile (DICYT 032351MA) and the Multi-partner Consortium to Expand Dementia Research in Latin America (ReDLat), which is supported by the Fogarty International Center and the National Institutes of Health, the National Institute on Aging (R01AG057234, R01AG075775, R01AG21051 and CARDS-NIH), Alzheimer's Association (SG-20-725707), Rainwater Charitable Foundation's Tau Consortium, the Bluefield Project to Cure Frontotemporal Dementia and the Global Brain Health Institute. The contents of this publication are solely the responsibility of the authors and do not represent the official views of these institutions.
Competing interests
None.