1 Introduction
This article investigates the size and direction of seven helping effects. A “helping effect” occurs anytime a situational factor increases or decreases helping. Situational factors can be related to victim characteristics (e.g., Reference Fisher and MaFisher & Ma, 2014), presentation of the problem (Reference Erlandsson, Nilsson and VästfjällErlandsson et al., 2018), surrounding context (Reference Carroll, White and PahlCarroll et al., 2011), psychological distance (Reference Ein-Gar and LevontinEin-Gar & Levontin, 2013), type of request (Reference Feiler, Tost and GrantFeiler et al., 2012) and more. Helping effects are frequently studied in social and organizational psychology, health economics and judgement and decision making, and typically investigated by experimentally manipulating a situational factor and then measuring prosocial motivation, behavioral intentions, or actual helping behavior.
This article examines seven well-known helping effects in a more rigorous and systematic way than what has been done previously. This is done by using a unified experimental paradigm, testing each helping effect in its “weak form” (implying equal efficiency) and in its “strong form” (implying unequal efficiency), as well as in three different decision modes (separate evaluation, joint evaluation and forced choice). This approach makes it possible to compare how the attributes central in the seven helping effects differ in evaluability, justifiability, and prominence (Reference Li and HseeLi & Hsee, 2019; Reference SlovicSlovic, 1975).
To facilitate understanding of key terms, I will first describe one of the seven helping effects, and use that effect to illustrate how “weak” and “strong” helping effects differ, and how helping effects can be tested in different decision modes. I will then describe the other six helping effects.
1.1 The identified victim effect
The identified victim effect (IVE) predicts that people are more motivated to help when the victims possible to save are identified than when they are non-identified (Reference Butts, Lunt, Freling and GabrielButts et al., 2019; Reference Kogut and RitovKogut & Ritov, 2015; Reference Lee and FeeleyLee & Feeley, 2016; Reference Small and LoewensteinSmall & Loewenstein, 2003). Identifiability can be increased in several ways, such as adding a name, a photo, or a personal background story of the people in need (Reference Kogut and RitovKogut & Ritov, 2005a; Reference Thornton, Kirchner and JacobsThornton et al., 1991). The underlying psychological mechanisms of the IVE are assumed to be different types of emotional reactions (e.g., distress, sympathy, anticipated warm glow if helping, or anticipated guilt if not helping) which are more intense when faced with identified victims than with non-identified statistical victims (Reference Ahn, Kim and AggarwalAhn et al., 2014; Reference Erlandsson, Björklund and BäckströmErlandsson et al., 2015; Reference Genevsky, Västfjäll, Slovic and KnutsonGenevsky et al., 2013; Reference Sah and LoewensteinSah & Loewenstein, 2012). The IVE is an influential and widely investigated helping effect, but its robustness has been questioned (Reference Hart, Lane and ChinnHart et al., 2018; Reference Lesner and RasmussenLesner & Rasmussen, 2014; Reference Perrault, Silk, Sheff, Ahn, Hoffman and TotzkayPerrault et al., 2015; Reference Wiss, Andersson, Slovic, Västfjäll and TinghögWiss et al., 2015), and the effect seems to come with several boundary conditions (Reference Ein-Gar and LevontinEin-Gar & Levontin, 2013; Reference Friedrich and McGuireFriedrich & McGuire, 2010; Reference KogutKogut, 2011; Reference Kogut and KogutKogut & Kogut, 2013; Reference Kogut and RitovKogut & Ritov, 2007; Reference Smith, Faro and BursonSmith et al., 2013).
Two studies in this article investigate the IVE. Study IVE1 takes place in a child cancer context whereas IVE2 takes place in a COVID-19 context. In both studies, helping motivation when faced with identified patients is compared against helping motivation when faced with non-identified patients.
1.2 The “weak” and “strong” forms of helping effects
Helping effects are sometimes seen as biases because they imply that lives are valued unequally (Reference Baron, Szymanska, Oppenheimer and OlivolaBaron & Szymanska, 2011; Reference Caviola, Schubert and NemirowCaviola et al., 2020; Reference Dickert, Västfjäll, Kleber and SlovicDickert et al., 2012; Reference SlovicSlovic, 2007). A related distinction is the number of individuals possible to save in the opposing help projects. Sometimes, two projects which differ on only one attribute are compared against each other when testing a helping effect. At other times, the projects differ also on an efficiency-related attribute such as the number of victims possible to save, so that one option is superior on the efficiency attribute whereas the other is superior on the helping attribute of interest. I will refer to the former (equal efficiency) as a weak helping effect, and to the latter (unequal efficiency) as a strong helping effect. To exemplify, the “weak IVE” is here tested by comparing a project that can save 3 non-identified patients against a project that can save 3 identified patients, whereas the “strong IVE” is tested by comparing a project that can save 3 non-identified patients against a project that can save 1 identified patient.Footnote 1
Importantly, to establish that a helping effect exists in its weak form does not imply that it exists also in its strong form. One could, e.g., find evidence for a weak IVE (i.e., people help more when they can help 3 identified compared to 3 non-identified), but at the same time find a reversed strong IVE (i.e., people help more when they can help 3 non-identified than when they can help 1 identified). With a few exceptions (Reference MataMata, 2016), past research has not clearly distinguished weak from strong helping effect.Footnote 2
1.3 Testing helping effects in different decision modes: separate evaluation, joint evaluation and forced choice
Helping effects can be tested in different decision modes. Specifically, the distinction between separate evaluation and joint evaluation has been very influential for research on judgment and decision making (Reference Bazerman, Moore, Tenbrunsel, Wade-Benzoni and BlountBazerman et al., 1999; Reference Bohnet, Van Geen and BazermanBohnet et al., 2016; Reference Hsee, Loewenstein, Blount and BazermanHsee et al., 1999; Reference Hsee, Zhang, Wang and ZhangHsee et al., 2013; Reference Hsee and ZhangHsee & Zhang, 2004; Reference Paharia, Kassam, Greene and BazermanPaharia et al., 2009). In one famous study (Reference HseeHsee, 1996), people had to express their willingness to pay for two dictionaries. Dictionary A had 5000 entries and was in mint condition whereas dictionary B had 10000 entries but a torn cover. When one group of participants valued only A and another group valued only B (separate evaluation), it was found that mean willingness to pay was greater for dictionary A. Among participants who saw and valued both A and B (joint evaluation), the willingness to pay was instead greater for dictionary B, indicating an evaluation mode-elicited preference reversal. The suggested explanation for this is that the number of entries (but not the quality of the cover) is difficult to evaluate in separate evaluation, but that joint evaluation brings meaning to the number of entries by introducing a reference point. In line with this, emotional reactions predict attitudes toward policies more in separate evaluation (Reference Ritov and BaronRitov & Baron, 2011), whereas joint evaluation makes us more attentive to efficiency-related attributes (Reference Bazerman, Gino, Shu and TsayBazerman et al., 2011; Reference Caviola, Faulmüller, Everett, Savulescu and KahaneCaviola et al., 2014; Reference Garinther, Arrow and RazaviGarinther et al., 2021).
Moreover, joint evaluation incorporates several decision modes (Reference Fischer and HawkinsFischer & Hawkins, 1993; Reference Skedgel, Wailoo and AkehurstSkedgel et al., 2015). When observing the options together, participants can, e.g., express preferences by rating the attractiveness or stating their willingness to pay for each alternative (joint-evaluation–rating), by distributing limited resources between the alternatives (joint-evaluation–allocation), or by choosing one of the alternatives (forced choice). A key difference is that ratings and allocations allow people to express indifference by rating the alternatives as equally attractive or by distributing resources evenly, whereas the choice mode forces people to favor one alternative (Reference Sharps and SchroederSharps & Schroeder, 2019), even if this can be done randomly, e.g., by flipping a coin (Reference BroomeBroome, 1984; Reference Keren and TeigenKeren & Teigen, 2010). As predicted by the prominence effect (Reference SlovicSlovic, 1975; Reference Tversky, Sattath and SlovicTversky et al., 1988), people do not choose randomly when faced with two alternatives that they previously rated as equally attractive, but instead tend to choose the alternative that is superior on the relatively more prominent attribute. This applies both when choosing which product to buy (Reference Nowlis and SimonsonNowlis & Simonson, 1997), and when making moral choices about which people to help (Reference Erlandsson, Lindkvist, Lundqvist, Andersson, Dickert, Slovic and VästfjällErlandsson et al., 2020).
This article will test both forms of the IVE in all three decision modes. The IVE in separate evaluation is tested by randomly assigning participants to read and respond to either a project that can save 3 identified patients, 1 identified patient or 3 non-identified patients. The weak IVE is tested by comparing attractiveness-ratings and allocations done by participants in the 3 identified condition against participants in the 3 non-identified condition. The strong IVE is tested by comparing those in the 1 identified condition against those in the 3 non-identified condition.
The IVE in joint evaluation is tested by having participants read about two helping projects presented side by side, rate the attractiveness of both projects, and allocate resources between the two projects. The weak [strong] IVE is tested by comparing ratings and allocations to the 3[1] identified-project against ratings and allocations to the 3 non-identified-project when the projects are presented together.
In the forced choice mode, participants read about two helping projects presented side by side as in joint evaluation, but are simply asked which of the two projects they prefer to implement. The weak [strong] IVE is found if significantly more than 50% of the participants choose the project that can save 3[1] identified when it is pitted against the project that can save 3 non-identified.
1.4 Underlying theory: Evaluable and justifiable attributes
Reference Li and HseeLi and Hsee (2019) argue that attributes in decision situations can differ in both evaluability (how easy the attribute can be understood in itself) and in justifiability (whether people think the attribute should influence decisions). In a helping context, the number of people possible to help is a prime example of an attribute with high justifiability (as most people agree that it is preferable to help more than to help fewer people), but low evaluability (as it is difficult to assess if 3 patients are few or many without any comparison). As demonstrated in the above mentioned dictionary study, moving from separate to joint evaluation increases evaluability. For example, willingness to donate was no higher when one could save 200 rather than 100 polar bears in separate evaluation, but almost twice as high in joint evaluation (Reference Hsee, Zhang, Wang and ZhangHsee et al., 2013).
Other attributes in helping situations are different. For example, the identifiability attribute might have a moderately high evaluability (because identified beneficiaries tend to make us experience compassionate emotions even without comparison), but a relatively low justifiability (because most people do not believe that adding a name and a face should make a person more valuable). Applied to the IVE, the theory predicts that people will prefer to help fewer identified beneficiaries when the effect is assessed in separate evaluation (because identifiability is more evaluable), but that they will prefer to help more non-identified beneficiaries when assessed in joint evaluation (because efficiency is more justifiable). Consistent with this, a study by Kogut and Ritov (2005b, Study 2) found that participants reading about a project that could help one identified child donated more than participants reading about a project that could help a group of identified children (separate evaluation), but that the two projects received equal amounts when evaluated side by side (joint evaluation), and further that the project saving a group of children was preferred more often when participants were forced to choose one of the projects (i.e., a reversed effect). It should however be noted that this study manipulated singularity rather than identifiability.
The current article is the first to apply the theories suggested by Reference Li and HseeLi and Hsee (2019) and by Slovic (1975) to a moral domain and expand previous research in at least two ways: (A) By systematically testing both weak and strong helping effects in both separate and joint evaluation, it will be possible to determine the evaluability and justifiability profile of a specific help-situation attribute, as well as how that attribute interact with the efficiency attribute. To exemplify, it is possible that people will prefer the identified project when the two projects are equally efficient (a weak IVE), but prefer the non-identified project when that is more efficient (a reversed strong IVE). (B) By testing joint evaluation preferences both with and without the option to express indifference, it is possible to detect choice-dependent moral preferences in line with the prominence theory. For example, people might prefer the identified and non-identified projects equally often in ratings or in allocations, but still prefer the identified project more often when forced to choose.
Up until now, the IVE has been used to exemplify a helping effect. Importantly, this article includes no less than seven helping effects – all tested in a unified paradigm. This makes it possible to compare how various help-situation attributes (one in each helping effect) differ in their evaluability, justifiability, and prominence profiles. Below, I briefly describe the other included helping effects.
1.5 Six other helping effects
The article has been inspired by Reference Kogut and RitovKogut and Ritov (2005b) in regard to decision modes, and by Mata (2016) in regard to weak and strong helping effects. But whereas these studies investigated one helping effect each, I test seven effects.
1.5.1 The proportion dominance effect
The proportion dominance effect (PDE) predicts that people are more motivated to help when the rescue proportion is high (e.g., you can help 99% of the patients in need), than when it is low (e.g., you can help 1% of the patients in need; Reference BaronBaron, 1997; Reference BartelsBartels, 2006; Reference Bartels and BurnettBartels & Burnett, 2011; Reference Fetherstonhaugh, Slovic, Johnson and FriedrichFetherstonhaugh et al., 1997; Reference Friedrich, Barnes, Chapin, Dawson, Garst and KerrFriedrich et al., 1999; Reference Jenni and LoewensteinJenni & Loewenstein, 1997; Reference Kleber, Dickert, Peters and FlorackKleber et al., 2013). Related research suggest that anticipated warm glow and helping responses towards one child possible to save is reduced when informing participants about other children that are not possible to help (Reference Dickert and SlovicDickert & Slovic, 2009; Reference Västfjäll, Slovic and MayorgaVästfjäll et al., 2015), and that people are more motivated to help when they can save 100% of 100 victims in need than when they can save 100 victims without any denominator specified (Reference Li and ChapmanLi & Chapman, 2009; Zhang & Slovic, 2018). The PDE is mediated by perceived impact, meaning that supporting a project with a low rescue proportion seems like a “drop in a bucket”, whereas a high rescue proportion project seems more effective (Erlandsson et al., 2014, 2015).
In the two PDE-studies included in this article, the weak PDE is tested by comparing a project that can help 6 out of 6 patients in need against a project that can help 6 out of 100 patients in need. The strong PDE is tested by comparing a project that can help 4 out of 4 (in Study PDE1) or 4 out of 5 (in Study PDE2) against a project that can help 6 out of 100 (Reference MataMata, 2016).Footnote 3
1.5.2 The ingroup effect
The ingroup effect (IGE, also known as parochialism or ingroup-bias) is a well-established phenomenon that predicts that people will help more when the people in need are from the helpers’ ingroup than when they are from the helpers’ outgroup (Reference Baron, Goodman, Jinks and WoodsBaron, 2012; Reference Duclos and BaraschDuclos & Barasch, 2014; Reference James and ZagefkaJames & Zagefka, 2017; Reference Fiedler, Hellmann, Dorrough and GlöcknerFiedler et al., 2018; Reference Levine and ThompsonLevine & Thompson, 2004; Reference Schwartz-Shea and SimmonsSchwartz-Shea & Simmons, 1991). The IGE has been suggested to be driven by attitudes (e.g., ingroup-love and outgroup-hate; Reference BrewerBrewer, 1999; Reference De Dreu, Greer, Van Kleef, Shalvi and HandgraafDe Dreu et al., 2011), beliefs (e.g., anticipated consequences for oneself; Reference Everett, Faber and CrockettEverett et al., 2015), and by a greater perceived obligation and responsibility to help the ingroup (Reference Erlandsson, Björklund and BäckströmErlandsson et al., 2015; Reference TomaselloTomasello, 2020). Importantly, there are many types of ingroups, such as family, spatial proximity, shared values, or cultural identity, and different ways to classify ingroups can be perceived as separate helping effects (Reference Waytz, Iyer, Young, Haidt and GrahamWaytz et al., 2019).
Two IGE-studies are included in this article. Study IGE1 focuses on kin-based ingroup (Reference Burnstein, Crandall and KitayamaBurnstein et al., 1994), and tests the weak [strong] IGE by comparing a project that can help 3 relatives [1 relative] against a project that can help 3 unknown non-relatives. Study IGE2 focuses on nationality-based ingroup (Reference Baron, Ritov and GreeneBaron et al., 2013), and tests the weak [strong] IGE by comparing a project that can help 6[4] fellow citizens against a project that can help 6 foreigners.
1.5.3 The age effect
The age effect predicts that people will be more motivated to help when the people in need are young (children and teenagers) than when they are old (adults; Reference Li, Vietri, Galvani and ChapmanLi et al., 2010). There are several possible reasons for this effect (Reference Tsuchiya, Dolan and ShawTsuchiya et al., 2003). One is that the evolved instinct to protect one’s offspring can extrapolate to behavior towards children in general. Another is that children are perceived to be more dependent than adults, and unlike adults, young children are almost never held responsible for their own plight (Reference Back and LipsBack & Lips, 1998). A third, more utilitarian reason for helping children is that the anticipated number of quality-adjusted life years is higher for a child than for an adult (Reference Goodwin and LandyGoodwin & Landy, 2014). Study AGE tests the weak [strong] age effect by comparing a project that can help 6[4] children and teenagers against a project that can help 6 adults.
1.5.4 The gender effect
The gender effect predicts that people will be more motivated to help when the people in need are women than when they are men (Reference Dufwenberg and MurenDufwenberg & Muren, 2006; Reference Eagly and CrowleyEagly & Crowley, 1986; Reference Weber, Koehler and Schnauber-StockmannWeber et al., 2019). One explanation for this effect is that female participants are more motivated to help their gender-based ingroup, but males also seem to help women in need more than men in need. One reason for this is that helping by men can be used to signal affluence and agreeableness towards women (Reference Raihani and SmithRaihani & Smith, 2015; Reference Van Vugt and Iredalevan Vugt & Iredale, 2013). Another reason is that gender-stereotypes lead both men and women to perceive females as less aggressive, more delicate, and more disadvantaged than men, and therefore both more deserving and in more need of protection (Reference Bradley, Lawrence and FergusonBradley et al., 2019; Reference Curry, Lee and RodriguezCurry et al., 2004; Reference Paolacci and YalcinPaolacci & Yalcin, 2020). Study GENDER tests the weak [strong] gender effect by comparing a project that can help 6[4] female patients against a project that can help 6 male patients.
1.5.5 The existence effect
The existence effect (aka the immediacy bias or the present bias; Reference Cropper, Aydede and PortneyCropper et al., 1994; Reference Huber, Van Boven, McGraw and Johnson-GrahamHuber et al., 2011; Reference O’Donoghue and RabinO’Donoghue & Rabin, 2015) predicts that people are more motivated to help when it is possible to help individuals who are suffering now (existing victims) than when it is possible to help individuals who will suffer at some later point in time (future victims). The existence effect is much related to intertemporal choices and to the discounted utility model which suggests that utilities in the future are discounted by their delay (Reference Bischoff and HansenBischoff & Hansen, 2016; Reference Chapman and ElsteinChapman & Elstein, 1995; Reference SamuelsonSamuelson, 1937). In addition, the existence effect can be seen as the main psychological barrier for combatting climate-related threats as the primary beneficiaries of this type of helping are the future generations (Reference Wade-Benzoni and TostWade-Benzoni & Tost, 2009). Study EXISTENCE tests the weak [strong] existence effect by comparing a project that will start right away and help 6[4] existing patients, against a project that will start one year later and help 6 future patients.
1.5.6 The innocence effect
The innocence effect predicts that people are more motivated to help when it is possible to aid individuals who are the victims of unfortunate circumstances (external factors) than when it is possible to aid individuals who fully or partially caused their own plight, or who do not try to help themselves (internal factors; Reference FongFong, 2007; Reference Lee, Winterich and RossLee et al., 2014; Reference Seacat, Hirschman and MickelsonSeacat et al., 2007; Reference WeinerWeiner, 1993). People report feeling less compassion and have less neural activity in areas associated with emotions when hearing about “non-innocent” victims (Reference Fehse, Silveira, Elvers and BlautzikFehse et al., 2015), and one study found that people suffering because of a natural disaster were helped more than people suffering from a civil war, due to a belief that natural disaster-victims try to help themselves more, and are less responsible for their current situation (Reference Zagefka, Noor, Brown, de Moura and HopthrowZagefka et al., 2011). Study INNOCENCE tests the weak [strong] innocence effect by comparing a project that can help 6[4] “innocent” patients who are ill despite exercising and eating healthy against a project that can help 6 “non-innocent” patients who smoke, drink, and eat excessively.
1.6 The current studies
There are multiple ways to test helping effects, and different methods, measures and contexts can create very diverging results. Rigorous and well-powered research that test different helping effects in a unified experimental paradigm is therefore much sought after. This paper aims to do just this, as well as to test the size (and direction) of each helping effect in three decision modes and two forms. This research can help us understand the relative evaluability and justifiability (Reference Li and HseeLi & Hsee, 2019) as well as the relative prominence (Reference SlovicSlovic, 1975) of different helping effect attributes, explain past and future inconsistencies in the literature, and motivate researchers to take decision modes and the “weak” and “strong” forms into account when investigating helping effects and other types of moral decision making.
The seven included helping effects are among of the most frequently investigated in the prosocial decision making literature. The IVE, PDE and IGE were chosen in part because earlier research found that these effects are mediated by different psychological mechanisms (Reference Erlandsson, Björklund and BäckströmErlandsson et al., 2015; 2017), suggesting that they might elicit different response patterns over the experimental manipulations. Still, the main contribution of this paper is not dependent on which helping effects are included, but rather that the included helping effects are tested much more systematically than what has been done before.
Three effects are tested in two studies each whereas the other four are tested in one preregistered study each. As all ten studies are similarly well-powered and adopt the same experimental design, use identical dependent variables, and have most contextual features in common, it will be possible to compare response patterns across the seven helping effects. If all helping effects are driven by the same underlying psychological mechanism, they would arguably be similarly affected when going from the weak to the strong form, and when moving between different decision modes.
This article could be said to investigate at least 42 research questions which can be derived from the following sentence: Does the [weak/strong] form of [IVE / PDE / IGE / AGE / GENDER / EXISTENCE / INNOCENCE] appear in the [separate evaluation / joint evaluation / forced choice] decision mode?
For each of these questions, the answer can be expressed with a percentage, where 100% indicates a very large helping effect (e.g., identified patients much favored over non-identified), 50% indicates absence of an effect, and 0% indicates a very large reversed helping effect (e.g., non-identified patients much favored over identified).
2 Method
All ten studies shared a similar core design and methodology. Participants were instructed to read and evaluate medical help projects, and randomly assigned to one out of seven conditions. Three conditions were used for testing the helping effects in separate evaluation, two were used for testing them in joint evaluation, and two were used for testing them in forced choice. I targeted 190–220 participants in each of the separate evaluation conditions and 60–70 participants in each of the joint evaluation and forced choice conditions.Footnote 4 Please refer to all tables and to the online supplement for additional information about each study.Footnote 5
2.1 Participants
Nine thousand one-hundred and eighty-seven complete responses were collected over ten studies (see Table 1). Data for the different studies were collected at different times but all participants were recruited from either Amazon Mechanical Turk or Prolific and payed $0.3–0.5.Footnote 6
* Note 1:Studies with “*” were preregistered.
Note 2: See the supplement for the number of participants in each experimental condition in each study.
2.2 Material and procedure
2.2.1 Separate evaluation
Participants assigned to any of the separate evaluation conditions read and evaluated a single help project. Participants in Condition A(X) read about Project A which could treat a specified number of patients for a specified amount of money. Participants assigned to Condition A(X-2) read an identical description except that two fewer patients could be treated for the same amount of money. Seven of the ten conducted studies used the numbers “6” and “4” treated patients to operationalize “(X)” and “(X-2)” respectively. The other three studies (IGE2, IVE1 and IVE2) used the numbers “3” and “1”. Participants assigned to Condition B(X) read about Project B which could treat equally many patients as A(X), but differed on one help-situation attribute which was different in the different studies and illustrated the helping effect currently being tested (see Tables 4–10).
Project A was presumed to be more attractive than Project B on the varying attribute in all studies, meaning that Project A could save: a higher proportion of patients in need (in PDE-studies), ingroup patients (in IGE-studies), identified patients (in IVE-studies), patients suffering now (EXISTENCE-study), children and teenagers (AGE-study), innocent “gymmers” (INNOCENCE-study) or female patients (GENDER-study; see the tables in the result section and the supplement).
The help project was presented to participants in a tabular form in eight studies (see Table 2 for an example and the supplement for all stimuli material). In the two IVE-studies, participants learned about the help project in written text rather than from a table (see the supplement).
Participants first responded to three attention check questions, meaning that they repeated provided information about the project. Participants who could not do this were deemed inattentive and screened out (see Table 1).Footnote 7
Next, participants were asked to rate the attractiveness of the helping project based on the provided information by responding to three questions; “how good does Project A[B] seem to you”, “how worthy of financing does Project A[B] seem to you” and “how much do you approve of implementing Project A[B]”. Participants responded on a visual analog scale ranging from 0 (not at all) to 100 (extremely) without any additional labels. Participants could see the number for where the marker was currently placed. These three questions were aggregated into a single variable labeled “rating” (all α ’s > .80).Footnote 8
Thereafter, participants were asked to state how much of a hypothetical budget they wanted to earmark to the described project and to unspecified “other projects” respectively. In order to anchor participants’ responses, participants were told that “the default allocation for a help project is 20%” but that they could earmark more if they found the project specifically worthy of financing. The percentage they earmarked to the project at hand (0–100%, same type of scale as for ratings) was labeled “allocation”.Footnote 9
The weak helping effects (equal efficiency) were tested by comparing the rating- and allocation-scores of participants reading about Project A(X) against those reading about Project B(X), whereas the strong helping effects (unequal efficiency) were tested by comparing ratings and allocations of those reading A(X-2) against those reading B(X).
2.2.2 Joint evaluation
Participants assigned to the joint evaluation conditions, read about two help projects presented next to each other, and evaluated both projects. In eight of the studies, this was done by adding a column in the tables so that participants could easily compare the two projects on each attribute (see Table 3). In the two IVE-studies, an additional paragraph of text described the second help project (see the supplement).
Half of the participants in joint evaluation read about Project A(X) and Project B(X) presented side by side (testing the weak effect), whereas the other half read about Project A(X-2) and Project B(X) presented side by side (testing the strong effect).
The attention check questions used in separate evaluation were used in joint evaluation as well, with the only difference that participants had to respond to questions regarding both Project A and Project B. The three questions used to assess attractiveness ratings were used also in joint evaluation (all α ’s > .80). Participants first responded to the three questions regarding Project A and then to the same three questions regarding Project B.
The allocation task in joint evaluation was different from the one used in separate evaluation. Participants were asked to allocate resources only between Projects A and B and explicitly told to allocate 50–50 in case they found both projects equally worthy of financing.
2.2.3 Forced choice
In the forced choice-conditions, participants read the same information and responded to the same attention check questions as in the joint evaluation-conditions. Half of the participants read A(X) vs. B(X) for testing the weak effect, the other half read A(X-2) vs. B(X) for testing the strong effect. However, rather than evaluating the projects using ratings and allocations, participants had to choose which of the two projects to implement. Participants could not refrain from choosing, but those who found the projects equally attractive were suggested to use an embedded online number generator to guide their choice (see the supplement). The number of participants who used the number generator was not recorded.
3 Results
The results are organized so that the seven helping effects are presented one at the time, beginning with a short summary of the results. The weak form (when the two projects can treat equally many patients) and the strong form (when Project A — presumed to be more attractive on the varying attribute — can help fewer patients) are presented separately for each effect.
The weak and strong forms of all helping effects were tested in separate evaluation (independent-sample t-test), joint evaluation (paired t-test) and with forced choice (one proportion binomial test).Footnote 10 Tables 4–10 (one table per helping effect) show cell means for ratings and allocations, the number of participants choosing each project, and the corresponding statistical test (unadjusted p-values).
Beyond testing the size and direction of each helping effect, I also aimed to compare the effects in two ways: (1) separate vs. joint evaluation; (2) preferences expressed in joint evaluation vs. forced choices.
The first comparison is complicated by the fact that effect sizes from between-group comparisons are not easily comparable with effect sizes from within-subject comparisons, because the unavoidable additional variance when comparing different subjects. I therefore compared mean differences instead. Specifically, for both separate and joint evaluation comparisons, I calculated a Project A minus Project B mean difference score for ratings and allocations (both measured on 0–100 scales). A positive mean difference score illustrates a helping effect, a score around zero indicates absence of an effect, and a negative mean difference score indicates a reversed helping effect. I then compared mean difference scores obtained in separate and joint evaluation. A higher[lower] mean difference score in joint evaluation indicates that joint evaluation increases[reduces] the helping effect.
For the second comparison, I calculated the percentage of participants (in joint evaluation-conditions) who expressed a preference for Project A by rating it higher or by allocating more than 50% of the resources to it (see Tables 4–10). Participants who gave equal ratings or allocated 50–50 were split so that exactly half of them preferred each project (when an uneven number of participants gave equal ratings or allocations, one was excluded). These rating- and allocation-inferred preferences were then compared against the preferences expressed in forced choice with 2*2 chi-square tests.
The percentage scores in the rightmost column in Tables 4–10 denotes different things for different rows. The percentage for separate (SE) and joint evaluation (JE) ratings and allocations is the “common language effect size” for each comparison of means (Reference LakensLakens, 2013; Reference McGraw and WongMcGraw & Wong, 1992). For independent t-tests (separate evaluation), the percentage expresses the probability that that a randomly sampled person reading about Project A (the project presumed to be more attractive on the varying attribute) have a higher observed value than a randomly sampled individual reading about Project B. For paired t-tests (joint evaluation), the percentage indicates the likelihood that a randomly selected person rates Project A higher than Project B (Reference LakensLakens, 2013).Footnote 11 The percentage score for rows labeled “preferences” denotes the proportion of participants who preferred Project A over Project B when they were forced to choose, and when preferences were inferred from ratings and allocations. A high percentage (green cells) indicates presence of a helping effect, a low percentage (orange cells) indicates presence of a reversed helping effect, and a percentage around 50% (yellow cells) indicates absence of any effect.
3.1 Proportion dominance effect (Studies PDE1 and PDE2)
The weak PDE was found in all three decision modes and not consistently affected by decision modes. The strong PDE was clearly present in separate evaluation, weaker in joint evaluation, and weaker still in forced choice.
3.1.1 Weak PDE (6 out of 6 patients vs. 6 out of 100 patients)
Separate evaluation
Participants reading about a high rescue proportion project gave higher attractiveness ratings than participants reading about a low rescue proportion project helping equally many (M = 79.25 vs. 44.24 in PDE1 and 78.38 vs. 49.92 in PDE2). Those reading about the high proportion project also earmarked more resources (M = 51.25 vs. 28.30 in PDE1 and 49.24 vs. 33.70 in PDE2).
Joint evaluation
When presented side by side, participants rated the high proportion project as more attractive than the low proportion project helping equally many (M = 80.17 vs. 57.28 in PDE1 and 83.05 vs. 54.09 in PDE2). They also allocated more resources to the high proportion project (M = 68.17 vs. 31.83 in PDE1 and 66.30 vs. 33.70 in PDE2).
The mean difference (Project A [6 of 6] minus Project B [6 of 100]) for ratings was around 12 points higher in separate than in joint evaluation (SE = 35.01, JE = 22.89) in PDE1, and about the same in PDE2 (SE = 28.46, JE = 28.96). In contrast, the mean difference in allocations was lower in separate evaluation in both studies (SE = 22.95, JE = 36.34 in PDE1; SE = 15.54, JE = 32.6 in PDE2).
Forced choice
83.82% in PDE1, and 76.79% in PDE2 chose to implement the high proportion project when the two projects helped equally many patients. When aggregating both PDE-studies, it was found that preferences expressed with forced choice did not differ from preferences inferred from joint evaluation ratings (χ 2 = 1.47, p = .226) or allocations (χ 2 = 1.13, p = .288).Footnote 12
3.1.2 Strong PDE (4 out of 4[5] patients vs. 6 out of 100 patients)
Separate evaluation
Participants reading about a high rescue proportion project treating four patients, gave higher attractiveness ratings than participants reading about a low rescue proportion project treating six patients (M = 73.37 vs 44.24 in PDE1 and 73.83 vs. 49.92 in PDE2). Those reading about the high proportion project also earmarked more resources (M = 43.67 vs. 28.30 in PDE1 and 46.38 vs. 33.70 in PDE2).
Joint evaluation
When presented side by side, participants rated the high proportion project treating four as more attractive than the low proportion project treating six patients (M = 79.88 vs. 63.44 in PDE1 and 70.01 vs. 57.53 in PDE2). However, they allocated resources about evenly (M = 54.47 vs. 45.43 in PDE1 and 50.74 vs. 49.26 in PDE2).
The mean difference (Project A [4 of 4 or 4 of 5] minus Project B [6 of 100]) for ratings was around 12 points higher in separate than in joint evaluation in both PDE1 (SE = 29.13, JE = 16.44) and PDE2 (SE = 23.91, JE = 12.48). Likewise, the mean difference for allocations was higher in separate evaluation in both PDE1 (SE = 15.37, JE = 8.94) and PDE2 (SE = 12.68, JE = 1.48). This indicates that joint evaluation slightly reduces the strong PDE.
Forced choice
46.48% (in PDE1), and 37.93% (in PDE2) chose to implement the high proportion project when that project helped fewer patients. When aggregating both PDE-studies, it was found that participants were slightly less likely to express preferences in line with the strong PDE in forced choice than in attractiveness ratings (χ 2 = 6.94, p = .008), but not than in resource allocations (χ 2 = 2.62, p = .105).
3.2 Ingroup effect (Studies IGE1 and IGE2)
The weak IGE was found in all three decision modes when the salient ingroup was family [IGE1], but only in joint evaluation allocations and in forced choice when the salient ingroup was nationality [IGE2]. Joint evaluation (for allocations) increased the weak IGE, and forced choice increased it further. The strong IGE was not found in any decision mode when the ingroup was fellow citizens [IGE2] but it was found in separate evaluation and forced choice when the ingroup was kin [IGE1]. Joint evaluation did not consistently affect the strong IGE, but expressing preferences with forced choice increased it.
3.2.1 Weak IGE (3[6] ingroup patients vs. 3[6] outgroup patients)
Separate evaluation
Participants reading about a project treating relatives gave higher attractiveness ratings (M = 72.54 vs. 63.42) and also earmarked more resources (M = 52.68 vs. 41.22) than those reading about a project treating equally many non-relatives in IGE1. In contrast, participants reading about a project treating fellow citizens gave similar ratings (M = 69.79 vs. 71.51) and earmarked similar amounts (M = 38.24 vs. 40.14) as those reading about a project treating equally many foreigners in IGE2.
Joint evaluation
When evaluated side by side, the project treating relatives was rated as more attractive than the project treating equally many non-relatives (M = 79.59 vs. 71.88), and also allocated more resources in IGE1 (M = 62.66 vs. 37.34). The project treating fellow citizens was rated as non-significantly more attractive than the project treating equally many foreigners (M = 76.04 vs. 72.83), and also allocated more resources in IGE2 (M = 56.89 vs. 43.11).
The mean difference (Project A [X ingroup patients] minus Project B [X outgroup patients]) in ratings was similar for separate and joint evaluations in both IGE1 (SE = 9.12, JE = 7.71) and in IGE2 (SE = −1.72, JE = 3.21). However, the mean difference in allocations was higher in joint evaluation in both studies (SE = 11.46, JE = 25.32 in IGE1; SE = -1.90, JE = 13.78 in IGE2).
Forced choice
85.71% in IGE1 (kin) and 85.07% in IGE2 (nationality) chose to help ingroup rather than outgroup patients when the two projects helped equally many patients. When aggregating both IGE-studies, it was found that participants were more likely to express preferences in line with the weak IGE in forced choice, than in attractiveness ratings (χ 2 = 24.73, p < .001) or in resource allocations (χ 2 = 10.90, p < .001).
3.2.2 Strong IGE (1[4] ingroup patients vs. 3[6] outgroup patients)
Separate evaluation
Participants reading about a project treating one relative gave slightly higher attractiveness ratings than those reading about a project treating three unknown patients (M = 70.25 vs. 63.42), and they also earmarked more resources in IGE1 (M = 50.14 vs. 41.22). In contrast, participants reading about a project treating four fellow citizens gave similar ratings as those reading about a project treating six foreigners (M = 68.40 vs. 71.51), and they also earmarked similar amounts of resources in IGE2 (M = 42.27 vs. 40.14).
Joint evaluation
When evaluated side by side, the project treating more outgroup patients was rated as more attractive than the project treating fewer ingroup patients in both studies (M = 70.62 vs. 75.57 in IGE1 and 70.82 vs. 75.26 in IGE2). The two project-pairs were however allocated equal amounts of resources (M = 53.83 vs. 46.17 in IGE1 and 49.47 vs. 50.53 in IGE2).
The mean difference (Project A [fewer ingroup patients] minus Project B [more outgroup patients]) in ratings was around 12 points higher in separate evaluation when the ingroup was kin in IGE1 (SE = 6.83, JE = -4.95), but about the same when the ingroup was nationality in IGE2 (SE = −3.11, JE = −1.06). The mean difference in allocations was similar in both studies (SE = 8.92, JE = 7.66 in IGE1; SE = 2.13, JE = −1.06 in IGE2).
Forced choice
70.27% chose to treat one relative rather than three unknown patients (in IGE1), whereas only 46.88% chose to treat four fellow citizens rather than six foreigners (in IGE2). When aggregating both IGE-studies, it was found that participants were more likely to express preferences in line with the strong IGE in forced choice, than in attractiveness ratings (χ 2 = 19.53, p < .001), or in resource allocations (χ 2 = 5.81, p = .016).
3.3 Identified victim effect (Studies IVE1 and IVE2)Footnote 13
The weak IVE was found to some extent in all three decision modes. On the contrary, No strong IVE was found in any decision mode. Instead, participants expressed clear preferences for saving a greater number of non-identified victims in joint evaluation and forced choice (i.e., a reversed strong IVE).
3.3.1 Weak IVE (3 identified patients vs. 3 non-identified patients)
Separate evaluation
Participants reading about a project treating identified patients gave higher attractiveness ratings than those reading about a project treating equally many non-identified patients (M = 87.25 vs. 79.53 in IVE1 and 70.10 vs. 60.18 in IVE2). Still, the two groups earmarked similar amounts of resources (M = 59.12 vs. 54.65 in IVE1 and 32.71 vs. 33.11 in IVE2).
Joint evaluation
When presented side by side, the identified patients-project was rated as slightly more attractive than the non-identified patient-project helping equally many (89.27 vs. 84.49 in IVE1 and 77.48 vs. 72.49 in IVE2). The project helping identified patients was also allocated more resources in both studies (M = 55.86 vs. 44.14 in IVE1 and 52.84 vs. 47.16 in IVE2).
The mean difference (Project A [3 identified] minus Project B [3 non-identified]) in ratings was slightly higher in separate evaluation (SE = 7.72, JE = 4.78 in IVE1; SE = 9.92, JE = 4.99 in IVE2). On the contrary, the mean difference in allocations was slightly higher in joint evaluation (SE = 4.47, JE = 11.72 in IVE1; SE = −0.40, JE = 5.68 in IVE2).
Forced choice
72.58% in IVE1 and 64.84% in IVE2 chose to implement the project helping three identified patients rather than the project helping three non-identified. When aggregating both IVE-studies, it was found that preferences expressed with forced choice did not differ from preferences inferred from joint evaluation ratings (χ 2 = 0.56, p = .454) or allocations (χ 2 = 1.36, p = .244).
3.3.2 Strong IVE (1 identified patient vs. 3 non-identified patients)
Separate evaluation
Participants reading about a project treating one identified patient gave similar attractiveness ratings as those reading about a project treating three non-identified patients (M = 80.12 vs. 79.53 in IVE1 and 56.21 vs. 60.18 in IVE2). Earmarked resources were similar in the two groups when the patients were children in IVE1 (M = 56.58 vs. 54.65), but participants reading about a project treating three non-identified earmarked slightly more than those reading about one identified when the patients were adults in IVE2 (M = 27.93 vs. 33.11).Footnote 14
Joint evaluation
When presented side by side, the project helping three non-identified patients was rated as much more attractive than the project helping one identified (M = 63.35 vs. 86.77 in IVE1 and 49.15 vs. 76.82 in IVE2). It was also allocated a larger portion of the resources (M = 30.98 vs. 69.02 in IVE1 and 28.26 vs. 71.74 in IVE2).
The mean difference (Project A [1 identified] minus Project B [3 non-identified]) in ratings was around 24 points lower (more negative) in joint evaluation in both studies (SE = 0.59, JE = −23.42 in IVE1; SE = -3.97, JE = −27.67 in IVE2). Likewise, the mean difference in allocations was much lower in joint evaluation (SE = 1.93, JE = −38.04 in IVE1; SE = -5.18, JE = −43.48 in IVE2). This clearly indicates that joint evaluation reverses the strong IVE.
Forced choice
Only 13.24% (in IVE1) and 13.08% (in IVE2) chose to implement the project that could treat one identified over the project that could treat three non-identified patients. When aggregating both IVE-studies, it was found that preferences expressed with forced choice did not differ from preferences inferred from joint evaluation ratings or allocations (both χ 2 = 1.82, p = .178).
3.4 Existence effect
Both the weak and the strong form of the existence effect were found in joint evaluation and forced choice, but not in separate evaluation. Joint evaluation increased both the weak and, to a lesser extent, the strong existence effect. Forced choice slightly increased the strong existence effect.
3.4.1 Weak existence effect (6 patients now vs. 6 patients one year later)
Separate evaluation
Participants reading about a project helping six existing patients gave similar attractiveness ratings (M = 72.79 vs. 69.95), and earmarked similar amounts of resources (M = 45.39 vs. 44.02), compared to those reading about a project helping equally many patients one year later.
Joint evaluation
Yet, when evaluated side by side, the project helping six existing patients was rated as much more attractive than the project helping six patients one year later (M = 81.46 vs. 57.72), and also allocated more resources (M = 77.45 vs. 22.55).
The mean difference (Project A [6 existing] minus Project B [6 future]) in ratings was more than 20 points higher in joint evaluation (SE = 2.84, JE = 23.74). Likewise, the mean difference in allocations was more than 50 points higher in joint evaluation (SE = 1.37, JE = 54.90). This indicates that joint evaluation increases the weak existence effect.
Forced choice
98.53% chose to help 6 patients now rather than 6 patients in one year. Preferences expressed with forced choice did not differ from preferences inferred from joint evaluation ratings (χ 2 = 1.74, p = .188) or allocations (χ 2 < 0.01, p = .975).
3.4.2 Strong existence effect (4 patients now vs. 6 patients one year later)
Separate evaluation
Participants reading about a project helping four existing patients and participants reading about a project helping six future patients gave similar attractiveness ratings (M = 68.96 vs. 69.95), and earmarked similar amounts of resources (M = 43.97 vs. 44.02).
Joint evaluation
Still, when evaluated side by side, the project helping four existing patients was rated as more attractive than the project helping six patients one year later (M = 80.08 vs. 70.59), and also allocated more resources (M= 59.99 vs. 40.01).
The mean difference (Project A [4 existing] minus Project B [6 future]) was more than 10 points higher in joint evaluation in ratings (SE = −0.99, JE = 9.49), and more than 20 points higher in allocations (SE = −0.05, JE = 19.98). This means that joint evaluation increases the strong existence effect as well.
Forced choice
79.71% chose to help 4 patients now rather than 6 patients in one year. Participants were slightly more likely to express preferences in line with the strong existence effect in forced choice, than in attractiveness ratings (χ 2 = 4.82, p = .028) or resource allocations (χ 2 = 3.88, p = .049).
3.5 Age effect
The weak age effect was not found in separate evaluation, but it was found in joint evaluation and to an even greater extent in forced choice. The strong age effect was generally absent, but preferences expressed with forced choices were slightly more in favor of helping 4 children over 6 adults, than preferences inferred from attractiveness ratings.
3.5.1 Weak age effect (6 children vs. 6 adults)
Separate evaluation
Participants reading about a project helping children and participants reading about a project helping equally many adults gave similar attractiveness ratings (M = 68.97 vs. 67.45) and earmarked similar amounts of resources (M = 41.15 vs. 40.74).
Joint evaluation
Still, when evaluated side by side, the project helping children was rated as slightly more attractive than the project helping equally many adults (M = 75.48 vs. 72.13), and also allocated more resources (M = 60.15 vs. 39.85).
The mean difference (Project A [6 children] minus Project B [6 adults]) in ratings was about the same (SE = 1.52, JE = 3.35), but the mean difference in allocations was almost 20 points higher in joint evaluation (SE = 0.41, JE = 20.30).
Forced choice
88.73% chose the project helping six children over the project helping six adults. Participants were slightly more likely to express preferences in line with the weak age effect in forced choice, than in attractiveness ratings (χ 2 = 14.09, p < .001), but not than in resource allocations (χ 2 = 2.31, p = .129).
3.5.2 Strong age effect (4 children vs. 6 adults)
Separate evaluation
Participants reading about a project treating four children and participants reading about a project treating six adults gave similar attractiveness ratings (M = 65.38 vs. 67.45) and earmarked similar amounts of resources (M = 39.99 vs. 40.74).
Joint evaluation
When evaluated side by side, the project helping four children was rated as slightly less attractive than the project treating six adults (M = 73.68 vs. 76.24).Footnote 15 Still, the two projects were allocated about equal amounts of resources (M = 52.10 vs. 47.90).
The mean difference (Project A [4 children] minus Project B [6 adults]) in ratings was about the same (SE = −2.07, JE = −2.56). The mean difference in allocations was slightly larger in joint evaluation (SE = −0.75, JE = 4.20).
Forced choice
Nevertheless, 60.29% chose to help four children rather than six adults. Participants were slightly more likely to express preferences in line with the strong age effect in forced choice, than in attractiveness ratings (χ 2 = 6.73, p = .009), but not than in resource allocations (χ 2 = 0.89, p = .345).
3.6 Innocence effect
The weak innocence effect was not found in separate evaluation, but it was found in joint evaluation and to a greater extent in forced choice. The strong innocence effect was generally absent, but preferences expressed with forced choices were slightly more in favor of helping 4 innocent gymmers rather than 6 non-innocent smokers, than preferences inferred from attractiveness ratings.
3.6.1 Weak innocence effect (6 “gymmers” vs. 6 “smokers”)
Separate evaluation
Participants reading about a project treating innocent patients and participants reading about a project treating equally many non-innocent patients gave similar attractiveness ratings (M = 63.80 vs. 65.41), and earmarked similar amounts of resources (M = 39.53 vs. 43.59).
Joint evaluation
When evaluated side by side, the project helping six innocent patients was rated as slightly more attractive than the project helping six non-innocent patients (M = 70.29 vs. 62.96), and also allocated more resources (M = 58.49 vs. 41.51).
The mean difference (Project A [6 innocent] minus Project B [6 non-innocent]) was larger in joint evaluation both in ratings (SE = −1.61, JE = 7.33) and in allocations (SE = −4.06, JE = 16.98). This indicates that joint evaluation increases the weak innocence effect.
Forced choice
81.25% chose to implement the project treating six innocent gymmers rather than the project treating six non-innocent smokers. Participants were more likely to express preferences in line with the weak innocence effect in forced choice, than in attractiveness ratings (χ 2 = 9.23, p = .002) or in resource allocations (χ 2 = 4.50, p = .034).
3.6.2 Strong innocence effect (4 “gymmers” vs. 6 “smokers”)
Separate evaluation
Participants reading about a project treating four innocent patients and participants reading about a project treating six non-innocent patients gave similar attractiveness ratings (M = 64.75 vs. 65.41), and earmarked similar amounts of resources (M = 44.06 vs. 43.59).
Joint evaluation
When evaluated side by side, the project helping four innocent patients was rated as slightly less attractive than the project treating six non-innocent patients (M = 62.81 vs. 69.30). Still, the two projects were allocated about equal amounts of resources (M = 49.51 vs. 50.49).
The mean difference (Project A [4 innocent] minus Project B [6 non-innocent]) in ratings was slightly lower (more negative) in joint than in separate evaluation (SE = −0.66, JE = −6.49). The mean difference in allocations was about the same (SE = 0.43, JE = −0.98).
Forced choice
Despite this, 58.89% chose to implement the project helping four gymmers rather than the project helping six smokers. Participants were slightly more likely to express preferences in line with the strong innocence effect in forced choice, than in attractiveness ratings (χ 2 = 6.87, p = .009), but not than in resource allocations (χ 2 = 1.00, p = .317).
3.7 Gender effect
The weak gender effect was not found in separate or joint evaluation, but it clearly appeared in forced choice. The strong gender effect was not found in any decision mode. On the contrary, participants expressed clear preferences for treating more males rather than fewer females in joint evaluation and in forced choice (i.e., a reversed strong gender effect).
3.7.1 Weak gender effect (6 females vs. 6 males)
Separate evaluation
Participants reading about a project helping females and participants reading about an otherwise identical project helping equally many males gave similar attractiveness ratings (M = 70.16 vs. 70.72), and earmarked similar amounts of resources (M = 41.71 vs. 42.86).
Joint evaluation
When evaluated side by side, participants rated the project helping six females and the project helping six males as similarly attractive (M = 70.69 vs. 69.71), and allocated resources evenly between the two projects (M = 50.72 vs. 49.28).
The mean difference (Project A [6 females] minus Project B [6 males]) was similar and around zero in both separate and joint evaluation for ratings (SE = −0.56, JE = 0.98) and for allocations (SE = −1.15, JE = 1.44).
Forced choice
Still, 77.50% chose the project helping six females over the project helping six males.Footnote 16 Participants were more likely to express preferences in line with the weak gender effect in forced choice, than in attractiveness ratings (χ 2 = 11.09, p < .001), or in resource allocations (χ 2 = 9.45, p = .002).
3.7.2 Strong gender effect (4 females vs. 6 males)
Separate evaluation
Participants reading about a project helping four females and participants reading about a project helping six males gave similar attractiveness ratings (M = 66.81 vs. 70.72), and earmarked similar amounts of resources (M = 40.94 vs. 42.86).
Joint evaluation
When evaluated side by side, the project helping six males was rated as more attractive than the project helping four females (M = 65.59 vs 73.48), and also allocated more resources (M = 43.65 vs. 56.35).
The mean difference (Project A [4 females] minus Project B [6 males]) in attractiveness ratings was slightly lower (more negative) in joint evaluation (SE = −3.91, JE = −7.89). Likewise the mean difference in allocations was almost 11 points lower in joint evaluation (SE = −1.92, JE = −12.70). This indicates that joint evaluation reverses the strong gender effect.
Forced choice
30.67% chose to help four females rather than six males.Footnote 17 Preferences expressed with forced choice did not differ much from preferences inferred from joint evaluation ratings (χ 2 = 3.32, p = .068) or from allocations (χ 2 = 0.46, p = .498).
4 General discussion
The weak and strong forms of seven helping effects were systematically tested in three decision modes (separate evaluation, joint evaluation and forced choice) using a unified experimental paradigm and with over 9000 participants. I think there are at least three lessons to learn from this research.
The first lesson is that many helping effects are notably difficult to find in separate evaluation. When evaluated one at the time, projects helping children, innocent patients, existing patients, and fellow citizens were rated as no more attractive and allocated no more resources than identical projects helping equally many adults, “non-innocent” patients (smokers), future patients, and foreigners. This is noteworthy considering that most of these effects clearly emerged in joint evaluation.
The IGE-family and the IVE did emerge to some extent also in separate evaluation, but the PDE was the effect that clearly stood out. In both studies, people rated a high rescue proportion project as more attractive and allocated it more resources than people who read about a low rescue proportion project. This suggests that numerical attributes can influence moral preferences more than categorical attributes in separate evaluation, but only when the numbers are easily evaluable (e.g., expressed as proportions rather than absolute numbers; Reference BartelsBartels, 2006; Reference Hsee and ZhangHsee & Zhang, 2010). In the terms used by Reference Li and HseeLi and Hsee (2019), these results show that the seven included attributes differ in their relative evaluability. To exemplify, age and innocence of the beneficiary as well as existence are relatively difficult to evaluate in isolation, whereas rescue proportion, and to lesser extent identifiability and family-belonging, are relatively easy to evaluate.
The second lesson is that the weak vs. the strong forms of helping effects elicited similar response patterns in separate evaluation, but quite different patterns in joint evaluation, for some of the effects. This confirms the assumption that the efficiency attribute (number of people possible to treat) matters for people, but is difficult to evaluate in isolation (similar to the number of entries in a dictionary; Reference HseeHsee, 1996). A novel finding was that the relative importance of the efficiency attribute (compared to the contrasting attribute) differed much across helping effects. The weak IVE was, e.g., found in joint evaluation (people preferred 3 identified over 3 non-identified) but there were large reversed effects when testing its strong form (3 non-identified much preferred over 1 identified). This result suggests that the number-of-victims attribute is more justifiable than the identifiability attribute.
Likewise, the IGE, age and innocence effects appeared in their weak form in joint evaluation, but were absent or reversed in their strong form, whereas the PDE and especially the existence effect were found also in their strong forms. The gender effect was instead absent in its weak form (6 females equally preferred as 6 males) but reversed in its strong form (6 males preferred over 4 females). Together, these results suggest that people value both the number of individuals possible to save, but also other attributes. The identifiability, nationality and gender attributes have relatively low justifiability, and are thus easily outweighed by the number-of-patients attribute (which was held constant across studies), whereas it is somehow easier for people to justify helping fewer existing over more future patients. Further elucidating why some attributes fare better and other fare worse when pitted against an efficiency attribute should be a prioritized research area in the future.Footnote 18
The third lesson is that preferences inferred from attractiveness ratings and resource allocations in joint evaluation do not always correspond with preferences obtained from forced choices, despite that these decision modes share the joint evaluation feature (Reference Erlandsson, Lindkvist, Lundqvist, Andersson, Dickert, Slovic and VästfjällErlandsson et al., 2020; Reference SlovicSlovic, 1975). The clearest evidence of this is found in the weak gender effect where participants expressed no preference between saving 6 male and 6 female patients when expressed with ratings or resource-allocations, but a robust preference for helping females when they were forced to choose. One explanation for this is that participants are first and foremost motivated to express justifiable moral preferences (Reference Capraro and RandCapraro & Rand, 2018; Reference Choshen-Hillel, Shaw and CarusoChoshen-Hillel et al., 2015). The most easily justifiable preference is to claim that males and females are equally valuable, so most people do so in rating and allocation tasks. In the choice task it was impossible to express indifference, but rather than then choosing randomly (which a truly indifferent decision-maker would do), it is possible that people then go for the second most justifiable preference which is to value females higher than males.
Additional support for that the forced choice decision mode influences preferences was found in other helping effects. Compared to preferences inferred from attractiveness ratings, forced choice made people more in favor of helping ingroup rather than outgroup members (weak and strong IGE), helping children rather than adults (weak and strong age effect), helping fewer existing patients rather than more future patients (strong existence effect) and helping fewer innocent patients rather than more non-innocent patients (strong innocence effect). The opposite pattern was found for the strong PDE where forced choice made people slightly more in favor of a low rescue proportion project helping more patients (e.g., 6 of 100), rather than a high proportion project helping fewer patients (e.g., 4 of 5).
The prominence effect (Reference SlovicSlovic, 1975; Reference Tversky, Sattath and SlovicTversky et al., 1988) argues that the relatively more prominent (important) attribute influences choices more than it influence other types of joint evaluation preference expressions (e.g., attractiveness-ratings, allocations, or contingent valuation). In this light, the results reported here suggest that innocent patients, existing patients, children and especially one’s ingroup are more prominent attributes than the number of victims possible to save, whereas rescue proportion is less prominent.
4.1 Limitations
I am not suggesting that the unified paradigm used by me (hypothetical medical helping projects presented in tabular form) is superior to other unified paradigms that could test the same effects. Helping effects are unavoidably context-dependent so it is possible that some effects are more or less “easy to find” using different paradigms. I welcome not only direct replications, but also conceptual replications of these studies in order to determine how generalizable the results are.Footnote 19
Three related limitations worth mentioning are: (1) That the difference in the efficiency attribute was small when testing the strong helping effects (6 vs. 4 patients or 3 vs. 1 patient). (2) That this study suffers from the W.E.I.R.D-problem, as all participants were English-speaking and recruited from MTurk or Prolific, and thus not representative of the global population (Reference Henrich, Heine and NorenzayanHenrich et al., 2010). (3) That all helping effects were tested using non-behavioral outcome variables. Possible ways to further improve helping-effect research include investigating where the efficiency related tipping points are for different effects (e.g., how many future patients must be treated in order to surpass treating one existing patient; Reference Dolan and TsuchiyaDolan & Tsuchiya, 2011; Reference Erlandsson, Lindkvist, Lundqvist, Andersson, Dickert, Slovic and VästfjällErlandsson et al., 2020), investigating cultural, demographic and personality-based differences in both weak and strong helping effects (e.g., Reference Deshpande and SpearsDeshpande & Spears, 2016; Reference Fiedler, Hellmann, Dorrough and GlöcknerFiedler, et al., 2018; Reference Wang, Tang and WangWang et al., 2015), and investigating whether different effects are differently affected when moving from hypothetical helping decisions to real (cost-incurring) helping decisions (Reference Ferguson, Zhao, O’Carroll and SmillieFerguson et al., 2019).
It is worth noting that results from attractiveness ratings did not always correspond with results from resource allocations. Differences in joint evaluation preferences were always the largest between ratings and forced choices (with allocation located somewhere between), and this pattern of results suggests that resource allocations represent something that is located between attractiveness-ratings and forced choices in terms of decision modes. Expressed differently, allocations are more “choice-like” than ratings, but more “rating-like” than choices.
Lastly, a potential experimental confound is that the joint evaluation and forced choice modes differ not only in possibility to express indifference (possible vs. impossible) but also in type of elicitation task (rating/allocation vs. choice). It is worth pointing out that one can manipulate the possibility to express indifference both in rating tasks (e.g., by making it possible or impossible to give equal ratings) allocations (by having people allocate even or uneven amounts) and in choice-tasks (by giving or not giving participants the option to pass on the choice to someone else or to opt out from choosing all together).
4.2 Conclusion
The main insight from this paper is that helping effects can be tested in different ways, that different effects are differently affected when moving from one type of test to another, and that these different response patterns can be understood as help-situation attributes differing in their evaluability, justifiability, and prominence. Some helping effects are present in joint evaluation but absent in separate evaluation whereas other effects give rise to the opposite pattern, and yet other effects are found only when people cannot express indifference. My hope is that this article can inspire researchers to routinely investigate also other helping effect in their weak and strong presentational form and in multiple decision modes, as this will provide us with a more nuanced and multi-faceted perspective of the psychology of prosocial decision making.