1 Introduction
If asked How many cubic feet of air are in an empty box that is 3’ deep x 3’ wide x 3’ long?Footnote 1 most (70%, N = 99) respondents correctly compute 27.Footnote 2 However, when asked the variant, How many cubic feet of dirtare in an empty box that is 3’ deep x 3’ wide x 3’ long?, most (54%, N = 102) still answer 27, compared to just 26% who recognize that an empty box contains 0 cubic feet of dirt. The presence of a possible calculation leads respondents to perform it, even though it is unnecessary. Though it is not hard, it is hard enough to be rewarding. We call this process “mindless math”.
In the present research, we find that mindless math is, somewhat ironically, more prevalent in problems involving slightly harder numeric operations. We believe that salient, simple but non-trivial math tempts people to perform it, and correspondingly distracts their attention from properly representing the problem, as is required for solution.
1.1 Problem representation and problem solving
To solve a problem, one must first represent it: to internally construct a problem space, with the problem’s elements, its goal, and the permissible operations one can use to meet that goal (Greeno & Simon, Reference Greeno and Simon1988; Newell & Simon, Reference Newell and Simon1972; Reimann & Chi, Reference Reimann, Chi and Gilhooly1989, Johnson-Laird, Reference Johnson-Laird1983; McNamara, Reference McNamara and Sternberg1994). For example, in the empty box problem at the start of the paper “empty” is an important element of the problem to encode. If, however, a problem solver quickly represents the problem as a computation of spatial volume, they would focus on depth, width, and length as the key elements, and unnecessarily multiply them to reach an incorrect answer.
One of the barriers to solving the empty box problem is that one can easily produce an incorrect answer and never receive corrective feedback. It stands in contrast to many classic problems in creativity research (Wiley & Jarosz, Reference Wiley and Jarosz2012), such as the nine-dot (Chein, Weisberg, Streeter & Kwok, Reference Chein, Weisberg, Streeter and Kwok2010) and matchstick problems (Ash & Wiley, Reference Ash and Wiley2006). For these problems it is often clear that a solution has not been reached; the respondent feels “stumped”; and is made to realize that their current representation is inaccurate. In contrast, many problems in everyday life are opaque: they map onto familiar structures and operations that allow a decision maker to reach a fast, seemingly satisfactory answer. They offer no obvious feedback that you have in fact failed. The ease of adopting familiar structures and operations reduces the tendency for problem solvers to scrutinize the accuracy of their approach, and so people don’t spend sufficient time representing these problems to reach correct answers.
The problems we will use to demonstrate mindless math have similar properties to the bat and ball problem — the first item of the Cognitive Reflection Test (Frederick, Reference Kahneman2005) — A bat and a ball cost $110 in total. The bat costs $100 more than the ball. How much does the ball cost? In this problem, there is no corrective feedback, and the presence of the erroneous intuition ($10) prevents the correct conclusion from being drawn even when checks are attempted (Meyer & Frederick, Reference Meyer and Frederick2022). This is partly because even when checking this item, the problem solver can actively endorse $10 as a response (Meyer & Frederick, Reference Meyer and Frederick2022) by explicitly affirming that the ball costs $10 and the bat costs $100, for a total of $110. In our problems, the presence of a possible calculation serves the role of the $10 intuition: it can direct the checking of the answer to verify the calculation was completed successfully, thus leading the supervisory system to actively endorse that the correct answer has been reached. This is different from “stumpers” problems (Bar-Hillel et al., Reference Bar-Hillel, Noah and Frederick2018, Reference Bar-Hillel, Noah and Frederick2019), where there is an erroneous, intuitive interpretation of the problem stem — people form a mental model based on incomplete information that makes the problem appear unsolvable — but this model is recognized as inappropriate.
1.2 Why do people do mindless math?
There is a long history of research showing that familiarity can impede insight. For example, when someone typically uses an object in a specific way, they can become unable to conceive of alternative uses. This is called functional fixedness (Duncker, Reference Duncker1945; Adamson, Reference Adamson1952). A related phenomenon is the Einstellung effect, where people develop predispositions to solving problems in a particular manner even when better alternatives are available. As an example, Sheridan and Reingold (Reference Sheridan and Reingold2013) used eye tracking data to show that the presence of a familiar solution to a chessboard situation blocks the discovery of better solutions, even among expert chess players.
Given that most people have significant experience with simple math problems, calculation strategies are likely to be a familiar approach, and thus may impede considering other approaches. This will especially be the case if the content of a problem evokes a calculation strategy. Though our problems don’t require math, they take the form of problems that do. Problems that contain numbers evoke a script to perform mathematical operations (Bassok, Reference Bassok1996). This is generally an adaptive response. It aids people in solving everyday math problems, such as the cost of different combinations at a restaurant. But, in cases where the tempting and intuitively cued math is not the correct approach, this schematic response to numeric information will impede reasoners’ ability to correctly answer the problem (Cooper & Sweller, Reference Cooper and Sweller1987; Luchins & Luchins, Reference Luchins and Luchins1950).
Though it is reasonable to construe problems mentioning numbers as math problems, it is less clear how varying the difficulty of the tempting but unnecessary calculations will affect people’s propensity to perform the irrelevant math.
1.3 How does problem difficulty affect the rate of mindless math?
We have argued that the presence of numerical information can lead people to misrepresent a problem, leading them to return mathematical answers to fundamentally non-mathematical problems (mindless math). In the present research, we consider how these effects vary as the tempting but unnecessary math in a problem becomes harder.
Consider the following question: Joey went to the store and bought a pack of chips. A bottle of water costs $3.00, a pack of chips costs $1.00 and a pack of gum costs $2.00. How much did he spend in total? There are two primary answers that people reach when faced with this problem: the price of the pack of chips ($1), which is correct, and the sum of the three items ($6). Performing the sum to reach $6 is incorrect, but in a specific way. It is not random noise. People who reach this answer have misrepresented the problem as requiring a sum, and perform mindless math to reach their answer. In this example where performing the tempting math was relatively straightforward, 74% of participants responded correctly with $1, and 24% with the mindless math answer of $6 (N = 196).
In contrast, consider the same problem with harder numbers: Joey went to the store and bought a pack of chips. A bottle of water costs $1.05, a pack of chips costs $0.75 and a pack of gum costs $1.70. How much did he spend in total? The numbers here are harder to add because it involves carrying digits to reach a final sum. The greater difficulty of this addition may steer people away from spending the time needed to correctly represent the problem. This will lead people into performing math when, in fact, none is required; Joey bought only a bag of chips and the correct answer is $0.75. In response to this question, 61% of participants gave the correct answer of $0.75 and 35% of participants gave the mindless math answer of $3.50, the sum of the three items (N = 200). Importantly, the difficulty of the actual problem (i.e., respond with a single number) is unchanged; and when people are incorrect, they are not simply being lazy: Performing the mindless math of adding the three numbers to reach $3.50 involves cognitive effort.
The effect of math difficulty is interesting because people give more math-based answers when the math is harder to compute. Though $3.50 is harder to compute than $6.00, it is a more common response. This is in contrast to literature on fluency which suggests that i) answers will be judged as truer if they are reached fluently (e.g., Alter & Oppenheimer, Reference Alter and Oppenheimer2009), and ii) that disfluency may lead to deeper processing and greater accuracy. For example, Keysar et al. (Reference Keysar, Hayakawa and An2012) find that decision biases are reduced when people operate in a foreign language because they have to think more carefully.Footnote 3 Both of these findings suggest that the harder numbers — which are more disfluent — should activate analytical thinking and increase accuracy, rather than leading to more mindless math.
Is this effect limited to math? Let’s consider a verbal formulation of the ‘chips problem’: Joey went to the store and bought a pack of chips. A pack of chips has a C on the front, a box of altoids has an A on the front, and a twix has a T on the front. At home, he looks down. What does he see? Participants fluently form the word “cat”, and this is reflected in their responses. 41% answered correctly with “c”, but 34% of participants answered incorrectly with “cat” (N = 203).Footnote 4 When we make the tempting assembly of letters less fluent (by yielding a non-word) — Joey went to the store and bought a pack of chips. A pack of chips has a C on the front, a twix has a T on the front, and a box of altoids has an A on the front. At home, he looks down. What does he see? — people provide more correct answers (49%) and fewer mindless, “cta” answers (3%, N = 204). In this setting people perform better when the letter combination (cta) is less fluent than when it is more fluent (cat). This is consistent with past fluency work (e.g., Alter & Oppenheimer, Reference Alter and Oppenheimer2009). In contrast, when answering our problems that contain math, harder calculations lead people to undertake the unnecessary math more. This shows that something specific about the act of doing the math calculations affects people’s accuracy, rather than just the answer being disfluent in general.
Across three main pre-registered studies (total N = 3,193) and eight supplementary studies (N = 4,485), we investigated the relationship between numeric demands and individuals’ ability to represent and solve problems correctly. Study 1 showed that participants facing higher numeric demands were more likely to respond with mindless math answers and less likely to answer the problems correctly. Study 2 replicated the results of Study 1 using a modified paradigm to test the difficulty effect’s robustness, and collected self-reports of participants’ decision processes to help elucidate the mechanism. Study 3 tested the effect’s robustness across a wider range of numeric demands, and compared the effect of numeric demands on accuracy between the problems discussed so far where the calculation is conflicting with the right approach, and problems where math is the right approach.
2 Study 1
In Study 1, we manipulated the numeric demands of problems that contained tempting math that was unnecessary to reach the correct answer, and measured the rate of correct responses and the rate of incorrect responses that resulted from completing the tempting math (mindless math answers). In doing so, we hoped to investigate how changing the difficulty of tempting, but unnecessary math affects people’s reasoning processes.
2.1 Methods
Participants.
We initially recruited 1,920 participants from the survey platform Prolific Academic. 126 participants were excluded prior to random assignment for failing a comprehension check, leaving us with a final sample of 1,794 participants (Mage = 34.9y, 48.5% women, 2.1% non-binary), who completed the study for $0.18. Our sample size and analyses were pre-registered (https://osf.io/wuanj/.)
Procedure.
Participants answered one of three possible questions (chips, pens, paper) with one of two levels of numeric demands (easier, harder) in a 3 x 2 full-factorial between-subjects design. The problems are listed in full in Table 1. Each problem contained tempting but unnecessary math that could lead to forming erroneous numeric problem representations. The chips problem tempted participants to do addition, the pens tempted subtraction, and the paper tempted division. After answering their problem, participants provided demographic information (age, gender, education, race).
a The exact wording of the chips problem varies slightly across studies. The minor changes did not affect our conclusions in any instance.
b The third item differs from the first two in that the correct answer requires some math (10 / 5 = 2; 288 / 12 = 24), but does not require the additional step that people perform, erroneously responding with the number of crates needed.
For each question, numeric responses were coded for two dependent variables: whether or not the answer was an instance of mindless math (MM) in which participants gave the incorrect mathematical answer and whether or not the answer was correct. Although these two dependent variables are related, they are not identical because it is possible to give an incorrect answer that does not match the MM response (e.g., saying that the volume of a 3’ x 3’ x 3’ hole is 20 cubic feet). The joint use of both dependent variables allows us to distinguish between noise and MM responses. We pre-registered the MM responses for each problem.
2.2 Results
As shown in Table 1, making the numeric elements more complex increased the rate of MM responses. These effects were significant for each problem and overall (all ps < .001 using Fisher’s exact test). In these problems, harder math difficulty led to a a higher rate of mindless math responses. Further, the incidence of mindless math answers was high overall (21%), and the MM responses made up the majority of errors (91%). There was a corresponding decrease to the rate of correct answers, both for each item and overall (all ps < .001).
2.3 Discussion
In Study 1, we presented three items where increasing the numeric demands of the numerical content led to participants responding with more mindless math answers, and so less accurately. We have replicated this result with these items many times. We also ran two studies — Study S3 and Study S4 — that tested whether other items exhibited this property. These studies tested a total of 9 items, of which 7 displayed the pattern whereby mindless math was more common when the numeric demands were higher, and thus accuracy was lower. Three of these items displayed statistically significant effects of the impact of numeric demands on both correct and mindless math responding. In short, we found the mindless math effect across some, but not all, of our different problems. In these supplementary studies, we also found that participants who scored higher on the Cognitive Reflection Test (Frederick, Reference Kahneman2005) and Berlin Numeracy Test (Cokely et al., Reference Cokely, Galesic, Schulz, Ghazal and Garcia-Retamero2012) were more accurate and provided fewer MM responses, but neither of these scales interacted with our manipulation of the numeric demands of the problems.
Subjects performed unnecessary mathematical operations more often when those operations were more difficult. In the next study, we tried to understand whether participants who provided correct answers were able to avoid performing the mindless math completely or whether they computed but rejected it. Second we investigated whether greater scrutiny of question wording arose because problems in the easy condition were “too easy”, leading respondents to suspect a trick.
3 Study 2
Study 1 suggests that problems whose solutions require no math are nevertheless solved at lower rates when the tempting, but unnecessary math is more difficult. This is because people do this math and return more MM answers. Yet it is less clear how people arrive at correct answers. It could be the case that when the math is easier, people do the math, and then have time left over to scrutinize their answers. This extra time could lead people to realize that the sum is incorrect, and revisit their representation to reach a correct answer. Alternatively, when the math is easier, people may feel less pressure to start quickly, and spend more time representing the problem. We compare the evidence for each of these two explanations in Study 2.
We also wanted to test whether people perceived a trick from the experimenter at a higher rate when the math was easier. It could be the case that when the math is easy, people do not believe that someone would ask them to do it. This could lead people to perceive that the experimenter is trying to trick them, which could lead them to reread the question and reach the correct problem representation. To test this, we asked respondents to report whether they perceived a trick and whether they re-read the question.
Finally, to address the possibility that the observed mindless math effect was an artifact of the specific numbers we used — either because these numbers specifically inflated the temptation of the math, or because respondents inferred from their presence a specific intention from the experimenter — we designed a study where participants answered a single study item in which they selected all of the numbers themselves.
3.1 Methods
Participants.
Participants (N = 602) were recruited from Amazon’s Mechanical Turk platform (registration: https://osf.io/y35w2). A further one-hundred fifty-seven participants were excluded prior to random assignment due to failing our comprehension check. We collected no data from these participants. The final sample of 602 respondents (Mean age = 38.2) contained 254 females and 348 males. Participants were paid $0.31 for participation.
Procedure.
All participants were given a version of the chips problem used in Study 1. As in Study 1, we manipulated the levels of the numeric demands. In this case, the manipulations were created by having subjects choose their own numbers from lists we manipulated: we attempted to induce participants to view the numbers as their choice, rather than the experimenter’s choice.Footnote 5
After providing consent, participants were asked to select three numbers from three identical dropdown menus. Half the subjects were randomly assigned to a dropdown list containing values ending in round numbers (0.5 to 5 in increments of 0.5). The other half saw a harder list containing values ending in less round numbers (e.g., 1.05, 3.80, etc.); see Table 2. After subjects chose three numbers, they read the chips problem with these values instantiated as the prices of the three items in a chips problem: Imagine Joey is going to the store to buy a pack of chips. A bottle of water costs [N1], a pack of chips costs [N2] and a pack of gum costs [N3]. How much does he spend in total? (in dollars)
After answering, participants were presented with a menu of possible strategies they might have used in answering the question (see Table 3). Participants were asked to report which process best described how they reached their answer and how many times they read the first sentence of the chips problem. If they answered that they read the first sentence more than once, they were then asked to select all options that applied to why they re-read the first sentence. Finally, participants were asked how difficult the problem was relative to their expectations, coded from -3 (a lot easier than expected) to 3 (a lot harder than expected).
a It is possible that people interpreted this question as only asking about operations they performed to directly reach the final answer, and not earlier steps that were conducted prior to the processes that produced that final answer.
3.2 Results
We again found that participants in the harder numeric demands condition gave the MM answer at a higher rate (68% versus 53%, Fisher’s exact p < .001) and the correct answer at a lower rate (17% versus 37%, p < .001). This shows that the effect is robust when it is less likely that participants infer a specific intention from the experimenter about the presented numbers. Respondents are tempted to do the unnecessary math, even when it is of their own construction.
In comparing participants’ self-reports of their processes, we found that participants in the easier condition were significantly less likely to pursue the wrong strategy of doing the math (Proportioneasier = 0.53 vs. Proportionharder = 0.69, p < .001), and more likely to pursue the normative strategy of doing no math (Proportioneasier = 0.34 vs. Proportionharder = 0.18, p < .001). There were no significant differences in the rate at which participants reported that they completed the sum but did not incorporate it into their answer (Proportioneasier = 0.13 vs. Proportionharder = 0.13, p = 1.00). This evidence suggests that participants in the easier numeric demands condition were more likely to represent the problem correctly the first time around, and thus avoid the need to do any math.
Accurate participants overwhelmingly reported that they did not complete the math (89% in the easier numeric demands condition and 88% in the harder numeric demands condition). A smaller percentage of accurate participants indicated that they completed math but did not use it (10% in each condition).Footnote 6 On the other hand, most MM participants (93% in the easier condition and 92% in the harder condition) reported that they completed math and used it (as we would expect). Overall, this evidence suggests the different numeric demand levels affected how participants represented and processed the problem prior to completing math, rather than through allowing them additional time to go back and check their answers.
One factor that could encourage people to scrutinize their representations and avoid doing math would be suspicion that an especially easy sum is not the correct operation. Though slightly more participants indicated they re-read the stem in the easier condition relative to the harder condition (48.5% vs. 45.8%), there was no significant difference in participants’ reports of the number of times they read the first sentence of the problem across numeric demands conditions (M easier = 1.58 vs. M harder = 1.62, t(585) = –0.60 p = .500), suggesting this is not the explanation.
To further probe this, we used the numeric demands and the number of times the participant read the item as predictors of whether a participant reached a correct answer. Both numeric demands (b = –1.18, p < .001) and times read (b = 0.591, p < .001) achieved statistical significance: the hard-easy effect remained even after controlling for the number of times a respondent read the problem. In fact, when including harder numeric demands, times read, and their two-way interaction in the model, there was a significant interaction (b = –0.655, p = .007), suggesting that harder numeric demands reduced the benefit of re-reading the problem.
Among the subset of participants who did re-read the problem (n = 284), there were no differences in the reasons they (non-exclusively) reported for why they chose to re-read the problem across numeric demands condition. Specifically, subjects reported the following justifications for why they re-read the question stem at the same rates across conditions; I could not remember the question (Proportioneasier = 0.12 vs. Proportionharder = 0.14, p = .726), The problem looked easier than I expected (Proportioneasier = 0.46 vs. Proportionharder = 0.51, p = .406), and I thought the experimenter might be tricking me (Proportioneasier = 0.73 vs. Proportionharder = 0.65, p = .160).Footnote 7 It does not appear that the hard-easy effect is explained by suspicion of a trick in the easier numbers condition.Footnote 8
Participants did report that the problems in the easier numeric demands condition were significantly easier (relative to their expectations) than participants in the harder numeric demands condition (Measier = –1.27 vs. Mharder = –0.82, p = .002). This result suggests that participants’ perceptions of the difficulty of the problem may have contributed to whether they did the math or not, but not due to suspicion of the experimenter leading them to re-read the question.
3.3 Discussion
In Study 2 we again found a higher rate of mindless math when numeric demands were harder, replicating Study 1’s results in a new design that allowed participants to select their own numbers. We also ruled out several possible explanations for the hard-easy effect: Participants in the easier numeric demands condition did not indicate greater suspicion of experimenter tricks that led to them re-reading the question, though it did appear that the returns to re-reading the stem may be higher when the numeric demands were easier. In other words, though participants suspecting a trick from the experimenter does not appear to be a compelling explanation of the observed effect, harder numeric demands appear to impede people’s ability to correctly solve the problem even when re-reading the stem.
Although self-reported decision processes are subject to potential response biases (Austin & Delaney, Reference Austin and Delaney1998; Ericsson & Simon, Reference Ericsson and Simon1998), Study 2 suggests that participants who are able to reach correct answers do not complete any math, but do re-read and scrutinize the problem more thoroughly. People got the answer right or wrong for the same reasons in both the easier and harder conditions, and further work is required to understand why the reasoning processes that led to reaching correct answers were differentially utilized across the numeric demands conditions.
4 Study 3
In Study 1 we found evidence that, in our problems, increasing the numeric demands of tempting but unnecessary math caused participants to give more mindless math answers and fewer correct ones. In Study 2, we replicated this result with a new design, and found that the difference in participants’ performance (in terms of the rate of correct and MM responses) did not seem to be explained by participants re-reading the question or suspecting experimenter tricks at a different rate.
In Study 3, we sought to test the relationship between numeric demands and accuracy over a wider range of values and with a behavioral measure of math difficulty. Participants either responded to a conflict version of the chips problem (i.e., the original) or a non-conflict version (where the correct answer did involve a sum). This design allowed us to analyze one set of participants’ accuracy in the conflict version, and another set of participants’ response times in the non-conflict version (i.e., how long did the tempting math take to complete). The participants’ response times in the non-conflict version served as a measure of the difficulty of the math: accuracy was generally very high in these conditions and so this variable captured how long on average it took to do the math correctly. As a result, a negative correlation between the response times taken to solve the non-conflict versions and the rate of correct responding in the conflict versions would show that as the math got more difficult, people were less able to reach correct answers in the conflict versions where no math was required. We predicted that this would be the case, consistent with the difficulty effect observed between conditions in Studies 1 and 2.
4.1 Methods
Participants.
Participants (N = 1005) were recruited from Amazon’s Mechanical Turk platform (registration: https://osf.io/mysut). 208 participants were prevented from completing the study for failing our comprehension check (prior to random assignment). This left us with 797 complete observations.Footnote 9 The sample (Mean age = 36.3) was 44.5% male. Participants were paid $0.20 for participation.
Procedure.
After providing consent, participants answered a version of the chips problem used in Studies 1 and 2. We manipulated two independent variables in an 8 x 2 factorial between-subjects design. The independent variables were the numeric demands of the math in the study item and whether the item was the conflict version of the problem in which the math is tempting but unnecessary (as in Studies 1 and 2) or a non-conflict version in which the tempting math is the correct approach. We use the term conflict to convey whether an intuitive response is cued that is different from the normative response (De Neys, Reference Enke, Gneezy, Hall, Martin, Nelidov, Offerman and van de Ven2012; Evans, Reference Evans2010).Footnote 10
Table 4a shows the eight levels of math difficulty: The first three levels used only whole dollars (e.g., $2.00), the next two levels used dollar amounts that varied in the first decimal place (e.g., $1.30), and the next three levels used dollar amounts that varied in first and second decimal places (e.g., $3.25). Table 4b shows two versions of a problem with the same level of numeric demands. On the left is the conflict version (as used in Studies 1 and 2) in which people may erroneously respond as if Joey is buying all three items. On the right is the non-conflict version in which the representation of the problem as a sum is the correct one.
As we continued to increase the difficulty of the numeric demands conditions, people’s attempts to do math included more errors. Thus, here we focused on the rate of correct responses (which imply an absence of performing mindless math) rather than mindless math responses (which demonstrate its presence). Further, there was no MM response to the non-conflict version, as the correct response was computing the sum. As well as accuracy, we also measured participants’ response times.
4.2 Results
In Table 4a, we see that non-conflict items that were designed to be harder did in fact take more time on average to answer (as shown by the geometric mean response time increasing across the range of numeric demands). The non-conflict versions were also answered less correctly across the range of numeric demands (98% to 88%).
On conflict items, accuracy decreased to a greater degree as the tempting math got harder – from 66% for the easiest to 41% for the hardest. Using logistic regression to predict correct answers with the level of numeric demands (1–8)Footnote 11, whether the problem was the conflict version, and their two way interaction revealed that the slope was significantly steeper for the conflict version (b = –0.026, p = .030). In other words, harder numeric demands had a larger negative effect on accuracy for the conflict items than the non-conflict items (25 percentage points vs. 10).
A negative correlation between geometric mean response time in the non-conflict condition and accuracy in the conflict condition (r(6) = –0.86; 95% CI [–1.00, –0.51]) supports the central claim that participants are more likely to engage in mindless math (and less likely to be accurate) when the math is harder.
In a pre-registered one-tailed test of the Pearson’s product-moment correlation coefficient, this was statistically significant (p = .003). This relationship was robust if we included only correct answers when we calculated the average log time taken in the non-conflict condition (r = –0.86, p = .003), and when using our a priori ranking of the difficulty of numeric demands (1–8), rather than a response time based measure (r = –0.87, p = .002).
4.3 Discussion
Study 3 complemented earlier findings by showing that a behavioral measure of math difficulty (average response time to reach a correct answer in a non-conflict version) was negatively associated with accuracy in conflict versions of problems over a range of numeric demands. For the non-conflict items, the effect was smaller. In the discussion, we offer a possible interpretation of why harder numeric demands might reduce accuracy and increase the rate of mindless math.
5 General Discussion
Across three pre-registered studies (total N = 3,193), we studied the relationship between the difficulty of tempting math, problem representation, and problem solving. We found that higher numeric demands led to people performing mindless math at a higher rate, giving answers based on unnecessary and inappropriate calculations. In self-reports, participants who reached correct answers generally said that they did not perform any math. This result suggests that harder numeric demands lead to reduced accuracy because they induce people to do math, rather than because they leave less time to scrutinize the math they’ve done.
There are several possible reasons why harder numeric demands might induce someone to do math, or otherwise be less skeptical that math is the right thing to do. To identify some of these, in a supplementary study we collected item-level ratings of some properties of the problems from Study 3. Essentially we looked for variables that covaried with math difficulty that could mediate the relationship between difficulty and likelihood of performing mindless math. We found two variables that were strongly associated with the difficulty of the math: the anticipated cognitive effort associated with the math (r = 0.93), and the belief that doing the math would be more impressive (r = 0.86). Each of these factors was associated with a reduced likelihood that participants solved the problem correctly and a greater likelihood of performing mindless math. These are two possible pathways that might explain the relationship between math difficulty and people giving mindless math responses — the former by leading to reallocation of attention away from problem representation, and the latter by increasing the social returns to performing the math. For further details of these analyses and our consideration of alternative mechanisms, refer to the supplement.
At a high level, research in judgment and decision making argues that different processes guide intuition (System 1) and deliberation (System 2), where deliberation is often needed both to detect and correct errors that arise from intuition (Kahneman, Reference Kahneman2011; Sloman, Reference Sloman1996). The current research represents an interesting example where one System 2 process (performing math operations) distracts attention from a second System 2 process (monitoring for errors in problem representation). The errors come from System 2 being actively engaged in the wrong deliberative task, rather than not being engaged at all. People are not being lazy, but misguided. Kahneman (Reference Kahneman2000) notes that for System 2 to be able to correct a System 1 reasoning error, the cues that evoke applying a rule must be present, and the rule must be known. The act of doing math appears to suppress rather than elucidate the fact it is the wrong approach, which makes overcoming this error especially difficult. In this sense, working in an incorrect problem representation is like attempting to solve a problem without the necessary inferential rule — without this contextualizing information, trying harder may not be very helpful (Lawson et al., Reference Lawson, Larrick and Soll2020).
Given that the benefits of engaging in deliberation are frequently contingent — either on possessing a rule (as in Lawson et al. (Reference Lawson, Larrick and Soll2020)) or forming a good problem representation — attempts to debias reasoning errors may benefit from guiding people to a process as well as a decision speed (Larrick & Lawson, Reference Larrick and Lawson2021). For example, consider Heath & Heath’s (2013) WRAP framework that compels decision makers to: Widen your options, Reality-test your assumptions, Attain distance before deciding, and Prepare to be wrong. Such a framework provides people with guidance for what to do with additional time spent deliberating, which may otherwise be wasted. Similarly, Ralph Keeney’s (Reference Keeney1996) “value-focused thinking” aims to provide guidance for how to create better alternatives for decision problems, which helps to guide how to productively use time spent deliberating. In the case of people performing mindless math, a debiasing intervention might go past telling people to deliberate — which may only lead to people more diligently performing unnecessary math — and instead ask decision makers to list the elements of the problem and draw a diagram that establishes how they relate to each other, before engaging in any computation.
People’s propensity to jump into tasks has widespread implications in organizational contexts. In strategic problem-solving, researchers have highlighted the presence of a “plunging-in bias”, where people start solving problems before understanding them (Bhardwaj, Crocker, Sims & Wang, Reference Bhardwaj, Crocker, Sims and Wang2018). The present research shows empirically how this tendency can be harmful, as failing to represent a problem correctly can lead to mindless math. In our problems, people would benefit from spending more time representing the problem up front, rather than jumping into the math. The general importance of this is summarized in a saying from the U.S. Navy Seals: “slow is smooth, smooth is fast”.
A key question that follows from highlighting the pitfalls of getting underway too quickly is, how can we incentivize people taking their time represent problems? One reason why people might jump into doing math is that the output of such operations is tangible and demonstrates effort, despite being wrong. The act of representing a problem is less tangible and therefore harder to hold up as demonstrable proof of work. Yet good processes — which sometimes can seem intangible — are essential to organizational success. In uncertain environments, organizations should want their members to fail quickly and often, and constantly learn from their mistakes. This requires setting out in directions that provide opportunities to learn, and consistently re-evaluating one’s approach. The only way to achieve this is through a good process — rather than over-emphasizing specific outcomes. To avoid the errors of diving into problems too quickly and misrepresenting them, greater emphasis must be placed on these processes.
A notable implication of hard-easy effect observed with mindless math is that this propensity to jump into doing something will be particularly pronounced in situations when the task is difficult, which presumably warrant even more time spent representing the problem, rather than less. Organizations could seek to safeguard against this tendency by putting embargos on project tasks (times they cannot be completed before) rather than deadlines. Setting an external time to start implementing could ensure that representing a problem is given sufficient attention, thus minimizing the likelihood of acting mindlessly.
5.1 Limitations
One limitation of the present work is the specificity of the problems — in the discussion of Study 1 we referenced some additional cases of the hard-easy mindless math effect, but this effect will not be present for most problems. As a result, our conclusions are limited to cases where a task invites people to dive into doing familiar operations, but these operations are not helpful for solving the task.
Second, though Study 3 showed that over some ranges increasing the numeric demands of tempting calculations increases the rate of mindless math in our problems, at some point this effect inverts, as debilitatingly difficult computations may also serve as a cue to reinspect one’s representation of a problem. Thus, increasing the numeric demands of problems may increase or decrease respondents’ use of mathematical strategies. In Table 5 we see this – moving from the easiest problem in the first row to the harder one in the second, accuracy decreases, but for the most difficult computation (the last row), accuracy is higher relative to the moderately difficult version (row two).
Another limitation of the present research is that it contains data only from online samples facing low stakes. For example, though the MTurk population is highly attentive and engagedFootnote 12 (Hauser & Schwarz, Reference Hauser and Schwarz2016), it is not representative of the broader US population (Huff & Tingley, Reference Huff and Tingley2015), and hence our results may look different in a different sample. Yet it is unclear whether higher stakes would improve accuracy or not. Larger financial incentives, for example, might lead to broader allocation of attention to tasks such as representation, but could simply induce greater application of effort to the wrong strategy, which would not improve accuracy (Lawson et al., Reference Lawson, Larrick and Soll2020; Enke et al., Reference Enke, Gneezy, Hall, Martin, Nelidov, Offerman and van de Ven2021). It would also be beneficial to expand the field of problems studied to those that involve more complex sequences of computations. We speculate that with greater complexity, the temptation to engage in mindless math may be stronger, and continue to impede identifying the correct path.
5.2 Conclusion
When faced with difficult tasks, people are often eager to start doing something. But this eagerness may preclude them from correctly representing the problems that they are so eager to start solving. When we ask people to “take their time”, we have to be more specific about when they should take their time. The execution of operations gives the illusion of progress, but if the problem is represented incorrectly, it remains just that, an illusion.