1 Introduction
Performance at work is influenced by many factors, such as individual characteristics, leadership, work pressure, incentive schemes, and corporate structure (Reference Hermalin and WeisbachHermalin & Weisbach, 1991; Reference Perry and PorterPerry & Porter, 1982; Reference Wageman and BakerWageman & Baker, 1997). The physical climate of the workplace is often overlooked as an important factor influencing performance. And when it is mentioned, the dominant strain of research focuses on comfort, through self-report on physical aspects of the environment and their effect on human performance. This is remarkable, as office buildings have been undergoing rigorous innovations throughout recent decades (for instance, Reference Vermeulen and HovensVermeulen & Hovens, 2006). Developments in the quality of insulation, ventilation, and air-conditioning are effectively changing the indoor environment to which workers are exposed. These innovations are typically motivated by effects on building efficiency and/or worker comfort, but while there is ample research highlighting the effects of increased energy efficiency on building resource consumption (Reference Eichholtz, Holtermans and KokEichholtz, Holtermans & Kok, 2019; Reference Pérez-Lombard, Ortiz and PoutPérez-Lombard, Ortiz & Pout, 2008), the link between changes in indoor environmental conditions and human performance remains a topic of debate (Reference MacNaughton, Satish, Laurent, Flanigan, Vallarino, Coull, Spengler and AllenMacNaughton et al., 2017; Reference Satish, Mendell, Shekhar, Hotchi, Sullivan, Streufert and FiskSatish et al., 2012; Reference Zhang, Wargocki, Lian and ThyregodZhang et al., 2017).
Research regarding the impact of indoor environment on worker performance is hampered by the fact that high-skilled performance measures at work are difficult to obtain directly, and are hard to compare between disciplines. For example, Reference Zivin and NeidellZivin and Neidell (2012) show that pear-pickers’ performance suffers from exposure to bad environmental quality conditions. However, the output of highly skilled workers who face cognitively demanding tasks – such as academics, managers, doctors, or investors – lacks such direct outcome measure. It is exactly this type of high-skilled workers that spends considerable time in confined offices or meeting rooms, subject to specific indoor climate conditions. Parsons (2014) notes that individual factors often dominate performance outcomes, making it even more challenging to compare productivity between workers. Moreover, any output that is measurable is not easily traced back to a quantifiable time period of exposure to the physical indoor climate.
To circumvent the challenge to correctly assess human performance, research has shifted from measuring performance to comfort (Reference BluyssenBluyssen, 2013). The implicit expectation is that when the climate is rated as “comfortable”, productivity increases. Comfort measures are an attractive proxy for productivity and performance, as they are easily and inexpensively assessed by self-report. Comfort could be treated as a measure of interest on its own (for instance, Nakamura et al., 2008), but whether self-assessed comfort levels are indeed an accurate proxy for performance remains an open question. Psychological research repeatedly suggests self-reported introspection into one’s own subjective experience and emotions to be unreliable (Reference Engelbert and CarruthersEngelbert & Carruthers, 2010).
In this paper, we assess the effect of indoor environmental conditions on human performance, by investigating decision processes. Reference Tversky and KahnemanTversky and Kahneman (1974), amongst others, distinguish decision making as “intuitive” and “rational" processes. Automated, intuitive rules of thumb, or heuristics, are “quick and dirty” and applied without much effort. The rational processes need more time and cognitive resources, are only scarcely applied, and are also associated with high decisional quality. A mainstream application of the interplay between these fast and rational or effortful processes is the default-interventionist approach (Reference EvansEvans, 2007). It stipulates that the effortful processes can intervene in the fast heuristics, when a wrongful application (a bias) in a given context is detected. Thus, whenever the effortful processes are hampered, for instance due to cognitive constraint resulting from environmental factors, increased bias-susceptibility generally lowers overall decisional quality (Reference Gawronski and BodenhausenGawronski & Bodenhausen, 2006; Reference Muraven and BaumeisterMuraven & Baumeister, 2000). In other words, we expect that bias detection and correction will (partially) suffer due to cognitive constraint in effortful processes capacity following temperature stress.
1.1 Literature
1.1.1 Temperature and Cognition
Psychological and neurological research has attempted to identify the effects of temperature on cognitive functions. We elaborate on two relevant findings.
The most profound and general finding is that cognitive capacity is lowered by adverse temperature conditions. Reference Wright, Hull and CzeislerWright, Hull and Czeisler (2002) find that changes in the temperature of the body and brain are correlated with changes in performance, such that deviating temperatures from the internal optimal will worsen performance. Reference Shibasaki, Namba, Oshiro, Kakigi and NakataShibasaki, Namba, Oshiro, Kakigi and Nakata (2017) show that neurological inhibition processes suffer from heat stress. In decision-making, executive and inhibition processes coordinate which stimuli to act on (execute) and which not (inhibit). Both these biological processes are found to be weaker under heat stress. Van Ooijen, Van Marken Lichtenbelt, Van Steenhoven and Westerterp (2004) suggest that temperature could influence mental performance as a result of fatigue. This view is similar to the theoretical concept of mental depletion, the cognitive model stipulating limited mental “control” resources for self-regulation (Reference Baumeister, Bratslavsky, Muraven and TiceBaumeister Bratslavsky, Muraven & Tice, 1998). Mental depletion often results in more instinctive behaviour (such as aggression; Van Lange, Rindery & Bushman, 2017). In general, when external stimuli overstimulate, concentration and performance become more costly (Reference MacLeodMacLeod, 1991).Footnote 1 Indeed, Cheema and Patrick (2012) show that temperature generally lowers cognitive performance, but not for people who were already mentally depleted at the start of the task. Although mental depletion is debated (Reference Carter, Kofler, Forster and McculloughCarter, Kofler, Forster & Mccullough, 2015; Reference Hagger, Chatzisarantis, Alberts, Anggono, Batailler, Birt, Brand, Brandt, Brewer, Bruyneel, Calvillo, Campbell, Cannon, Carlucci, Carruth, Cheung, Crowell, De Ridder, Dewitte and ZwienenbergHagger et al., 2016), the general notion of negative cognitive performance effects after enduring strain on mental capacity seems to be a common denominator in ongoing self-regulation discussions (Reference Cunningham and BaumeisterCunningham & Baumeister, 2016; Reference Baumeister, Vohs and TiceBaumeister, Vohs & Tice, 2007; Reference Lin, Saunders, Friese, Evans and InzlichtLin, Saunders, Friese, Evans & Inzlicht, 2020; Reference HockeyHockey, 2013; for an overview, see Reference Inzlicht, Werner, Briskin and RobertsInzlicht, Werner, Briskin & Roberts, 2021).
The second key finding of research on temperature and cognition is that not all mental processes are affected equally. Lowered cognitive capacity appears theoretically very close to behavioural fatigue. However, it is important to understand that these two concepts are fundamentally and hierarchically distinct. When discussing behavioural fatigue, we consider a general lowering of behavioural activity (i.e., a "global" effect). Decrease of cognitive capacity does not have a general uniform effect, but is depending on the neurological area that suffers most (i.e., a "local" effect). Reference Lan, Lian, Pan and YeLan, Lian, Pan and Ye (2009) found performance to decrease with adverse temperatures, but the effects differ across tasks.
In sum, it is clear that temperature has a general, or global, effect on cognition and cognitive performance, and that some local effects can be identified as well.
1.1.2 Temperature and Intuition
The literature review by Reference Hancock and VasmatzidisHancock and Vasmatzidis (2003) suggests that high capacity and complex mental processes are more profoundly affected by temperature than automated processes. Automated tasks rely on a strong and fast relation between stimulus and response, making them less susceptible to mental constraints (Reference KahnemanKahneman, 1973). Automated tasks are part of system I in Kahneman’s cognitive framework – also known as the intuitive system. They rely on intuition and on simple rules of thumb that are learned and are often successfully applied to predictable situations. System II is slow and costly on mental resources, but is generally associated with high-quality decision making.
Cognitive capacity and cognitive control are highly correlated (Reference Engle and KaneEngle & Kane, 2003), and the latter has also been found to be affected by temperature. Reference Shibasaki, Namba, Oshiro, Kakigi and NakataShibasaki, Namba, Oshiro, Kakigi and Nakata (2017) show that neurological inhibition processes suffer from heat stress. In decision making, inhibition and executive processes coordinate to achieve an optimal solution. As such, the effect of heat on performance can be twofold: not only do higher-order complex tasks suffer more than simple automated tasks (Reference GretherGrether, 1973), but wrongful application of an automated process or application of a wrong automated process might also be less likely to be corrected. In other words, even when the direct effect of heat on simple and automated processes is not evident (as stated by Zhang & de Dear, 2017), the outcome can still suffer in quality due to the lack of high order process intervention. Indeed, Hancock and Vasmatzidis (1998) found that highly skilled operators suffer less from performance decrease under heat stress, and they argued that this is most likely a result of performance depending on automated internalized processes.
The cognitive framework of Tversky and Kahneman leads to relevant predictions when we apply the findings of temperature on task complexity and intuition. The interaction found between temperature and automated tasks and task complexity suggests that system I could be less affected than system II. The default-interventionist approach (Reference EvansEvans, 2007) stated that both systems work parallel to each other, and system II generally attempts to identify mistakes made by system I and intervenes if necessary. Recent advances in this field suggest that logical conclusions also manifest intuitively (Reference De Neys and PennycookDe Neys & Pennycook, 2019). In this view, deliberation by system II is activated only when both the heuristic and logic intuition are of similar strength and conflicting. Thus, a correct response on the CRT, for instance, does not need deliberation when the logic intuition is stronger than the heuristic intuitive. For both views, however, the wrongful application of heuristics would be more prevalent when the controlling function of system II would fail as a consequence of the heat stress.Footnote 2
We therefore expect that the distinct effect that heat has on cognition can be (partially) captured by the Kahneman framework. Recent research has investigated the effect on cognitive reflection (Reference Chang and KajackaiteChang & Kajackaite, 2019), but to date, no study has extended this investigation to the specific behavioural biased outcomes stemming from a predisposition to overly adhere to intuitive decision strategies. Although the CRT is highly correlated with specific behavioral biases, we test the effect of heat on bias sensitivity for an array of specific well-known biases directly. To our knowledge, no attempts have been made to distinguish the effects of heat on behaviour and cognition using this approach.
1.1.3 Temperature and Risk
Evidence suggests that temperature has a direct effect on the willingness to take risk. Wang (2017) shows that people making trading decisions will pursue high-risk high-yield options compared to a control condition.
Some indirect evidence on aggression also suggests that risky behaviour could follow from loss of control through the same channel. For instance, solely increasing the temperature makes people subjectively rate other people in the room to be more hostile (Reference Anderson, Anderson, Dorr, DeNeve and FlanaganAnderson, Anderson, Dorr, DeNeve & Flanagan, 2000). Reference Cao and WeiCao & Wei (2005) hypothesize that aggression leads to increased risk behaviour. Denson, DeWall and Finkel (2012) conclude that it is the loss of self-control that increases aggression. Finally, Reference Frey, Pedroni, Mata, Rieskamp and HertwigFrey, Pedroni, Mata, Rieskamp and Hertwig (2017) show self-control to be predictive of various risk behaviour outcomes. Overall, we expect the same channel that increases system I dependency will also increase risk-taking behaviour.
1.1.4 Temperature and Gender
Many individual characteristics mediate the effect heat has cognition, however, the heterogeneous gender-related differences stands out.Footnote 3 Biological research (Reference Kingma and Van Marken LichtenbeltKingma & Van Marken Lichtenbelt, 2015), metabolic research (Reference Byrne, Hills, Hunter, Weinsier and SchutzByrne, Hills, Hunter, Weinsier & Schutz, 2005), and psychological empirical research (Reference WyonWyon, 1974) shows that hot temperatures have a distinctly different effect on women as compared to men. The most profound example of this distinction and its neglect in the past decade is the temperature comfort level. The ‘default’ room temperature level of 21° C seems mainly based on male preferences (Reference Kingma and Van Marken LichtenbeltKingma & Van Marken Lichtenbelt, 2015). Indeed, anecdotal evidence suggests that women perform better at slightly higher default room temperatures (Reference Chang and KajackaiteChang & Kajackaite, 2019).
As such, finding the effects of adverse temperature on cognition would be incomplete without taking gender-specific preferences into consideration. Without correcting for gender, female preference or tolerance for higher temperatures might influence the overall findings regarding the effect of adverse temperatures on performance. Given that women show a preference for somewhat higher temperatures, women will rate identical absolute temperature increases (subjectively) as less adverse as compared to men. Performance for women might thus also be expected to be less affected by heat.
1.2 This study
We hypothesize that heat exposure will decrease cognitive performance such that biased behaviour will be more prominent, as rational correction will require more effort under heat stress. Heat is a salient factor in the working environment and workers can often elicit control over temperature themselves, making the relevance of our results apparent and immediately applicable. Moreover, by testing detectable temperature differences in each condition, we are able to assess the accuracy and thus relevance of self-reported comfort measures for in future research.
Additionally, we investigate the effect of heat on risk behavior. Through the same channel, we expect that a combination of lack of effortful control and bodily discomfort will increase risk behaviour. This would be in line with aggression studies (for instance, American football players commit more aggressive fouls; Reference Craig, Overbeek, Condon and RinaldoCraig, Overbeek, Condon & Rinaldo, 2016). We test both the general self-reported risk attitude, which has generally been claimed to be a rather stable character trait, unaffected by heat (Reference Dohmen, Falk, Huffman, Sunde, Schupp and WagnerDohmen et al., 2011), and actual risk behaviour, which we expect to increase following indoor temperature manipulation (see, for example, Wang, 2017).
Our experimental design has several key advantages over current practices in the literature. First, we actively strive to control a variety of factors influencing the physical experience of the environment. That is, we pre-expose all participants to the temperature manipulation for a defined adjustment period of one hour before starting the tasks. All participants are wearing similar clothing provided specifically for the experiment. We further control for the outdoor temperature of the period before testing. Second, we keep all other indoor climate factors constant. For instance, we manipulate the temperature while keeping air ventilation levels unchanged. As a result, CO2 levels, noise, lighting, and air refreshment are equal between manipulations. Some recent experiments manipulated temperature by opening and closing windows, without controlling for CO2 and fine particles between groups, and are therefore unable to isolate the effect of just temperature on task performance (Reference WangWang, 2017).
2 Method
2.1 Experimental conditions and design
We designed a controlled experiment to measure the effect of heat on decision quality. We employed a stratified random sampling method to recruit a total of 257 participants with an average age of 21.57 (SD = 2.41) years old using the Maastricht University Behavioral Experimental Economics laboratory database. Stratification ensures an equal gender distribution amongst manipulation groups. The final sample allows for a 10% deviation of gender within groups. All participants were proficient in reading and writing of the English language. Participants are randomly distributed to either the control or the experimental condition.Footnote 4 This between-subject design used temperature as the main independent variable. Given the clear gender differences in the temperature effect on performance and satisfaction in the literature, gender is the secondary independent variable in our analysis.
Participants were exposed to a controlled physical environment with either a hot temperature (28° C) or a neutral temperature (22° C). The decision for 28° C is derived from the body of literature focused on temperatures below 29° C / 85° F (for an overview, see Reference Hancock and VasmatzidisHancock & Vasmatzidis, 2003). More specifically, previous research repeatedly showed an effect of hot temperature on performance on neurobehavioural test at 27–28° C (Reference Lan, Lian, Pan and YeLan, Lian, Pan & Ye, 2009; Reference Lan, Lian, Pan and YeLan & Lian, 2009).Footnote 5 In these conditions, a battery of validated tests included cognitive reflection tasks, a heuristics battery, lottery risk tasks, and self-reported risk preferences. Additionally, participants state their personal comfort levels and their subjective estimation as to what extent the environment influences their performance on the battery of tasks. The experiment was programmed using Qualtrics Software (Qualtrics, Provo, UT) and executed at the Behavioral Experimental Economics lab facilities at Maastricht University in the Netherlands. The laboratory is approximately 5 meters wide and 20 meters long. In this room, there are 33 cubicles (approx. 1.0 meter by 1.5 meters), all including a computer and table, which are closed off by shutters. All participants are tested in groups varying between 25 and 30 participants per group. Air quality is controlled using a climate system that holds the air refreshment rate constant.Footnote 6 The control condition of 22 ° C is reached running only the climate system. The “hot” condition of 28 ° C is reached using five 3kW industrial heaters, each with a 115m3 capacity. During the experiment, four heaters maintain a constant temperature. Manual adjustments to the thermostats of the individual heaters ensures a stable temperature. All heaters also ran without heating during the control condition, such that the noise produced by the heaters is constant between conditions.Footnote 7
All participants were subject to strict clothing prescriptions. These requirements ensure that all participants have a similar physical experience of the heat. For instance, the possibility to remove layers of clothing could increase heterogeneity in the experienced heat within and between conditions. All participants are asked to wear long jeans. To fully ensure homogeneity, we provide all participants with long-sleeved black polyester thermoshirts. Participants are not allowed to wear anything underneath these shirts.Footnote 8
Participants arrived in the laboratory at 11 AM, one hour before the start of the actual experiment. This adaption time ensured that all participants experience the indoor climate similarly, independent of the outdoor temperature or previous activity. During this adaption time, the temperature was kept at the same levels as during the experiment. After one hour, the test battery automatically started. All tasks were completed in English. Each task was presented to each participant only once. We did not impose a time schedule for the different tasks. The average completion time was roughly 45 minutes. Moreover, the outdoor temperature was measured on all testing days and compared between conditions. (Appendix Table 4 Panel A provides an overview of the indoor temperature during task and adaption, as well as the outdoor temperature between conditions.) The tasks were given in the order in which they are presented in Section 2.2. All tasks were presented to each participant only once.
2.2 Dependent measures
2.2.1 Performance measures
Cognitive Reflection Task:
The classic Cognitive Reflection Task (CRT) by Frederick (2005) measures participants’ propensity to rely on intuition or rational thinking. The test consists of three questions, of which each question has a salient intuitive answer and a correct rational answer. Each of these questions are scored with 1 for a correct response or 0 for an incorrect response. The score for this task is the number of correctly answered questions, such that the score of the CRT lies between 0 (no correct answers) and 3 (all answers correct). Although this test is often used, Reference Bialek and PennycookBialek & Pennycook (2017) find that multiple exposure does not reduce its validity.
Cognitive Reflection Task Expansion:
To increase the probability of capturing the distinction between intuitive and rational thinking in our sample, we added an expansion of the original CRT. This test (from Reference Toplak, West and StanovichToplak, West & Stanovich, 2014) consists of three additional items, following the same structure. It is highly correlated to the original CRT.
Heuristics Battery:
The heuristic bias task battery by Reference Toplak, West and StanovichToplak, West and Stanovich (2011) includes various questions about well-known economical biases. We select ten questions from this battery concerning casual base rate neglect, sample size problems, sensitivity towards regression to the mean, framing bias, outcome bias, the conjunction fallacy, probability matching, ratio bias, methodological reasoning, and the covariation problem.Footnote 9 Each of these questions are scored with 1 for a correct response or 0 for a biased and thus wrong response. The resulting score on this battery is thus between 0 and 10 points (M = 6.32, SD = 2.16), in line with the original authors.
2.2.2 Risk measures
Risk Elicitation Task:
The first measure of risk assessment is aimed at inducing or eliciting actual risk behaviour at the time of the experiment. Similar to the original task of Reference Holt and LauryHolt and Laury (2002) we showed the participants nine choices between two sets of lotteries. The first lottery is of relatively low risk, where both the high and low payout options diverge only minimally (€6 versus €4.80, respectively). The second lottery can be considered high risk, as there is a strong divergence between the high (€11.55) and low (€0.30) payout option. For each consecutive choice, the probability of the high payout in both lotteries increases with 10%, such that in the first choice the probability of the high payout for each lottery is 10% and in the ninth and final choice this probability has become 90%. Note that the expected payout of the high-risk lottery surpasses the payout of the low-risk lottery from step 5 onwards (since then the expected payout is €5.93 for the high-risk versus €5.40 for the low-risk lottery). Participants are scored on a scale from 1–10, where the score reflects the switching point of the participants. Score 1 indicates a sustained preference for the high-risk lottery, labelling them as “risk-loving”. A score of 5 implies risk-neutral behaviour, as participants follow the switching point in which both measures are equivalent. A score of 10 is assigned when participants never switch to the high-risk lottery. We label these participants as “risk averse”. Depending on the risk preference, all scores are considered rational, as even in step 1 or 9 there is still a 10% probability of a high win or loss, respectively. This lottery is incentivised, and participants are told that one of the lottery choices will be played at the end of the questionnaire. The outcome of their chosen lottery will be added to their total reimbursement. To make this incentive at least 25% of the total reimbursement, the lottery outcomes are multiplied by a factor from the original (Reference Holt and LauryHolt & Laury, 2002). Participants who switched their choice of lottery more than once were excluded from the sample; 34 observations were thus excluded (16 male, 18 female).Footnote 10
Risk Attitude Task:
In addition to a risk elicitation task, we asked participants how risk-loving they perceive themselves to be, both in general and on specific domains. Participants rated themselves on a 10-point scale, with the lowest score being risk-averse, and the highest score labelled fully prepared to take risk. First, all participants state to what extent they are willing to take risk or avoid taking risk generally as a person. Second, their willingness to take or avoid risk are specified for the following domains: driving, financial matters, leisure and sport, their occupation, health, and faith in other people. This approach has been extensively validated and found to correlate with actual risk behaviour (Reference Dohmen, Falk, Huffman, Sunde, Schupp and WagnerDohmen et al., 2011; Reference Falk, Dohmen and HuffmanFalk, Dohmen & Huffman, 2016).
2.2.3 Indoor climate satisfaction
Self-reported Indoor Climate Satisfaction and Hindrance:
Self-reported indoor environmental satisfaction was assessed by adapting the occupant indoor environment quality survey developed by Berkeley’s Centre for the Built Environment (Reference Huizenga, Abbaszadeh, Zagreus and ArensHuizenga, Abbaszadeh, Zagreus & Arens, 2006). For temperature, air quality, noise, and lighting, all participants are asked to rate their satisfaction level on a scale from 1 to 7. Additionally, for all these factors, participants are asked to what extent they perceive it as hindering or supporting their ability to answer the questions in the questionnaire on a similar 7 point scale. The scores are recoded such that a score of 7 indicates that the factor fully supports their ability, and a score of 1 indicates that the factor fully hinders their ability to answer the questionnaire. We label the totality of these factor-specific measures “satisfaction measures”. In the analysis, we control for multiple testing.Footnote 11
2.2.4 Additional checks
CRT multiple exposure check:
After the three performance tasks (e.g., original CRT, extended CRT, and the Heuristics battery), all participants were asked to indicate whether they recognize any if these questions and if yes, whether they also remember the correct answer. These questions are scored by 1 – yes, 2 – no, or 3 – unsure.
Clothing check:
All participants were asked to indicate whether they are indeed wearing the thermoshirts provided by the experimenter.Footnote 12 On a Likert-scale of 1 (bad) to 7 (good), participants indicate the fit, length, and the comfort of the shirt. Additionally, we ask to what extent the shirt influences the performance on the tasks using the same scale.
Temperature:
To be able to check for climate adjustment effects, three questions assessed the current and past climate experienced by the participants as well as their climate preference. Specifically, participants were asked to state in which country they grew up (most time spend until your 18th birthday), in which country they lived for the majority of the last five years, and what their preferred thermostat setting is (in degrees Celsius) in winter.
2.3 Incentives payoff
The payout was determined by adding the outcome of the preferred lottery of the risk elicitation task to the standard endowment of €15. The participants were told that for one of the steps, their chosen lottery will be played, but do not know which step this will be. The Qualtrics Internal Randomizer was used to draw an outcome (50/50 allocation) for the lottery chosen by the participant at step 5. The outcome was displayed at the end of the questionnaire. For the whole sample the average expected payoff of the risk task is 27% of the total payoff (with mean €5.98). No other performance tasks were incentivised, as these specific tasks are found not to be affected by incentives (Reference Brañas-Garza, Kujal and LenkeiBrañas-Garza, Kujal & Lenkei, 2019).
2.4 Statistical approach
To investigate statistical significance of the variables of interest, we ran mean comparison tests between the two manipulation conditions. Specifically, we conducted independent samples t-tests using STATA software (StataCorp, 2017). In situations when normality violations are detected (using Shapiro-Wilk normality tests), we tested for significance using Mann-Whitney U (Wilcoxon rank-sum) tests. For all results, we state whether parametric or nonparametric procedures are reported. Additionally, we apply the Benjamini & Hochberg procedure (Reference Benjamini and HochbergBenjamini & Hochberg, 1995) as multiple testing correction when required.
3 Results
3.1 Descriptives and Condition Manipulations
The recorded sample consists of 257 students ranging from 17 to 31 years old, of which 53.5% are female (see Appendix Table 3).Footnote 13 The recorded indoor and outdoor climate conditions are reported in Appendix Table 4. The average temperature in the control condition was 22.4° C and in the hot condition 28.3° C. Levels of indoor CO2, outdoor temperature of each test day during the morning, and outdoor temperature of the past three days do not differ significantly between manipulations.
3.2 Satisfaction measures
We first present the climate satisfaction measures in Table 1. Looking at the first column, it is confirmed that temperature (d= 0.77) and air quality (d= 1.53) are significantly less satisfactory in the hot condition. Additionally, both are predicted to hinder the performance on the performance measures. This confirms the notion that the high-temperature manipulation is considered uncomfortable.
Note: all scores are on 1-7 scale, and all scores are recoded such that 1 is bad or low, and 7 is good or high. Significance levels are based on nonparametric analysis. Standard deviation are given in parentheses.
* indicates p < .05,
** p <.01, and
*** p <.001, after multiple testing correction.
Looking at the other indoor factors, and taking male and female participants together, we do not observe lighting satisfaction to be significantly different between conditions. The same holds for the effects of light on perceived performance. Similarly, we find no difference for noise satisfaction between conditions. However, it is reported to improve performance in the hot conditions. Here also, we note that noise was kept constant between conditions. Interestingly, participants actually predict noise to improve performance compared to the control condition. We suggest that in the control condition, when the heaters only produced noise, participants perceive the noise on its own as potentially hindering performance. In the hot conditions the noise of the heaters may be driven to the background by the more salient temperature. Also, in the hot condition there is a justification for the noise. Finally, clothing satisfaction and hindrance do not differ between conditions.
3.3 Gender Differences and Temperature
Following recent studies of gender differences and temperature effects on performance, we examine the satisfaction measures when controlling for gender. Interestingly, the general dissatisfaction and increased hindrance of temperature are reflected in our male sample only. These findings are presented in the middle two columns of Table 1. Our results are in line with Reference Chang and KajackaiteChang and Kajackaite (2019), such that males dislike hot temperatures and report to suffer more from heat as compared to women. This notion is further supported by the observation that temperature experience differs between genders when related factors do not. When we compare air quality satisfaction and its hindrance between the two conditions, we find that both men and women dislike the hot temperature condition equally compared to the control condition. We note that additional (marginally) significant inconsistencies are seen for rating factors that are stable between conditions such as noise and light. Those discrepancies are correlated with the temperature manipulation (e.g., a potential demand effect; also see limitation section).Footnote 14
Summarizing, we find that, as expected from the manipulations, temperature significantly lowers satisfaction and the perceived performance on the task, but only for the male sample. As such, as the commonly used hypothesis regarding the link between comfort and productivity predicts, we expect to find a decrease in performance on the performance measures for men, but not for women.
3.4 Performance Measures
Panel A of Table 2 shows the non-parametric results for the performance measures. We find no significant difference between control and hot conditions on any of the three performance measurements for the full sample. Only for women do we find a marginally significant difference (T=−1.75, p=0.08; d=0.30) between the performance on the CRT original between the control condition (M=1.26, SD=1.09) and the hot condition (M=1.61, SD=1.24).Footnote 15 Note that performance is increasing rather than decreasing. We conclude from these first results that the temperature has no direct effect on performance for men and women on our performance measures. If anything, we find weak support in line with Reference Chang and KajackaiteChang and Kajackaite (2019), as women seem to improve rather than decrease their performance on one of the three tasks in the hot temperature condition.Footnote 16
Note: For all panels except C, all significance levels are based on parametric analysis. For panel C, significance levels is based on nonparametric analysis. Standard deviation are given in parentheses.
* indicates p < .05,
** , p <.01, and
*** p <.001
3.5 Risk measures
Risk preference elicitation task
As expected from a strong body of research (for an overview, see Reference Byrnes, Miller and SchaferByrnes, Miller & Schafer, 1999), a baseline difference in risk behaviour is observed when comparing the control conditions as can be seen in Table 2, panel B. Based on parametric independent sample t-tests, men (M = 5.70, SD = 1.85) are significantly more risk-taking as compared to women (M = 6.48, SD =1.57; t = −2.42, p < 0.05; d=0.45), in line with the literature.
For the risk elicitation measure, participants in general do not differ between conditions. However, when we look at the gender subsamples, the picture changes. First, although men do not differ significantly in risk preference between conditions, women are significantly more risk loving in the hot condition (M = 5.61, SD = 1.89) compared to the control condition (M = 6.48, SD = 1.57; t = 2.75, p < .01 ; d= 0.50). As such, for women the risk and heat hypothesis appears to be a valid prediction.Footnote 17
When comparing the risk preferences of women in the hot condition with the control condition of male risk preference, we observe that women do not only become more risk loving in a hot condition, but that their risk preference becomes equal to that of men in a normal control situation.
General risk attitude
For the general risk attitude question “Are you generally a person who is fully prepared to take risks or do you try to avoid taking risks?” (See Table 2, panel C), men report to be less prepared to take risk when asked in a hot condition (Mdn = 6.5) compared to the control condition (Mdn = 6; z=2.1, p < .05; d=0.38).Footnote 18 This is surprising, as we explicitly ask participants to reflect on their general risk attitude. This question has repeatedly shown to be stable over time and context independent, and as such, is supposed to be a stable predictor for risk behaviour. Women do report a stable attitude independent of conditions.Footnote 19
When looking at the domain-specific risk attitudes, only one differs significantly between conditions: Men predict to be less risky on work-related issues in a hot condition (Mdn=6) compared to the control (Mdn=6.5; z =2.19 p=0.028; d=0.42) condition. Footnote 20 For an overview of these results, see Appendix Table 5. This result remains significant when applying the Benjamini-Hochberg rank-dependent multiple testing correction (Reference Benjamini and HochbergBenjamini & Hochberg, 1995) on the critical p-value threshold with a Q (false discovery rate) of 15%.Footnote 21
4 Discussion
The increasing frequency of heatwaves, and outside temperatures that used to be exceptional, raises important questions about the impact of temperature on human performance. Of course, outdoor temperature does not need to be harmful given the mitigating effect of buildings, acting as a “shield” against temperature changes and pollution. There is evidence of a positive effect of building quality on human performance and productivity (e.g., Reference Palacios, Eichholtz and KokPalacios, Eichholtz & Kok, 2020). But research measuring indoor climate also shows negative performance effects resulting from exposure to adverse indoor conditions (e.g., Künn, Palacios & Pestel, 2019; X. Reference Zhang, Wargocki, Lian and ThyregodZhang, Wargocki, Lian & Thyregod, 2017). Given that we spend roughly 90% of our time indoors, the effect of these adverse conditions warrants research. Understanding the effects of indoor temperature on human performance is crucial in determining and optimizing the daily indoor environment in work places and beyond.
The focus of this study is twofold: First, we assess the effect of hot temperatures on decision quality, and second, we answer the question whether peoples’ stated experiences regarding these temperatures are related to this decision quality. In this study, we assessed the effect of adverse temperature by manipulation of the indoor temperature to 28° C over a two-hour period, compared to a control temperature of 22° C.
From the expectation that rational decision-making would suffer under adverse temperatures, more reliance on intuition would lead to a lower score on the Cognitive Reflection Task and to more biased responses in the Heuristic Battery. However, no significant difference on performance between the hot and control conditions were identified in this study. When looking at risk, a factor often associated with decisional quality and furthermore proposed to be correlated with the intuition-rational trade-off (Reference Leith and BaumeisterLeith & Baumeister, 1996), we observe only an increase of risk preference in hot conditions for women.
Comparing these results with self-reported measures show some essential discrepancies. First, in our sample, only men find the hot condition significantly less satisfactory as compared to the control condition. Women do not seem to make a distinction between conditions. Furthermore, when asking to what extent temperature has an influence on performance, men predict that the hot temperature significantly hinders their performance. Again, women do not make this distinction.
The discrepancy between self-report and actual behaviour is of crucial importance for the literature regarding the effects of indoor climate. Currently, self-reported measures are commonly used as a proxy for performance or productivity, yet this study shows that men are consistently overestimating the effect of adverse temperatures on performance. First, the discrepancy between the actual performance outcomes and the perceived hindrance from adverse temperature for men shows that men would have expected to have performed better in the control condition, which they did not. If policy makers would have assessed this self-perceived hindrance only, they might have spent significant effort and resources to improve indoor temperature conditions. In our study, however, we show that this would not result in an actual increase in performance.
On the domain of risk, we find that men assess their own daily willingness to take risk in general and in work situations to decrease when they are asked about this in the hot condition. This is surprising, since this measure is aimed at assessing the general self-reported risk preference, independent of any manipulation, and would thus be expected to be stable across conditions. For women, no significant difference between conditions is found. As for actual risk behavior, we find no difference between conditions for men.
These results have at least two implications for future indoor temperature (and indoor climate) research. First, we repeatedly find inconsistencies between the self-reported and actual effects of the indoor climate on performance. Specifically, men are overestimating the negative effect the temperature has on their performance. This shows that the use of self-reported measures as a proxy for actual performance is unreliable. Future research should focus on more direct measures of human performance and productivity than self-reported indoor climate satisfaction. Second, our research supports the recent findings of Reference Chang and KajackaiteChang and Kajackaite (2019) that gender plays a moderating part in the effect of temperature on performance. This underlines the conclusion from Kingma & Van Marken Lichtenbelt (2015) that one universal temperature standard does not fit the whole population. Gender differences have to be taken into account in any situation when we include temperature as an influential factor.
4.1 Limitations
Three specific limitations are worth discussing. First, a multitude of factors could mediate our results. We control for many relevant variables, yet we cannot exclude the possibility that some factors confound our results. According to Zhang, De Dear and Hancock (2019), the following factors should be considered regarding the effect of the thermal environment on performance:
Environment-related factors
include intensity and duration of the indoor environment. We carefully control temperature and keep all other relevant factors constant between conditions. We include an adaption time that extends the total exposure time beyond most comparable studies. However, it is possible that higher temperatures would lead to differences in performance on the (heuristics) tasks battery (Reference ParsonsParsons, 2014). For instance, Zhang, De Dear and Hancock (2019) found that reasoning declines from temperatures upwards of 28° C. We justify our decision for the temperature levels based on earlier research and our goal to generalize our finding to a realistic working environment of high skilled workers. By doing so, we inevitably limit the external validity of our results for higher temperatures. Finally, although we measure a multitude of variables between conditions (see Appendix Table 4), unobserved variables could inadvertently influences the results.
Performance-related factors
include all individual factors such as age, gender, skill level, acclimation level, and emotional state. We control for individual differences between groups regarding gender, math skill, education level, age, and thermostat preference (see Appendix Table 4). We apply random sampling to counter unobserved variables, such as emotional state, to distort our results. The sample size is limited as the adaption (or acclimation) time required takes more resources than in comparable studies. However, we are confident that addressing the exposure time is a key advantage of our experiment relative to the current literature. Regarding participant age, the sample mainly consists of students around the age of 22 (M = 21.57, SD = 2.41). We attempted to recruit an age category representing an older population (older than 50), but recruitment turned out to be difficult. Moreover, the level of English language skills and task comprehension forced us to exclude a significant part of the successfully recruited "older" sample. The educational background of the majority of our sample (Business and Economics students) increased the likelihood of recognition of the type of tasks we assessed, and previous exposure to these constructs can influence results (we will discuss the results of multiple exposure to the CRT test below). Usage of the relatively unfamiliar extension of the CRT (Reference Toplak, West and StanovichToplak et al., 2014) and an unfamiliar heuristic battery (Reference Toplak, West and StanovichToplak et al., 2011) at least partially alleviates this concern.
Task-related factors
include the complexity and the type of task presented to the participant. Since all participants are performing the same tasks, no confounding effect of task type and complexity is to be expected. However, a new view on the underlying mechanism of the dual process model could explain why we do not find an effect of temperature on cognitive performance using our heuristics battery. De Neys and Pennycook (2019) suggest that the deliberate system is activated only when there is a clear conflict between a heuristic reaction and a logical reaction. It is possible that the nature of our task battery elicits either a intuitive responses or a logical solution, but without a conflict between these two. The lack of conflict, according to De Neys and Pennycook, will not reveal any potential restrictions in the deliberate system because this system is not involved in the response. We deliberately test an extensive battery of well-known heuristic problems which should increase the likelihood of conflicts in which the deliberate system is active. However, we cannot fully excluded the possibility that the lack of conflict (partially) explains why we find no difference between the two groups. We encourage further research to assess both neurological measured deliberate system activation as well as the level to which these tasks present an implicit conflict between logic and intuitive response.
Second, participants likely change behaviour in anticipation of the effect of the manipulation, which is unavoidable in an experiment with temperature manipulation. All participants in the manipulation conditions (e.g., the “hot” temperature condition), are instantly aware of this manipulation when entering the laboratory. To create uniformity between groups and take away emphasis on the temperature, we asked participants in all conditions to wear a provided shirt, and in both conditions the industrial heaters were on. Moreover, the indoor climate quality scale was not limited to temperature, but included other important indoor climate variables, reducing the emphasis on temperature. However, when the participants were asked to state what they thought the experiment was about, they indeed stated (in the manipulation condition) that temperature and task performance was the major aim of the experiment. In the control condition, less than 10% stated temperature to be a decisive factor (popular guesses included the influence of “clothing” or “noise” on performance).
Finally, the choice for our test battery is the outcome of a careful trade-off between practical and theoretical considerations. Research has suggested that the CRT is robust under multiple exposure (Reference Bialek and PennycookBialek & Pennycook, 2017; Reference Meyer, Zhou and FrederickMeyer, Zhou & Frederick, 2018) and consistent over time (Reference Stagnaro, Pennycook and RandStagnaro et al., 2018). Recognition of the original CRT is relatively high (46% recognized at least one question, and 20% recognized all questions) .Footnote 22 For the extended CRT questions, however, only 13% recognized one or more questions. The fact that we observe no difference in performance between the classic and extended CRT supports the notion that these levels of recognition and recollection of answers do not affect the results of this study.
Welsh et al. (2013) propose that the CRT merely reflects mathematical skills. In our sample we see that self-reported math skills differ significantly between genders. Women report a proficiency of 59.07 out of 100, whereas males report 67.48 out of 100 (p < .001). We indeed find that in the total sample, men outperform women in the CRT. However, this does not affect the result in the sense that we analyse the effect of temperature on performance specifically within gender. We furthermore find no interaction between math proficiency and the effect of temperature on the CRT. Nevertheless, we cannot exclude that the risk assessment is effected by the difference in math proficiency.
Appendix
Note. Statistics presented are mean values and standard deviation are presented in parentheses. Math Proficiency is on a 0–100 scale. Thermostat Preferece is in °C, in winter. Extreme thermostat preferences were excluded (below zero degrees and above 30 degrees). p-values results from nonparametric independent sample t-tests.
* indicates p-vale < .05,
** a p-value <.01, and
*** a p-value <.001.
Note. Statistics presented are mean values and standard deviation are presented in parentheses. Panel A describes the indoor and outdoor climate conditions. ppm stands for particles per million. Panel B describes the individual characteristics per condition. Thermostat Preference stated is in winter conditions. Education level in on a 0 to 5 scale, where 0 is without high school diploma, and 5 is completed masters diploma.p-values results from parametric independent sample t-tests.
* indicates p < .05,
** , p <.01, and
*** p <.001.
Note: All scores are on 1-10 likert scale, and all scores are recoded such that 1 is risk averse, and 10 is risk loving. Significance levels are based on nonparametric analysis. Standard deviation are given in parentheses.
* indicates p < .05,
** p <.01, and
*** p <.001.
Note. The Risk Elicitation task has missing values, the summary statistics excluded all risk attitude cases that are matched to missing values for the risk task. Correlation coefficient presented is the Spearman’s rho and 95% confidence interval in brackets.
* indicates p < .05,
** p <.01, and
*** p <.001.
Note. The p-value are the result of nonparametric ranksum tests as shown in table 3. The chosen levels of False Discovery Rates (Q) are chosen given that Q=15% implies less than 1 FDR per 7 tests. Q=5% is the most conservative FDR rate, with the highest risk of False Negatives (Reference McDonaldMcDonald, 2014). Applying the FDR formula (False Discovery Rate = Expected (False Positive / (False Positive + True Positive))) to the risk domain entails that the change of two significant findings amongst 7 domains would be 28.6%. We find two significant findings (in the male sample) if we correct for a FDR as low as 15%. The significance of the general risk attitude in the male sample is robust against a FDR of 12%.
Note.
* The percentage in the remembering column is conditional on recognition. For example: For the Lilypads, of the 45.52% that recognizes the questions, 44.03 % does not remember the answer.
Note. Effect size sensitivity is reported per groupsize. The first rows apply to the majority of all presented results in the paper. Only for the risk elicitation task, the latter rows applies, due to some exclusion cases in that sample. We present for each sample-size sensitivity estimates for parametric as well as non-parametric tests.