1. Introduction
A growing body of literature has reliably established that political candidates benefit from having a lower voice pitch (as summarized in the next section). They are perceived to have higher leadership abilities. Hence, keeping everything else the same, a lower voice pitch brings in more votes. This voice-pitch bias might be the reason why Margaret Thatcher, one of the most important political figures of the 20th century, worked with a voice coach early in her political career to lower her voice pitch.Footnote 1 This begs the question: what alternative strategies can a politician employ to overcome voice-pitch bias, and how effective are they?
A candidate's policy position is arguably the most important aspect according to which she is evaluated by voters. Hence, by proposing a more desirable policy than her opponent, a candidate can be expected to offset the vote disadvantage due to voice-pitch bias. But in relation to one another, how important are voice-pitch and policy position? And how does the trade-off between the two depend on the candidate's gender, the socioeconomic characteristics of the voter, and the policy dimensions on which the candidates contend? The objective of our study is to answer these questions using an experimental methodology. To the best of our knowledge, this is the first study to approach these questions.
To analyze these issues, we propose a new measure that we call persistence of voice-pitch bias: the amount of policy difference that makes a voter indifferent between two candidates who have a unit difference in voice pitch but are identical otherwise. More formally, persistence of voice-pitch bias is the marginal rate of substitution of voice pitch for policy in the voter's preferences.Footnote 2 Measuring its magnitude in an experimental setting, we find a significant level of persistence in voters’ revealed preferences. Furthermore, we find that the voice-pitch bias voters display is, on average, five times more persistent when evaluating women as opposed to men candidates. Hence, a man candidate disadvantaged due to voice-pitch bias needs a much smaller policy adjustment than a woman candidate in a similar situation. Our experiment also corroborates the earlier empirical and experimental literature discussed in the next section in that the average voter exhibits a significant voice pitch bias. Furthermore, we find that the average voter displays a higher voice-pitch bias when evaluating a man candidate than a woman candidate.Footnote 3
Our study has important implications for gender and politics. The female voice pitch typically ranges between 165 and 255 Hz while the male voice has typically a much lower pitch (LP), with a range from 85 to 180 Hz. As a result, signals of social dominance and trust perceived from the voice pitch leverages men over women. Even when the two candidates are of the same gender, the effect of voice pitch on perception is prominent, for example favoring women candidates with a lower voice pitch. Elections between men candidates suffer from a similar gender bias, and as discussed above, sometimes to a higher extent. Furthermore, we theorize that policy declarations by men candidates are taken much more seriously by voters, and that, a combination of these two biases results in our two main findings, namely that (i) voters exhibit a stronger voice-pitch bias when evaluating men candidates, and yet (ii) voters respond more strongly to policy differences between men candidates, making voice-pitch bias more persistent for women candidates.
We also analyze how participant gender affects voting behavior. We start with the case where there are no policy differences between candidates. For men participants, we find that a higher percentage vote for the LP candidate when choosing between men than women candidates. This is not the case for women participants. Hence, we conclude that it is men participants that drive the overall finding that voice-pitch bias is higher for men candidates than women candidates. On the other hand, both men and women participants strongly respond to a small policy difference between men candidates, but not between women candidates. Additionally, we analyze the implications of a variety of pre-treatment covariatesFootnote 4 and demonstrate that our findings are robust against their inclusion in the analysis.
As two potential mechanisms for voice-pitch bias and its persistence, we also measure how a candidate's voice pitch affects the participants’ perception of her/his competence and trustworthiness. Competence is closely related to social dominance and the literature discussed in the next section shows that a candidate with lower voice pitch is perceived as more competent. On the other hand, previous findings on trustworthiness, another important characteristic for a candidate, are mixed. We contribute to this discussion by showing that a lower voice pitch creates on our participants a perception of both higher competence and higher trustworthiness. Furthermore, the effect is stronger for voters evaluating men candidates. In Section 4 we discuss how these findings might provide an underlying mechanism for voice-pitch bias.
We focus on two policy dimensions, namely, per capita public spending on health and public spending on education, varied in a between-subjects design. Focusing on per capita public spending allows policy differences between candidates to be measured in monetary terms. We choose health and education since both are major valence issues, that is, issues where there is a broad consensus that an increase in the current level of public spending is desirable in Turkey, where we run our experiment, and both have been salient policy dimensions around the world, especially after the onset of COVID-19. Additionally, the Turkish public is predominantly satisfied (dissatisfied) from government policies on health (education), hence allowing us to control for attitudes toward government.
In our experiment, participants listen to two voice recordings (i.e., candidates). On a given policy dimension, each recording declares a policy stance (e.g., I will annually allocate X TLFootnote 5 per person for public health expenditures). To control for unobservable individual differences between candidates, the two voice recordings are obtained from the same person and digitally manipulated to either a higher or an LP to create a difference of 1 equivalent rectangular bandwidth (ERB) (roughly 40 Hz) between them. We refer to these two recordings as the LP and the HP candidates.Footnote 6 In our design, a participant is exposed to candidates of only one gender. Our experiment was carried out online. Online voice-pitch experiments have been shown to produce results that are comparable to laboratory experiments (Feinberg et al., Reference Feinberg, DeBruine, Jones and Perrett2008).
2. Background and hypotheses
Voice pitch is a prominent vocal feature that has significant effect on humans’ perception of others. It can be defined as the number of vibrations per second made by the vocal folds to produce a vocalization (Tusing and Dillard, Reference Tusing and Dillard2000). Larger vocal folds generate lower frequencies due to slower vibrations, and hence produce lower sounding voices. Additionally, the human voice pitch is sexually dimorphic (Puts et al., Reference Puts, Hodges, Cárdenas and Gaulin2007). The voice pitch of an average male is almost half of that of an average female (Titze, Reference Titze1992; Feinberg et al., Reference Feinberg, Jones, Little, Burt and Perrett2005a; Vieira et al., Reference Vieira, Gadenz and Cassol2015). The discernible dimorphism can be attributed to factors beyond mere discrepancies in physical dimensions between genders. It has been observed that men tend to have a lower vocal pitch in comparison to women and prepubescent children of both sexes when taking into account their respective height and body volume (Titze, Reference Titze2000). According to the literature, the emergence of sexual dimorphism in the human voice can be attributed to sexual selection, specifically through the process of female mate choice (Darwin, Reference Darwin1888; Collins, Reference Collins2000). The literature shows that the voice pitch is perceived to signal information about physical and psychological traits such as attractiveness (Collins, Reference Collins2000; Collins and Missing, Reference Collins and Missing2003; Feinberg et al., Reference Feinberg, Jones, Little, Burt and Perrett2005a, Reference Feinberg, Jones, DeBruine, Moore, Law Smith, Cornwell, Tiddeman, Boothroyd and Perrett2005b; Jones et al., Reference Jones, Feinberg, DeBruine, Little and Vukovic2008), social and physical dominance (Puts et al., Reference Puts, Hodges, Cárdenas and Gaulin2007; Tigue et al., Reference Tigue, Borak, O'Connor, Schandl and Feinberg2012; Rezlescu et al., Reference Rezlescu, Penton, Walsh, Tsujimura, Scott and Banissy2015; Schild et al., Reference Schild, Braunsdorf, Steffens, Pott and Stern2022), and reproductive capabilities (Feinberg et al., Reference Feinberg, Jones, DeBruine, Moore, Law Smith, Cornwell, Tiddeman, Boothroyd and Perrett2005b). Lower voices are perceived to signal masculinity, trustworthiness, competence, and strength (Feinberg et al., Reference Feinberg, Jones, DeBruine, Moore, Law Smith, Cornwell, Tiddeman, Boothroyd and Perrett2005b; Puts et al., Reference Puts, Hodges, Cárdenas and Gaulin2007; Feinberg et al., Reference Feinberg, DeBruine, Jones and Perrett2008; Jones et al., Reference Jones, Feinberg, DeBruine, Little and Vukovic2010; Klofstad, Reference Klofstad2016; Banai et al., Reference Banai, Banai and Bovan2017; Klofstad, Reference Klofstad2017; O'Connor and Barclay, Reference O'Connor and Barclay2017). Hence, in environments where these traits are desirable listeners are expected to exhibit a bias against higher-pitched speakers.
In the context of politics, a number of experiments demonstrate that candidates with lower voices receive higher votes and have a higher probability of winning elections (Anderson and Klofstad, Reference Anderson and Klofstad2012; Klofstad et al., Reference Klofstad, Anderson and Peters2012; Tigue et al., Reference Tigue, Borak, O'Connor, Schandl and Feinberg2012; Klofstad et al., Reference Klofstad, Anderson and Nowicki2015; Klofstad, Reference Klofstad2016).Footnote 7 Banai et al. (Reference Banai, Banai and Bovan2017) support these findings in an empirical study of 51 presidential elections around the world. Klofstad (Reference Klofstad2016) analyzes the 2012 US House Elections and provides an overall support with an exception that will be discussed below. Some studies also analyze how voice-pitch bias interacts with other variables. Laustsen et al. (Reference Laustsen, Petersen and Klofstad2015) find in survey experiments that voters with a more conservative stance display a higher voice-pitch bias than more liberal voters. Using empirical data as well as survey experiments, Klofstad (Reference Klofstad2016) shows that older, well-educated, and politically engaged voters are the most biased in favor of candidates with lower voices. In an experimental study, Klofstad et al. (Reference Klofstad, Anderson and Nowicki2015) establish candidate age as an important determinant of voter choice and shows voice pitch to have an effect on perception of candidate age. In an empirical study, Klofstad and Anderson (Reference Klofstad and Anderson2018) find no correlation between a politician's voice pitch and leadership ability.
The first main contribution of our paper is to this literature. First, we replicate the aforementioned finding that between two candidates who are identical in every aspect but their voice pitch, a voter is more likely to vote for the one with the lower voice.Footnote 8
More importantly, in a novel experimental design that has not been considered before, we differentiate the two candidates on the policy space and study how voice-pitch bias interacts with such differences. Specifically, we measure how much of a policy difference between the LP and HP candidates is sufficient to offset voice-pitch bias. As discussed in Section 1, this amount gives us an estimate of the voter's marginal rate of substitution between voice pitch and policy. It is what we refer to as persistence of voice-pitch bias.
Hypothesis 1: By proposing a more desirable policy than her (his) opponent, a candidate can offset the vote disadvantage due to voice-pitch bias.
Gender is one of the most prominent traits voters perceive at first sight. Hence, its effects on politics have been the subject of extensive research. The literature indicates that voters believe women politicians to be warmer, more compassionate, better able to handle education, family, and women's issues, more liberal, and feminist than men, whereas men politicians are seen as strong, intelligent, better suited to handle crime, defense, and foreign policy issues, and more conservative (see Johns and Shephard, Reference Johns and Shephard2007; Dolan, Reference Dolan2010 and the literature cited therein). These gender stereotypes affect voting behavior significantly. Koch (Reference Koch2000) analyzes data from the 1988–1992 Pooled Senate Election Study to show that even after candidates’ individuating ideological orientations are taken into account, candidate gender still exerts substantial effect on how voters perceive a candidate's ideological orientation. Eagly et al. (Reference Eagly, Diekman, Schneider and Kulesa2003) show that voters are more likely to vote for candidates who endorses a position typically favored more by their own gender. Dolan (Reference Dolan2010) examines how gender stereotypes shape voters’ support for women candidates in various electoral circumstances. Johns and Shephard (Reference Johns and Shephard2007) find that men voters are more inclined than women to see men candidates as stronger and to prioritize strength while voting. Bernhard (Reference Bernhard2023) finds that women candidates significantly benefit from being taller than their opponents while the benefit of height is not significant for men.
Mo (Reference Mo2015) analyzes how candidate quality and voter gender bias interact to determine candidate evaluation. Relatedly, Bauer (Reference Bauer2020) finds that voters hold women politicians to higher qualifying requirements than men. These higher standards make it more difficult for women candidates to acquire electoral support. Hence, women candidates on average are more qualified than their men counterparts. Furthermore, Fox and Lawless (Reference Fox and Lawless2004) uncover that, on average, women, even those with the highest levels of professional achievement, are less likely than men to consider running for political office.
In general, women candidates face gendered constraints when running for office, and are required to “double-bind” themselves by demonstrating the competence associated with masculinity and the tenderness associated with femininity (Bauer and Santia, Reference Bauer and Santia2022; Carpinella and Bauer, Reference Carpinella and Bauer2021). Relatedly, Schneider and Bos (Reference Schneider and Bos2014) find that women politicians do not possess the traits attributed to women (e.g., warm, empathetic), and they have no advantage in terms of female-stereotypical characteristics. Dietrich et al. (Reference Dietrich, Hayes and O'Brien2019) quantify lawmakers’ emotional intensity by analyzing minor voice-pitch fluctuations. They find that women display greater emotional intensity when talking about women related issues than about other topics, and compared to men colleagues. Using role congruity expectations as a framework, Boussalis et al. (Reference Boussalis, Coan, Holman and Müller2021) examine how candidate gender affects usage of facial, vocal, and textual communication in German federal election debates (2005–2017), as well as voters’ reaction to such communication. For example, they find that Angela Merkel expresses less anger than her male opponents, and that voters punish her for anger displays and reward her for happiness and general emotional displays. Carpinella and Bauer (Reference Carpinella and Bauer2021) demonstrate that women candidates tend to blend male verbal assertions with feminine images such as presence of family, schools, and hospitals.
Regarding the effect of voice pitch, Searles et al. (Reference Searles, Fowler, Ridout, Strach and Zuber2020) analyze relative effectiveness of man and woman voices in political advertising, in relation to the considered issues being masculine or feminine. Candidate and voter gender turns out to be important for voice-pitch bias as well. Klofstad (Reference Klofstad2016) analyzes 2012 data on the US House Elections and finds that when facing a woman opponent, a higher voice pitch increases votes. Anderson and Klofstad (Reference Anderson and Klofstad2012) find that when considering men candidates for feminine leadership roles, women voters do not respond to voice pitch (while men voters do).
The second main contribution of our paper is to the literature on the effect of gender on voter preferences, summarized above. We first test whether the amount of voice pitch bias depends on the candidate gender. More importantly, we also test whether candidate gender affects the marginal rate of substitution between voice pitch and policy. We hypothesize that the effect of voice pitch is more persistent in case of women candidates and hence, to offset voice-pitch bias an HP woman needs to offer a much more desirable policy than an HP man.
Hypothesis 2: Voters exhibit a more persistent voice-pitch bias when voting between women candidates than voting between men candidates.
The literature shows voice pitch to have a strong influence on the perception of characteristics related to social power, such as competence or social dominance (e.g., see Aung and Puts, Reference Aung and Puts2020). Individuals with lower voices are perceived to be more competent (Klofstad et al., Reference Klofstad, Anderson and Nowicki2015), more socially dominant (Gregory, Reference Gregory1994; Puts et al., Reference Puts, Hodges, Cárdenas and Gaulin2007; Ko et al., Reference Ko, Judd and Stapel2009; Jones et al., Reference Jones, Feinberg, DeBruine, Little and Vukovic2010; Wolff and Puts, Reference Wolff and Puts2010; Borkowska and Pawlowski, Reference Borkowska and Pawlowski2011; Tigue et al., Reference Tigue, Borak, O'Connor, Schandl and Feinberg2012; Klofstad et al., Reference Klofstad, Anderson and Nowicki2015; Laustsen et al., Reference Laustsen, Petersen and Klofstad2015), and have better leadership abilities (Nagel et al., Reference Nagel, Maurer and Reinemann2012; Klofstad et al., Reference Klofstad, Anderson and Nowicki2015). The effect of voice pitch on perceptions of trustworthiness is, on the other hand, gender-dependent. O'Connor and Barclay (Reference O'Connor and Barclay2017) and Klofstad et al. (Reference Klofstad, Anderson and Peters2012) find in two alternative contexts that lower-pitch women voices are perceived to be more trustworthy. For men voices on the other hand, O'Connor and Barclay (Reference O'Connor and Barclay2017) find that an HP induces more trust, while Tigue et al. (Reference Tigue, Borak, O'Connor, Schandl and Feinberg2012) obtain an opposite finding. These contrasting findings present an interesting puzzle for us to focus on.
Our paper also contributes to this discussion by measuring voters’ perceptions of competence and trustworthiness of both men and women candidates when they are recorded making a policy-neutral statement “Vote for me.” In case of trustworthiness, our paper also serves to bring further evidence to the contrasting findings in the earlier literature.
3. Materials and methods
3.1 Experimental stimuli
We recorded six native Turkish speakers—three women with an average age of 38 and three men with an average age of 40—making the following policy statements in Turkish: “Please vote for me” and “I will annually allocate X TL per person.” Pisanski et al. (Reference Pisanski, Groyecka-Bernard and Sorokowski2021) show that studies on voice pitch obtain comparable results over different types of recordings, such as a series of vowels, a single word, or a sentence as in our case. We recorded multiple speakers to reduce any individual-level effect of other vocal characteristics such as tone of the voice, rhythm, or tempo. The monetary amount X took six values, starting from 10, 000 Turkish Liras (TL) and decreasing by 200 TL at each step down to 9000 TL. Overall, we obtained seven recordings from each speaker.
We used monetary differences in per capita public spending to measure differences in policy. This is because monetary differences in per capita spending are easy to understand and their perception is uniform among participants (as opposed to e.g., differences in an abstract policy space). Also, they allow us to measure in objective units the persistence of voice-pitch bias (the trade-off between voice-pitch bias and policy differences).
As discussed in the previous section, we expect to corroborate the previous literature by finding that the LP candidate will receive significantly higher votes than the HP candidate. We then hypothesize that policy differences between candidates can first mitigate and then neutralize this voice-pitch bias, at which point the percentage of participants voting for the LP candidate will not be significantly different than 50 percent (see hypothesis 1). To measure how much of a policy difference is needed to neutralize the voice-pitch bias, we gradually made the policy declaration of the low-pitched candidate less desirable. After analyzing pre-test results, we determined these policy increments to be 200 TL.
To reduce any policy-specific effect on our findings, we chose two alternative policy dimensions. Our main objective in choosing education and health was to find valence issues where a higher public spending would be almost unanimously considered to be desirable. Figure A.2 in the Appendix shows that this is indeed the case for our participants.Footnote 9 Another reason for our choice of education and health was that the Turkish public is predominantly satisfied (dissatisfied) from government policies on health (education), hence allowing us to control for attitudes toward government.Footnote 10 This is indeed corroborated by our data, as displayed in Figure A.3 in the Appendix. Hence, any finding that holds for both dimensions cannot be attributed to the participants’ attitude toward government. Finally, in a post-experiment survey, we asked the participants whether they think it is men or women in elected office that are better at handling each issue. As can be seen in Table 1 in the Appendix, more than 75 percent of our participants express no preference and the remaining group is equally divided between preference for men and women. Hence, we believe it is appropriate for our study to consider education and health as gender-neutral issues.
To find a common monetary unit for both education and health policy statements, we consulted the The Organization for Economic Cooperation and Development education and health reports for Turkey where the yearly per capita spending is stated to be approximately 2400 TL for both dimensions. We then chose our maximal amount (10, 000 TL) to be significantly higher.
Voices were recorded as .mp4 files.Footnote 11 We inspected each audio file aurally and visually in Audacity (v.2.3.3).Footnote 12 Before converting the audio files into .wav format, we ensured that the recordings were without speech errors and background noise. We used the Get Pitch command in the Praat phonetic analysis program (Boersma and Weenink, Reference Boersma and Weenink2020, v.6.1.15) to determine the mean pitch of each recording. For unaltered women voices, the mean pitch is 239 Hz and the standard deviation is 14 Hz. The mean pitch for unaltered men voices is 134 Hz and the standard deviation is 12 Hz. To create a lower-pitched and higher-pitched version of each recording, we used the Pitch-Synchronous Overlap Add (PSOLA) method in Praat.Footnote 13 Following the literature (e.g., see, Jones et al., Reference Jones, Feinberg, DeBruine, Little and Vukovic2010; Klofstad and Anderson, Reference Klofstad and Anderson2018; Tigue et al., Reference Tigue, Borak, O'Connor, Schandl and Feinberg2012), we altered each recording by ±0.5 ERB.Footnote 14 Hence, each recording was converted into a pair of recordings, one with an HP and one with an LP with 1 ERB difference between the two. The ±0.5 ERB manipulation creates natural sounding voices and accounts for a perceivable shift of roughly ±20 Hz. Manipulating the recordings by ERB corrects for the logarithmic difference between actual fundamental frequency and perceived fundamental frequency. Therefore, it produces a constant perceivable gap between the raised and lowered versions of a recording, regardless of its initial fundamental frequency. As explained in the Appendix, this is confirmed for our experiment as well.
3.2 Procedure
The experiment was carried out online, using software Qualtrics. Results obtained from online voice-pitch experiments have been shown to be comparable to those of laboratory experiments (Feinberg et al., Reference Feinberg, DeBruine, Jones and Perrett2008). As summarized in panel (a) of Figure 1, the experimental conditions were assigned following a 2 × 2 factorial design (with equal probability): the participants were randomly assigned to listen to policies about either only education or only health and the candidates they were evaluating in-between were either always both men or women. Participants compared recordings of multiple speakers in order to minimize pseudoreplication bias, whereby the idiosyncratic characteristics of any one speaker might influence the results of the experiment (Machlis et al., Reference Machlis, Dodd and Fentress1985; Kroodsma, Reference Kroodsma1990). The experiment was not pre-registered.
In the first part of the experiment (Figure 1, first step of panel (a)), a participant listened to six individual recordings (obtained from three speakers) speaking the sentence “Please vote for me.” After listening to each recording, the participant rated the candidate in terms of trustworthiness and competence.Footnote 15
In the second part of the experiment (Figure 1, second step of panel (b)), a participant listened to 18 pairs of recordings obtained from three different speakers. After listening to each pair, the participant was asked to vote for one of the candidates (Figure A.1 in the Appendix presents a screenshot of a typical choice task). In the health policy treatment, each recording declared “I will annually allocate X TL per person for public health expenditures.” In each recording pair, both recordings were obtained from the same speaker but one was higher-pitched (HP candidate) and the other one, lower-pitched (LP candidate). While the HP recording always stated X = 10, 000 TL, the LP recording declared an X in between 9000 and 10, 000 TL with increments of 200 TL. The order of recording pairs, as well as, the order of recordings in each pair, were determined randomly to eliminate order effects.
We did not inform the participants about the gender of the candidates they are listening to. Hence, any differentiation between two participants evaluating candidates of different gender is solely based on these participants’ perceived gender norms regarding women and men, particularly about their voice pitch.
In the third and final part of the experiment, participants filled out a survey including questions on their birth year, sex, gender, education, family income, whether they voted in the last elections, political preferences, trust toward others, opinion on whether men or women politicians are better suited for public health/education issues, satisfaction level from public health/education services, and importance of government providing public health/education services. Participants were also asked if they had trouble listening to the recordings (Table 4 in Appendix) and the medium they used for listening (speakers versus earphones). Covariates on gender, age, ideology, income, turnout in the last general election, survey completion time, listening device (mobile phone versus computer), medium of listening (earphones versus speakers), and general trust show no significant differences across experimental groups (see Tables 3–5 in the Appendix).
Participants
Participants (N = 185) predominantly declared their gender as either woman (68 participants) or man (113 participants).Footnote 16, Footnote 17 Participants ranged in age from 19 to 29 (mean age of the participants were 22 with a standard deviation of 0.12). Students who were enrolled in the introductory courses of economics received an email link that gave them access to the survey in the Qualtrics platform. Participants received course credit in exchange for their participation. They came from a diverse range of majors, belonging to the faculties of engineering, management, and social sciences. Anonymity of the participants was respected throughout the study, and their identities were kept confidential.Footnote 18
Additional socioeconomic characteristics of our sample are as follows. Prior to our study, 93 percent of participants had voted in a real-life election. This is not significantly different than the 86 percent turnout rate in the 2018 Turkish elections, as published by the Turkish Statistical Institute. The monthly median family income in our sample is between 15, 000 TL (2170 USD) and 18, 000 TL (2600 USD), with a standard deviation of 10, 800 TL (1, 565 USD). The ideological distribution of our participants, another covariate that we are interested in, accumulates toward the mid-point, as seen in Table 1 (Appendix). Furthermore, as noted in Subsection 3.1, our participants consider a higher public spending on both health and education to be desirable (Figure A.2, Appendix). Additionally, they are predominantly satisfied with public health services while the satisfaction numbers are significantly lower for education (Figure A.3, Appendix). Tables 1 and 2 in the Appendix respectively present descriptive statistics for each experimental group and results of balance tests.
4. Results
4.1 Voice-pitch bias
We summarize the results of our experiment in Figures 2 and 3. For each level of policy difference (taking values from 0 to 1000 on the x-axis), we conducted separate linear regressions where the randomly assigned candidate gender is the independent variable. For each participant, the dependent variable is her average vote for the LP candidate, taken over the three choice tasks (recordings).
Figure 2 displays—for both men and women candidates—how the percentage of participants voting for an LP candidate changes in response to the policy difference between LP and HP candidates. For both LP men and LP women, a policy difference of no more than 1000 TL is sufficient to take this percentage down to a neighborhood of 50 percent, providing support for hypothesis 1. We hence conclude that by offering a more desirable policy than an LP opponent, an HP candidate can increase their vote shares and offset voice pitch bias. However, the amount of policy difference needed depends on the candidate gender, as discussed in the next paragraph.
Figure 2 shows that the percentage of participants voting for an LP man displays a sharp decrease to almost 50 percent as the policy difference between the candidates increases to 200 TL (p > 0.1).Footnote 19 That is, by offering 200 TL more public spending than the LP man, the HP man is able to offset the vote disadvantages of voice-pitch bias. This observation is significantly different from what we see in elections between women candidates. As can be seen in Figure 2, the percentage of participants voting for an LP woman remains significantly higher than 50 percent (p < 0.05) as the policy difference increases from 0 up to 800 TL. That is, even by offering an 800 TL more favorable policy than her opponent, an HP woman cannot offset the detrimental effect of voice-pitch bias on her votes. It is only when the policy difference reaches 1000 TL that an LP and an HP woman receive more or less the same number of votes (p > 0.1). Overall, a comparison of men and women candidates shows us that voters exhibit a more persistent voice-pitch bias when voting between women candidates than between men candidates. Hence, our findings support hypothesis 2.
Figure 2 also provides support for the earlier literature by showing that voters exhibit a significant voice-pitch bias: when there is no policy difference between the LP and HP candidates, the percentage of participants voting for the LP candidate is 64.50 percent, which is significantly higher than 50 percent (p < 0.01). We also analyze subsamples where all participants vote between either all men or all women candidates. We find the percentage of participants voting for an LP man to be 69.12 percent (p < 0.01), in comparison to 59.63 percent for an LP woman (p < 0.01). Hence, in both experimental groups the probabilities are significantly higher than 50 percent.
Finally, Figures 2 and 3 together show that the magnitude of voice-pitch bias responds to candidate gender. In Figure 3, when the horizontal axis is 0 TL (i.e., when both candidates offer the same policy) we see a significant (p < 0.05) Intention-To-Treat (hereafter, ITT) effect of 9.49 percentage points. This means that the difference in votes received by an LP man and an LP woman is 9.49 percentage points. Hence, we conclude that voters exhibit a higher voice-pitch bias when voting between men candidates than voting between women candidates. Figure 3 also shows that the effect is reversed in case of a 200 TL policy difference. That is, an LP man receives 12.49 percentage points less votes than an LP woman (p < 0.05). This is because an HP man offering 200 TL more than his LP opponent can overcome voice-pitch bias—decreasing his opponent's votes down to approximately 50 percent—while an HP woman cannot, hence demonstrating hypothesis 2.
4.2 Perceptions of trustworthiness and competence
Figure 4 displays the effect of a switch in voice pitch from HP to LP on perceptions of competence and trustworthiness. For candidates of each gender, we conducted separate linear regressions with the candidate voice pitch as a binary independent variable and, for each participant, her average competence (respectively trustworthiness) evaluation as a dependent variable.
Over all experimental groups, the effect on a participant's competence rating of a switch from an HP to an LP version of the same recording is 9.43 percentage points (p < 0.01). Hence, perceptions of competence provide a possible mechanism for the effect of voice pitch on voting behavior. In the two experimental groups—participants who only listened to women candidates versus participants who only listened to men candidates—the effect is 6.20 percentage points (p < 0.05) in case of men candidates and 12.84 percentage points (p < 0.01) in case of women candidates.
We next analyze another theoretical mediator, trustworthiness. Figure 4 shows that, over all experimental groups, the effect on a voter's trustworthiness rating of a switch from an HP to an LP version of the same recording is 5.13 percentage points (p < 0.05). However, for reasons that will be discussed next, trustworthiness perception seems to be a less likely mechanism underlying voice-pitch bias. Namely, when the two experimental groups—participants who only listened to women candidates versus participants who only listened to men candidates—are analyzed separately, the effect on trustworthiness ratings largely diminish in significance. In case of men candidates the effect is 4.34 percentage points and not significant (p > 0.1). In case of women candidates, the effect is slightly larger with 5.97 percentage points, and only significant with 90 percent confidence interval (CI) (p < 0.1).
4.3 Exploratory analysis: participant gender
In this section, we demonstrate that participant gender is an important factor in our findings on candidate gender that (i) voters exhibit a higher voice-pitch bias when evaluating men candidates but (ii) they exhibit a more persistent voice-pitch bias when evaluating women candidates. Figure 5 displays the effect of participant gender on voting behavior. We first look at the case when the policy difference between candidates is 0 TL. We find that men participants vote 18.3 percentage points more for an LP candidate in elections between men than women (p < 0.01). In contrast, women participants vote 3.7 percentage points less for an LP candidate in elections between men than women, though the difference is not significant (p > 0.1). Hence, we conclude that it is men participants that drive the overall finding that voice-pitch bias is higher for men candidates than for women candidates.
For comparison, in Figure 5 we also present the case when the policy difference between candidates is 200 TL.Footnote 20 Contrary to the previous case, we now find a similar treatment effect for both men and women participants. More specifically, women participants vote for an LP candidate 17.40 percentage points less in elections between men than women (p < 0.05). Similarly, men participants vote for an LP candidate 8.46 percentage points less in elections between men than women, though the difference is not significant (p > 0.1). Together with Figure 2, the above analysis suggests that for participants of both genders, a 200 TL difference between the candidates’ policies results in a significant decrease in the probability of voting for an LP man, providing a mechanism for hypothesis 2.
5. Conclusion
By and large, our findings corroborate the stated hypotheses as well as the earlier literature. Our participants exhibit both voice-pitch bias and persistence of voice-pitch bias (hypothesis 1). Furthermore, voice-pitch bias is higher in elections between men candidates while it is more persistent in elections between women candidates (hypothesis 2). We find that men participants are mainly responsible for gender dependence of voice-pitch bias while both men and women participants drive gender dependence of the persistence of voice-pitch bias. We also identify the effect of voice pitch on perceptions of competence and trustworthiness as an important mechanism for voice-pitch bias.
Our results are robust to the inclusion of pre-treatment covariates, including participant gender, age, income level, turnout, ideology, general trust toward others, as well as survey completion time, which is a proxy for participant attention, listening device, and medium of listening. Additionally, our findings are consistent across the two policy dimensions—health and education—that we considered. Since the Turkish public is predominantly satisfied (dissatisfied) from government policies on health (education), we conclude that our findings are not driven by participants’ attitude toward government.
While we find that candidates with a lower voice pitch are perceived to be both more competent and trustworthy, our results also display an interesting pattern. As can be seen in Figure 4, the effect of voice pitch on perception is higher for voters evaluating women candidates. This difference is particularly pronounced in competence ratings. Even though our participants overwhelmingly declare both education and healthcare to be gender-neutral issues, the earlier literature establishes them to be women-congruent (Dolan, Reference Dolan2014). In relation to this literature, our study then shows that, on women-congruent issues, even a vocal characteristic signaling masculinity—voice pitch—can have a more significant effect on perceptions regarding women than men.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/psrm.2023.51. To obtain replication material for this article, https://doi.org/10.7910/DVN/QQJGG6
Acknowledgments
We thank Fikret Adaman, Nejat Anbarcı, Abdurrahman Aydemir, Rachel Bernhard, Florian Foos, Sara Hobolt, Özge Kemahlıoğlu, Casey Klofstad, Korhan Kocak, Katharina Lawall, Mert Moral, and Emre Selçuk, as well as EPOP 2021, MPSA 2022, EPSA 2022, Oxford RAI Political Behaviour Workshop, and LSE PSPE participants for comments and suggestions. The paper also immensely benefited from the inputs of the editor and two reviewers.