Are Judicial Performance Evaluations Fair to Women and Minorities? A Cautionary Tale from Clark County, Nevada

Rebecca D. Gill; Sylvia R. Lazos; Mallory M. Waters

doi:10.1111/j.1540-5893.2011.00449.x

Are Judicial Performance Evaluations Fair to Women and Minorities? A Cautionary Tale from Clark County, Nevada

Published online by Cambridge University Press: 01 January 2024

Rebecca D. Gill ,

Sylvia R. Lazos and

Mallory M. Waters

Article contents

Abstract
Assessing Judicial Performance with Surveys
The Potential for Race and Gender Bias in the Survey Component of JPEs
Data and Methodology
Reversal Rates
Results
Discussion
Conclusion
Footnotes
References

Rights & Permissions

Abstract

Because voters rely on judicial performance evaluations when casting their ballots, policymakers should work diligently to compile valid, reliable, and unbiased information about our sitting judges. Although some claim that judicial performance evaluations are fair, the systematic research needed to establish such a proposition has not been done. By the use of attorney judicial performance survey data from Clark County, Nevada, this analysis shows that objective measures of judicial performance cannot explain away differences in scores based on race and sex. Minority judges and female judges score consistently and significantly lower than do their white and male counterparts, all other things being equal. These results are consistent with the hypothesis that judicial performance evaluation surveys may carry with them unexamined and unconscious gender/race biases. Future research must compare judicial performance evaluation structure, content, and execution across states in order to identify those evaluation mechanisms least susceptible to unconscious gender and race bias.

Type: Articles
Information: Law & Society Review , Volume 45 , Issue 3 , September 2011 , pp. 731 - 759

DOI: https://doi.org/10.1111/j.1540-5893.2011.00449.x [Opens in a new window]
Copyright: © 2011 Law and Society Association.

Ensuring that judges are accountable to the people is essential to maintaining the legitimacy of the judiciary. Accountability protects the independence of the judiciary (Reference O'ConnorO'Connor 2009). This helps shield the judicial branch from the low public confidence ratings that are common in the executive and legislative branches. For these reasons, various judiciary-oriented organizations have been championing judicial performance assessments as an important tool in maintaining the judiciary's accountability to the people (Reference LaubeLaube et al. 2007). These state judicial performance evaluation (JPE) programs are promoted for the pursuit of two goals: to provide information to the judges for the purposes of self-improvement and to provide information to the public about the performance of their judges.

State JPE programs should be “designed and administered in a way that does not inadvertently harm the principles they are intended to promote” (Reference EsterlingEsterling 1998: 207). More specifically, JPEs should evaluate fairly the performance of all judges, regardless of their gender, race, or other personal demographic attributes. This is especially important because voters are influenced by performance evaluations (Reference EsterlingEsterling 1998). In judicial elections, one of the biggest causes of low voter participation is a lack of relevant information about the candidates (Reference Bonneau and Hall9). Voters in most judicial elections get relatively little information about the candidates prior to the elections (Reference BaumBaum 2001).

In vigorously contested elections, citizens will learn about candidates through campaign advertisements. Voters in partisan elections see a voting cue in the form of party identification on their ballots. In retention elections, however, there is very little information available to voters besides the results of the JPEs (Reference EsterlingEsterling 1998). Exit polls of voters in four states using retention elections found that, “[a]mong those respondents who were familiar with the evaluation reports, most report that the evaluation information either determined or assisted their vote choice in retention elections” (Reference Esterling and SampsonEsterling & Sampson 1998:3). If the evaluation reports provided to voters are systematically biased based on gender or race, this will lead to less diversity on the bench as voters, in reaction to these evaluations, fail to retain women and minority judges.

The appraisal of judicial performance, like other types of performance evaluations, “is fraught with many potential pitfalls and problems” (Reference KearneyKearney 1999:483). Many state programs have struggled to create fair and workable programs (Reference PelanderPelander 1998). Nonetheless, evidence about the validity, reliability, and fairness of judicial evaluation programs is scarce (Reference KearneyKearney 1999). While many states have put in place judicial performance evaluation systems, none has conducted a systematic review to answer the basic question of whether these systems are valid, reliable, and fair.

This is a critically important issue at a time when the judiciary is still struggling to increase gender and racial representativeness. Women judges make up only a quarter of the federal bench and 32.4 percent of state supreme court judgeships, far below their representation in the legal profession (Commission on Women in the Profession 2009). As Supreme Court nominee Elena Kagan noted in an address about women in the legal field, “there's a problem here, and we need to figure out why it exists” (Reference KaganKagan 2006). Part of this problem may be related to role stereotyping. Early research on this topic showed that “[t]he image of the judicial office thus seems to affect negatively a female candidate's chances” (Reference HedlundHedlund et al. 1979: 520). Interest groups continue to push hard for the transition to and reliance on these JPEs in merit-based selection systems. While some groups advocate “for strong judicial performance evaluation (JPE) programs in every state” (Institute for the Advancement of the American Legal System 2010), they do so without the benefit of research to confirm that these programs will not contribute to systematic bias in the selection of our state judges.

Do judicial performance evaluations manifest unconscious bias through the stereotypes of the judicial role and of candidates for judgeships? This research is aimed at remedying this information gap. After a brief discussion of judicial performance evaluations in the American states, we present a theory as to why existing systems may be expected to yield biased results based on gender and race. Next, we present an analysis of judicial performance evaluations administered in Clark County, Nevada, from 1998 to 2008. Our results show that there is reason to question whether attorney polls reliably avoid problems of unconscious gender and race bias.

Assessing Judicial Performance with Surveys

Judicial performance evaluation is nothing new; polls of attorneys seeking opinions on judicial performance date as far back as the 1870s (Reference FeeneyFeeney 1987). In large part, the more recent movement toward state-mandated judicial performance reviews began in tandem with the shift toward merit plan judicial selection. These state-run evaluations arose as a reaction to the dearth of candidate information available to voters in retention elections (Reference PelanderPelander 1998).

In 1985, in anticipation of the growing need for accountability and independence of the judiciary, and with the first judicial performance evaluation program in Alaska nearly a decade old, the American Bar Association's (ABA) Special Committee on the Evaluation of Judicial Performance published guidelines intended to direct states toward the “best practices” in the implementation of JPEs (American Bar Association 1985). Although the primary goal was to increase the ability of judges to improve themselves through constructive criticism and feedback, a secondary goal of the ABA report was to provide voters with valuable information on the quality of judicial candidates who were facing retention elections. Essentially, the ABA attempted to provide a roadmap for states seeking to create JPE programs by highlighting key criteria on which judges could adequately and appropriately be evaluated.

Since the issuance of the ABA's guidelines, 23 states have created or are currently in the process of developing JPE programs for the purposes of voter information and awareness, judicial self-improvement, and judicial education (Reference RottmanRottman 2004). Among other things, the ABA report laid out five main categories of criteria on which judges should be evaluated: legal ability, integrity/impartiality, communication skills, professionalism/ temperament, and administrative capacity. The first column of Table 1 presents the subcategories from the ABA report.

Table 1. Comparing Questions between the ABA Guidelines, the LVRJ Survey, and Other State Surveys

^a Nevada's proposed state-sponsored JPE program would have been implemented if SJR-2 had passed in November, 2010 (see Reference ThomasThomas et al. 2009).

Among the states that have adopted some form of JPE program, nearly all utilize a survey as the key instrument to measure judicial quality (Reference PelanderPelander 1998). States go about the survey process in different ways. Surveys can be state-sponsored, bar association–run, and even privately sponsored. The newspaper-sponsored poll of the state bar that is the subject of analysis here, the Las Vegas Review Journal's (LVRJ) “Judging the Judges” survey,Footnote ¹ is an example of the latter type. All of the state JPE surveys exhibit variations in terms of the groups to which they are distributed, the number of questions and the survey's contribution to the overall judicial evaluation process. However, all hew closely to the ABA guidelines of 1985 in terms of both the categories of information surveyed and the survey questions themselves. All survey-based evaluation systems currently in use canvass attorneys. State-sponsored programs tend to be more ambitious, and many also survey jurors, litigants, and/or other constituent groups. The officials compiling the data for the final evaluation, however, report that the lawyer polls provide the most helpful information (Reference Esterling and SampsonEsterling & Sampson 1998).

The widespread use of surveys to assess judicial performance stands in stark contrast to the limited use of objective information. Even though scholars (Reference KearneyKearney 1999) and judges (Institute for the Advancement of the American Legal System 2008)Footnote ² agree that survey data should not stand alone as a determinant of judicial quality (Reference BrodyBrody 2000: 341), few states formally integrate objective data alongside the survey data. Objective measures such as discipline records, caseload evaluations, reversal rates, recusal rates, and completion of continuing judicial education requirements provide a good picture of judicial performance (American Bar Association 2005; Reference AndersenAndersen 2000). Unfortunately, states have found it difficult to collect and utilize these alternative measures of judicial performance (Reference ThomasThomas et al. 2009). This leaves most states relying very heavily or entirely on survey data alone.

The Potential for Race and Gender Bias in the Survey Component of JPEs

The limited use of objective data in evaluating judges is troubling because objective metrics could help counter two types of systemic errors that arise from the use of survey data. First, JPE surveys often contain methodological errors that can be traced back to lack of professional guidance and failure to incorporate statistical controls (Reference BrodyBrody 2000: 341).Footnote ³ Second, judicial performance surveys, like teacher evaluations (Reference Sprague and MassoniSprague & Massoni 2005) and other job performance assessments (Reference MartellMartell 1991), could be subject to unconscious gender and race bias. In this analysis, we focus on the possibility of the latter type of systemic error.

Judges have long expressed concern about the fairness of evaluation systems (Reference GriffinGriffin 1994; Institute for the Advancement of the American Legal System 2008). One main concern is the possibility that unconscious gender and race bias may color the recommendations of the judicial evaluation committees. Of the 22 states that have state-sponsored JPE systems in place, not one has undertaken the work of systematically reviewing judicial performance surveys for evidence of such unconscious gender and race bias. As other researchers have noted, “there is a self-evident need to acquire more factual data about the operation of gender in the judicial performance evaluation process” (Reference DurhamDurham 2000: 16). The same must be said for racial bias.

Conscious or explicit bias is the kind of prejudice that people openly embrace, but unconscious or implicit biases are automatic, unintentional, and often unexamined. This latter type of bias can cause people who believe they are neutral, and who espouse principles of race and gender equity, to nonetheless make erroneous judgments influenced by race, gender, or other stereotypes (Reference LaneLane et al. 2007). Even psychologists (Reference Steinpreis, Anders and RitzkeSteinpreis et al. 1999) and trial court judges (Reference RachlinskiRachlinski et al. 2008) have been found to exhibit this type of unconscious bias.

Within the discipline of psychology there has been extensive research establishing that stereotyping, as part of the normal unconscious cognitive processes of categorization, leads to inaccurate and unfair judgments of women and minorities (American Psychological Association 1991). A stereotype can be defined as a “set of attributes ascribed to a group and imputed to its individual members simply because they belong to that group” (Reference HeilmanHeilman 1983). Stereotypes act as cognitive shortcuts that form the bases for cognitive categorizations, leading us to generalize about individuals and circumstances. For our purposes, racial and gender stereotypes are negative generalizations based on culture and myth. Social norms rooted in the past justified racial and gender hierarchy based on “truths” about gender and race. These “truths” disadvantaged the minority status group and legitimized the dominant group's privileged status. Although we have moved to a more equitable society, these stereotypes linger and continue to affect how we perceive those in the minority group.

Recent research shows just how strongly stereotypes influence the actions of actors who believe themselves to be without prejudice (Reference Bertrand and MullainathanBertrand & Mullainathan 2004; Reference CorrellCorrell et al. 2007). Within the body of anti-discrimination law, the powerful effect of these unconscious biases in the evaluation process has long been recognized. For example, in Price Waterhouse v. Hopkins (1989), the U.S. Supreme Court found that the evaluation of a female candidate for partner in an accounting firm was likely driven by unconscious bias. Researchers have found that women's performance ratings in jobs that are sex-typed as male (e.g., police officers) suffer when they are evaluated in comparison with men's ratings, and vice versa (Reference HeilmanHeilman et al. 2004; Reference Nieva and GutekNieva & Gutek 1980).

Women judges perform work stereotypically considered to be “a man's job.” Women in male-typed jobs are not conforming to gender role expectations (Reference ValianValian 1998). This theory predicts that women judges will be at a disadvantage, failing to be recognized and rewarded when they are doing a good job. Certainly, some individual female judges will be able to defy stereotypes, but, by and large, most women are not able to pull off such a performance. Many women judges have to work harder and do more things right to get a score that would still be a low score for male judges.

Similar biases related to race-based stereotypes, especially in the context of the law-and-order world of judging, may drive inequities in retention scores based on race. Trial judges may exhibit unconscious race bias in their own work (Reference RachlinskiRachlinski et al. 2008), and stereotypes surrounding the racial characteristics of criminal defendants may exacerbate the racial disparities in evaluating minority judges. Social science research shows that, while the underlying racial stereotypes are different from those underlying gender bias, “the consequences of skewed racial distributions for the social psychology of stereotyping and outgroup bias are similar to those resulting from gender imbalance, as are the resulting barriers to career advancement” (Reference BielbyBielby 2000: 123). When a candidate's true qualifications are somewhat ambiguous, minority applicants are consistently given less support than their white counterparts (Reference Dovido and GaertnerDovido & Gaertner 2000). Minority evaluatees are judged more fairly when they compose at least 30 percent of the pool (Reference SackettSackett et al. 1991), but minority judges in the “Judging the Judges” survey pool represent just half of 1 percent (see Table 2).

Table 2. Descriptive Statistics and Difference of Means Tests by Gender and Race

N = 95 judges (33 female and 5 minority), weighted by observations for each judge in the data (total = 364)

^* The independent variable in the model has been weighted by the judges' bench means. There is no significant difference between these weighted scores based on gender.

^† This number is calculated by subtracting the mean for male judges from the mean for female judges. Where the sign is negative, the mean value for female judges is lower.

^‡ This number is calculated by subtracting the mean for white judges from the mean for minority judges, as above. Note that there are only 5 minority judges in the sample.

Many states have recognized the potential for unconscious bias—especially gender bias—in the judicial system. To date, 45 states have implemented some type of task force charged with evaluating the level of gender bias in state court systems (Reference Schafran and WiklerSchafran & Wikler 2001). These task forces are generally asked to determine the severity and type of bias. Many of them have found evidence of substantial gender bias in courtrooms, offices of court administration, and in the legal profession more generally (Reference Kearney and SellersKearney & Sellers 1996). Unfortunately, gender bias in the selection process is “still a problem” (Reference Schafran and WiklerSchafran & Wikler 2001: 34).

The most highly publicized debate about possible gender and race bias in judicial performance evaluations occurred in Missouri in 2007, when revisions were made to the attorney poll that had been administered by the Missouri Bar since 1948 (Missouri Bar Judicial Evaluation Survey Committee 2007). The revisions were motivated by a desire to improve the survey, and they reflected the Bar's hope that these improved judicial evaluations would have greater influence on voter behavior.Footnote ⁴ In 2004, the Missouri Bar revamped the survey, and the process resulted in a longer, more comprehensive questionnaire. The new survey instrument more closely conformed to the 1985 ABA guidelines.Footnote ⁵ Surprisingly, this more detailed survey resulted in greater scoring gaps by gender and race. By moving from a simple “yes/no” retention question to a more sophisticated survey structure, Missouri increased the evaluation disparities by gender and race. The new survey resulted in negative recommendations for two associate circuit court judges who were black females, each of whom had previously received “do retain” ratings (Reference LauckLauck 2006).

At the request of the Mound City Bar Association, Dr. Gary Burger undertook an analysis of Missouri's judicial performance evaluations for a 10-year period (Reference BurgerBurger 2007). The Burger analysis found that there was a significant difference in the Missouri attorney poll results according to gender and race. Female judges, as a group, were ranked much lower than male judges, and black judges ranked lower than white judges (Reference BurgerBurger 2007).Footnote ⁶ Of the four groups that could be formed by classifying judges by both gender and race, female African-American judges were rated significantly lower than white female judges, and African-American male judges were rated statistically significantly lower than Caucasian male judges (Reference BurgerBurger 2007).

Burger's study does not ascribe a reason for these significant differences in ratings. In fact, he specifically states that “[t]here are a variety of possible explanations for the differences documented in this study … [such as] bias on the part of the raters, real differences in performance, problems with the rating scales and process, or some combination of these factors” (Reference BurgerBurger 2007: 6). It is clear that a controlled analysis is needed in order to provide evidence as to which of these factors is driving score disparities in JPE surveys.

Data and Methodology

The purpose of this study is to remedy this research gap. We supplement attorney survey results with alternative measures of judicial quality in order to test the hypothesis that unconscious gender and race bias are influencing bar poll results. The dependent variable in our analysis comes from data from the Las Vegas Review-Journal's “Judging the Judges” biennial survey (1998–2008) of local attorneys rating Clark County judges and Nevada's Supreme Court justices. Until 1996, this survey was conducted jointly with the local bar association. Since 1998, the “Judging the Judges” survey has been conducted solely by the Las Vegas Review-Journal, a private newspaper, with the help of an external research firm. The LVRJ makes a redacted version of the survey results public on their Web site.Footnote ⁷

The surveys cover all of the judgeships that appear on the Clark County ballot. This includes Clark County trial judges and the justices sitting on the Nevada Supreme Court.Footnote ⁸ In all, there are 95 judges in this dataset, 33 of whom are female and 5 of whom are of minority racial status. Because many judges are rated in successive biennial surveys, there are a total of 364 observations. The eligible pool of respondents consists of all attorneys who hold a bar card in that jurisdiction (Reference WhitelyWhitely 2006). The number of respondents for each survey reaches as many as 800 lawyers. Each attorney is asked to affirm that he or she has had direct experience dealing with the sitting judge that he or she is rating (Reference HopkinsHopkins 2008), although no validation process is in place to ensure that attorneys limit their responses to only judges before whom they have appeared during the evaluation period.

Our data include the universe of judicial evaluations from these biennial surveys from 1998 to 2008. The survey includes the questions outlined in the second column of Table 1.Footnote ⁹ These questions address most of the categories recommended in the ABA's guidelines (American Bar Association 1985). The last question in the survey reads, “Taking everything into account, would you recommend retaining this judge on the bench?” The percentage of respondents answering “yes” to this question represents the retention score. This is the dependent variable in our analysis.Footnote ¹⁰ As Table 2 shows, the average retention score in these data is 73.84 of a possible 100.

Table 1 presents a comparison between the ABA guidelines, the “Judging the Judges” bar poll and the attorney surveys used in other states. There is remarkable uniformity regardless of which organization is sponsoring the survey. States generally hew to the guidelines issued by the ABA. The greatest variability among these is in the length of the surveys; for example, Colorado's survey of district attorneys is composed of 36 questions.Footnote ¹¹ Although many states use surveys that are significantly longer than the “Judging the Judges” bar poll, there is no evidence that longer surveys produce better evaluative results (Reference BurgerBurger 2007). This comparison shows that the results of this analysis of the LVRJ's “Judging the Judges” bar poll may be generalizable to state-sponsored programs elsewhere.

The independent variables of interest are gender and minority status. A dummy variable is included to indicate whether the judge is female, and another indicates whether the judge is of a minority race. Similar to the findings of the Burger study (Reference Burger2007), the retention score exhibits a consistent, statistically significant difference among judges based on both gender and race. Table 2 shows the frequency of votes in favor of retention broken down by gender and race. The first line of Table 2 shows that the mean for male judges is significantly higher—by more than 10 points—than for female judges. The mean difference between minority and nonminority judges is even bigger, at more than 12 points. This difference is not statistically significant, perhaps because there are only 5 minority judges in our sample. The distribution of scores for male judges is also skewed, with more high scores than would be predicted by the normal distribution (see Figure 1).

Figure 1. “Yes” votes for retention in LVRJ Bar Poll Survey by gender, 1998–2008.

It is important, of course, to rule out the possibility that this difference in ratings is attributable to female and minority judges being, on average, less qualified and less able on the bench. For this reason, a number of alternative indicators of judicial performance are included in the model. If there is a qualitative difference in the abilities of women judges versus male judges and minority versus nonminority judges, then the gap in scores represents a true difference in judicial performance. In that case, the observed differences would be attributed not to bias, but to true differences in these alternative measures of judicial quality. In order to assess the relative quality of the judges based on gender and race, we included proxy measures for the quality of the judges in the study; specifically, we include weighted reversal rates for each judge, whether a judge was initially appointed to the bench, quality of legal education, experience on the bench, and ethical record. The descriptive statistics for these variables are included in Table 2, along with the results of difference of means tests by judge gender and minority status.

Reversal Rates

One widely used source of objective information about legal ability is the reversal rate (Reference BrodyBrody 2000; Reference PosnerPosner 2000). When a judge has a low reversal rate, this is an indication that the judge is interpreting the law correctly in the eyes of the appellate court (Reference FeeneyFeeney 1987). For this study, we gathered reversal rate figures using a Lexis–Nexis search for each judge spanning the years 1998–2009.Footnote ¹² Table 2 shows that there is no significant difference in reversal rate by judge, gender, or race. Because our sample includes judges from across the spectrum of jurisdictions, the measure of reversal rate included in the model is calculated as the distance between an individual judge's reversal rate and the mean for his or her court. The mean value of this variable is zero, and this represents the mean reversal rate for the level of court. In our dataset, the reversal rate distance ranges from −0.35 (which is a lower or “better” reversal rate than the court mean) to 0.13.

We must entertain the possibility that the reversal rate itself is partially driven by race and gender bias. Attorney prejudice or bias as to the reversibility of the judge's decision may influence the attorney's decision as to whether there is a better than average probability that the judge will be reversed on appeal. If attorneys have a low opinion of a female or minority judges' legal abilities, reflecting the attorneys' own biases, then female and minority judges' rulings may be more often subject to appeal. The higher frequency of requests for appellate review may result in higher reversal rates, regardless of whether the individual judge is competent. As Walker and Barrow observe, “if practicing attorneys perceive [female and black judges] to be vulnerable to reversal on appeal, then a pattern of relatively high appeal rates for nontraditional judges should occur” (Reference Walker and Barrow1985: 612).

To test this possibility, difference of means tests were conducted on the both the reversal rate distance and the appeal rate distance. The appeal rate distance is calculated the same way as the reversal rate distance, but using the number of times a judge was appealed as compared with the mean of the relevant court. Because this measure necessarily uses a raw number as opposed to a rate, it is weighted by the number of evaluation years we have for each judge. The difference of means tests reveal no significant bias effects in either of these variables.Footnote ¹³ This is in keeping with findings of previous research on diversity on the federal bench (Reference Walker and BarrowWalker & Barrow 1985).

Initially Appointed to the Bench

Another proxy measure for judicial quality is whether or not the judge was first appointed to the bench. Nevada's constitution provides that the justices and judges who serve in the courts of Nevada shall be elected (Nev. Const., Art. VI sec. 2, 3A). Unlike most other states with competitive judicial elections, the Nevada constitution stipulates that judicial vacancies be filled by the governor from a roster of recommended nominees selected by a committee of citizens, which is largely made up of lawyers.Footnote ¹⁴ So, in the case of vacancies only, Nevada employs a merit selection plan of sorts (Reference Flango and DucatFlango & Ducat 1979). Empirical studies have found that having been appointed at first instance confers benefits on judges; most notably, they gain incumbency status without first having to stand for election (Reference GlickGlick 1978). This suggests that, from the voter's perspective, judges who are appointed in the first instance have desirable qualities that make them worthy of being judges.

The evidence is mixed as to whether judges selected through a merit plan are of higher quality than elected judges. In fact, there is empirical work that concludes that appointed judges are not qualitatively superior to elected judges (Reference Bonneau and HallBonneau & Hall 2009; Reference ChoiChoi et al. 2007; Reference Glick and EmmertGlick & Emmert 1986). Regardless of the empirical evidence, attorney respondents may believe that appointed judges are more highly qualified. First, attorneys tend to support merit plans as the best method of selecting judges (Reference O'ConnorO'Connor 2009). This may be because the state bar plays such a critical role in the selection process. Lawyers usually dominate the merit selection committees that make recommendations to governors, and state bars have great influence on who gets to sit on these committees. Recommended nominees already have close connections with the state bar and would be highly regarded by members of the bar. In other words, because the bar has a hand in choosing merit-selected judges, attorneys may be inclined to see these judges as more highly qualified than those selected by popular election.

Second, there is an argument that in nonpartisan elective systems like Nevada's, judges initially appointed to the bench may indeed be of higher quality than elected judges. Elected judges in nonpartisan systems face no vetting process prior to taking their place on the ballot. In Nevada, the only constitutional and statutory requirements to run for some judgeships are residency, minimum age, and collecting sufficient signatures on a petition.Footnote ¹⁵ The absence of a vetting process (which, in partisan elections, is often provided through the political party system) may indeed lead to lower-quality candidates.

In this study, we include a dummy variable to indicate whether the judge was appointed to the bench on first instance. Table 2 shows that there is no significant difference between men and women in terms of appointment rates. The mean indicates that 42 percent of the judges in this sample were appointed on first instance. In our sample, all five of the judges with minority race status were first appointed to the bench.

Legal Education

The nature of a judge's legal education is another proxy for judicial quality, “though like most proxies a rough one” (Reference PosnerPosner 2005). The prestige of a judge's legal education carries with it certain implications for the judge's perceived level of legal ability (Reference GlickGlick 1978).Footnote ¹⁶ Research on the American Bar Association's ratings of federal judicial nominees indicates that law school prestige is an important covariate of the ABA's quality scores (Reference SlotnickSlotnick 1983). For this analysis, we use a variable representing the current tier ranking of the judge's law school education as per the 2010 rankings from U.S. News & World Reports.Footnote ¹⁷ Alumni information was gathered from various online sources, including official judge biographical blurbs, résumés, and newspaper articles.

Of the 95 judges in this dataset, 36 came from the very best schools and 34 came from second-tier schools. On the lower-prestige end, 6 judges came from tier-three schools, 16 from tier-four schools, and three did not graduate from law school at all. Table 2 shows that the average level of law school prestige is a second-tier school. The prestige of school attended by the female judges is, on average, about three-quarters of a tier lower. There is no significant difference between minority and non-minority judges, but it is interesting to note that minority judges have a mean that is about one-fifth of a tier higher than their non-minority peers.

Judicial Experience

Previous judicial experience is a time-tested measure of judicial quality, particularly with respect to the federal bench (Reference EpsteinEpstein et al. 2003). In the realm of judicial quality, it stands to reason that a judge who has served longer has had more opportunity to learn and master the trade of judging (Reference HaireHaire 2001). Research on federal court judges suggests that recently appointed judges experience acclimation effects as they familiarize themselves with the job of judging (Reference Brenner and HagleBrenner & Hagle 1996; Reference HettingerHettinger et al. 2003). Of course, it is possible that the longer one stays on the bench, the more likely it is that the respondent attorneys will see them as “old-timers,” or otherwise past their prime. This is certainly part of the argument for imposing a mandatory retirement age on judges.

As with the information about law schools, judicial experience information was gathered from publicly available sources. Judges in our sample ranged from 0 to 30 years on the bench. The average tenure of judges in our sample is nearly 8 years. Female and minority judges had slightly shorter tenures, on average, but this difference is not significantly significant (see Table 2).

Ethical Record

Disciplinary complaints are an important signal about a judge's performance (Reference PosnerPosner 2005), and may be the single most important objective measure of judicial integrity available to researchers. To be sure, some lawyers may be unwilling to file formal complaints for fear of future retribution (Reference JacksonJackson 2007); however, filing a disciplinary complaint is also relatively easy. We have elected to code a disciplinary record variable to reflect both whether a complaint was filed and the outcome of the disciplinary complaint, rather than just the filing of the complaint. These outcomes were ranked on an ordinal scale of 0 to 8.Footnote ¹⁸

We have also included a separate variable that reflects reported scandals and accusations of impropriety. Not all scandals and press reports alleging that a judge has behaved improperly rise to the level of disciplinary complaints. However, some respondents may follow the philosophy of “where there is smoke, there is fire.” Even the mere suggestion of impropriety may drive attorney opinion against a judge who is the subject of public reports of misconduct. Alternatively, respondents may be swayed by a judge's connection to public scandal as, at the very least, unbecoming of a member of the judiciary. Attorneys without recent, first-hand experience with the judge may be particularly susceptible to being swayed by this sort of information.Footnote ¹⁹

For this reason, we have included a dummy variable to flag those judges who have been publicly connected—rightly or wrongly—to scandal. This information comes from publicly available news sources, mostly from the pages of the Review-Journal itself. In our study, a full 21 percent of the judges are involved in a reported scandal (see Table 2). This high number may reflect Clark County's reputation for being one of America's most preeminent “judicial hell-holes” (American Tort Reform Foundation 2008). Alternatively, it may reflect the fact that the media in Las Vegas is particularly vigilant in reporting the actual and alleged missteps of elected judges. As Table 2 demonstrates, there is no significant difference based on gender or race on this measure.

Methodology

An important consideration in an analysis of this sort is the relative weight given to the various judge ratings in the LVRJ poll. While the response rates of this poll often fall below the generally accepted level of 50 percent respondent solicitations (Reference BrodyBrody 2000), the assessment of response rate is actually quite a bit more complicated than it seems. In the LVRJ methodology, like many of the other state surveys, the respondents are instructed to answer questions only if they have professional experience with the judge. In 2008, for example, researchers sent out 4,237 survey invitations, but only 799 attorneys responded (Reference DowneyDowney 2008). This yields an incredibly low response rate of just 18.9 percent. Beyond this, none of the responding attorneys had experience with all of the 68 judges evaluated in the 2008 survey. For the 2008 results, an average of 202 attorneys evaluated each judge, with a range between 44 and 387. Given the limitations on the information about which attorneys actually had professional experience with each judge during the evaluation period (Reference ThomasThomas et al. 2009), it is impossible to calculate a “true” response rate of qualified attorneys. Even still, the results calculated with the input of fewer attorneys will be less reliable. As such, a weighted model of retention scores is used here.

Our dataset includes biennial data for Clark County judges and justices of the Nevada Supreme Court from 1998–2008. Of course, most of the judges in the dataset served on the bench for only some of this time. This means that we have an unbalanced panel dataset. To analyze these data, we estimate a pooled weighted least-squares (WLS) model with Reference Driscoll and KraayDriscoll and Kraay (1998) standard errors. This adaptation of the Reference Beck and KatzBeck and Katz (1995) approach allows for the regression to be adapted on the basis of a robust estimate of the error structure of the model (Reference HoechleHoechle 2007).Footnote ²⁰ As we have several biennial estimates for most judges, we cannot assume that these observations are independent of one another. This model allows the standard errors to be calculated with this systematic dependence in mind.

Results

The results of our analysis are presented in Table 3. None of the objective measures of judicial quality and judicial performance that we utilize in our model mitigate the difference in scores based on race and sex. There remains a large, unexplained gap in the ratings of female and minority judges and their male and nonminority counterparts, all other measures of judicial quality being equal. This is demonstrated by the large, statistically significant coefficients on the race and gender variables in the model. These results are consistent with the hypothesis that judicial performance evaluation surveys may carry with them unexamined and unconscious gender and race biases (Reference DurhamDurham 2000).

Table 3. Pooled OLS Model Footnote ^* of LVRJ “Judging the Judges” Retention Scores

^* Pooled ordinary least-squares regression with Reference Driscoll and KraayDriscoll & Kraay (1998) standard errors. N(obs) = 364; N(judges) = 95; F(8,93) = 1980.93^***; root mean squared error = 316.73.

In the model, only two of the control variables fail to help explain variation in retention recommendation rates. The prestige of law school education is insignificant, as is the reversal rate distance from the court mean. Part of the reason may be that, as compared with measures of integrity, legal knowledge questions make up a small fraction of the questions on the LVRJ survey. Because the insignificance of these legal knowledge variables is interesting in its own right, it will be addressed separately in the discussion section below.

The rest of the variables are statistically significant in the model. The intercept for this model is about 89 points, which is the estimated starting place for favorable attorney retention ratings when all other variables are held constant. Judges who are appointed at first instance get a boost in scores of 1.76 points. Substantively, this is not of a particularly high magnitude, and it may reflect the legal profession's general preference for an appointment system of judicial selection (Standing Committee on Judicial Independence 2000). In 2010, 68 percent of responding Clark County lawyers favored a move to a merit plan for judicial selection (Reference McMurdoMcMurdo 2010). Indeed, those appointed on first instance are able to achieve incumbency without first having to withstand the rigors of a contested election, and this may also help to protect their image among attorneys.

The number of years of experience on the bench is significant and negative. For each additional year of judicial tenure, the judge loses more than a third of a point. Lawyer respondents tended to disfavor retention at a higher rate for judges with more years accumulated on the bench.Footnote ²¹ The integrity measures are also statistically significant. As a judge's disciplinary result moves one step up the scale of severity, the judge can expect to lose about two points. When the judge is connected with a public scandal, he or she can expect a loss of 5.65 points.

Of course, the independent variables of interest are both statistically significant and of very high magnitude. When all of these control variables are equal, we can expect female judges to score 11.27 points lower than their male colleagues. Minority judges score more than 14 points lower than nonminority judges. Even after controlling for the alternative measures of judicial quality, these judges' demographic variables still account for much of the variation in retention scores.

Discussion

The results of this analysis of the LVRJ “Judging the Judges” bar poll shows that the gender and race disparity in retention scores is not driven by qualitative differences in judicial performance. If real performance differences were responsible for the gender- and race-based differences in scores, much of this disparity would fall away with the inclusion in our model of the objective measures of performance quality. This, however, is not the case. While many of the measures were significant, they failed to cancel out the gender and race disparities.

Interestingly, two of the more theoretically compelling measures of judicial quality were completely insignificant in our model. First, we find that the prestige of a judge's legal education is not related to the retention scores in the “Judging the Judges” surveys. This contradicts previous research about judicial quality (Reference GlickGlick 1978; Reference SlotnickSlotnick 1983). One explanation is that the key skills that judges need to be good judges are not necessarily learned in law school. It may be that the attorney respondents have a more sanguine assessment of the usefulness of traditional legal education to the art of judging. Law school education has been widely criticized for failing to address the tools needed by attorneys and judges (Reference SternlightSternlight 1996). While law schools do a good job in teaching legal theory, they do a relatively poor job training students in key practical skills that they would need in legal practice (American Bar Association 1992). So, perhaps the reason why legal practitioners find law school pedigree to be noninfluential in their ratings is that standard legal education does not teach the skills that are most relevant to being a good judge. Indeed, “[d]espite the fact that law school graduation is the path to judgedom, American law schools devote not one minute of education to preparing law graduates to enter the world of judging” (Reference CordellCordell 2008:639). Even still, this does not explain why previous research on state and federal judges has found strong ties between the prestige of a judge's legal education and his or her subsequent quality ratings (Reference GlickGlick 1978; Reference SlotnickSlotnick 1983).

Reversal rates were also insignificant in our model of retention scores. Judges who make more “correct” decisions are not rewarded for it by the respondent attorneys in these surveys. In theory, reversal rates are among the most relevant and quantifiable objective measures we have. That they are not related to retention scores in the “Judging the Judges” survey shows that Clark County attorneys do not place much weight on a judge's reversal rates. This result may reflect the particularities of Nevada. It is one of only 10 states without an intermediate appellate court. Of these states,Footnote ²² Nevada is by far the most populous. Because all appeals are heard by the Nevada Supreme Court, the chances of a case being heard on appeal are relatively low. Nevada's 77 trial courts dispose of close to a half million nontraffic cases on a yearly basis (Reference TitusTitus et al. 2009). In 2009, the number of cases that were filed for appeal and were granted was 1,759, about 0.4 percent of all cases (Reference TitusTitus et al. 2009). A recent Department of Justice analysis of 48 large counties showed an appeal rate of nearly 15 percent in civil cases alone (Reference CohenCohen 2006). Rates of criminal appeals tend to be higher still, with as many as one in five criminal cases resulting in an appeal (Reference BaumBaum 2001). Because cases in Nevada have a lower probability of being appealed in the first place, Nevada's lower court judges have a lower overall probability of being reversed than do trial court judges in other jurisdictions, all other things being equal.

Those alternative measures of judicial quality that are significant in the model also present some interesting stories. The two measures of ethical integrity—the presence of a scandal and the outcome of disciplinary complaints—have a high magnitude of impact on attorney ratings. These results, taken together, reflect the importance of perceived judicial integrity to attorney ratings of judges. There exists a long-standing ethic of judicial neutrality and integrity in the United States, and there is an expectation that judges will act in an unbiased, fair and impartial manner (Committee on Judicial Independence 2006; Reference BrodyBrody 2008). For judges to be able to administer justice, they must be perceived as neutral parties who can render judgment without bias. The preamble of the Model Code of Judicial Conduct states that a judge “must respect and honor the judicial office as a public trust and strive to maintain and enhance confidence in the legal system” (American Bar Association 2007:1). In the recent case of Caperton v. Massey (2009), the U.S. Supreme Court found that the appearance of improper ethics rises to the level of a constitutional due process violation.

Additionally, reported improprieties are influential on attorney ratings above and beyond the actual adjudication of an ethical violation. This may reflect the fact that press information about alleged scandals is more current and plentiful. Because of the wide reach of media reports, these reported scandals are likely to be more salient to attorneys evaluating a judge's judicial ethics. Although there is a significant amount of overlap between disciplinary actions and media reports, successful formal complaints against a judge will underreport whether a given judge lacks integrity. Press reports will be censored only in terms of whether the newspaper deems the information newsworthy, and will often be reported in more memorable and sensational terms. By the time a judge is disciplined, which generally happens about a year from when the complaint was filed, the judge's improprieties are “old news” and the reputational market among attorneys would have already accounted for the misdeed.

The main story here, however, is the performance of the independent variables of interest: race and gender. These variables are both statistically significant and of very high magnitude. When all of the alternative measures of judicial quality are equal, we can expect female judges to score 11 points lower than their male colleagues, and minority judges score 14 points lower than nonminority judges. These differences, then, are not due to a qualitative difference in the abilities of women and minority judges, at least not in the alternative measures included here.

The hypothesis that the difference in means based on race and gender in this attorney poll is due to differences in talents and competence is not a convincing explanation for the systematically lower retention scores of female and minority judges. Even when these judges are on par with their peers in terms of experience, education, and integrity, their scores are still drastically lower than those of their white male counterparts. Of course, one could still make a case that our study has not entirely eliminated the possibility that the gap in ratings is due to true differences in talents and competency. We concede that there are alternative objective measures of judge quality not included in our model (e.g., Reference ChoiChoi et al. 2009). Unfortunately, it is exceedingly difficult to obtain additional quality data reliably in state judiciaries. This issue is an appropriate subject for further inquiry and innovation.

Nonetheless, a significant finding of this study is how little effect our alternative measures of judicial quality have on attorney assessments of a judicial performance in the survey of Clark County attorneys. If it were true that the gap is a result of real qualitative performance differences, we would expect the objective measures in our model to explain much of the variance in retention scores. This is not what we find. The control variables, which represent this alternative hypothesis, do not overtake in magnitude the effects of gender and race in the model. This result illustrates the need to examine more closely the content and structure of existing JPE attorney surveys, as well as the central role of these surveys in determining committee retention recommendations in state-sponsored JPEs. While the differences between state surveys may mitigate some of the bias found in the “Judging the Judges” survey, empirical evidence must be amassed to support this conclusion.

If we reject the hypothesis that the difference in ratings is due to an inferior pool of talent among women and minority judges, the alternative explanation that bias may be influencing results must be strongly considered. Certainly, the few attorney comments publicly reported from the “Judging the Judges” poll show that stereotyping is at play in this bar poll. The Las Vegas Review-Journal itself acknowledged the possibility that respondents might be evaluating women judges more harshly when it reported one attorney commenting that “[t]he female judges [in Family Court] are too emotional and biased, and they try too hard to make everyone happy” (Reference WhitelyWhitely 2006). This attorney is likely being influenced by stereotypes about women, namely that women are too emotional and nonconfrontational to be competent judges.

Women judges at every level report that their authority in the courtroom is challenged in ways that do not mirror what happens to their male colleagues. Should they fail to take control of their courtroom, they are perceived as weak and indecisive—or even incompetent. If they assert their authority, though, they will be labeled as discourteous or unduly punitive. In either case, their evaluations by members of the bar are negatively affected. These concerns are heightened further for minority women judges, who have faced even more intense scrutiny and challenges to their competence (Reference BurgerBurger 2007).Footnote ²³

There are reasons to think that the results of the “Judging the Judges” bar poll are not entirely aberrational. As Table 1 shows, judicial survey instruments used across the country are variations of the same ABA blueprint. Accordingly, the “Judging the Judges” survey closely tracks the bar surveys of other jurisdictions. An important area of further inquiry is to determine whether attorney surveys in other jurisdictions reflect the same unexplained gender and race biases. It is possible that the magnitude of unexplained bias is greater or lesser in other jurisdictions. It will be critical to identify these differences in comparative perspective so that we can isolate the characteristics of these programs that tend to minimize or exacerbate gender and race bias.

While unconscious bias cannot be entirely eliminated, questions could be asked in a way that minimizes the problem. These alternative formulations can be identified through careful comparative research. Experimental research may also help to develop a more comprehensive set of best practice recommendations for attorney survey instruments. Research shows that behaviorally oriented questions, which solicit information based on experiences, behaviors, and skills, may yield different responses than do attitudinal questions, which tend to be less specific (Reference DillmanDillman 2000; Reference Schaeffer and PresserSchaeffer & Presser 2003).Footnote ²⁴ As Missouri's experience illustrates, however, question wording is not a silver bullet for eliminating the effects of unconscious bias (Reference BurgerBurger 2007). By comparing the gender and race gaps in differently structured surveys, we can determine the degree to which various alternative constructions may mitigate the problem of unconscious bias in judicial performance evaluations.

Conclusion

The findings of possible unconscious bias in the attorney bar poll raise significant questions about whether JPEs, as they are currently instituted, are fair. Certainly, the evidence is not good for the LVRJ's privately run JPE. A poll that consistently scores women and minority judges lower, without regard to other observable measures of judicial quality, has an undesirable impact on democracy. Women and minority candidates will be more vulnerable to electoral challenges.

As there are increasing calls for reliance on JPEs as a way of ensuring quality standards, it is imperative that these processes not reproduce—even inadvertently—a system that disfavors groups like women and minorities, who have been historically underrepresented in the judiciary. Retention election outcomes have been linked to performance scores on judicial evaluations. Brody finds a significant linear relationship between JPE scores and voting outcomes, such that the higher the score a judge received on the JPE, the higher the voting percentage in favor of retention (Reference BrodyBrody 2008). While similar research has yet to be completed on the specific electoral effects of the LVRJ “Judging the Judges” poll, it is reasonable to expect that a systematic downward bias in scores would harm the chances of women and minorities at the polls.

Unfair and biased evaluations do not only harm the individuals subject to them, but would have far-reaching and deleterious effects on the judiciary as an institution. Voters who have access to state-sponsored judicial performance evaluations rely on that information when casting their votes (Reference EsterlingEsterling 1998). If women and minorities are unfairly rated lower, this disparity will reproduce itself in the judicial election returns, compromising the diversity of our state benches. For this reason, it is fundamentally important for the democratic process of judicial elections that JPEs refrain from providing biased, uneven, or unfair ratings of judicial candidates.

Footnotes

For helpful comments on previous versions of this article, the authors wish to thank Chris W. Bonneau, Eric Waltenburg, Udi Sommer, and Malia Reddick.

^1. Information about the “Judging the Judges” survey can be found on the Las Vegas Review-Journal Web site: http://www.lvrj.com/hottopics/in_depth/judges/.

^2. According to the Institute for the Advancement of the American Legal System's 2008 survey of sitting Colorado judges, close to three-quarters believed that objective data, such as case management data, should be part of Colorado's JPE program.

^3. Judges themselves are concerned with the possibility that JPEs are based on unreliable survey data. According to the 2008 IAALS survey of sitting Colorado judges, a majority of judges perceived that low response rates were a “major problem” in the survey component of JPEs (Institute for the Advancement of the American Legal System 2008).

^4. In 1990, according to the Missouri Bar Report, “it became apparent that the election results did not mirror the legal community's perception of the judges' performance, raising a concern that the public did not have sufficient knowledge and understanding of the judges' qualifications” (Missouri Bar Judicial Evaluation Survey Committee 2007).

^5. The ratings categories increased from 5 to 16. Attorneys were asked to detail whether a judge gave reasons for ruling, engaged in ex parte communications, made correct decisions based on law and facts, had settlement skills, and demonstrated impartiality, decisiveness, courteousness, and fairness (Reference BurgerBurger 2007).

^6. In the Burger study, male judges were rated significantly higher (85.84 percent) than were female judges (76.69 percent). Caucasians were rated significantly higher (85.44 percent) than were African Americans (75.19 percent). This ratings gap was present in all three courts studied.

^7. The redacted form of the survey results from 2000–2008 can be found at the LVRJ's Web site: http://www.lvrj.com/hottopics/in_depth/judges. This version reports aggregate results only. Attorney comments are also redacted, although the newspaper has selectively included these attorney comments in their reporting.

^8. Nevada currently has no intermediate appellate court.

^9. Between the 2000 and 2002 survey, the questionnaire was shortened from 15 questions to 13 questions. The new, shorter version did not omit questions, but instead collapsed some of the questions.

^10. In some of the survey reports, an “adequacy” score was also provided. The questions on the survey, with the exception of the retention question, asked attorneys to rate judges on a three-point scale: not adequate, adequate, or more than adequate. The adequacy score was calculated by taking the percentage of “adequate” or “more than adequate” ratings the judges received. We ran our model with this adequacy score as the dependent variable, and the results were quite similar to what is presented here.

^11. Colorado's survey does not present new questions or categories in terms of concepts. Essentially, Colorado is not asking new questions, but simply fracturing existing ones (Colorado Office of Judicial Performance Evaluation 2008).

^12. This yields a single reversal rate for each judge in the database, and this rate spans his or her time as a judge in Nevada. For example, a supreme court judge's reversal rate includes reversals from decisions made in his or her previous judicial posts.

^13. The results of the difference of means tests for female judges (on reversal rate, F = 0.487; on appeal rate, F = .515) and for minority judges (on reversal rate, F = 1.003; on appeal rate, F = 0.089) yielded no statistically significant differences in either reversal rates or appeal rates.

^14. Nev. Const. Art. VI, Sec. 20(1). In most states, vacancies are filled by gubernatorial appointment as opposed to a merit-style plan.

^15. Nev. Rev. Stat Sec. 5.020. Additional statutory requirements for justices of the peace include a high-school diploma and, in some jurisdictions, membership in the state bar association (Nev. Rev. Stat Sec. 4.010). Additional statutory requirements for District Court judges include a minimum of 10 years as a member of the bar in an American state or the District of Columbia, 2 years of which must have been in Nevada (Nev. Rev. Stat Sec. 3.060). Additional statutory requirements for the justices of the Nevada Supreme Court include a minimum of 15 years as a member of the bar in an American state or the District of Columbia, 2 years of which must have been in Nevada (Nev. Rev. Stat Sec. 2.020).

^16. Reference GlickGlick (1978) also suggests that the in-state/out-of-state law school distinction is important. In Nevada, this distinction is less helpful. Nevada's first law school, the William S. Boyd School of Law at the University of Nevada Las Vegas, did not graduate its first class until 2001.

^17. These rankings can be found online at http://grad-schools.usnews.rankingsandreviews.com.

^18. The scale for this ordinal variable is as follows: 0 = no complaint; 1 = complaint was filed but dismissed; 2 = required course; 3 = required course and public apology; 4 = public reprimand; 5 = public reprimand and fine; 6 = censure, required course, and fine; 7 = removal from bench; 8 = removal from bench and permanently barred from holding public office in Nevada.

^19. The Nevada bar poll, like other attorney polls, asks respondents to evaluate only those judges that attorney-respondents have appeared before during the evaluation period. However, many judges in Nevada and elsewhere report that the number of respondents who rate them in attorney polls is greater than the number of attorneys that appeared in their courtroom.

^20. The models in this paper were also estimated without the number of respondents as a weight, and again using the original Reference Beck and KatzBeck and Katz (1995) panel corrected standard errors method. The results of all of these analyses were similar to what is reported here.

^21. Because the model showed no significant relationship between experience and retention scores, a nonlinear version of this relationship was also estimated, operating on the premise that judicial experience may have diminishing returns over time. In other words, we tested the hypothesis that some experience would increase scores, but judges who had been on the bench for decades would be penalized for being “too old” or “out of touch.” We found no evidence to support this hypothesis.

^22. These states include Delaware, Maine, Montana, Nevada, New Hampshire, Rhode Island, South Dakota, Vermont, West Virginia, and Wyoming.

^23. We have not tested this interaction term here as, unfortunately, minority women judges are a null set in Nevada.

^24. For example, on the topic of timeliness, a behavioral question might read, “In your appearances before Judge X in the past year, how many times has he or she been late to court?” An attitudinal question may present the following statement, “Judge X is routinely late to court,” followed by a scaled answer set.

References

American Bar Association (1985) “Guidelines for the Evaluation of Judicial Performance.” Washington, DC: American Bar Association.Google Scholar

American Bar Association (1992) Legal Education and Professional Development: An Educational Continuum. Washington, DC: American Bar Association.Google Scholar

American Bar Association (2005) “Guidelines for the Evaluation of Judicial Performance with Commentary.” Chicago, IL: American Bar Association.Google Scholar

American Bar Association (2007) ABA Model Code of Judicial Conduct. Washington, DC: American Bar Association.Google Scholar

American Psychological Association (1991) “In the Supreme Court of the United States: Price Waterhouse v. Ann B. Hopkins: Amicus curae brief for The American Psychological Association,” 46 American Psychologist 1061–70.Google Scholar

Andersen, Seth S. (2000) “Judicial Retention Evaluation Programs,” 34 Loyola L.A. Law Rev. 1375–89.Google Scholar

American Tort Reform Foundation (2008) “Judicial Hellholes 2008/2009.” Washington, DC: American Tort Reform Foundation.Google Scholar

Baum, Lawrence (2001) American Courts: Process and Policy, 5th ed. Boston: Houghton Mifflin.Google Scholar

Beck, Nathaniel, & Katz, Jonathan (1995) “What To Do (and Not To Do) With Time-Series Cross-Section Data,” 89 American Political Science Rev. 634–47.CrossRef Google Scholar

Bertrand, Maria, & Mullainathan, Sendhil (2004) “Are Emily and Brendan More Employable than Lakisha and Jamal? A Field Experiment on Labor Market Discrimination,” 94 American Economic Rev. 991–1014.CrossRef Google Scholar

Bielby, William T. (2000) “Minimizing Workplace Gender and Racial Bias,” 29 Contemporary Sociology 120–9.CrossRef Google Scholar

Bonneau, Chris W., & Hall, Melinda Gann (2009) In Defense of Judicial Elections. New York: Routledge.CrossRef Google Scholar

Brenner, Saul, & Hagle, Timothy M. (1996) “Opinion Writing and Acclimation Effects,” 18 Political Behavior 235–61.CrossRef Google Scholar

Brody, David C. (2000) “Judicial Performance Evaluations by State Governments: Informing the Public While Avoiding the Pitfalls,” 21 Justice System J. 333–56.CrossRef Google Scholar

Brody, David C. (2008) “The Use of Judicial Performance Evluation to Enhance Judicial Accountability, Judicial Independence, and Public Trust,” 86 Denver Univ. Law Rev. 1–42.Google Scholar

Burger, Gary K. (2007) “Attorney's Ratings of Judges: 1998–2006.” Mound City, MO: Report to the Mound City Bar.Google Scholar

Choi, Stephen J., et al. (2007) “Professionals or Politicians: The Uncertain Empirical Case for an Elected Rather than Appointed Judiciary,” Olin Working Paper, University of Chicago Law and Economics, Chicago, IL.CrossRef Google Scholar

Choi, Stephen J., et al. (2009) “Are Judges Overpaid? A Skeptical Response to the Judicial Salary Debate.” 1 Journal of Legal Analysis 47–117.CrossRef Google Scholar

Cohen, Thomas (2006) “Appeals from General Civil Trials in 46 Large Counties, 2001–2005.” Washington, DC: Bureau of Justice Statistics, U.S. Department of Justice.Google Scholar

Colorado Office of Judicial Performance Evaluation (2008) “Judicial Performance Reviews.” Available online at http://www.coloradojudicialperformance.gov/review.cfm. Accessed March 17, 2010.Google Scholar

Commission on Women in the Profession (2009) “A Current Glance at Women in the Law 2009.” Chicago, IL: American Bar Association.Google Scholar

Committee on Judicial Independence (2006) “Resource Kit on Fair and Impartial Courts,” Chicago, IL: American Bar Association.Google Scholar

Cordell, La Doris H. (2008) “The Joy of Judging,” 43 Harvard Civil Rights—Civil Liberties Law Rev. 639–43.Google Scholar

Correll, Joshua, et al. (2007) “Across the Thin Blue Line: Police Officers and Racial Bias in the Decision to Shoot.” 96 J. of Personality and Social Psychology 1006–23.Google Scholar

Dillman, Don (2000) “The Role of Behavioral Survey Methodologists in National Statistical Agencies.” 68 International Statistical Rev. 200–13.Google Scholar

Dovido, John F., & Gaertner, Samuel L. (2000) “Aversive Racism in Selection Decisions: 1989 and 1999.” 11 Psychological Science 315–19.Google Scholar

Downey, Nancy (2008) “Las Vegas Review Journal Judicial Performance Evaluation 2008.” Las Vegas, NV: Downey Research Associates.Google Scholar

Driscoll, John C., & Kraay, Aart C. (1998) “Consistent Covariance Matrix Estimation With Spatially Dependent Panel Data,” 80 Rev. of Economics and Statistics 549–60.CrossRef Google Scholar

Durham, Christine M. (2000) “Gender and Professional Identity: Unexplored Issues in Judicial Performance Evaluation,” 39 Judges' J. 13–16.Google Scholar

Epstein, Lee, et al. (2003) “The Norm of Prior Judicial Experience and Its Consequences for Career Diversity on the U.S. Supreme Court,” 91 California Law Rev. 903–65.CrossRef Google Scholar

Esterling, Kevin M. (1998) “Judicial Accountability the Right Way,” 82 Judicature 206–15.Google Scholar

Esterling, Kevin M., & Sampson, Kathleen M. (1998) Judicial Retention Evaluation Programs in Four States: A Report with Recommendations. Chicago, IL: American Judicature Society.Google Scholar

Feeney, Floyd (1987) “Evaluating Trial Court Performance,” 12 Justice System J. 148–70.Google Scholar

Flango, Victor Eugene, & Ducat, Craig R. (1979) “What Difference Does Method of Judicial Selection Make? Selection Procedures in State Courts of Last Resort,” 5 Justice System J. 25–45.Google Scholar

Glick, Henry R. (1978) “The Promise and the Performance of the Missouri Plan: Judicial Selection in the Fifty States,” 32 Univ. of Miami Law Rev. 509–43.Google Scholar

Glick, Henry R., & Emmert, Craig F. (1986) “Selection Systems and Judicial Characteristics: The Recruitment of State Supreme Court Judges,” 70 Judicature 228–35.Google Scholar

Griffin, Jacqueline R. (1994) “Judging the Judges,” 21 Litigation 5.Google Scholar

Haire, Susan B. (2001) “Rating the Ratings of the American Bar Association Standing Committee on Federal Judiciary,” 22 Justice System J. 1–18.Google Scholar

Hedlund, Ronald D., et al. (1979) “The Electability of Women Candidates: The Effects of Sex Role Stereotypes,” 41 The J. of Politics 513–24.CrossRef Google Scholar

Heilman, Madeline E. (1983) “Sex Bias in Work Settings: The Lack of Fit Model,” 5 Research in Organizational Behavior 269–98.Google Scholar

Heilman, Madeline E., et al. (2004) “Penalties for Success: Reactions to Women Who Succeed at Male Gender-Typed Tasks,” 89 J. of Applied Psychology 416–27.CrossRef Google Scholar PubMed

Hettinger, Virginia A., et al. (2003) “Acclimation Effects and Separate Opinion Writing on the United States Courts of Appeals,” 84 Social Science Q. 792–810.CrossRef Google Scholar

Hoechle, Daniel (2007) “Robust Standard Errors for Panel Regressions with Cross-Sectional Dependence,” 7 Stata Journal 281–312.CrossRef Google Scholar

Hopkins, A. D. (2008) “Judging the Judges: Nearly 800 Lawyers Turn in Judicial Performance Evaluation Grades,” Las Vegas Rev. J. May 18.Google Scholar

Institute for the Advancement of the American Legal System (2008) “The Bench Speaks on Judicial Performance Evaluation: A Survey of Colorado Judges.” Denver, CO: Institute for the Advancement of the American Legal System.CrossRef Google Scholar

Institute for the Advancement of the American Legal System (2010) “Current Projects: Judicial Performance Evaluation,” Institute for the Advancement of the American Legal System. Available online at http://www.du.edu/legalinstitute. Accessed August 11, 2010.Google Scholar

Jackson, Jeffrey (2007) “Beyond Quality: First Principles in Judicial Selection and their Application to a Commission-Based Selection System,” 34 Fordham Urban Law J. 125–61.Google Scholar

Kagan, Elena (2006) “Women and the Legal Profession–A Status Report,” 61 The Record 37–48.Google Scholar

Kearney, Richard C. (1999) “Judicial Performance Evaluation in the States,” 22 Public Administration Q. 468–89.Google Scholar

Kearney, Richard C., & Sellers, Holly (1996) “Sex on the Docket: Reports of State Task Forces on Gender Bias,” 56 Public Administration Rev. 587–92.CrossRef Google Scholar

Lane, Kristin A., et al. (2007) “Implicit Social Cognition and the Law,” 19 Annual Rev. of Law and Social Science 1–25.Google Scholar

Laube, Heather, et al. (2007) “The Impact of Gender on the Evaluation of Teaching: What We Know and What We Can Do,” 19 National Women's Studies Association J. 87–104.Google Scholar

Lauck, Scott (2006) “Missouri Bar Poll Has Little Pull with Voters,” Missouri Lawyers Weekly 13 November.Google Scholar

Martell, Richard F. (1991) “Sex Bias at Work: The Effects of Attentional and Memory Demands on Performance Ratings of Men and Women,” 21 J. of Applied Social Psychology 1939–60.CrossRef Google Scholar

McMurdo, Doug (2010) “Judging the Judges: Many Lawyers Seek Appointment System,” Las Vegas Review Journal May 14.Google Scholar

Missouri Bar Judicial Evaluation Survey Committee (2007) “Report of the Judicial Evaluation Survey Committee.” Jefferson City, MO: The Missouri Bar.Google Scholar

Nieva, Veronica F., & Gutek, Barbara A. (1980) “Sex Effects on Evaluation,” 5 Academy of Management Rev. 267–76.CrossRef Google Scholar

O'Connor, Sandra Day (2009) “Judicial Independence and Civic Education,” 22 Utah Bar J. 10–19.Google Scholar

Pelander, A. John (1998) “Judicial Performance Review in Arizona: Goals, Practical Effects and Concerns,” 30 Arizona State Law J. 643.Google Scholar

Posner, Richard A. (2000) “Is the Ninth Circuit Too Large? A Statistical Study of Judicial Quality,” 29 The J. of Legal Studies 711–19.CrossRef Google Scholar

Posner, Richard A. (2005) “Judicial Behavior and Performance: An Economic Approach,” 32 Florida State Univ. Law Rev. 1259–79.Google Scholar

Rachlinski, Jeffrey J., et al. (2008) “Does Unconscious Racial Bias Affect Trial Judges?,” 84 Notre Dame Law Rev. 1195–246.Google Scholar

Rottman, David B. (2004) “Trends and Issues in the State Courts: Challenges and Achievements,” in The Book of the States: 2004. Lexington, KY: Council of State Governments.Google Scholar

Sackett, Paul R., et al. (1991) “Tokenism in Performance Evaluation: The Effects of Work Group Representation on Male-Female and White-Black Differences in Performance Ratings,” 76 J. of Applied Psychology 263–7.CrossRef Google Scholar

Schaeffer, Nora Cate, & Presser, Stanley (2003) “The Science of Asking Questions,” 29 Annual Rev. of Sociology 65–88.CrossRef Google Scholar

Schafran, Lynn Hecht, & Wikler, Norma J. (2001) “Gender Fairness in the Courts: Action in the New Millennium.” Washington, DC: National Judicial Education Program, NOW Legal Defense and Education Fund.Google Scholar

Slotnick, Elliot E. (1983) “The ABA Standing Committee on Federal Judiciary: A Contemporary Assessment–Part 1,” 66 Judicature 349–62.Google Scholar

Sprague, Joey, & Massoni, Kelley (2005) “Student Evaluations and Gendered Expectations: What We Can't Count Can Hurt Us,” 53 Sex Roles 779–93.CrossRef Google Scholar

Standing Committee on Judicial Independence (2000) “Standards on State Judicial Selection: Report of the Commission on State Judicial Selection Standards.” Chicago, IL: American Bar Association.Google Scholar

Steinpreis, Rhea E., Anders, Katie A., & Ritzke, Dawn (1999) “The Impact of Gender on the Review of the Curricula Vitae of Job Applicants,” 41 Sex Roles 509–28.CrossRef Google Scholar

Sternlight, Jean (1996) “Symbiotic Legal Theory and Legal Practice: Advocating a Common Sense Jurisprudence of Law and Practical Applications,” 50 Univ. of Miami Law Rev. 707–78.Google Scholar

Thomas, Rebecca M., et al. (2009) “Nevada Judicial Evaluation Pilot Project: Final Report.” Reno, NV: Grant Sawyer Center for Justice Studies.Google Scholar

Titus, Ronald R., et al. (2009) “Annual Report of the Nevada Judiciary, Fiscal Year 2009,” Carson City, NV: Administrative Office of the Courts.Google Scholar

Valian, Virginia (1998) Why So Slow? The Advancement of Women. Boston, MA: MIT Press.Google Scholar

Walker, Thomas G., & Barrow, Deborah J. (1985) “The Diversification of the Federal Bench: Policy and Process Ramifications,” 47 Journal of Politics 596–617.CrossRef Google Scholar

Whitely, Joan (2006) “Judging the Judges: About the Survey,” Las Vegas Review-Journal Apr. 30.Google Scholar

Cases Cited

Caperton v. A. T. Massey Coal Co., Inc. (2009) 556 U.S. ___.Google Scholar

Price Waterhouse v. Hopkins (1989) 490 U.S. 288.Google Scholar

Table 1. Comparing Questions between the ABA Guidelines, the LVRJ Survey, and Other State Surveys

Table 2. Descriptive Statistics and Difference of Means Tests by Gender and Race

Figure 1. “Yes” votes for retention in LVRJ Bar Poll Survey by gender, 1998–2008.

Table 3. Pooled OLS Model * of LVRJ “Judging the Judges” Retention Scores

Article contents

Are Judicial Performance Evaluations Fair to Women and Minorities? A Cautionary Tale from Clark County, Nevada

Abstract

Assessing Judicial Performance with Surveys

The Potential for Race and Gender Bias in the Survey Component of JPEs

Data and Methodology

Reversal Rates

Initially Appointed to the Bench

Legal Education

Judicial Experience

Ethical Record

Methodology

Results

Discussion

Conclusion

Footnotes

References

References

Cases Cited

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests