Published online by Cambridge University Press: 20 November 2018
Tests of statistical significance have increasingly been used in employment discrimination cases since the Supreme Court's decision in Hazelwood. In that case, the United States Supreme Court ruled that “in a proper case” statistical evidence can suffice for a prima facie showing of employment discrimination. The Court also discussed the use of a binomial significance test to assess whether the difference between the proportion of black teachers employed by the Hazelwood School District and the proportion of black teachers in the relevant labor market was substantial enough to indicate discrimination. The Equal Employment Opportunity Commission has proposed a somewhat stricter standard for evaluating how substantial a difference must be to constitute evidence of discrimination. Under the so-called 80% rule promulgated by the EEOC, the difference must not only be statistically significant, but the hire rate for the allegedly discriminated group must also be less than 80% of the rate for the favored group. This article argues that a binomial statistical significance test standing alone is unsatisfactory for evaluating allegations of discrimination because many of the assumptions on which such tests are based are inapplicable to employment settings; the 80% rule is a more appropriate standard for evaluating whether a difference in hire rates should be treated as a prima facie showing of discrimination.
1 401 U.S. 424(1971).Google Scholar
2 42 U.S.C. s&S 2000e to 2000e-17 (1976 & Supp. IV 1980).Google Scholar
3 For the business necessity defense, see generally Note, Business Necessity Under Title VII of the Civil Rights Act of 1964: A No-Alternative Approach, 84 Yale L. J. 98 (1974); Comment, The Business Necessity Defense to Disparate-Impact Liability Under Title VII, 46 U. Chi. L. Rev. 911 (1979).Google Scholar
4 Barbara Lerner, Washington v. Davis: Quantity, Quality and Equality in Employment Testing, 1976 Sup. Ct. Rev. 263.Google Scholar
5 See generally Barbara Lindeman Schlei & Paul Grossman, Employment Discrimination Law (Washington, D.C.: Bureau of National Affairs, 1976).Google Scholar
6 International Bhd. of Teamsters v. United States, 431 U.S. 324, at 339–40 11.20 (1977).Google Scholar
7 433 U.S. 299(1977).Google Scholar
8 Id. at 309 11. 14, 311 n.17.Google Scholar
9 See generally David C. Baldus & James W. L. Cole, Statistical Proof of Discrimination (Colorado Springs, Colo.: Shepard's Inc., 1980).Google Scholar
10 John Henry Wigmore, The Science of Judicial Proof (3d ed. 1937); Albert S. Osborn, Questioned Documents (2d ed. Albany, N.Y.: Boyd Printing Co., 1929); Charles T. McCormick, Handbook of the Law of Evidence (1st ed. St. Paul, Minn.: West Publishing Co., 1954); Robinsonv. Mandell, 20Fed. Cas. 1027 (1868); People v. Risley, 214 N.Y. 75, 108 N.E. 200 (1915); People v. Collins, 68 Cal.2d 319, 325-32, 66 Cal. Rptr. 497, 500–505 (1968); Finkelstein, Michael O. & Fairley, William B., A Bayesian Approach to Identification Evidence, 83 Harv. L. Rev. 489 (1970);Finkelstein, Michael O. & Fairley, William B., A Comment on “Trial by Mathematics,” 84 Harv. L. Rev. 1801 (1971). In the Howland Will case the distinguished mathematician Benjamin Peirce gave an elaborate (for the time) statistical analysis. See Meier, Paul & Zabell, Sandy, Benjamin Peirce and the Howland Will, 75 J. Am. Statistical A. 497 (1980).Google Scholar
11 See generally Tribe, Laurence H., Trial by Mathematics: Precision and Ritual in the Legal Process, 84 Harv. L. Rev. 1329 (1971).Google Scholar
12 Ellman, Ira Mark & Kaye, David, Probabilities and Proof: Can HLA and Blood Group Testing Prove Paternity? 54 N.Y.U. L. Rev. 1131 (1979).Google Scholar
13 Sprowls, R. Clay, The Admissibility of Sample Data into a Court of Law: A Case History, 4 U.C.L.A. L. Rev. 222 (1957); Zeisel, Hans, The Uniqueness of Survey Evidence, 45 Cornell L.Q. 322 (1960).Google Scholar
14 Supra note 5.Google Scholar
15 International Bhd. of Teamsters v. United States, 431 U.S. 324, 342 n.23 (1977), quoting United States v. T.1.M.E.-D.C., 517 F.2d 299, 315.Google Scholar
16 433 U.S. 299, 307–8 (1977).Google Scholar
17 Id.Google Scholar
18 430 U.S. 482 (1977).Google Scholar
19 Id. at 497 n. 17.Google Scholar
20 Zeisel, Hans, Dr. Spock, & The Case of the Vanishing Women Jurors, 37 U. Chi. L. Rev. 1 (1969).Google Scholar
21 433 US. at 311 n.17.Google Scholar
22 Id. at 307–8.Google Scholar
23 29 C.F.R. 51607.4(d) (1983). The rule is quoted at the beginning of 5 6 infra.Google Scholar
24 102 S. Ct. 2525, at 2529 n.4 (1982).Google Scholar
25 Shoben, Elaine W., Comment: Differential Pass-Fail Rates in Employment Testing: Statistical Proof Under Title VI1, 91 Harv. L. Rev. 793 (1978); Cohn, Richard M., On the Use of Statistics in Employment Discrimination Cases, 55 Ind. L.J. 493 (1980); Baldus & Cole, supra note 9; Booth, Dean & MacKay, James L., Legal Constraints on Employment Testing and Evolving Trends in the Law, 29 Emory L.J. 121 (1980):Van Bowen, Jacob Jr., & Riggins, C. Allen, A Technical Look at the Eighty Per Cent Rule as Applied to Employee Selection Procedures, 12 U. Rich. L. Rev. 647 (1978).Google Scholar
26 Cohn, supra note 25; Shoben, Elaine W., In Defense of Disparate Impact Analysis Under Title VII: A Reply to Dr. Cohn, 55 Ind. L.J. 515 (1980);Cohn, Richard M., Statistical Laws and the Use of Statiztics in Law: A Rejoinder to Professor Shoben, 55 Ind. L.J. 537 (1980).Google Scholar
27 E.g., Baldus & Cole, supra note 9; Shoben, Elaine W., Probing the Discriminatory Effects of Employee Selection Procedures with Disparate Impact Analysis Under Title VII, 56 Tex. L. Rev. 1 (1977); Shoben, supra note 25; Cohn, supra note 25; Smith, Arthur B. Jr., & Abram, Thomas G., Quantitative Analysis and Proof of Employment Discrimination, 1981 U. Ill. L. Rev. 33;Spradlin, B. C. & Drane, J. W., Additional Comments on the Application of Statistical Analysis to Differential Pass-Fail Rates in Employment Testing, 17 Duq. L. Rev. 777 (1978-79);Braun, Louis J., Statistics and the Law: Hypothesis Testing and Its Application to Title VII Cases, 32 Hastings L.J. 59 (1980).Google Scholar
28 433 U.S. 259(1977).Google Scholar
29 Id. at 311.Google Scholar
30 The notion of statistical independence has some subtlety and is not equivalent to the colloquial use of the term; see sec. 3.Google Scholar
31 The general formula in the one-sample case for the standard error is where p is the population proportion or standard and n is the sample size or number of hires.Google Scholar
32 433 U.S. 299, 311 (1977). The proportion of blacks among qualified teachers in St. Louis County excluding the city of St. Louis was specified as 5.7%, and thus was the standard adopted by the court. The corresponding proportion in St. Louis County including the city was 15.4%, and the government, representing the plaintiffs in a class action, asserted that this was the proper standard.Google Scholar
33 Id. at 311–12.Google Scholar
34 Alternative approximations, their relationship, and the exact calculation for the Hazelwood data are described in app. 2.Google Scholar
35 The fluctuations are approximately normal with standard error where p is the assumed common pass rate, nB is the number of blacks (48) in the sample, and pC, is the number of whites (259). Since pC is unknown, we estimate it by substituting pC, the observed total pass rate; in Teal, pC = (26 + 1206)/(48 + 259) = 232/307 =. 7557 and consequently the standard error is estimated to be .068. See app. 2.Google Scholar
36 The adverse impact of the test itself was acknowledged by the state, and the decision makes no comment on statistical significance.Google Scholar
37 The Fisher exact test is both the appropriate conditional procedure and the UMPU, or uniformly most powerful unbiased test. Ronald A. Fisher, Statistical Methods for Research Workers 96-97 (14th ed. Edinburgh: Oliver & Boyd, 1973); Tocher, K. D., Extension of the Neyman-Pearson Theory of Tests of Discontinuous Variates, 37 Biometrika 130 (1950); Erich L. Lehmann, Testing Statistical Hypotheses (New York: John Wiley & Sons, 1959); 2 Maurice Kendall k Alan Stuart, The Advanced Theory of Statistics 57C-76 (3d ed. New York: Hafner Press, 1973); J. Pratt & J. D. Gibbons, Concepts of Nonparametric Theory 238–41 (New York: Springer-Verlag New York, 1981).Google Scholar
Strictly speaking, in both the one- and two-sample cases the UMPU tests have randomized versions that may become appropriate if testing at a predetermined level of significance is required. This detail is of interest in the theoretical comparison of tests but has no relevance for our present purpose. In any event, the attained level of significance is the same for both the randomized and nonrandomized versions of the tests. For further discussion, see app. 2.Google Scholar
38 See app. 2.Google Scholar
39 Spradlin & Drane, supra note 21.Google Scholar
40 Another example will further illustrate the difference between the two conditional probabilities. In the celebrated Collins case (68 Cal. 2d 319, 438 P.2d 33, 66 Cal. Rptr. 497 (1968)), a woman had been robbed by a blond white woman wearing a ponytail, who then fled the scene of the crime in a yellow automobile driven by a black man with a beard and a moustache. At trial the prosecutor had a mathematical expert witness testify that the chance that a randomly chosen couple would answer to this description was 1 in 12,000,000. (On appeal, the calculation on which this number was based was rightfully and successfully challenged, but that is not relevant to the point presently at issue.) Symbolically, what was being asserted was P(specified traits / couple selected at random)= 1/12,000,000.Google Scholar
This is not the same as P (Collins couple innocent / specified traits), although that was what the prosecutor thought. (He argued that if there were only a 1 in 12 million chance that a random couple would possess these traits, then there was only a 1 in 12 million chance that another couple existed, other than the Collinses, answering to the same description, and hence only a 1 in 12 million chance that the Collinses were innocent.)Google Scholar
To see why this need not be the case, consider the simplified but essentially identical case of a smudged, partial fingerprint with characteristics that occur in 1 out of 10,000 people. Suppose a person with a matching fingerprint is found. Would the probability that he was innocent be 1 in 10,000? Obviously not; it would depend on the size of the pool of potential suspects and what other evidence was available. In a city of one million, one would expect about 100 people to match such a partial print and, lacking any further evidence, the probability of guilt for any one of these would be only 1 in 100 (and therefore the probability of innocence 99 in 100). If, on the other hand, there were other, substantial evidence leading the authorities to suspect a particular individual (so that his probability of guilt prior to considering the evidence of the partial print was quite high, say 50%), then the additional probative force of the matching partial print would be very great indeed.Google Scholar
41 See text supra, sec. 1.Google Scholar
42 E.g., Chicano Police Officer's Ass'n v. Stover, 526 F.2d 431, 439 (10th Cir. 1975), vacated, 426 U.S. 944 (1976). For a detailed discussion of this point, see Williams v. City & County of San Francisco, 483 F. Supp. 335, 341-42 (N.D. Cal. 1979).Google Scholar
43 See supra note 9.Google Scholar
44 Fisher was the leading figure in the development of statistical methodology during the first half of the twentieth century.Google Scholar
45 Fisher, supra note 37, 1sted. 1925, 14th ed. 1973.Google Scholar
46 The later elaboration of significance testing by Neyman, J. & Pearson, E. (On the Problem of the Most Efficient Tests of Statistical Hypotheses, 231 Phil. Trans. Roy. Statistical Soc. A 289 (1933)), offset the centrality of the significance level by a balancing concern for the probability of failing to detect a disparity when one exists. (The significance level is concerned, as we have noted, with the probability of detecting a disparity where none exists.) Abraham Wald's formulation of statistical decision theory (Statistical Decision Functions (New York: John Wiley & Sons, 1950)) and his view of statistical inference as a form of decision making continued this deemphasis of the significance level. At present there is even debate about the appropriateness of significance testing for inference altogether, some insisting on a so-called Bayesian formulation in which a prior distribution of belief plays a prominent role and has the effect of replacing the subjectivity of a 5% significance level by the subjectivity of the prior distribution. Thus, there is no professional consensus about the proper use of significance levels, or about which level of significance is critical, to claim the law's particular attention. See generally William H. Kruskal, Significance, Tests of, in William H. Kruskal & Judith A. Tanur, eds., International Encyclopedia of Statistics 944 (New York: Free Press, 1978).Google Scholar
47 433 U.S. 299, 318 n.5 (1977) (Stevens, J., dissenting). See app. 2 infra at pp. 176–78.Google Scholar
48 While there are examples where the one-sided, two-sided issue has surfaced in court we do not regard the matter as one of great import. For further discussion, and some useful cautionary notes on the use of one-sided tests, see Freedman, infra note 68, at 494–96.Google Scholar
49 433 US. 299, 311 11.17 (1977) (emphasis added).Google Scholar
50 In statistical parlance the word “independence” is used to denote the property permitting the application of the product rule to the outcomes WWBBW. In situations like the ones we discuss the property of independence derives from the assumption of random sampling.Google Scholar
51 The binomial distribution function specifies the probability of observing, say, x blacks in a sample of size n when sampling at random from a pool in which a proportion, p, is black. There being (§) distinct sequences of choices resulting in the selection of x blacks (and whites) each of which has probability pxqn-x, the sum of these probabilities is b (x)=(§)pxqn-x (See, e.g., Frederick Mosteller, Robert E. K. Rourke, & George B. Thomas, Jr., Probability with Statistical Applications (2d ed. Reading, Mass.: Addison-Wesley Publishing Co., 1970)).Google Scholar
52 Shoben, supra note 27, at 801.Google Scholar
53 Smith & Abram, supra note 27, at 42Google Scholar
54 Following the calculation done for Teal in sec. 2, supra note 35, the difference of observed cluster rates is 10/25- 15/25 = .40-.60= .20= -20%. Since pC=25/50= .5, the estimated standard error is and therefore the difference in cluster rates is 1.41 standard errors. The difference of individual rates of 40/100-60/100= -.20= -20% with, so the difference in individual rates is 2.83 standard errors.Google Scholar
55 See Bickel, P. J., Hammel, E. A., & O'Connell, J. W., Sex Bias in Graduate Admissions: Data from Berkeley, 187 Science 398 (1975).Google Scholar
56 See Wagner, Clifford H., Simpson's Paradox in Real Life, 36 Am. Statistician 46 (1982).Google Scholar
57 Id. at 87.Google Scholar
58 As a simple example, consider two hypothetical brackets in 1976, say, 20% and 40%, with 90% of the taxpayers in the 20% bracket, 10% in the 40% bracket. If in 1976, the brackets decline to 19% and 39%, but now 90% of the taxpayers are in the higher bracket, the overall rate of income paid out in taxes will change from 22% in 1976 to 37% in 1979.Google Scholar
59 William H. Kruskal, Letters, in William B. Fairley & Frederick Mosteller, eds., Statistics and Public Policy 127–30 (Reading, Mass.: Addison-Wesley Publishing Co., 1977).Google Scholar
60 426 U.S. 229 (1976).Google Scholar
61 See, e.g., John Maynard Keynes, A Treatise on Probability ch. 14 (London: Macmillan & Co., 1921); Meier & Zabell, supra note 10, at 500–501.Google Scholar
62 431 U.S. 324, 340 n. 20 (1977).Google Scholar
63 Dyer, A. et al., Relationship of Relative Weight and Body Mass Index to 14-Year Mortality in the Chicago Peoples Gas Company Study, 28 J. Chronic Diseases 109 (1975).Google Scholar
64 Nature, May 1982.Google Scholar
65 426 U.S. 229 (1976). The Court took note in this case that the D.C. Police Department was engaged in affirmative action, seeking to increase the number of black recruits. Its campaign urged blacks to take the qualification test, even if they doubted that they would succeed. Not surprisingly, the pass rate for blacks during this period fell, but, in view of the recruitment campaign, the Court did not find in this disparity of pass rates any evidence of “intent to discriminate” against blacks.Google Scholar
66 433 U.S. at 310–11.Google Scholar
67 Thus, our use of the term “relevance” not only refers to the weight of the statistical evidence for discrimination, which would be greater if the disparity were larger, but also to a more standard usage: when does statistical evidence have any relation at all to discriminatory behavior?Google Scholar
68 It might be thought that when large employers are concerned, the size of the numbers involved would result in a canceling out of such differences, and thus ameliorate many of the difficulties of interpretation and analysis just discussed. But there are many examples in the sampling literature where exactly the opposite has occurred, with the larger sample size merely exacerbating the problems caused by the nonrandom nature of the sample. The famous Literary Digest poll of 1936, for example, which predicted FDR's defeat in his reelection bid for president, was based on the largest number of people ever replying to a poll, some 2.4 million persons. But size didn't help the Literary Digest. Due to selection and nonresponse biases it predicted that Landon would defeat Roosevelt by a 57% to 43% margin, whereas Roosevelt ended up winning by a landslide 62% to 38%; see David Freedman, Robert Pisani, & Roger Purves, Statistics 302–4 (New York: W. W. Norton, 1978), and references therein. As Freedman et al. state, “When a selection procedure is biased, taking a large sample doesn't help. This just repeats the mistake on a larger scale.” Id. at 303. (Bias is used here in a statistical, rather than a discriminatory, sense.)Google Scholar
69 Even an employer with a modest work force faces the same difficulty except that, rather than be found at fault with near certainty when there is only a small difference, the smaller employer runs the risk of being found at fault with substantial probability. For an employer who has 320 black applicants and 320 white applicants and uses an employment test on which the pass rates are 65% and 70%, there is approximately a 23% chance that the data will show adverse impact.Google Scholar
70 Lerner, supra note 4.Google Scholar
71 Even if the employer adopts an affirmative action policy to make the proportions equal, the holding in Teal suggests that the use of nonvalidated tests is prohibited.Google Scholar
72 See supra note 11.Google Scholar
73 See supra note 68.Google Scholar
74 Here and below, there is necessarily an element of caricature in so compressing a position. We cite these authorities as representative of a stance rather than describing their actual positions with precision. Neither specifically addresses the employment discrimination problem but discuss the wider question of the proper use of statistics in law (Tribe) and in general (Freedman).Google Scholar
75 Freedman, supra note 68.Google Scholar
76 Even a Benthamite can have too much of a good thing: “It is obvious, too, that even when the probabilities are derived from observation and experiment, a very slight improvement in the data, by better observations, or by taking into fuller consideration the special circumstances of the case, is of more use than the most elaborate application of the calculus to probabilities founded on the data in their previous state of inferiority.” 2 John Stuart Mill, A System of Logic n.65 (1st ed. 1843).Google Scholar
77 The issue of practical vs. statistical significance has been discussed in the legal literature (see, e.g., Baldus & Cole, supra note 22, at 317–18; Smith & Abram, supra note 27, at 53). The matter does not seem to carry weight in Shoben, supra note 25, at 806, where it is stated that the “flaws in the four-fifths rule can be eliminated by replacing it with a test of the statistical significance of differences in pass rate proportions.” The Supreme Court has not addressed the problem, but the lower courts have recognized the issue. E.g., in Moore v. Southwestern Bell Tel. Co., 593 F.2d 607, 608 n. 1 (5th Cir. 1979), a test used to qualify clerks for promotion resulted in 248 of 277 blacks passing (pass rate = .895) while 453 of 469 whites passed (pass rate = .966). The difference in pass rates is 3.94 standard errors and therefore highly statistically significant. The district court did not find the difference to be substantial and was upheld on appeal.Google Scholar
78 A similar concern for “robustness” of conclusion against modest deviations from the hypothesized model is characteristic of much recent statistical theory.Google Scholar
79 487 F. Supp. 389 (N.D. Tex. 1980).Google Scholar
80 Id. at 393.Google Scholar
81 Shoben, supra note 25.Google Scholar
82 If pB = .70. pw= .80 are the black and white selection rates, then the absolute difference is 10%. The rejection rates qB = 1 -pe = .30, qw = 1 -pw = .20, also have an absolute difference of .30 - .20 = 10%. On the other hand, pB/pw = .70/.80= .875, while qw/qB = .20/.30= .67.Google Scholar
83 I.e., the minimum of pB/pw and qw/qB.Google Scholar
84 See Joseph L. Fleiss, Statistical Methods for Rates and Proportions (2d ed. New York: John Wiley & Sons, 1982). The odds ratio is (pB/qB)/(pW/qW).Google Scholar
85 If we pursue a statistical analysis here, then the impact of the different numbers of applicants will be felt no matter which measure—the ratio or the absolute difference—is used. In any case, the choice of index necessarily precedes any statistical considerations.Google Scholar
86 440 US. 568(1979).Google Scholar
87 523 F.2d 1290 (8th Cir. 1975). In Green, of the 3,282 black applicants and 5,206 white applicants, 174 blacks and 118 whites were rejected on grounds of having a prior conviction. Here qB = 174/3282 = .05, qw = 118/5206 = .02, so the data have the same rejection (and selection) rates as in the example. The court found the use of a conviction record to be arbitrary and to have adverse impact on blacks and found for the plaintiff, whose conviction resulted from refusal to be drafted into military service.Google Scholar
88 9 C.F.R. § 1607.4(d) (1982).Google Scholar
89 E.g., Cormier v. P.P.G. Indus., 519 F. Supp. 211 (W.D. La. 1981).Google Scholar
90 Shoben, supra note 25; Cohn, supra note 25.Google Scholar
91 In contrast to its original clear and direct statement of the rule, a “Questions and Answers” supplement later published by the EEOC is confusing to the point of incoherence.Google Scholar
92 533 F. Supp. 844 (D. Del. 1982).Google Scholar
93 Historically, the 80% rule, as it appears in the EEOC agency guidelines, evolved from a California regulation that required both the 415 disparity in selection rate ratios and statistical significance for an administrative finding of adverse impact. Subsequently, compliance officers experienced difficulties in properly performing statistical significance tests in the field and the role of statistical significance was much reduced in the application of the regulation. This experience and lengthy discussion led, at the federal level, to the adoption of the present rule which reaffirms the role of statistical significance but couched in terms permitting generally easier administrative application. The sources of misinterpretation of the rule have been the error in focusing only on the 4/5 disparity and the failure to recognize that at no time was it suggested that the 4/5 disparity alone be adopted as having probative force in the courtroom. We are grateful to William Burns, David Rose, and Philip Sklover for this information.Google Scholar
94 E.g., in the case of the “inexorable zero.” International Bhd. of Teamsters v. United States, 431 U.S. 324, 342 n.23 (1977), quoting United States v. T.1.M.E.-D.C., 517 F.2d 299, 315. Where there are such gross disparities as in United States v. Commonwealth, 454 F. Supp. 1077 (E.D. Va. 1978), modified, 620 F.2d 1018 (4th Cir. 1980). cert. denied, 449 U.S. 1021 (1980), sophistication is not needed to judge the statistics.Google Scholar
95 So that the selection rate for the most favored group is at least 50% greater.Google Scholar
96 Griggs v. Duke Power Co., 401 U.S. 424, 433–34 (1971).Google Scholar
97 An instance like this appears in one of the myriad data sets in Cormier v. P.P.G. Indus., 519 F. Supp. 211 (W.D. La. 1981), where there were 2,499 white and 1,050 black applicants for utility crew jobs in the year 1977, of which 86 whites and24 blacks were hired. The hire rates are pB = .023, pW, = .034 with a ratio pB/pW = .664. The two-sample tests show that this disparity is not statistically significant; thus, both the 80% rule and the two-sample binomial test would not find adverse impact.Google Scholar
98 In Jackson v. Nassau County Civil Serv. Comm'n, 424 F. Supp. 1162, 1167 (E.D.N.Y. 1976), 113 identified whites and 55 blacks took an examination for the position of community service assistant. Of these, 99 whites and 40 blacks passed. The pass rates are pB = .727, pW = .876 with ratio pB/pW =.830. Thus the 80% rule would not find substantial disparity. The two-sample binomial test is significant (with Zc, = 2.178 computed as described in app. 2, infra at p. 183).Google Scholar
A case in this domain but where the sample sizes are less and disparity is great, resulting in a borderline situation, is Reynolds v. Sheet Metal Workers Local 102, 498 F. Supp. 952, 960, 965 (D.D.C. 1980), where, of 44 black and 80 nonblack applicants to an apprenticeship training program, 14 blacks and 41 (51.3%) nonblacks were selected. The selection rates were pB=.318, pw = .513; the ratio is .621 (the reported calculations in the opinion are in error). The one-sided two-sample binomial test shows significance at the .029 level, which is a bit larger than the .025 standard. Strictly speaking, neither the 80% rule nor the two-sample binomial would find disparity. The court agonized about the .025 standard and decided on the basis of collateral evidence and the inessentiality of such a precise standard that a prima facie case was established.Google Scholar
99 68 Cal. 2d 319, 438 P.2d 33, 66 Cal. Rptr. 497.Google Scholar
100 550 F.2d 577, 584 (10th Cir. 1976).Google Scholar
101 Chance v. Board of Examiners, 458 F.2d 1167 (2d Cir. 1972); Chicano Police Officer's Ass'n v. Stover, 526 F.2d 431 (10th Cir. 1975), vacated, 426 U.S. 944 (1976); Harless v. Duck, 14 Fair Empl. Prac. Cas.(BNA) 1616 (N.D. Ohio 1977), rev'd, 619F.2d611 (6th Cir. 1980), cert. denied, 449 U.S. 872(1980); Lee v. City of Richmond, 456 F. Supp. 756 (E.D. Va. 1978); Williams v. City & County of San Francisco, 483 F. Supp. 335 (N.D. Cal. 1979); Dendy v. Washington Hosp. Center, 431 F. Supp. 873 (D.C. 1977), remanded per curium, 581 F.2d 990 (D.C. Cir. 1978); Bridgeport Guardians, Inc. v. Members of Bridgeport Civil Serv. Comm'n, 354 F. Supp. 778 (D. Conn.), aff'd in relevant part, 482 F.2d 1333 (2d Cir. 1973); Educational Equality League v. Tate, 472 F.2d 612 (3d Cir. 1973), rev'd sub nom Mayor of Philadelphia v. Educational Equality League, 415 U.S. 605 (1974); Wade v. New York Tel. Co., 500 F. Supp. 1170(S.D.N.Y. 1980).Google Scholar
102 The Supreme Court has twice given its imprimatur to such concerns but in each case, characteristically, has not elaborated. See Mayor of Philadelphia v. Educational Equality League, 415 U.S. 605, 621 (1974); International Bhd. of Teamsters v. United States, 431 U.S. 324, 340 n.20 (1977).Google Scholar
103 Lee v. City of Richmond, 456 F. Supp. 756, 766 (E.D. Va. 1978); Harless v. Duck, 14 Fair Empl. Prac. Cas. (BNA) 1616 (N.D. Ohio 1977). rev'd, 619 F.2d 611 (6th Cir.), cert. denied, 449 U.S. 872 (1980) (reference to “cell of five” rule a misstatement of rule-of-thumb for applicability of chi-squared approximation); hey v. Western Elec. Co., 23 Fair Empl. Prac. Cas. (BNA) 1024 (N.D. Ga. 1977).Google Scholar
104 Most notably in Williams v. City & County of San Francisco, 483 F. Supp. 335, 341–42 (N.D. Cal. 1979), where the court misread the Department of Justice guidelines (and a large number of cases) as equating small sample size with lack of statistical significance. See also Chicano Police Officer's Ass'n v. Stover, 526 F.2d 431, 439 (10th Cir. 1975), vacated, 426 U.S. 944 (1976).Google Scholar
105 456 F. Supp. 756 (E.D. Va. 1978).Google Scholar
106 Id. at 766.Google Scholar
107 Chicano Police Officer's Ass'n v. Stover, 526 F.2d 431 (10th Cir. 1975), vacated, 426 U.S. 944Google Scholar
108 Id. at 439.Google Scholar
109 Id.Google Scholar
110 330 F. Supp. 203 (S.D.N.Y. 1971), aff'd, 458 F.2d 1167 (2d Cir. 1972).Google Scholar
111 Id. at 212.Google Scholar
112 Educational Equality League v. Tate, 472 F.2d 612 (3d Cir. 1973), rev'd sub nom Mayor of Philadelphia v. Educational Equality League, 415 US. 605 (1974); Jackson v. Nassau County Civil Serv. Comm'n, 424 F. Supp. 1162, 1168 (E.D.N.Y. 1976).Google Scholar
113 Wadev. New York Te1. Co., 500 F. Supp. 1170, 1180 (S.D.N.Y. 1980). Cf. Bridgeport Guardians Inc. v. Members of Bridgeport Civil Serv. Comm'n, 354 F. Supp. 788 (D. Conn.), aff'din relevanr part, 482 F.2d 1333 (2d Cir. 1973). “While this probability may have some statistical significance, the numbers on which it is based are too small to have constitutional significance. If only one more non-White candidate had passed the exam, the comparative passing rates would be 68% and 50%, too slight a difference to establish a prima facie case of discrimination. Even if the 68% to 40% comparison is sufficient under Chance, these figures cannot be relied upon when a different result achieved by a single candidate could so drastically alter the comparative figures.” Id. at 795.Google Scholar
114 431 F. Supp. 873 (D.C. 1977), remanded per curium, 581 F.2d 990 (D.C. Cir. 1978).Google Scholar
115 21 Fair Empl. Prac. Cas. (BNA) 200, 205 (N.D. Ill. 1979).Google Scholar
116 431 F. Supp. 873, 876 (D.C. 1977), remanded per curium, 581 F.2d 990 (D.C. Cir. 1978): To be persuasive, statistical evidence must rest on data large enough to mirror the reality of the employment situation. If, on the one hand, the courts were to ignore broadly based statistical data, that would be manifestly unfair to Title VII complainants. But if, on the other hand, the courts were to rely heavily on statistics drawn from narrow samples, that would inevitably upset legitimate employment practices for reasons of appearance rather than substance. The courts must be astute to safe-guard both of these conflicting interests….Google Scholar
In the instant matter, the Court is convinced that the data offered by plaintiffs represents too slender a reed on which to rest a weighty remedy of preliminary relief.Google Scholar
117 21 Fair Empl. Prac. Cas. (BNA) 200, 205 (N.D. Ill. 1979). In this case 562 whites and 12 blacks took the 1973 fire captain examination, and 417 whites and 5 blacks passed. The ratio of pass rates is .562. From the opinion one gathers that no significance test was performed, but the 4/5 part of the 80% rule is appealed to by the plaintiffs (“in the absence of testimony as to the probability of chance” as a factor in the figure for the pass rate for blacks given the small sample of blacks compared to the larger sample for whites. This consideration, along with others discussed below, also influences the weight to be accorded the application of the “80 percent rule.” Id. at 207.) In fact, the F.E.T. establishes statistical significance at the, 019 level, which is below the standard .025! At the same time the court expressed concern about the small number of black applicants “especially in the light of the absence of thorough testimony on the statistical significance.” Id. at 207.Google Scholar
118 See Mosteller et al., supra note 51, at § 4–5.Google Scholar
119 Id. at ch. 8.Google Scholar
120 433 U.S. 299 n.17 (1977).Google Scholar
121 The simple relationship between one-sided and two-sided probabilities that holds exactly for the normal distribution is only an approximate relationship in the binomial case because the binomial is not quite symmetric about its mean. In the Hazel wood case, the two-sided test would be based on the probability of a departure in either direction from the expected proportion, 5.7%, as large or larger than 5.7% - 3.7%= 2.0%. An observed proportion as large or larger than 5.7%+ 2.0%= 7.7% corresponds to a number of black hires at least equal to 23.1 + 8.1 = 31.2, that is 32 or more. Exact calculation shows this probability, together with the previously calculated value of 4.6%, gives an exact two-sided probability quite close to the 2 × 4.2%= 8.4% given by the normal approximation.Google Scholar
122 Frederick Mosteller & Robert E. Rourke, Sturdy Statistics: Nonparametrics and Order Statistics 26–27 (Reading, Mass.: Addison-Wesley, 1973).Google Scholar
123 Id. at 170–72.Google Scholar
124 Id. at 113–14.Google Scholar
125 See Mosteller et al., supra note 51, at §§ 9–2, 9–3.Google Scholar
126 Ronald A. Fisher & Frank Yates, Statistical Tables for Biological, Agricultural and MedicalGoogle Scholar
127 102 S. Ct. 2525 (1982).Google Scholar
128 431 F. Supp. 873 (D.C. 1977), remanded per curiam, 581 F.2d 990 (D.C. Cir. 1978).Google Scholar
129 Mosteller et al., supra note 51, at 6 2–3.Google Scholar
130 Chicano Police Officer's Ass'n v. Stover, 526 F.2d 431 (10th Cir. 1975). vacated, 426 U.S. 944 (1976).Google Scholar
131 102 S. Ct. 2525 (1982).Google Scholar
132 See Mosteller et al., supra note 51, at 315–19.Google Scholar
133 Id. at 319–22.Google Scholar
134 See Mosteller & Rourke, supra note 122, ch. 11.Google Scholar
135 William G. Cochran, The x2 Test of Goodness of Fit, 23 Annals Mathematical Statistics 315 (1952).Google Scholar
136 Mosteller et al., supra note 51, § 10–5.Google Scholar
137 The equivalence stems from the fact that appendix table 1 and its analysis do not depend upon whether we use the rows or the columns to compare proportions. The proportions related to the rows are pB and pw; the proportions related to the columns are the variants cited in this paragraph.Google Scholar
138 See generally Kendall & Stuart, supra note 37, ch. 24.Google Scholar
139 G2 is undefined in any cell with a 0 and the chi-square distribution is an unreliable approximation when there are very small cell values.Google Scholar
140 Spradlin & Drane, supra note 27, at 781, criticize the 80% rule and also criticize Shoben's use of the Z-test by presenting a hypothetical table: P F B 1 149 W 19 481 for which G2= 5.078 while the chi-squared value (which, as we have noted, is equal to Z2) = 3.799. The significance probability (based on the approximation of the chi-squared distribution) is, for G2, .024 for chi-squared .051, and for the continuity corrected chi-squared .093. For the F.E.T. we would get, for the same problem, a significance probability of .07 based on doubling the corresponding calculation in Teal. The wide discrepancy between G2 and the F.E.T. is a consequence of the small number in the “black pass” cell of the table in this note (see supra note 139).Google Scholar