Published online by Cambridge University Press: 06 January 2021
Statistical adjustment is a ubiquitous practice in all quantitative fields that is meant to correct for improprieties or limitations in observed data, to remove the influence of nuisance variables or to turn observed correlations into causal inferences. These adjustments proceed by reporting not what was observed in the real world, but instead modeling what would have been observed in an imaginary world in which specific nuisances and improprieties are absent. These techniques are powerful and useful inferential tools, but their application can be hazardous or deleterious if consumers of the adjusted results mistake the imaginary world of models for the real world of data. Adjustments require decisions about which factors are of primary interest and which are imagined away, and yet many adjusted results are presented without any explanation or justification for these decisions. Adjustments can be harmful if poorly motivated, and are frequently misinterpreted in the media’s reporting of scientific studies. Adjustment procedures have become so routinized that many scientists and readers lose the habit of relating the reported findings back to the real world in which we live.
1 See The Mutual Construction of Statistics and Society 66 (Ann Rudinow Saetnan, Heidi Mork Lomell & Svein Hammer eds., 2010) [hereinafter Mutual Construction of Statistics] (treating statistical knowledge as producing the “spatiotemporal framework for the experience of populations, nations, classes, and social problems” and finding that “statistics do the work of holding together knowledge, practice, and the State.”) (emphasis in original).
2 See generally Alain DesrosiÈres, The Politics Of Large Numbers: A History Of Statistical Reasoning (Camille Naish trans., 1998) (examining the long history of statistics and its connection with the construction, unification and administration of the State).
3 Mutual Construction of Statistics, supra note 1, at 66 (“Statistics has become, at least in some forms of practice, the epistemic flagship of the modern sciences – be it in biology, physics, informatics, or sociology.”).
4 See generally Statistics in Society: The Arithmetic of Politics (Daniel Dorling & Stephen Simpson eds., 1999) (explaining the need for widespread comprehension of statistical insights and skills on topics such as gender, ethnicity, religion, poverty, race, health, education, unemployment and politics, among others).
5 See generally Greenland, Sander, Concepts and Pitfalls in Measuring and Interpreting Attributable Fractions, Prevented Fractions, and Causation Probabilities, 25 Annals Epidemiology 155 (2015)CrossRefGoogle ScholarPubMed (explaining how attributive theory is intimately connected to the function and proliferation of causal theories and models).
6 See Susan Dean & Barbara Illowsky, Introductory Statistics 7 (2017) (ebook).
7 Id. at 113.
8 Id. at 116.
9 Id. at 105–106.
10 Id. at 240 (“A discrete probability distribution function has two characteristics: 1. Each probability is between zero and one, inclusive. 2. The sum of the probabilities is one.”).
11 Greenland, Sander, Summarization, Smoothing, and Inference in Epidemiologic Analysis, 21 Scandinavian J. Soc. Med. 227, 228 (1993)CrossRefGoogle ScholarPubMed (“Smoothing is usually viewed as the combination of data with a model to obtain data expected under the model – more precisely, the data one should expect to see in a replicate of the study if the model is correct and the replication is perfect with respect to all identified study and subject characteristics.”).
12 Id. at 230.
13 See Keiding, Niels & Clayton, David, Standardization and Control for Confounding in Observational Studies: A Historical Perspective, 29 Stat. Sci. 529 (2014).CrossRefGoogle Scholar
14 See id. (describing the emergence of the regression modeling approach and the refinement of the weighting approach for confounder control during the twentieth-century).
15 Id. at 529 (explaining that methods of standardization of rates compare predicted marginal summaries to target populations).
16 The 2000 Census: Counting Under Adversity (Constance F. Citro, Daniel L. Cork & Janet L. Norwood eds., 2004).
17 Freedman, Laurence S. et al., Dealing with Dietary Measurement Error in Nutritional Cohort Studies, 103 J. Nat’l Cancer Inst. 1086, 1089–90 (2011).Google ScholarPubMed
18 Jennifer Weuve et al., Accounting for Bias Due to Selective Attrition: The Example of Smoking and Cognitive Decline, 23 Epidemiology 119, 121 (2012).
19 See Kieding & Clayton, supra note 13, at 542.
20 See id. at 541.
21 Guido W. Imbens & Donald B. Rubin, Causal Inference in Statistics, Social, and Biomedical Sciences 24 (2015).
22 Maldonado, George & Greenland, Sander, Estimating Causal Effects, 31 Int’l J. Epidemiology 422, 424, 428 (2002).Google ScholarPubMed
23 Klein, Richard J. & Schoenborn, Charlotte A., Age Adjustment Using the 2000 Projected U.S. Population, 20 Health People 2010 Stat. Notes 1, 1 (2001).Google Scholar
24 Victor J. Schoenbach & Wayne D. Rosamund, Understanding the Fundamentals of Epidemiology: An Evolving Text 132 (2000) (ebook).
25 Id.
26 Id.
27 Id.
28 Id.
29 Id.
30 Harper, Sam et al., Implicit Value Judgments in the Measurement of Health Inequalities, 88 Milbank Q. 4 (2010)CrossRefGoogle ScholarPubMed (discussing the involvement of value judgments in the adjustment of statistical analyses).
31
, where rk is rate in the kth stratum of the study population and Nk is the number of people in the kth stratum of the standard population.32 Krieger, Nancy & Williams, David R., Changing to the 2000 Standard Million: Are Declining Racial/Ethnic and Socioeconomic Inequalities in Health Real Progress or Statistical Illusion?, 91 Am. J. Pub. Health 1209 (2001).CrossRefGoogle ScholarPubMed
33 See Klein & Schoenborn, supra note 23, at 1–2.
34 See Krieger & Williams, supra note 32, at 1211.
35 Kaufman, Jay S. et al., The Relation Between Income and Mortality in U.S. Blacks and Whites, 9 Epidemiology 147, 148 (1998).CrossRefGoogle ScholarPubMed
36 Id. at 152.
37 But see Age Standardization of Death Rates: Implementation of the Year 2000 Standard, Nat’l Vital Stat. Rep. (CDC), Oct. 7, 1998 (explaining the impact of the implementation of the year 2000 population standard on statistical variability).
38 Matthew M. Chingos, Urban Inst., Breaking the Curve: Promises and Pitfalls of Using NAEP Data to Assess the State Role in Student Achievement (2015), http://www.urban.org/sites/default/files/publication/72411/2000484-Breaking-the-Curve-Promises-and-Pitfalls-of-Using-NAEP-Data-to-Assess-the-State-Role-in-Student-Achievement.pdf [https://perma.cc/5E6U-XWNH].
39 NAEP Overview, Nat’l Ctr. For Educ. Stat., https://nces.ed.gov/nationsreportcard/about/ [https://perma.cc/LXU6-SX7B] (last updated Mar. 30, 2016).
40 Chingos, supra note 38, at 1.
41 Id. at 3. Note that the label “blacks” is used in this paper to refer to African Americans, and “whites” to European Americans. Where some of the papers referenced have used other terminology, I have generally converted the terminology to these labels for reasons of simplicity and consistency. In all cases, the terms refer to self-identified groups within the United States.
42 Id. at 4.
43 Id. at app. A, Table A.1.
44 Id.
45 Id. at 2.
46 Id. at 4–5.
49 See generally Kaufman, Jay S. et al., Socioeconomic Status and Health in Blacks and Whites: The Problem of Residual Confounding and the Resiliency of Race, 8 Epidemiology 621 (1997)Google ScholarPubMed (discussing potential sources for residual confounding in statistical adjustments based on socioeconomic factors); VanderWeele, Tyler J. & Robinson, Whitney R., On the Causal Interpretation of Race in Regressions Adjusting for Confounding and Mediating Variables, 25 Epidemiology 473 (2014)CrossRefGoogle ScholarPubMed (analyzing the effect of adjustment to socioeconomic distributions when race is used as an exposure variable).
48 Duan, Naihua et al., Disparities in Defining Disparities: Statistical Conceptual Frameworks, 27 Stat. Med. 3941 (2008).CrossRefGoogle ScholarPubMed
49 Id. at 3942 (quoting Inst. of Med., Unequal Treatment: Confronting Racial and Ethnic Disparities in Health Care 32 (Brian D. Smedley et al. eds., 2002)).
50 Id.
51 Id.
52 Id.
53 See id.
54 See id.
55 See, e.g., Greenland, supra note 5 (using examples of control group patients without providing justification); Keiding & Clayton, supra note 13 (controlling for variables in order to make comparisons, but failing to provide justification for comparison); Tchetgen Tchetgen, Eric J., Identification and Estimation of Survivor Average Causal Effects, 33 Stat. Med. 3601 (2014)CrossRefGoogle ScholarPubMed (utilizing control variables to account for confounding problems without providing rationale behind controls).
56 Inst. of Med., Unequal Treatment: Confronting Racial and Ethnic Disparities in Health Care 77–79 (Brian D. Smedley et al. eds., 2002).
57 See generally Schisterman, Enrique F. et al., Overadjustment Bias and Unnecessary Adjustment in Epidemiologic Studies, 20 Epidemiology 488 (2009)CrossRefGoogle ScholarPubMed (utilizing a casual model analysis to illustrate and clarify the definition of overadjustment bias).
58 Tchetgen Tchetgen, supra note 55, at 3602.
59 See Duan et al., supra note 548, at 3944–55.
60 See generally Duan et al., supra note 50.
61 Hernán, Miguel A., Clayton, David & Keiding, Niels, The Simpson’s Paradox Unraveled, 40 Int’l. J. Epidemiology 780 (2011).CrossRefGoogle ScholarPubMed
62 See, e.g., Persoskie, Alexander & Leyva, Bryan, Blacks Smoke Less (and More) than Whites: Simpson’s Paradox in U.S. Smoking Rates, 2008 to 2012, 26 J. Health Care Poor & Underserved 951, 952 (2015).CrossRefGoogle ScholarPubMed
63 Beavis, Anna L. et al., Hysterectomy-Corrected Cervical Cancer Mortality Rates Reveal a Larger Racial Disparity in the United States, 123 Cancer 1044 (2017).CrossRefGoogle ScholarPubMed
64 Id. at 1044.
65 Id.
66 Id at 1045.
67 Id at 1046.
68 Id.
69 Id. at 1047.
70 See Esselen, Katharine M. et al., Health Care Disparities in Hysterectomy for Gynecologic Cancers, 126 Obstetrics & Gynecology 1029 (2015)CrossRefGoogle ScholarPubMed (concluding that there were striking racial disparities associated with the use of minimally invasive hysterectomy for uterine and cervical cancers).
71 Beavis et al., supra note 65, at 1044.
72 Mytton, Jemma et al., Removal of All Ovarian Tissue Versus Conserving Ovarian Tissue at Time of Hysterectomy in Premenopausal Patients with Benign Disease: Study Using Routine Data and Data Linkage, 356 Brit. Med. J. 1, 5 (2017).Google ScholarPubMed
73 Temkin, Sarah M. et al., The End of the Hysterectomy Epidemic and Endometrial Cancer Incidence: What Are the Unintended Consequences of Declining Hysterectomy Rates?, 6 Frontiers Oncology 1, 3 (2016).CrossRefGoogle ScholarPubMed
74 See, e.g., Wiley, Cervical Cancer Mortality Rates May Be Underestimated, ScienceDaily (Jan. 23, 2017), www.sciencedaily.com/releases/2017/01/170123094748.htm [https://perma.cc/WW35-GRSB].
75 Id.
76 See, e.g., Rositch, Anne F. et al., Increased Age and Race-Specific Incidence of Cervical Cancer After Correction for Hysterectomy Prevalence in the United States from 2000 to 2009, 120 Cancer 2032, 2035 (2014).CrossRefGoogle ScholarPubMed
77 Id.
78 Wiley, supra note 76.
79 Sharmilee M. Nyenhuis et al., Race is Associated with Differences in Airway Inflammation in Patients with Asthma, J. Allergy Clinical Immunology, 1 (2017).
80 Id. at 2.
81 Id. at 4.
82 Id. at 3.
83 Id. at 3.
84 Id. “FEV1” is the estimated volume of air that can be forced out of the lungs in one second, expressed as a percent of the total exhaled volume, and “IgE” is immunoglobulin E, the component of the immune system that drives allergic reactions.
85 Id. at 5.
86 Id. at 3–4.
87 See id. at 3–4.
88 Id. at 405. Logistic regression is a generalized linear model with a logit link and a binomial error distribution, where logit refers to ln[p/(1-p)], p is the risk of the outcome, and p/(1/p) is referred to as the odds of the outcome. Exponentiated regression coefficients from this model have an odds ratio interpretation. See, e.g., David W. Hosmer, Jr. et al., Applied Logistic Regression 1-33 (3d ed. 2013).
89 Nyenhuis et al., supra note 79, at 3.
90 Id. at 5, 7.
91 Id. at 2.
92 See supra text accompanying notes 85–87.
93 See supra text accompanying note 89.
94 Univ. of Ill. at Chi., Why is Asthma Worse in Black Patients?, ScienceDaily (Jan. 6, 2017), www.sciencedaily.com/releases/2017/01/170106133056.htm [https://perma.cc/KBZ7-ZVTL].
95 Id. (emphasis added).
96 See Nyenhuis et al., supra note 79, at 4–5.
97 Supra text accompanying notes 85–93.
98 See Rosenbaum, Paul R. & Rubin, Donald B., The Central Role of the Propensity Score in Observational Studies for Causal Effects, 70 Biometrika 41 (1983).CrossRefGoogle Scholar
99 See Schisterman et al., supra note 57, at 493 (“For estimation of total causal effects, it is not only unnecessary but likely harmful to adjust for a variable on a causal path from exposure to disease, or for a descending proxy of a variable on a causal path from exposure to disease.”).
100 See supra text accompanying notes 89–90.
101 Nyenhuis et al., supra note 79, at 2; see also Kaufman, Jay S. & Cooper, Richard S., Commentary: Considerations for Use of Racial/Ethnic Classification in Etiologic Research, 154 Am. J. Epidemiology 291, 293 (2001)CrossRefGoogle ScholarPubMed; see also supra text accompanying notes 91–94.
102 See Nyenhuis et al., supra note 79, at 4, Table 1; see also id. at 7 (“[O]ur analysis relied mostly on a single assessment of airway inflammation using induced sputum. Although induced sputum is a direct and noninvasive measure of airway inflammation, other measures, including blood eosinophil counts, exhaled nitric oxide levels, and total serum IgE levels, have been associated with greater asthma severity, yet have not consistently been shown to predict ICS treatment responsiveness to the same degree as sputum eosinophils have. Blood eosinophil counts and total serum IgE levels were measured in a subset of patients in this analysis. No difference was found in blood eosinophil counts, although African American subjects had significantly higher total serum IgE levels compared with white subjects. Larger studies examining additional measures of airway inflammation, such as blood eosinophil counts, activated airway eosinophil counts, exhaled nitric oxide levels, and total serum IgE levels, should be pursued to fully address airway inflammatory differences that might exist in African American and white patients with asthma.”).
103 See Nyenhuis et al., supra note 79, at 2.
104 A numerical example of this phenomenon may make the point more concretely. For the 3 variables race (1=black; 0=white), high vs low academic performance (1 vs. 0) and admission to Harvard (1 vs. 0) in that order, suppose that the 8 cells of the 2x2x2 tables are: 1,1,1=20; 1,0,1=5; 1,1,0=20; 1,0,0=20; 0,1,1=44; 0,0,1=20; 0,1,0=20 and 0,0,0=20. In this case, high academic performance is a cause of getting into Harvard (OR=2.6), Black race impedes Harvard admission (OR=0.4), and academic performance is completely uncorrelated with race (OR=1). Nonetheless, when one considers only those students who are attending Harvard, then odds of high academic performance must by eighty percent higher in whites to explain their excess. In the causal inference literature, this phenomenon is known as “collider stratification bias.” See Cole, Stephen R. et al., Illustrating Bias Due to Conditioning on a Collider, 39 Int’l J. Epidemiology 417 (2010).CrossRefGoogle ScholarPubMed
105 See, e.g., Scanlan, James P., Can We Actually Measure Health Disparities?, 19 Chance 47 (Mar. 1, 2006).CrossRefGoogle Scholar
106 See Nyenhuis et al., supra note 79, at 9.e4; see also Kaufman, Jay S. & Cooper, Richard S., Commentary: Considerations for Use of Racial/Ethnic Classification in Etiologic Research, 154 Am. J. Epidemiology 291, 293–94 (2001).CrossRefGoogle ScholarPubMed
107 See VanderWeele, Tyler J. & Shpitser, Ilya, On the Definition of a Confounder, 41 Annals Stat. 196 (2013).CrossRefGoogle ScholarPubMed
108 See Kaufman, Jay S. & Cooper, Richard S., Seeking Causal Explanations in Social Epidemiology, 150 Am. J. Epidemiology 113 (1999).CrossRefGoogle ScholarPubMed
109 See Kaufman, Jay S., Race: Ritual, Regression, and Reality, 25 Epidemiology 485 (2014)CrossRefGoogle Scholar; VanderWeele, Tyler J. & Robinson, Whitney R., On the Causal Interpretation of Race in Regressions Adjusting for Confounding and Mediating Variables, 25 Epidemiology 473 (2014).CrossRefGoogle ScholarPubMed
110 VanderWeele & Robinson, supra note 109, at 474.
111 See Schisterman et al., supra note 57.
112 See, e.g., Beavis et al., supra note 63; see also, e.g., Nyenhuis et al., supra note 79.
113 Forouzanfar, Mohammad H. et al., Global Burden of Hypertension and Systolic Blood Pressure of at Least 110 to 115 mm Hg, 1990-2015, 317 JAMA 165 (2017).CrossRefGoogle ScholarPubMed
114 Id.
115 Id. at app. 36, 90–91.
116 For examples of statistical estimation for an entirely counterfactual entity, see Abadie, Alberto et al., Comparative Politics and the Synthetic Control Method, 59 Am. J. Pol. Sci. 495 (2015).CrossRefGoogle Scholar This estimate can never be refuted with observed data because it pertains to an entity that is counter to historical fact.
This also invokes the philosophical debate about the demarcation between science and metaphysics. Positivist philosophers proposed that what distinguishes “science” is that its explanatory theories must be refutable based on observable evidence, but imaginary countries are never subject to direct observation. Sven Ove Hansson, Science and Pseudo-Science, Stanford Encyclopedia of Philosophy (Edward N. Zalta ed., 2017), https://plato.stanford.edu/entries/pseudo-science/.
117 Erik Corona et al., Analysis of the Genetic Basis of Disease in the Context of Worldwide Human Relationships and Migration, PLOS Genetics (May 23, 2013), http://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1003447 [https://perma.cc/TE53-CNYA].
118 Id.
119 Id.
120 Ecoronap, Comment to Analysis of the Genetic Basis of Disease in the Context of Worldwide Human Relationships and Migration, PLOS Genetics (July 6, 2013, 1:22 PM), http://journals.plos.org/plosgenetics/article/comment?id=10.1371/annotation/5dcab322-d620-4f9e-bfb0-6383bd42be9d [https://perma.cc/TE53-CNYA].
121 Id.
122 Id.
123 Id.
124 Box, G.E.P., Robustness in the Strategy of Scientific Model Building, in Robustness in Statistics 201, 202–03 (Robert L. Launer & Graham N. Wilkinson eds., 1979).Google Scholar
125 See Wing, Steve, Whose Epidemiology, Whose Health?, 28 Int’l J. Health Servs. 241 (1998).CrossRefGoogle ScholarPubMed
126 Victor J. Schoenbach & Wayne D. Rosamund, Understanding the Fundamentals of Epidemiology: An Evolving Text 132 (2000) (ebook) (modified from original table).