Hostname: page-component-586b7cd67f-dsjbd Total loading time: 0 Render date: 2024-11-24T12:01:18.881Z Has data issue: false hasContentIssue false

The Use and Misuse of Biomedical Data: Is Bigger Really Better?

Published online by Cambridge University Press:  06 January 2021

Sharona Hoffman
Affiliation:
Case Western Reserve University School of Law; Wellesley College; Harvard Law School; University of Houston
Andy Podgurski
Affiliation:
Case Western Reserve University, University of Massachusetts

Extract

Very large biomedical research databases, containing electronic health records (EHR) and genomic data from millions of patients, have been heralded recently for their potential to accelerate scientific discovery and produce dramatic improvements in medical treatments. Research enabled by these databases may also lead to profound changes in law, regulation, social policy, and even litigation strategies. Yet, is “big data” necessarily better data?

This paper makes an original contribution to the legal literature by focusing on what can go wrong in the process of biomedical database research and what precautions are necessary to avoid critical mistakes. We address three main reasons for approaching such research with care and being cautious in relying on its outcomes for purposes of public policy or litigation. First, the data contained in biomedical databases is surprisingly likely to be incorrect or incomplete. Second, systematic biases, arising from both the nature of the data and the preconceptions of investigators, are serious threats to the validity of research results, especially in answering causal questions. Third, data mining of biomedical databases makes it easier for individuals with political, social, or economic agendas to generate ostensibly scientific but misleading research findings for the purpose of manipulating public opinion and swaying policymakers.

In short, this paper sheds much-needed light on the problems of credulous and uninformed acceptance of research results derived from biomedical databases. An understanding of the pitfalls of big data analysis is of critical importance to anyone who will rely on or dispute its outcomes, including lawyers, policymakers, and the public at large. The Article also recommends technical, methodological, and educational interventions to combat the dangers of database errors and abuses.

Type
Article
Copyright
Copyright © American Society of Law, Medicine and Ethics and Boston University 2013

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

1 Coleman, Priscilla K. et al., Induced Abortion and Anxiety, Mood, and Substance Abuse Disorders: Isolating the Effects of Abortion in the National Comorbidity Survey, 43 J. Psychiatric Res. 770, 773 (2009), available at http://www.journalofpsychiatricresearch.com/article/S0022-3956(08)00238-0/abstract.CrossRefGoogle ScholarPubMed

2 Id. at 770.

3 Sharon Begley, Journal Disavows Study Touted by U.S. Abortion Foes, Reuters (Mar. 7, 2012, 3:11 P.M.), http://www.reuters.com/article/2012/03/07/us-usa-abortion-psychiatry-idUSTRE8261UD20120307 (stating that the study had been “widely cited by legislators and advocates to argue that abortion raises a woman's risk of mental illness and to push for laws requiring providers” to inform women of this danger).

4 Counseling and Waiting Periods for Abortion, State Policies In Brief (Guttmacher Inst., New York, N.Y.), May 1, 2012, at 1, 3, available at http://www.guttmacher.org/statecenter/spibs/spib_MWPA.pdf.Google Scholar

5 Kessler, Ronald C. & Schatzberg, Alan F., Reply to Letter to the Editor, Commentary on Abortion Studies of Steinberg and Finer (Social Science & Medicine 2011; 72:7282)Google Scholar and Coleman (Journal of Psychiatric Research 2009;43:770–6 & Journal of Psychiatric Research 2011;45:1133–4), 46 J. Psychiatric Res. 410, 410-11 (2012).

6 Id. at 410.

7 Blumenthal, David & Tavenner, Marilyn, The “Meaningful Use” Regulation for Electronic Health Records, 363 New Eng. J. Med. 501, 501 (2010).CrossRefGoogle ScholarPubMed Others may call EHRs electronic medicalrecords (EMR). For the sake of simplicity, we use “EHR” consistently throughout and do not believe there is a substantive distinction between the two terms. See Peter Garrett & Joshua J. Seidman, EMR vs EHR–What Is the Difference?, Healthitbuzz (Jan. 4, 2011, 12:07 P.M.), http://www.healthit.gov/buzz-blog/electronic-health-and-medical-records/emr-vs-ehr-difference/ (“Some people use the terms ‘electronic medical record’ and ‘electronic health record’ (or ‘EMR’ and ‘EHR’) interchangeably. But here at the Office of the National Coordinator for Health Information Technology (ONC), you’ll notice we use electronic health record or EHR almost exclusively.”).

8 Hoffman, Sharona & Podgurski, Andy, Balancing Privacy, Autonomy, and Scientific Needs in Electronic Health Records Research, 65 Smu L. Rev. 85, 9194 (2012).Google ScholarPubMed

9 Hoffman, M.A., The Genome-Enabled Electronic Medical Record, 40 J. Biomedical Informatics 44, 44 (2006);CrossRefGoogle ScholarPubMed Kohane, Isaac S., Using Electronic Health Records to Drive Discovery in Disease Genomics, 12 Nature Rev. Genetics 417, 417 (2011).CrossRefGoogle ScholarPubMed

10 Arthur M. Lesk, Introduction To Genomics 104-05 (2d ed. 2012).

11 Hoffman & Podgurski, supra note 8, at 97-102.

12 See Kho, Abel N. et al., Electronic Medical Records for Genetic Research: Results of the eMERGE Consortium, 3 Sci. Transl. Med. 78re1, 5 (2011)CrossRefGoogle ScholarPubMed; Safran, Charles, Toward a National Framework for the Secondary Use of Health Data: An American Medical Informatics Association White Paper, 14 J. Am. Med. Informatics Ass’N 1, 2 (2007);Google Scholar Weiner, Mark G. & Embi, Peter J., Toward Reuse of Clinical Data for Research and Quality Improvement: The End of the Beginning?, 151 Ann. Intern. Med. 359, 359-60 (2009).CrossRefGoogle ScholarPubMed

13 See infra notes 225-226.

14 42 U.S.C. § 1320e (Supp IV. 2010); see infra notes 97-100.

15 Food and Drug Administration Amendments Act of 2007, Pub. L. No. 110-85, 121 Stat. 823 (codified as amended in scattered sections of 21 U.S.C.); see infra Part II.B.3.

16 Peterson, Pamela N. & Varosy, Paul D., Observational Comparative Effectiveness Research: Comparative Effectiveness and Caveat Emptor, 5 Circulation Cardiovasc. Quality & Outcomes 150, 151 (2012)Google ScholarPubMed (warning that “a primary determinant of the quality of any study is the quality of the data” and that “how the results of observational studies are interpreted and used” is of critical importance).

17 See infra note 37 and accompanying text for definition of federated database system. In other contexts, biomedical databases can also consist of data collected from large-scale clinical studies. Nadkarni, Prakash M., Managing AttributeValue Clinical Trials Data Using the ACT/DB ClientServer Database System, 5 J. Am. Med. Informatics Ass’N 139, 139 (1998)CrossRefGoogle Scholar (stating that “complex trials need sophisticated database expertise not readily available to individual investigators”).

18 Charles P. Friedman & Jeremy C. Wyatt, Evaluation Methods In Biomedical Informatics 369 (Kathryn Hannah & Marion Ball eds., 2d ed. 2006) (defining observational studies as involving an “[a]pproach to study design that entails no experimental manipulation”); Bryan F. J. Manly, The Design And Analysis Of Research Studies 1 (1992) (explaining that observational studies involve the collection of data “by observing some process which may not be well-understood”); Paul R. Rosenbaum, Observational Studies vii (2d ed. 2001) (stating that an observational study is “an empiric investigation of treatments, policies, or exposures and the effects they cause, but it differs from an experiment in that the investigator cannot control the assignment of treatments to subjects”). When using the term “observational studies,” we refer only to studies involving the review of existing records or data.

19 Manly, supra note 18, at 1 (explaining that experimental clinical studies involve “the collection of data on a process when there is some manipulation of variables that are assumed to affect the outcome of a process, keeping other variables constant as far as possible”); Hoffman & Podgurski, supra note 8, at 98-102 (contrasting clinical trials and observational studies).

20 See infra notes 115-119 and accompanying text.

21 David L. Faigman et al., Modern Scientific Evidence: Standards, Statistics, And Research Methods 338-42 (student ed. 2008).

22 Id.

23 Id. at 339-40 (explaining that epidemiological evidence has already played an important role in many mass tort cases); Gold, Steve C., The More We Know, the Less Intelligent We Are?–How Genomic Information Should, and Should Not, Change Toxic Tort Causation Doctrine, 34 Harv. Envtl. L. Rev. 369, 412-17 (2010)Google Scholar (discussing genes and other toxins as alternate causes of plaintiffs’ injuries).

24 See infra Part III.

25 Reed Abelson et al., Medicare Bills Rise as Records Turn Electronic, N.Y. Times, Sept. 21, 2012, at A1, A3, http://www.nytimes.com/2012/09/22/business/medicare-billing-rises-at-hospitals-with-electronic-records.html?_r=0.

26 Id.

27 See infra Part IV.

28 Kleinberg, Samantha & Hripcsak, George, A Review of Causal Inference for Biomedical Informatics, 44 J. Biomed. Informatics 1102, 1102 (2011)CrossRefGoogle ScholarPubMed (defining causal inference as “the process of uncovering causal relationships from data”).

29 See Miguel A. Hernn & Sarah L. Taubman, Does Obesity Shorten Life? The Importance of Well-Defined Interventions to Answer Causal Questions, 32 INT’L J. OBESITY S8 (2008).

30 Id. at S13.

31 Id.

32 See infra Part V.

33 See supra notes 1-6 and accompanying text.

34 See Staudt, Nancy C. & VanderWeele, Tyler J., Methodological Advances and Empirical Legal Scholarship: A Note on Cox and Miles's Voting Rights Act Study 109 Colum. L. Rev. Sidebar 42, 43 (2009)Google Scholar (asserting that by 2009 the methodology of causal diagrams had “become popular in a number of disciplines – including statistics, biostatistics, epidemiology, and computer science … [but had yet] to appear in the empirical law literature”).

35 Hoffman & Podgurski, supra note 8, at 128-30.

36 Wilson D. Pace Et Al., Agency For Health Care Res. & Quality, Distributed Ambulatory Research In Therapeutic Network (Dartnet): Summary Report ii (2009), available at http://www.effectivehealthcare.ahrq.gov/ehc/products/53/151/2009_0728DEcIDE_DARTNet.pdf.

37 Weber, Griffin M. et al., The Shared Health Research Information Network (SHRINE): A Prototype Federated Query Tool for Clinical Data Repositories, 16 J. Am. Med. Informatics Ass’N 624, 624 (2009).CrossRefGoogle ScholarPubMed A federated network can be defined as one that “links geographically and organizationally separate databases to allow a single query to pull information from multiple databases while maintaining the privacy and confidentiality of each database.” PACE, supra note 36, at ii.

38 Hoffman & Podgurski, supra note 8, at 131-33.

39 Id. at 91.

40 Ancker, Jessica S. et al., Root Causes Underlying Challenges to Secondary Use of Data, Amia Annual Symposium Proceedings 57, 57 (2011);Google Scholar Botsis, Taxiarchis et al., Secondary Use of EHR: Data Quality Issues and Informatics Opportunities, Amia Joint Summits On Transl. Sci. 1, 1 (2010).Google Scholar

41 Press Release, Office of Sci. & Tech. Policy, Exec. Office of the President, Obama Administration Unveils “Big Data” Initiative: Announces 200 Million in New R & D Investments (Mar. 29, 2012), available at http://www.whitehouse.gov/sites/default/files/microsites/ostp/big_data_press_release_final_2.pdf.

42 Id.

43 Id. The agencies are the Office of Science and Technology Policy, National Science Foundation, National Institutes of Health, Department of Defense, Department of Energy, and U.S. Geological Survey.

44 Id. The international 1000 Genome Project “aims to find most genetic variants that have frequencies of at least 1 percent in the populations studied.” According to the National Institutes of Health, it is the world's largest human genetic variation data set, with 200 terabytes – “the equivalent of 16 million file cabinets filled with text, or more than 30,000 standard DVDs.” The information i.available on the Amazon Web Services cloud. Baumann, Jeannie, White House Initiative Aims to Improve Use of Large Digital Databases for R & D, 11 Med. Res. L. & Pol’Y 217, 217-18 (2012).Google Scholar

45 See, e.g., Ctrs. for Medicare & Medicaid Servs. (CMS), About Chronic Conditions Data Warehouse, Chronic Conditions Data Warehouse, https://www.ccwdata.org/web/guest/about-ccw (last visited Oct. 16, 2013); FDA's Sentinel Initiative, U.S. FOOD & DRUG ADMIN. (Sept. 4, 2013), http://www.fda.gov/safety/FDAsSentinelInitiative/ucm2007250.htm; Million Veteran Program: A Partnership with Veterans, U.S. Dep'T Of Veterans Affairs (Mar. 6, 2013), http://www.research.va.gov/mvp/veterans.cfm.

46 Million Veteran Program, supra note 45.

47 Id.

48 CMS, supra note 45.

49 Id. CCW was created pursuant to section 723 of the Medicare Modernization Act of 2003. Medicare Modernization Act of 2003 § 723, 42 U.S.C. § 1395b-8 (2006).

50 McGraw, Deven et al., A Policy Framework for Public Health Uses of Electronic Health Data, 21(S1) Pharmacoepi. & Drug Safety 18, 18 (2012);CrossRefGoogle Scholar FDA's Sentinel Initiative, supra note 45. The Sentinel initiative was authorized by Congress in the Food and Drug Administration Amendments Act of 2007. FDA's Sentinel Initiative-Background, U.S. Food & Drug Admin. (Sept. 22, 2013), http://www.fda.gov/Safety/FDAsSentinelInitiative/ucm149340.htm.

51 McGraw et al., supra note 50, at 18.

52 Id. at 19. See Hoffman & Podgurski, supra note 8, at 131-33 (discussing distributed databases).

53 McGraw et al., supra note 50, at 19.

54 Welcome to MedMining, MEDMINING, http://www.medmining.com/index.html (last visited Oct. 13, 2013).

55 Id.

56 Explorys Overview, EXPLORYS, https://www.explorys.com/docs/data-sheets/explorys-overview.pdf (last visited Oct. 16, 2013).

57 Id.

58 Id.

59 Emerge Network, http://emerge.mc.vanderbilt.edu/ (last visited Oct. 16, 2013). The seven sites are: Group Health Cooperative with the University of Washington, Geisinger, Marshfield Clinic, Mayo Clinic, Mount Sinai School of Medicine, Northwestern University, and Vanderbilt University. National Human Genome Research Institute, Electronic Medical Records and Genomics (Emerge) Network, Genome.Gov, http://www.genome.gov/27540473 (last updated Aug. 29, 2013).

60 Emerge Network, supra note 59; Appropriations Subcommittee Statement on the Fiscal Year 2013 Budget (Mar. 23, 2012), NAT’L INST. GEN. MED. SCIS., http://www.nigms.nih.gov/About/Budget/Statements/March23_2012.htm.

61 McCarty, Catherine A. et al., The eMERGE Network: A Consortium of Biorepositories Linked to Electronic Medical Records Data for Conducting Genomic Studies, 4 Bmc Med. Genomics 13, 14 (2011).CrossRefGoogle ScholarPubMed

62 Id. A recent study found that data captured from EHRs could identify disease characteristics with sufficient accuracy to be used in genome-wide association studies. Kho et al., supra note 12, at 4-5.

63 Dartnet Institute: Informing Practice, Improving Care, http://www.dartnet.info/ (last visited Oct. 16, 2013); About DARTNet, DARTNET INST., http://www.dartnet.info/AboutDI.htm (last visited Oct. 15, 2013).

64 History of the Organization, DARTNET INST., http://www.dartnet.info/organization.htm (last visited Oct. 15, 2013); Networks, DARTNET INST., http://www.dartnet.info/networks.htm (last visited Oct. 15, 2013).

65 See About DARTNet, supra note 63.

66 Technology, DARTNET INST., http://www.dartnet.info/Technology.htm (last visited Oct. 15, 2013).

67 Software Tools, NAT’L CANCER INST., http://www.cancer.gov/clinicaltrials/international/answers/softwaretools (last visited Oct. 15, 2013) (stating that the initiatives’ goal is to “[b]uild or adapt tools for collecting, analyzing, integrating, and disseminating information associated with cancer research and care”).

68 INTERMACS Description, Interagency Registry for Mechanically Assisted Circulatory Support (INTERMACS), http://www.uab.edu/ctsresearch/intermacs/description.htm (last visited Oct. 22, 2013) (explaining that analysis of the collected data is expected to improve patient care and “influence future research”).

69 ELSO Registry Information Data Policy, ECMO REGISTRY EXTRACORPOREAL LIFE SUPPORT ORG., http://www.elso.med.umich.edu/DataRequests.html (last updated Oct. 12, 2010) (providing details concerning the collection of data with most identifiers removed, submission of queries, and release of query results to members in aggregate form).

70 Data, UNITED NETWORK FOR ORGAN SHARING (UNOS), http://www.unos.org/donation/index.php?topic=data (last visited Oct. 15, 2013) (discussing the creation of UNet, an online database system that “contains data regarding every organ donation and transplant event occurring in the United States since 1986”).

71 See Hyman, David A. & Silver, Charles, The Poor State of Health Care Quality in the U.S.: Is Malpractice Liability Part of the Problem or Part of the Solution? 90 Cornell L. Rev. 893, 952 (2005)Google ScholarPubMed (observing that a “great deal of uncertainty exists about the ‘best’ treatment for particular clinical conditions, and about the ‘best’ way to perform those treatments” and that the “efficacy of most medical treatments has never been proven”); Stewart, Walter F. et al., Bridging the Inferential Gap: The Electronic Health Record and Clinical Evidence, 26 Health Aff. w181, w181 (2007)CrossRefGoogle ScholarPubMed (discussing the “inferential gap” between “the paucity of what is proved to be effective for selected groups of patients versus the infinitely complex clinical decisions required for individual patients”).

72 John Carey, Medical Guesswork, Businessweek, May 29, 2006, at 73, available at http://www.businessweek.com/stories/2006-05-28/medical-guesswork (asserting that many physicians “say the portion of medicine that has been proven effective is still outrageously low – in the range of 20% to 25%”).

73 Hoffman & Podgurski, supra note 8, at 97-102 (discussing the benefits of EHR-based research).

74 Etheredge, Lynn M., A Rapid-Learning Health System, 26 Health Aff. w107, w111 (2007)CrossRefGoogle ScholarPubMed, available at http://content.healthaffairs.org/cgi/content/full/26/2w107; Hoffman & Podgurski, supra note 8, at 97-102; Liang, Louise, The Gap Between Evidence and Practice, 26 Health Aff. w119, w120 (2007)CrossRefGoogle Scholar (asserting that EHRs “have the potential to take over where clinical trials and evidence-based research leave off, by providing real-world evidence of drugs’ and treatments’ effectiveness across subpopulations and over longer periods of time”); see Ware, James H. & Hamel, Mary Beth, Pragmatic Trials – Guides to Better Patient Care?, 364 New. Eng. J. Med. 1685, 1685 (2011)CrossRefGoogle Scholar (discussing the shortcomings of clinical trials).

75 Clinical studies involve “the collection of data on a process when there is some manipulation of variables that are assumed to affect the outcome of a process, keeping other variables constant as far as possible.” BRYAN F. J. MANLY, THE DESIGN AND ANALYSIS OF RESEARCH STUDIES 1 (1992). Thus, they involve actual experimentation on human subjects rather than just review of their medical records.

76 Smith, Sheila Weiss, Sidelining Safety–The FDA's Inadequate Response to the IOM, 357 New. Eng. J. Med. 960, 961 (2007).CrossRefGoogle ScholarPubMed

77 See Ioannidis, John P.A., Why Most Published Research Findings Are False, 2 Plos Med. 696, 700 (2005).CrossRefGoogle ScholarPubMed

78 See Moher, David et al., Statistical Power, Sample Size, and Their Reporting in Randomized Controlled Trials, 272 J. Am. Med. Ass’N 122, 122-24 (1994).CrossRefGoogle ScholarPubMed

79 Hoffman & Podgurski, supra note 8, at 98-99; Vandenbroucke, Jan P., The HRT Controversy: Observational Studies and RCTs Fall in Line, 373 Lancet 1233, 1234 (2009).CrossRefGoogle ScholarPubMed

80 Silverman, Stuart L., From Randomized Controlled Trials to Observational Studies, 122 Am. J. Med. 114, 114 (2009)CrossRefGoogle ScholarPubMed (explaining that “[o]bservational studies may be an important addition to the clinician's resources by complementing randomized controlled trial data with information on efficacy, safety, and patient compliance in a population of real-world patients”); Stewart et al., supra note 71, at 73 (stating that analysis of EHR data should help bridge the “inferential gap” between “the paucity of what is proved to be effective for selected groups of patients versus the infinitely complex clinical decisions required for individual patients”).

81 Vandenbroucke, Jan P., Observational Research, Randomised Trials, and Two Views of Medical Science, 5 Plos Med. 339, 341 (2008)CrossRefGoogle ScholarPubMed, available at http://www.plosmedicine.org/article/info%3Adoi%2F10.1371%2Fjournal.pmed.0050067 (explaining that adverse effects are generally unexpected and unpredictable, and therefore are not subject to “confounding by indication” and can be determined through observational studies). See infra notes 244-247 and accompanying text for discussion of confounding by indication.

82 Vandenbroucke, supra note 81, at 343 (asserting that “[m]uch good can come from going down the wrong alley and detecting why it is wrong, or playing with a seemingly useless hypothesis; the real breakthrough might come from that experience”).

83 Benson, Kjell & Hartz, Arthur J., A Comparison of Observational Studies and Randomized, Controlled Trials, 342 New Eng. J. Med. 1878, 1878 (2000)CrossRefGoogle ScholarPubMed (mentioning “greater timeliness” as an advantage of observational studies); Kaelber, David et al., Patient Characteristics Associated with Venous Thromboembolic Events: A Cohort Study Using Pooled Electronic Health Record Data, 19 J. Am. Med. Informatics Ass’N 965, 966 (2012);Google ScholarPubMed Port, Friedrich K., Role of Observational Studies Versus Clinical Trials in ESRD Research, 57 Kidney Int’L S3, S4 (2000)Google Scholar, available at http://www.nature.com/ki/journal/v57/n74s/full/4491615a.html. For further details about the benefits of observational studies, see Hoffman & Podgurski, supra note 8, at 97-102.

84 Wakefield, Andrew J. et al., Ileal-Lymphoid-Nodular Hyperplasia, Non-Specific Colitis, and Pervasive Developmental Disorder in Children, 351 Lancet 637, 641 (1998).CrossRefGoogle ScholarPubMed

85 Id. at 637.

86 Murch, Simon H. et al., Retraction of an Interpretation, 363 Lancet 750, 750 (2004).CrossRefGoogle ScholarPubMed Dr. Wakefield did not join the retraction.

87 Taylor, Brent et al., Autism and Measles, Mumps, and Rubella Vaccine: No Epidemiological Evidence for a Causal Association, 353 Lancet 2026, 2026-29 (1999).CrossRefGoogle ScholarPubMed Furthermore, the British General Medical Council found that Dr. Wakefield was guilty of multiple transgressions, including dishonesty, financial misconduct, and “callous disregard” of the suffering of the children who were his research subjects. General Medical Council, Dr. Andrew Jeremy Wakefield, Determination on Serious Professional Misconduct (SPM) and Sanction8 (May 24, 2010), available at http://www.gmc-uk.org/Wakefield_SPM_and_SANCTION.pdf_32595267.pdf.

88 Measles, Mumps, and Rubella (MMR) Vaccine, Ctrs. For Disease Control And Prevention, http://www.cdc.gov/vaccinesafety/Vaccines/MMR/MMR.html (last updated Feb. 7, 2011).

89 Kohane, supra note 9, at 417.

90 Juran, Brian D. & Lazaridis, Konstantinos N., Genomics in the Post-GWAS Era, 31 Sem. In Liver Disease 215, 215 (2011);CrossRefGoogle ScholarPubMed Lambert, Christophe G. & Black, Laura J., Learning From Our GWAS Mistakes: From Experimental Design to Scientific Method, 13 Biostatistics 195, 196 (2012).CrossRefGoogle ScholarPubMed

91 Dictionary of Cancer Terms, Nat’L Cancer Inst., http://www.cancer.gov/dictionary?cdrid=636779 (last visited Oct. 25, 2013).

92 Nat’L Human Genome Res. Inst., A Catalog of Published Genome-Wide Association Studies, GENOME.GOV, http://www.genome.gov/gwastudies/ (last visited Oct. 26, 2013).

93 Hunter, David J., Lessons from Genome-Wide Association Studies for Epidemiology, 23 Epidemiology 363, 363 (2012)CrossRefGoogle ScholarPubMed (stating that “GWAS-discovered variants are relatively ‘weak’ risk factors” and “are not modifiable factors with direct potential to reduce disease incidence” but will improve “understanding of disease mechanisms” and perhaps facilitate “identification of persons at higher or lower risk of specific diseases”); Juran & Lazaridis, supra note 90, at 215-16.

94 Lambert & Black, supra note 90, at 196-97.

95 Holmans, P.A. et al., Genomewide Linkage Scan of Schizophrenia in a Large Multicenter Pedigree Sample Using Single Nucleotide Polymorphisms, 14 Molecular Psych. 786, 786-87 (2009).CrossRefGoogle Scholar

96 An allele “is one of two or more versions of a gene.” Thus, the term “allele” is used when there is “variation among genes.” Nat’L Human Genome Res. Inst., Allele, Genome.Gov, http://www.genome.gov/glossary/?id=4 (last visited Oct. 25, 2013).

97 Holmans et al., supra note 95, at 787.

98 42 U.S.C. § 1320e (Supp IV. 2010); Inst. Of Med., Initial National Priorities For Comparative Effectiveness Research (2009), available at http://www.iom.edu/Reports/2009/ComparativeEffectivenessResearchPriorities.aspx (emphasizing the need for CER and proposing initial CER priorities).

99 42 U.S.C. § 1320e(a)(2)(A) (Supp. IV 2010).

100 See id.§ 1320e(d)(2)(A). See Concato, John et al., Observational Methods in Comparative Effectiveness Research, 123 Am. J. Med. e16, e16 (2010);CrossRefGoogle ScholarPubMed Schneeweiss, S. et al., Assessing the Comparative Effectiveness of Newly Marketed Medications: Methodological Challenges and Implications for Drug Development, 90 Clin. Pharmacology & Therapeutics 777, 777 (2011)CrossRefGoogle ScholarPubMed (discussing the use of “secondary health-care data, including electronic medical records” for purposes of CER); Vandenbroucke, supra note 81, at 340.

101 See42 U.S.C. § 1320e(d)(2)(A) (Supp. IV 2010); Manchikanti, L. et al., Facts, Fallacies, and Politics of Comparative Effectiveness Research: Part 1. Basic Consideration, 13 Pain Physician E23, E39 (2010);Google Scholar Elshaug, Adam G. & Garber, Alan M., How CER Could Pay for Itself – Insights from Vertebral Fracture Treatments, 364 New Eng. J. Med. 1390, 1392-93 (2011).CrossRefGoogle ScholarPubMed

102 Chan, Kitty S. et al., Electronic Health Records and the Reliability and Validity of Quality Measures: A Review of the Literature, 67 Med. Care Res. & Rev. 503, 504 (2010).CrossRefGoogle ScholarPubMed

103 Id.; Parsons, Amanda et al., Validity of Electronic Health Record-Derived Quality Measurement for Performance Monitoring, 19 J. Am. Med. Informatics Ass’N 604, 609 (2012)Google ScholarPubMed (finding that “EHR-derived quality measurement has limitations due to several factors, most notably variations in EHR content, structure and data format, as well as local data capture and extraction procedures”); Roski, Joachim & McClellan, Mark, Measuring Health Care Performance Now, Not Tomorrow: Essential Steps to Support Effective Health Reform, 30 Health Aff. 682, 683 (2011).CrossRefGoogle Scholar

104 See Horvath, Monica M. et al., The DEDUCE Guided Query Tool: Providing Simplified Access to Clinical Data for Research and Quality Improvement, 44 J. Biomed. Informatics 266, 273 (2011)CrossRefGoogle ScholarPubMed (stating that Duke University Hospital sought data in order to evaluate the effects of new health information technology that it had implemented).

105 See Chan et al., supra note 102, at 504; Tang, Paul C. et al., Comparison of Methodologies for Calculating Quality Measures Based on Administrative Data Versus Clinical Data from an Electronic Health Record System: Implications for Performance, 14 J. Am. Med. Informatics Ass’N 10, 10 (2007).CrossRefGoogle Scholar

106 Ross, Joseph S. et al., State-Sponsored Public Reporting of Hospital Quality: Results Are Hard to Find and Lack Uniformity, 29 Health Aff. 2317, 2318-19 (2010);CrossRefGoogle ScholarPubMed HANYS QUALITY INST., UNDERSTANDING PUBLICLY REPORTED HOSPITAL QUALITY MEASURES: INITIAL STEPS TOWARD ALIGNMENT, STANDARDIZATION, AND VALUE, 1-3 (2007), available at http://www.hanys.org/publications/upload/hanys_quality_report_card.pdf.

107 See Ross et al., supra note 106, at 2318; What Is Hospital Compare?, U.S. DEP’T. HEALTH & HUMAN SERVS., http://www.hospitalcompare.hhs.gov/About/WhatIs/What-Is-HOS.aspx (last visited Oct. 22, 2013).

108 See Evans, Barbara J., Seven Pillars of a New Evidentiary Paradigm: The Food, Drug, and Cosmetic Act Enters the Genomic Era, 85 Notre Dame L. Rev. 419, 479–85 (2010)Google Scholar (discussing “infrastructure to develop evidence for postmarket observational studies”).

109 FDAAA of 2007, Pub. L. No. 110-85, 121 Stat. 823 (codified as amended in scattered sections of 21 U.S.C.).

110 21 U.S.C. § 355(o)(3) (Supp. IV 2010) (discussing post-approval studies).

111 See id. § 355(o)(3)(D) (stating that clinical trials should be conducted only if other types of studies would be inadequate).

112 See supra notes 45, 50-52 and accompanying text.

113 Port, Friedrich K., Role of Observational Studies Versus Clinical Trials in ESRD Research, 57 Kidney Int’L S3, S3 (2000),CrossRefGoogle Scholar available at http://www.nature.com/ki/journal/v57/n74s/full/4491615a.html (stating that “[r]andomized controlled clinical trials have been considered by many to be the only reliable source for information in health services research”). See also Hoffman, Sharona, The Use of Placebos in Clinical Trials: Responsible Research or Unethical Practice?, 33 Conn. L. Rev. 449, 452-54 (2001)Google ScholarPubMed (describing different designs of clinical trials).

114 See Evans, supra note 108, at 439-50; Vandenbroucke, supra note 81, at 339.

115 Evans, supra note 108, at 439-50 (arguing that observational research and randomized clinical trials are each preferable in different circumstances, depending on the particulars of the research hypothesis).

116 21 U.S.C. § 355-1 (Supp. IV 2010).

117 Id.§ 355(o)(4).

118 Id.§ 355(e).

119 Xanodyne Agrees to Remove Propoxyphene from U.S. Market, U.S. FOOD & DRUG ADMIN., (Nov. 19, 2010), http://www.fda.gov/NewsEvents/Newsroom/PressAnnouncements/ucm234350.htm (stating that the FDA based its request in part on a review of “postmarketing safety databases”).

120 Coloma, Preciosa M. et al., Combining Electronic Healthcare Databases in Europe to Allow for Large-Scale Drug Safety Monitoring: The EU-ADR Project, 20 Pharmacoepi. & Drug Safety 1 (2011);CrossRefGoogle ScholarPubMed Trifir, Gianluca et al., Data Mining on Electronic Health Record Databases for Signal Detection in Pharmacovigilance: Which Events to Monitor? 18 Pharmacoepi. & Drug Safety 1176, 1177 (2009).Google Scholar

122 See WELCOME TO THE EU-ADR WEBSITE, available at http://euadr-project.org/drupal/ (last visited Nov. 9, 2013).

123 See id.; Coloma et al., supra note 120, at 2; Trifir et al., supra note 120, at 1177.

124 Resnic, Frederic S. & Normand, Sharon-Lise T., Postmarketing Surveillance of Medical Devices – Filling in the Gaps, 366 New Eng. J. Med. 875, 875 (2012).CrossRefGoogle ScholarPubMed

125 See id.

126 See id.; Hauser, Robert G., Here We Go Again – Another Failure of Postmarketing Device Surveillance, 366 New Eng. J. Med. 873, 873-74 (2012).CrossRefGoogle ScholarPubMed

127 Hauser, supra note 126, at 874.

128 Resnic & Normand, supra note 119, at 876; Welcome to INTERMACS, UAB SCHOOL MED., http://www.uab.edu/medicine/intermacs/ (last visited Oct. 22, 2013).

129 Resnic & Normand, supra note 119, at 877.

130 Leslie Lenert & David Sundwall, Public Health Surveillance and Meaningful Use Regulations: A Crisis of Opportunity, 102 AM. J. PUB. HEALTH e1, e1 (2012).

131 Resnic & Normand, supra note 124, at 876; Welcome to INTERMACS, supra note 128.

132 Lenert & Sundwall, supra note 130, at e1-e2 (arguing that the infrastructure of contemporary public health authorities is inadequate for the task of receiving and processing such large amounts of information).

133 Hoffman, Sharona & Podgurski, Andy, Big Bad Data: Law, Public Health, and Biomedical Databases, 41 J.L. Med. & Ethics 56, 56 (2013).CrossRefGoogle ScholarPubMed

134 See, e.g., eHealth in Public Health, CAL. DEPT. HEALTH, http://www.cdph.ca.gov/data/informatics/Pages/eHealth.aspx (last visited Oct. 25, 2013); Disease Prevention, MONROE CNTY., http://www2.monroecounty.gov/health-diseases.php (last visited Oct. 25, 2013); HIV/AIDS Program Home, IOWA DEPT. PUB. HEALTH, http://www.idph.state.ia.us/HivStdHep/HIV-AIDS.aspx?prog=Hiv&pg=HivHome (last visited Oct. 25, 2013).

135 Hazlehurst, Brian et al., Detecting Possible Vaccine Adverse Events in Clinical Notes of the Electronic Medical Record, 27 Vaccine 2077, 2077 (2009).CrossRefGoogle ScholarPubMed

136 Id. at 2081.

137 See State Vaccination Requirements, CTRS. FOR DISEASE CONTROL & PREVENTION, http://www.cdc.gov/vaccines/vac-gen/laws/state-reqs.htm (last modified Sept. 30, 2011); State Law and Vaccine Requirements, NAT’L VACCINE INFO. CTR., http://www.nvic.org/vaccine-laws/state-vaccine-requirements.aspx (last visited Oct. 22, 2013).

138 See Paneth-Pollak, Rachel et al., Using STD Electronic Medical Record Data to Drive Public Health Program Decisions in New York City, 100 Am. J. Pub. Health 586, 586 (2010).CrossRefGoogle ScholarPubMed

139 Id.

140 Id. at 589.

141 Herwehe, Jane et al., Implementation of an Innovative, Integrated Electronic Medical Record (EMR) and Public Health Information Exchange for HIV/AIDS, 19 J. Am. Med. Informatics Ass’N 448, 448 (2012).CrossRefGoogle ScholarPubMed

142 Id. at 448-49.

143 Id. at 452 (Louisiana has developed similar alerts for tuberculosis patients in need of follow-up care.).

144 Hoffman, Sharona & Podgurski, Andy, Finding A Cure: The Case for Regulation and Oversight of Electronic Health Record Systems, 22 Harv. J.L. & Tech. 103, 124 (2008).Google Scholar

145 See Inpatient Hospital Discharge Data, CAL. DIABETES PROGRAM, http://www.caldiabetes.org/content.cfm?contentID=487&CategoriesID=31&CFID=5020870&CFTOKEN=92167121 (last visited Oct. 22, 2013); Health Care Information User Manual, Texas Hospital Inpatient Discharge Public Use Data File, TEX. DEP't STATE HEALTH SERVS. http://www.dshs.state.tx.us/thcic/hospitals/Inpatientpudf.shtm (last updated Aug. 12, 2013); VUHDDS Frequently Asked Questions, VT. DEP't HEALTH, http://healthvermont.gov/research/hospital-utilization/VHUR_FAQS.aspx (last visited Oct. 25, 2013).

146 See Herwehe et al., supra note 141, at 452; Rodwin, Marc A., Patient Data: Property, Privacy & the Public Interest, 36 Am. J.L. & Med. 586, 615 (2010).Google ScholarPubMed See also Hoffman & Podgurski, supra note 8, at 95-97, 104-07, 128-31 (discussing de-identification of data).

147 See Welcome to MedMining, MEDMINING, http://www.medmining.com/index.html (last visited Oct. 22, 2013); Request Data, STRATEGIC HEALTHCARE PROGRAMS, LLC, https://www.shpdata.com/company/requestdata.aspx (last visited Oct. 22, 2013).

148 Rodwin, supra note 146, at 590.

149 Id. at 589.

150 FAIGMAN ET AL., supra note 20, at 339-40.

151 Id. at 341; Norris v. Baxter Healthcare Corp., 397 F.3d 878, 882 (10th Cir. 2005) (noting that “the body of epidemiology largely finds no association between silicone breast implants and immune system diseases”).

152 Milberger, Sharon et al., Tobacco Manufacturers’ Defence Against Plaintiffs’ Claims of Cancer Causation: Throwing Mud at the Wall and Hoping Some of It Will Stick, 15 Tobacco Control iv17, iv22 (Supp. IV 2006).CrossRefGoogle Scholar

153 See Bowen v. E.I. Du Pont De Nemours & Co., No. Civ.A. 97C-06-194 CH, 2005 WL 1952859, at *4 (Del. Super. Ct. 2005) (involving a claim by defendant that the injuries and condition in question “constitute CHARGE Syndrome, which is generally thought to be genetic, as opposed to environmental, in origin”).

154 Saccone, Nancy L. et al., Multiple Independent Loci at Chromosome 15q25.1 Affect Smoking Quantity: a Meta-Analysis and Comparison with Lung Cancer and COPD, 8 Plos Genetics 1, 3 (2010);Google Scholar Thorgeirsson, Thorgeir E. et al., Sequence Variants at CHRNB3-CHRNA6 and CYP2A6 Affect Smoking Behavior, 42 Nature Genetics 448, 448 (2010).CrossRefGoogle ScholarPubMed

155 Brennan, Paul et al., Genetics of Lung-Cancer Susceptibility, 12 Lancet Oncology 399, 403-04 (2011);CrossRefGoogle ScholarPubMed Broderick, Peter et al., Deciphering the Impact of Common Genetic Variation on Lung Cancer Risk: A Genome-Wide Association Study, 69 Cancer Res. 6633, 6633 (2009);CrossRefGoogle ScholarPubMed Cho, Michael H. et al., A Genome-Wide Association Study of COPD Identifies A Susceptibility Locus on Chromosome 19q13, 21 Human Molecular Genetics 947, 948-49 (2012);CrossRefGoogle ScholarPubMed Saccone et al., supra note 154, at 3; Thorgeirsson et al., supra note 154, at 448.

156 Hakim, Alan J. et al., The Genetic Contribution to Carpal Tunnel Syndrome in Women: A Twin Study, 47 Arthritis & Rheumatism 275, 277 (2002);CrossRefGoogle ScholarPubMed Lozano-Calderon, Santiago et al., The Quality and Strength of Evidence for Etiology: Example of Carpal Tunnel Syndrome, 33A J. Hand Surgery Am. 525, 532-33 (2008).Google Scholar

157 Gold, supra note 23, at 412; Hoffman, Diane E. & Rothenberg, Karen H., Judging Genes: Implications of the Second Generation of Genetic Tests in the Courtroom, 66 Md. L. Rev. 858, 867 (2007);Google Scholar Marchant, Gary E., Genetic Data in Toxic Tort Litigation, 14 J.L. & Pol’Y 7, 12 (2006);Google Scholar Poulter, Susan, Genetic Testing in Toxic Injury Litigation: The Path to Scientific Certainty or Blind Alley?, 41 Jurimetrics J. 211, 217-20 (2001).Google Scholar

158 EEOC v. Burlington N. and Santa Fe Ry. Co., No. 02-C-0456, 2002 WL 32155386, at *1 (E. D. Wis. 2002).

159 Id.The case settled before trial.

160 Milberger et al., supra note 152, at iv22 tbl. 6; Mehlman v. Philip Morris, Inc., No. L-1141-99, (Sup. Ct. N.J. filed Feb. 4, 1999).available at http://legacy.library.ucsf.edu/tid/ekz52d00/pdf (Legacy Tobacco Documents Library).

161 Milberger et al., supra note 152, at iv22; Stephen D. Sugarman, Address at the Robert Wood Johnson Foundation's SAPRP Conference: Tobacco Litigation Update (revised as of November 5, 2001) 2 (Nov. 14, 2001), available at http://www.law.berkeley.edu/sugarman/tobacco_litigation_upate_october_2001_.doc. The decedent, plaintiff's wife, had stopped smoking 30 years before her death.

162 See infra Part III.C.

163 See supra Part II.A. (discussing database initiatives).

164 See infra Parts III.D., IV (discussing software failures and the challenges of causal inference).

165 See infra Part VI.A.

166 WIN PHILLIPS & YANG GONG, HUMAN COMPUTER INTERACTION: INTERACTING IN VARIOUS APPLICATION DOMAINS 589, 591 (Julie A. Jacko ed., 2009).

167 Ancker et al., supra note 40, at 61; Botsis et al., supra note 40, at 3-4; Hoffman, Sharona & Podgurski, Andy, E-Health Hazards: Provider Liability and Electronic Health Record Systems, 24 Berkeley Tech. L.J. 1523, 1544-45 (2009)Google Scholar (discussing input errors).

168 Hirschtick, Robert E., Copy-and-Paste, 295 J. Am. Med. Ass’N 2335, 2335-36 (2006);CrossRefGoogle ScholarPubMed Roszell, Sheila & Stewart, Cheryl, E-charting Point-of-Care Data Entry Dilemma, 38 J. Nursing Admin. 417, 417 (2008).CrossRefGoogle ScholarPubMed

169 PHILLIPS & GONG, supra note 166, at 591.

170 Id.

171 Goldberg, Saveli I. et al., Analysis of Data Errors in Clinical Research Databases, 2008 Amia Ann. Symp. Proc. 242, 244.Google Scholar

172 Id. A second study by the same authors examined weight measurement errors. An algorithm checked the weight records of 25,000 patients, including 420,469 weight entries. It found errors in .58% of entries in the records of “up to 7% of all patients.” See Goldberg, Saveli et al., A Weighty Problem: Identification, Characteristics and Risk Factors for Errors in EMR Data, 2010 Amia Ann. Symp. Proc. 251, 253-54.Google Scholar

173 Nahm, Meredith L. et al., Quantifying Data Quality for Clinical Trials Using Electronic Data Capture, 3 Plos One 1, 1 (2008)CrossRefGoogle ScholarPubMed (discussing a literature review of “42 articles that provided source-to-database error rates, primarily from registries” and finding that the “average error rate across these publications was 976 errors per 10,000 fields”). See also Haerian, Krystl et al., Use of Clinical Alerting to Improve the Collection of Clinical Research Data, 2009 Amia Ann. Symp. Proc. 218, 218Google ScholarPubMed (discussing data error rates pertaining to research databases).

174 Shachak, Aviv et al., Primary Care Physicians’ Use of an Electronic Medical Record System: A Cognitive Task Analysis, 24 J. Gen. Internal Med. 341, 342-44 (2009).CrossRefGoogle ScholarPubMed

175 Pippins, Jennifer R. et al., Classifying and Predicting Errors of Inpatient Medication Reconciliation, 23 J. Gen. Internal Med. 1414, 1414 (2008).CrossRefGoogle ScholarPubMed

176 Id. at 1416.

177 Id.

178 Id.

179 Id. at 1417.

180 Hripcsak, George et al., Bias Associated with Mining Electronic Health Records, 6 J. Biomed. Discovery & Collaboration 48, 52 (2011).CrossRefGoogle ScholarPubMed

181 Gallivan, Steve & Pagel, Christina, Modelling of Errors in Databases, 11 Health Care Mgmt. Sci. 35, 39 (2008);CrossRefGoogle ScholarPubMed Pagel, Christina & Gallivan, Steve, Exploring Potential Consequences on Mortality Estimates of Errors in Clinical Databases, 20 Ima J. Mgmt. Mathematics 385, 391 (2009).CrossRefGoogle Scholar

182 See Jennifer Dobner, Fallout Grows from Hacking of Utah Health Database, REUTERS (Apr. 9, 2012), http://www.reuters.com/article/2012/04/10/us-usa-hackers-utah-idUSBRE83904G20120410 (discussing an incident in which Eastern European hackers gained access to state health records of over 780,000 patients).

183 Greenland, Sander, Multiple-Bias Modelling for Analysis of Observational Data, 168 J. Royal Stat. Soc’Y: Series A (Stat. In Soc’Y) 267, 267-68 (2005).CrossRefGoogle Scholar

184 Cox, Murray P. et al., SolexaQA: At-A-Glance Quality Assessment of Illumina Second-Generation Sequencing Data, 11 Bmc Bioinformatics 485, 485 (2010);CrossRefGoogle ScholarPubMed Klimke, William et al., Solving the Problem: Genome Annotation Standards Before the Data Deluge, 5 Standards In Genomic Scis. 168, 168 (2011);CrossRefGoogle ScholarPubMed Schnoes, Alexandra M. et al., Annotation Error in Public Databases: Misannotation of Molecular Function in Enzyme Superfamilies, 5 Plos Computational Biology 1, 1 (Dec. 2009).CrossRefGoogle ScholarPubMed

185 Schnoes et al., supra note 184, at 2, 6.

186 Klimke et al., supra note 184, at 169.

187 Id. at 168.

188 Id. at 170.

189 Newgard, Craig D. et al., Electronic Versus Manual Data Processing: Evaluating the Use of Electronic Health Records in Out-of-Hospital Clinical Research, 19 Acad. Emergency Med. 217, 224 (2012).CrossRefGoogle ScholarPubMed

190 Alan Brookhart, M. et al., Confounding Control in Healthcare Database Research: Challenges and Potential Approaches, 48 Med. Care S114, S115 (2010)Google Scholar (explaining that one of the limitations of healthcare databases is that “because the data were not collected as part of [a] designed study, many variables that the researcher might wish to have access to remain unrecorded”).

191 Hripcsak et al., supra note 180, at 50. It is especially challenging to analyze the effects of treatments or exposures in the face of data with missing items if they are not missing completely at random. Such non-random omissions create the potential for biased results. Mallinckrodt, Craig H. et al., Assessing and Interpreting Treatment Effects in Longitudinal Clinical Trials with Missing Data, 53 Biological Psychiatry 754, 755 (2003).CrossRefGoogle Scholar

192 Newgard et al., supra note 189, at 225.

193 Id.

194 Interoperable systems can communicate with each other, exchange data, and operate seamlessly and in a coordinated fashion across organizations. BIOMEDICAL INFORMATICS: COMPUTER APPLICATIONS IN HEALTH CARE & BIOMEDICINE 952 (Edward H. Shortliffe & James J. Cimino eds., 2006).

195 Botsis et al., supra note 40, at 4 (stating that the EHR system that was mined for purposes of the study did not contain records of patients who were transferred to dedicated cancer centers because of the severity of their disease or who had initially been treated elsewhere).

196 Ramakrishnan, Naren et al., Mining Electronic Health Records, Computer 95, 96 (2010),Google Scholar available at http://people.cs.vt.edu/ramakris/papers/ehrmining10.pdf (discussing “the lack of data standards”).

197 Id. at 95.

198 HHS Issues Final ICD-10 Sets and Updated Electronic Transaction Standards Rules, U.S. DEP't OF HEALTH & HUMAN SERVS. (Jan. 15, 2009), http://www.hhs.gov/news/press/2009pres/01/20090115f.html; ICD-10, CTRS. FOR MEDICARE & MEDICAID SERVS., http://www.cms.gov/Medicare/Coding/ICD10/index.html?redirect=/ICD10 (last modified Sept. 9, 2013) (indicating that HHS published a proposed rule that would delay the compliance date, setting it at October 1, 2014 rather than October 1, 2013); ICD-10 Code Set to Replace ICD-9, AM. MED. ASS’N, http://www.ama-assn.org/ama/pub/physician-resources/solutions-managing-your-practice/coding-billing-insurance/hipaahealth-insurance-portability-accountability-act/transaction-code-set-standards/icd10-code-set.page (last visited Oct. 22, 2010).

199 de Lusignan, Simon et al., Routinely-Collected General Practice Data Are Complex, but With Systematic Processing Can Be Used for Quality Improvement and Research, 14 Informatics In Primary Care 59, 62 (2006)Google ScholarPubMed (analyzing a “picking list … taken from a general practice computer system”).

200 Ramakrishnan et al., supra note 196, at 95; Jensen, Peter V. et al., Mining Electronic Health Records: Towards Better Research Applications and Clinical Care, 13 Nature Revs. Genetics 395, 401 (2012)Google ScholarPubMed (mentioning “systematic erroneous use of disease terminology codes caused by strategic billing”); Kohane, supra note 9, at 424 (asserting that “the primary driver of EHR implementation has been clinical reimbursement rather than the potential for reuse of the clinical data for research”).

201 OFFICE INSPECTOR GEN., U.S. DEPT. HEALTH & HUMAN SERVS., TOP MANAGEMENT & PERFORMANCE CHALLENGES (2012), https://oig.hhs.gov/reports-and-publications/top-challenges/2012/issue09.asp.

202 Brunt, Christopher S., CPT Fee Differentials and Visit Upcoding Under Medicare Part B, 20 Health Econ. 831, 840 (2011).CrossRefGoogle ScholarPubMed The 2.13 billion figure is in 2007 dollars. Id..

203 Id.

204 Andrea K. Walker, Medical Billing a Target of Fraud Investigations, BALT. SUN, Jan. 12, 2012, http://articles.baltimoresun.com/2012-01-12/health/bs-hs-umms-malnutrition-response-2-20120112_1_health-care-fraud-coding-billing.

205 See, e.g., Liaw, Siaw-Teng et al., Data Quality and Fitness for Purpose of Routinely Collected Data – A General Practice Case Study from an Electronic Practice-Based Research Network (ePBRN), 2011 Amia Ann. Symp. Proc. 785, 789Google Scholar (noting a “lack of implemented terminology and coding standards”).

206 Botsis et al., supra note 40, at 4.

207 See AM. MED. ASS’N, supra note 198.

208 De Lusignan et al., supra note 199, at 62.

209 Id.

210 Ancker et al., supra note 40, at 61.

211 Id.

212 Christopher G. Chute, Medical Concept Representation, inMEDICAL INFORMATICS: KNOWLEDGE MANAGEMENT & DATA MINING IN BIOMEDICINE 163, 170 tbl. 6-1 (Hsinchun Chen et al. eds., 2010).

213 Trent Rosenbloom, S. et al., Data from Clinical Notes: A Perspective on the Tension Between Structure and Flexible Documentation, 18 J. Am. Med. Informatics Ass’N 181, 181-82 (2011).CrossRefGoogle Scholar

214 Id. at 184 (stating that some physicians prefer the flexibility and expressivity of notes); Ramakrishnan et al., supra note 196, at 96-97 (explaining that “much of the relevant data is ‘locked up’ in free text documents”).

215 Ramakrishnan et al., supra note 196, at 97.

216 Kohane, supra note 9, at 420.

217 Kho et al., supra note 12 at 2-4; Ramakrishnan et al., supra note 196, at 97.

218 Benin, Andrea L. et al., How Good Are the Data? Feasible Approach to Validation of Metrics of Quality Derived from an Outpatient Electronic Health Record, 26 Am. J. Med. Quality 441, 441 (2011).CrossRefGoogle ScholarPubMed

219 See Hoffman & Podgurski, supra note 167, at 1552.

220 Hatton, Les, The Chimera of Software Quality, 40 Computer 104, 104 (2007).CrossRefGoogle Scholar

221 See Kelly, Diane F., A Software Chasm: Software Engineering and Scientific Computing, 24 Ieee Software 118, 118-20 (Nov.-Dec. 2007);CrossRefGoogle Scholar Hatton, supra note 220, at 104; Sanders, Rebecca & Kelly, Diane, Dealing with Risk in Scientific Software Development, 25 Ieee Software 21, 27 (July-Aug. 2008).CrossRefGoogle Scholar

222 SeeKelly, supra note 221, at 118.

223 See Henderson-MacLennan, Nicole K. et al., Pathway Analysis Software: Annotation Errors and Solutions, 101 Molecular Genetics & Metabolism 134, 137-38 (2010);CrossRefGoogle ScholarPubMed Sanders & Kelly, supra note 207, at 25.

224 See supra Part III.

225 KENNETH J. ROTHMAN ET AL., MODERN EPIDEMIOLOGY 148-9 (3d ed. 2008).

226 Id. at 149. According to one source, a “confidence interval calculated for a measure of treatment effect shows the range within which the true treatment effect is likely to lie (subject to a number of assumptions).” Huw T. O. Davies & Iain K. Crombie, What Are Confidence Intervals and P-Values?, WHAT IS…? SERIES (Apr. 2009), available at http://www.medicine.ox.ac.uk/bandolier/painres/download/whatis/what_are_conf_inter.pdf.

227 See DAVID L. FAIGMAN ET AL., MODERN SCIENTIFIC EVIDENCE: THE LAW AND SCIENCE OF EXPERT TESTIMONY § 4:16 (2008). ROTHMAN ET AL., supra note 225, at 196.

228 Miller, Franklin G., Research on Medical Records Without Informed Consent, 36 L. Med. & Ethics 560, 560 (2008);CrossRefGoogle Scholar see COMM. ON HEALTH RESEARCH & THE PRIVACY OF INFO.: THE HIPAA PRIVACY RULE, INST. OF MED. (IOM), BEYOND THE HIPAA PRIVACY RULE: ENHANCING PRIVACY, IMPROVING HEALTH THROUGH RESEARCH 209 (Sharyl J. Nass et al., 2009) [hereinafter IOM REPORT].

229 See IOM REPORT, supra note 228, at 213-14.

230 Id. at 212.

231 Hernn, Miguel A. et al., A Structural Approach to Selection Bias, 15 Epidemiology 615, 615 (2004)CrossRefGoogle Scholar (explaining that “the common consequence of selection bias is that the association between exposure and outcome among those selected for analysis differs from the association among those eligible”).

232 See IOM REPORT, supra note 228, at 209.

233 Cole, Stephen R., Illustrating Bias Due to Conditioning on a Collider, 39 Int’L J. Epidemiology 417, 417 (2010).CrossRefGoogle ScholarPubMed

234 ROTHMAN ET AL., supra note 225, at 185.

235 Hernn et al., supra note 231, at 618.

236 Hernn, Miguel A., A Definition of Causal Effect for Epidemiological Research, 58 J. Epidemiology & Community Health 265, 265 (2004).CrossRefGoogle Scholar

237 Id.

238 ROTHMAN ET AL., supra note 225, at 185.

239 Hernn et al., supra note 231, at 617-18.

240 Collider-stratification bias may also occur because of a poorly conceived attempt to adjust for confounding bias, discussed below. Hernn et al., supra note 231, at 620 (stating that “[a]lthough stratification is commonly used to adjust for confounding, it can have unintended effects”).

241 See Greenland, Sander, Quantifying Biases in Causal Models: Classical Confounding vs. Collider-Stratification Bias, 14 Epidemiology 300, 306 (2003).CrossRefGoogle ScholarPubMed

242 See id. at 301.

243 Hernn et al., supra note 231, at 615.

244 See Psaty, Bruce M. & Siscovick, David S., Minimizing Bias Due to Confounding by Indication in Comparative Effectiveness Research, 304 J. Am. Med. Ass’N 897, 897 (2010).CrossRefGoogle ScholarPubMed

245 Hernn et al., supra note 231, at 618.

246 See Bosco, Jaclyn L.F. et al., A Most Stubborn Bias: No Adjustment Method Fully Resolves Confounding by Indication in Observational Studies, 63 J. Clinical Epidemiology 64, 70 (2010).CrossRefGoogle ScholarPubMed

247 See Brookhart et al., supra note 190, at S115.

248 Id.

249 See ROTHMAN ET AL., supra note 225, at 158.

250 Brookhart et al., supra note 190, at S114.

251 Bosco et al., supra note 246, at 64 (stating that “confounding is best controlled by a randomized design”).

252 See, e.g., id.

253 Id. at 64-65.

254 Id. at 65.

255 Psaty & Siscovick, supra note 244, at 898.

256 Id.

257 Id.

258 ROTHMAN ET AL., supra note 225, at 146-47 (discussing generalizability).

259 Id. at 147.

260 Id. at 266.

261 Id. at 271.

262 Hammer, Gael P. et al., Avoiding Bias in Observational Studies, 106 Deutsches Rzteblatt Int’L 664, 665 (2009).Google ScholarPubMed

263 Id.

264 Id.

265 See id.

266 See id.

267 Brookhart et al., supra note 190, at S116 See supra Part III for discussion of deficiencies in EHR documentation.

268 ROTHMAN ET AL., supra note 225, at 137-38.

269 Hernn, Miguel A. & Cole, Stephen R., Causal Diagrams and Measurement Bias, 170 Am. J. Epidemiology 959, 960 (2009).Google Scholar

270 ROTHMAN ET AL., supra note 225, at 144-45.

271 Davey Smith, George, Big Business, Big Science?, 37 Int’L J. Epidemiology 1, 1 (2008)CrossRefGoogle Scholar (stating that “corporate influences can distort the knowledge base of epidemiology” when epidemiologists work “as the hired guns of industry”).

272 See Prescription Drug Advertising: Questions and Answers, U.S. FOOD & DRUG ADMIN., http://www.fda.gov/Drugs/ResourcesForYou/Consumers/PrescriptionDrugAdvertising/UCM076768.htmcontrol_advertisements (last updated Sept. 13, 2012).

273 See Rivlin, Richard S., Can Garlic Reduce Risk of Cancer? 89 Am. J. Clinical Nutrition 17, 17 (2009)CrossRefGoogle ScholarPubMed (asserting that “the very strict criteria required to make a health claim [about the benefits of garlic consumption] may not be met by the limited number of studies conducted to date that are currently available”).

274 See supra notes 1-6 and accompanying text.

275 Id.

276 Daniele Fannelli, Do Pressures to Publish Increase Scientists’ Bias? An Empirical Support from US States [sic] Data, 5 PLOS ONE 1, 4 (2010).

277 Id.

278 Id. at 1.

279 Id.

280 See, e.g., supra Part IV.

281 See Rossouw, Jacques E. et al., Postmenopausal Hormone Therapy and Risk of Cardiovascular Disease by Age and Years Since Menopause, 297 J. Am. Med. Ass’N 1465, 1465 (2007).Google ScholarPubMed

282 Vandenbroucke, supra note 79, at 1233.

283 Id. at 1233-34.

284 Id. at 1235. In addition, the risk of heart disease was found to increase in the first years of HRT use but then waned. Id. at 1234.

285 Moffat, Viva R., Regulating Search, 22 Harv. J.L. & Tech. 475, 481 (2009).Google Scholar

286 Goldman, Eric, Search Engine Bias and the Demise of Search Engine Utopianism, 8 Yale J.L. & Tech. 188, 193 (2006).Google Scholar

287 Facts about Google and Competition, GOOGLE, http://www.google.com/competition/howgooglesearchworks.html (last visited Oct. 22, 2013).

288 Google continually acquires new information by sending “automated ‘spiders’ and ‘crawlers’ onto the Web.” Moffat, supra note 285, at 481.

289 Goldman, supra note 286, at 193.

290 See Hoffman & Podgurski, supra note 8, at 97-102 (discussing the benefits of EHR-based research).

291 Kalra, Dipak et al., ARGOS Policy Brief on Semantic Interoperability, 170 Stud. In Health Tech. & Informatics 1, 5 (2011);Google Scholar See also supra Parts III.B.-III.C.

292 Goble, Carole & Stevens, Robert, State of the Nation in Data Integration for Bioinformatics, 41 J. Biomed. Informatics 687, 687 (2008)CrossRefGoogle Scholar (stating that “the integration of resourcesa prerequisite for most bioinformatics analysisis a perennial and costly challenge”).

293 See supra notes 191-192 and accompanying text.

294 Veltman, Kim H., Syntactic and Semantic Interoperability: New Approaches to Knowledge and the Semantic Web, 7 New Rev. Info. Networking 159, 167 (2001).Google Scholar See also Dolin, Robert H. & Alschuler, Liora, Approaching Semantic Interoperability in Health Level Seven, 18 J. Am. Med. Informatics Ass’N 99, 99100 (2010)CrossRefGoogle ScholarPubMed (providing alternative definitions of “semantic interoperability”).

295 See Botsis et al., supra note 40, at 4 (stating that incompleteness “could be mitigated using health information exchange (HIE) methods”); Herwehe et al., supra note 141, at 448 (explaining that “[e]lectronic health information exchange (HIE) offers a provider-acceptable means of utilizing information from multiple sources”); Jensen et al., supra note 200, at 403 (stating that “EHR data need to be merged across regional barriers in order to provide the strongest basis for research”).

296 Ceusters, Werner & Smith, Barry, Semantic Interoperability in Healthcare State of the Art in the US, St. U.N.Y. Buffalo 1, 4 (2010)Google Scholar, http://ontology.buffalo.edu/medo/Semantic_Interoperability.pdf.

297 Id.

298 M. Alexander Otto, Despite Small Steps, EHR Interoperability Remains Elusive, INTERNAL MED. NEWS (Jan. 31, 2011), http://www.internalmedicinenews.com/news/more-top-news/single-view/despite-small-steps-ehr-interoperability-remains-elusive/71b93edeb0.html.

299 Mike Miliard, EHR/HIE Interoperability Workgroup Agrees on Connectivity Specs, HEALTHCARE IT NEWS (Nov. 9, 2011), http://www.healthcareitnews.com/news/ehrhie-interoperability-workgroup-agrees-connectivity-specs; Official PR: 10 States Now Unified to Standardize Health Data Interoperability, EHR/HIE INTEROPERABILITY WORKGROUP (Feb. 20, 2012), http://interopwg.org/news/OFFICIAL-PR-10-States-Now-Unified-to-Standardize-Health-Data-Interoperability.html.

300 See Hoffman, Sharona & Podgurski, Andy, Finding A Cure: The Case for Regulation and Oversight of Electronic Health Record Systems, 22 Harv. J.L. & Tech. 103, 152-53 (2008)Google Scholar (recommending the development of a common exchange representation).

301 See 45 C.F.R. §§ 170.205, 170.207 (2012) (providing current health information exchange standards).

302 See EHR Incentive Program, CTRS. FOR MEDICARE & MEDICAID SERVS., https://www.cms.gov/Regulations-and-Guidance/Legislation/EHRIncentivePrograms/index.html?redirect=/EHRIncentivePrograms/30_Meaningful_Use.asp (last modified June 26, 2013).

303 See supra Part III.

304 Blanchet, Kevin D., Remote Patient Monitoring, 14 Telemed. & E-Health 127, 128-30 (2008);Google ScholarPubMed Technologies for Remote Patient Monitoring in Older Adults,CTR. FOR TECH. & AGING 1, 4 (2009), http://www.techandaging.org/RPMpositionpaperDraft.pdf.

305 Technologies for Remote Patient Monitoring, supra note 304, at 4.

306 See Michael E. Wiklund, Making Medical Device Interfaces More User-Friendly, inDESIGNING USABILITY INTO MEDICAL PRODUCTS 151–60 (Michael E. Wiklund & Stephen B. Wilcox eds., 2005) (discussing user-interface problems and techniques for enhancing the user-friendliness of medical device interfaces); Williams, Adrian, Design for Better Data: How Software and Users Interact Onscreen Matters to Data Quality, 77 J. Am. Health Info. Mgmt. Inst. 56, 56 (2006)Google ScholarPubMed (stating that “[p]oorly designed software that confronts the user with confusing screens, excessive data entry fields, or unclear navigational tools … threatens the quality of the data that users enter”).

307 See supra note 306; Terry, Ken, Voice Recognition Moves Up a Notch: When the Computer Can Type While You Talk, You Save Money and Time, 81 Med. Econ. Tcp 11 (2004).Google Scholar

308 See supra note 216 and accompanying text.

309 Haerian, Krystl et al., Use of Clinical Alerting to Improve the Collection of Clinical Research Data, 2009 Amia Ann. Symp. Proc. 218, 219-20.Google Scholar

310 Id. at 219.

311 Id.

312 Id. at 220.

313 See supra Part II.B.1.

314 U.S. GOV't ACCOUNTABILITY OFFICE, GAO-06-54, HOSPITAL QUALITY DATA: CMS NEEDS MORE RIGOROUS METHODS TO ENSURE RELIABILITY OF PUBLICLY RELEASED DATA 5 (2006) (discussing the Centers for Medicare and Medicaid Services’ process “for ensuring the accuracy of the quality data submitted by hospitals for the APU program”); Fine, Leon G. et al., How to Evaluate and Improve the Quality and Credibility of an Outcomes Database: Validation and Feedback Study on the UK Cardiac Surgery Experience, 326 Brit. Med. J. 25, 2526 (2003).CrossRefGoogle Scholar

315 See Curran-Everett, Douglas & Benos, Dale J., Guidelines for Reporting Statistics in Journals Published by the American Physiological Society, 18 Physiology Genomics 249, 250 (2004)CrossRefGoogle ScholarPubMed (discussing the importance of reporting uncertainty).

316 Id.

317 Kahn, Michael G. et al., A Pragmatic Framework for Single-Site and Multisite Data Quality Assessment in Electronic Health Record-Based Clinical Research, 50 Med. Care S21, S22 (2012).CrossRefGoogle ScholarPubMed

318 Klimke et al., supra note 184, at 168 (describing methods to assess annotation quality, including combining different pieces of evidence “in order to assign confidence levels to a particular annotation”).

319 Ioannidis, John P. A. et al., Assessment of Cumulative Evidence on Genetic Associations: Interim Guidelines, 37 Int’L J. Epidemiology 120, 122 (2008).Google ScholarPubMed

320 Id. at 126.

321 FOOD & DRUG ADMIN., BEST PRACTICES FOR CONDUCTING AND REPORTING PHARMACOEPIDEMIOLOGIC SAFETY STUDIES USING ELECTRONIC HEALTHCARE DATA SETS (2013), available at http://www.fda.gov/downloads/Drugs/GuidanceComplianceRegulatoryInformation.Guidances/UCM243537.pdf. See also Sanderson, Simon et al., Tools for Assessing Quality and Susceptibility to Bias in Observational Studies in Epidemiology: A Systematic Review and Annotated Bibliography, 36 Int’L J. Epidemiology 666, 666-74 (2007)CrossRefGoogle ScholarPubMed (providing guidance concerning observational studies but not specifically about EHR-based research).

322 See Boffetta, Paolo et al., Recommendations and Proposed Guidelines for Assessing the Cumulative Evidence on Joint Effects of Genes and Environments on Cancer Occurrence in Humans, 41 Int’L J. Epidemiology 686, 686704 (2012);CrossRefGoogle ScholarPubMed see generally Ioannidis et al., supra note 319.

323 Christopher Gibbons, Michael, Use of Health Information Technology Among Racial and Ethnic Underserved Communities, 8 Persp. In Health Info. Mgmt. 1, 6 (2011)Google ScholarPubMed, available at http://perspectives.ahima.org/PDF/Winter_2011/Use_of_HIT_Among_Racial_and_Ethnic_Underserved_Communities/Use_of_HIT_Among_Racial_and_Ethnic_Underserved_Communities_final.pdf.

324 Brabham, Daren C., Crowdsourcing as a Model for Problem Solving: An Introduction and Cases, 14 Convergence 75, 76 (2008);Google Scholar Jeff Howe, Crowdsourcing: A Definition,CROWDSOURCING (June 2, 2006), http://crowdsourcing.typepad.com/cs/2006/06/crowdsourcing_a.html.

325 Tang, Paul C. et al., Personal Health Records: Definitions, Benefits, and Strategies for Overcoming Barriers to Adoption, 13 J. Am. Med. Informatics Ass’N 121, 122 (2006)Google ScholarPubMed (citing MARKLE FOUND., CONNECTING FOR HEALTH: THE PERSONAL HEALTH WORKING GROUP FINAL REPORT (2003) (defining a personal health record as “an electronic application through which individuals can access, manage and share their health information, and that of others for whom they are authorized, in a private, secure, and confidential environment”)).

326 In some cases, patients will be wrong about the existence of an error, and thus clinicians must scrutinize error reports before changing EHR entries.

327 See HIPAA Privacy Rule, 45 C.F.R. § 164.526 (2012) (“An individual has the right to have a covered entity amend protected health information or a record about the individual in a designated record set.”).

328 See Hoffman & Podgurski, supra note 157, at 1530, 1549 (describing secure messaging).

329 See Pennisi, Elizabeth, Proposal to ‘Wikify’ GenBank Meets Stiff Resistance, 319 Sci. 1598, 1598 (2008)CrossRefGoogle ScholarPubMed (describing a controversy regarding the process for correcting errors in GenBank, “the U.S. public archive of sequence data”).

330 JUDEA PEARL, CAUSALITY 65-68 (2d ed. 2009); VanderWeele, Tyler J. & Staudt, Nancy C., Causal Diagrams for Empirical Legal Research: Methodology for Identifying Causation, Avoiding Bias, and Interpreting Results, 10 L. Probability & Risk 329, 329-30 (2011).CrossRefGoogle ScholarPubMed

331 VanderWeele & Staudt, supra note 330, at 333; Swanson, Jeffrey & Ibrahim, Jennifer, Picturing Public Health Law Research: Using Causal Diagrams to Model and Test Theory, Pub. Health L. Res. 1, 6 (2011),Google Scholar http://publichealthlawresearch.org/sites/default/files/SwansonIbrahim-CausalDiagrams-March2012.pdf.

332 See supra note 307.

333 Swanson & Ibrahim, supra note 331, at 6.

334 Id.

335 VanderWeele & Staudt, supra note 330, at 332.

336 Id. at 329.

337 Brookhart et al., supra note 190, at S116.

338 Id.; Swanson & Ibrahim, supra note 331, at 1.

339 VanderWeele & Staudt, supra note 330, at 335.

340 PEARL, supra note 330, at 65-68; Shpitser, Ilya et al., On the Validity of Covariate Adjustment for Estimating Causal Effects, 26 Th Ann. Conf. On Uncertainty In Artificial Intell. (UAI-10) 527, 527-26 (2010);Google Scholar VanderWeele, Tyler J. & Shpitser, Ilya, A New Criterion for Confounder Selection, 67 Biometrics 1406, 1406 (2011).CrossRefGoogle ScholarPubMed

341 PEARL, supra note 330, at 79-81 (explaining the “back-door criterion”).

342 Id.

343 VanderWeele & Staudt, supra note 330, at 335.

344 PEARL, supra note 330, at 72-76 (discussing how the effect of interventions is computed).

345 Id.

346 Id.

347 See supra notes 252-254 and accompanying text.

348 Id.

349 Attia, John et al., How to Use an Article About Genetic Association B: Are the Results of the Study Valid? 301 J. Am. Med. Ass’N 191, 191 (2009);CrossRefGoogle Scholar Geneletti, Sara et al., Assessing Causal Relationships in Genomics: From Bradford-Hill Criteria to Complex Gene-Environment Interactions and Directed Acyclic Graphs, 8 Emerging Themes In Epidemiology 1, 5 (2011).CrossRefGoogle ScholarPubMed For example, researchers have found that standard statistical approaches for estimating/testing direct genetic effects may yield biased estimates when there is a non-genetic link between the target phenotype and another phenotype. Vansteelandt, Stijn et al., On the Adjustment for Covariates in Genetic Association Analysis: A Novel, Simple Principle to Infer Direct Causal Effects, 33 Genetic Epidemiology 394, 395 (2009).CrossRefGoogle ScholarPubMed

350 Sheehan, Nuala A. et al., Mendelian Randomisation: A Tool for Assessing Causality in Observational Epidemiology, in Genetic Epidemiology 153, 153-66 (M. Dawn Teare ed., 2011);CrossRefGoogle Scholar Alekseyenko, Alexander V. et al., Causal Graph-Based Analysis of Genome-Wide Association Data in Rheumatoid Arthritis, 6 Biology Direct 25, 26 (2011);CrossRefGoogle ScholarPubMed Coughlin, Steven S., Quantitative Models for Causal Analysis in the Era of Genome Wide Association Studies, 4 Open Health Serv. Pol’Y J. 118, 120 (2011).Google ScholarPubMed For example, Geneletti et al. present a framework of assessing causal relationships in clinical genomics that integrates Austin Bradford Hill's influential guidelines for assessing causality, on one hand, with the use of graphical models (depicting both causal and non-causal associations), on the other hand. See Geneletti et al., supra note 349, at 5-6.

351 Lefebvre, Celine et al., Reverse-Engineering Human Regulatory Networks, 4 Wiley Interdiscip. Rev. Syst. Biology Med. 311, 311 (2012).CrossRefGoogle ScholarPubMed Such regulation occurs indirectly, via the products of gene expression, namely RNA and proteins. Id. at 312.

352 Barabasi, Albert-Lszl et al., Network Medicine: A Network-Based Approach to Human Disease, 12 Nature Rev. Genetics 56, 56 (2011).CrossRefGoogle ScholarPubMed

353 Swanson & Ibrahim, supra note 331, at 1; Anderson, Evan et al., Measuring Statutory Law and Regulations for Empirical Research, Pub. Health L. Res. Program 1, 12 (2012),Google Scholar http://publichealthlawresearch.org/sites/default/files/MeasuringLawRegulationsforEmpiricalResearch-Monograph-AndersonTremper-March2012.pdf (stating that “[b]y forcing researchers to identify plausible links between the law and health outcomes, causal diagrams help flush out the legal inputs relevant to the question of interest”).

354 In addition, statistical analysis of causal effects based on a causal diagram is valid only if certain strong assumptions hold that relate the diagram to the underlying probability distribution of the variables. Philip Dawid, A., Beware of the DAG, 6 J. Machine Learning Res. 59, 68 (2008),Google Scholar available at http://jmlr.csail.mit.edu/proceedings/papers/v6/dawid10a/dawid10a.pdf.

355 See Brookhart et al., supra note 190, at S116 (explaining that “in many studies of medical interventions, the available subject-matter knowledge is inadequate to specify with any degree of certainty the causal connections between variables”).

356 See supra notes 231, 233-38 and accompanying text. Assume the “collider” variable S indicates whether a study subject is lost to follow-up (1: yes, 0: no) and is influenced by the disease outcome O (1: cured, 0: not cured) under investigation and by treatment T (1: drug A, 0: drug B). If S was always zero (indicating “not lost to follow up”) for study participants, then the path T→S←O would be open and possibly create a spurious association between T and O resulting in selection bias. For example, suppose that a number of study subjects stopped going to the doctor because of unpleasant side effects of drug A (assume drug B has no side effects) or because they experienced no improvement in their disease symptoms and became discouraged. Among subjects who received drug A, those who completed the treatment regime might have experienced an atypically strong therapeutic effect from A, since they were willing to tolerate its side effects. Consequently, treatment A might appear more effective overall, when compared to treatment B, than it really is.

357 Genetic Disease Information – Pronto, HUMAN GENOME PROJECT INFO., http://web.archive.org/web/20130430183952/http://www.ornl.gov/sci/techresources/Human_Genome/medicine/assist.shtml (last modified Mar. 07, 2012) (accessed by searching for Human Genome Project Information in the Internet Archive); Understanding Human Genetic Variation, NAT’L INSTS. OF HEALTH OFFICE OF SCI. EDUC., http://science.education.nih.gov/supplements/nih1/genetic/guide/genetic_variation1.htm (last visited Oct. 15, 2013).

358 Genetic Disease Information – Pronto, supra note 357 (indicating that many other diseases are multi-factorial, chromosomal, and mitochondrial).

359 Understanding Human Genetic Variation, supra note 357.

360 Id.

361 See supra Parts II.A, II.B.1.

362 See supra Part V.

363 See supra Parts IV, VI.A.1.

364 See, e.g., Kuran, Timur & Sunstein, Cass R., Availability Cascades and Risk Regulation, 51 Stan. L. Rev. 683, 685 (1999)CrossRefGoogle Scholar (discussing “the availability heuristic, a pervasive mental shortcut whereby the perceived likelihood of any given event is tied to the ease with which its occurrence can be brought to mind”); Tversky, Amos & Kahneman, Daniel, Availability: A Heuristic for Judging Frequency and Probability, 5 Cognitive Psychology 207, 207 (1973)CrossRefGoogle Scholar (proposing “that when faced with the difficult task of judging probability or frequency, people employ a limited number of heuristics which reduce these judgments to simpler ones”).

365 Galesic, Mirta & Garcia-Retamero, Rocio, Statistical Numeracy for Health: A Cross-Cultural Comparison with Probabilistic National Samples, 170 Archives Internal Med. 462, 467 (2010).CrossRefGoogle ScholarPubMed In addition, “almost 30% could not answer whether 1 in 10, 1 in 100 or 1 in 1000 represents the largest risk” and nearly 30% “could not state what percentage 20 of 100 is.” Id.

366 Gigerenzer, Gerd et al., Helping Doctors and Patients Make Sense of Health Statistics, 8 Psychology Sci. Pub. Int. 53, 54 (2007).Google ScholarPubMed

367 See Hoffman & Podgurski, supra note 8, at 140-41 (developing a more detailed proposal for educational programs regarding EHR databases).

368 DARRELL HUFF, HOW TO LIE WITH STATISTICS 8 (1954).

369 Id. at 9.

370 von Elm, Erik et al., The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) Statement: Guidelines for Reporting Observational Studies, 61 J. Clinical Epidemiology 344, 346-47 (2008).CrossRefGoogle ScholarPubMed

371 See Instructions for Authors, BMJ OPEN, http://bmjopen.bmj.com/site/about/guidelines.xhtml (last visited Oct. 15, 2013); JAMA Instructions for Authors, JAMA NETWORK, http://jama.jamanetwork.com/public/instructionsForAuthors.aspx (last updated Sept. 10, 2013); Types of Article and Manuscript Requirements, LANCET, http://www.thelancet.com/lancet-neurology-information-for-authors/article-types-manuscript-requirements (last visited Oct. 15, 2013).

372 Ioannidis, supra note 77, at 696.

373 Twain, Mark, Chapters from my AutobiographyXX, 186 N. Am. Rev. 465, 471 (1907),Google Scholar reprinted in MARK TWAIN, CHAPTERS FROM MY AUTOBIOGRAPHY ch. 20, at 471 (Shelley Fisher Fishkin ed., Oxford Univ. Press 1996).