1. Introduction
In a recent issue of BMJ Evidence-Based Medicine (EBM), a letter signed by an interdisciplinary group of 42 scholars urged medical researchers to abandon the priority placed on randomized controlled trials (RCTs) in the evaluation of medical evidence (Anjum et al. Reference Anjum, Copeland and Rocca2020). The signatories included Nancy Cartwright, Jacob Stegenga, Alex Broadbent, and other notable philosophers of science. This preponderance of philosophers on the list is less surprising given that the effort to extend the notion of causation has been spearheaded by a network of philosophers of medicine collectively working under the label of “EBM+.”Footnote 1 Considering this origin, what is impressive is the group’s success at reaching outside of philosophy to medical researchers, including those at the center of EBM (e.g., Jeffrey Aronson and Trish Greenhalgh at the Centre for Evidence-Based Medicine at the University of Oxford). This is doubly impressive given the sweeping changes the letter encourages in relation to medical evidence in general.
Though it might be the received view amongst philosophers of medicine, we argue that there are critical weaknesses in EBM+ that militate against some of the sweeping reforms its advocates propose. To clarify the stakes of the debate, we briefly summarize previous critiques of EBM+ and the responses offered in its defense in section 2. We contend that a difference in focus has led to proponents and critics talking past each other. Whereas proponents have focused on medical evidence in multiple contexts, the critics have focused on the narrow role that medical evidence plays in pharmaceutical regulation. Though both EBM and EBM+ purport to be a superior means of evaluating medical evidence in any context, it is at least possible that, for example, EBM+ is superior as a tool for making individual medical decisions yet inferior to traditional EBM approaches in regulatory science.
In this article, we leave other domains aside and focus on the role of RCTs and mechanistic evidence in pharmaceutical regulation. In section 2, we review past critiques of EBM+ and argue they have not been satisfactorily responded to. In section 3, we consider the case of aducanumab that was recently approved for treating Alzheimer’s disease on grounds that are very similar to those advocated by EBM+. We argue that from the perspective of EBM+ its approval should be viewed as a likely success and that from the perspective of EBM its approval should be seen as a mistake. Because the ultimate value of aducanumab is yet to be determined, the case is not proposed as evidence for or against the EBM+ standard—but rather as an example of the kind of cases that EBM+ proponents should be weighing in on to counter their critics and establish evidence for the reforms they propose.Footnote 2 The advantage of engaging with “science in action” is argued for in section 4 where we take up and reply to objections. We conclude that future debate on EBM+ should be confined to particular domains of application.
2. The debate on EBM+
The first sustained critique of EBM+ arises in the work of Jeremy Howick (Reference Howick2011), whose concern echoes the traditional EBM advocates: Mechanistic evidence is too unreliable. He argued, for example, that antiarrhythmic drug tragedy was the result of relying on an incomplete mechanistic understanding of cardiac arrest. It was thought that cardiac arrests were precipitated by ventricular extra beats (VEBs) and thus preventing VEBs would prevent their lethal sequalae. However, while the drugs were approved because they were successful at reducing VEBs, postapproval RCTs showed antiarrhythmics dramatically increased patient mortality and had likely killed tens, if not hundreds, of thousands of people.
More recently, Bennett Holman (Reference Holman2019b) has argued that a fuller examination of the antiarrhythmic drug tragedy reveals that the fault was not with an unwarranted reliance on mechanistic reasoning, but with undue industry influence. However, while rejecting Howick’s argument, he offers EBM+ little support. Rather, Holman claims that both views are “friction-free” accounts of medicine in that they have been abstracting away from the influence of the pharmaceutical industry (i.e., the metaphorical friction). The essence of his critique is that both parties are debating the philosophical merits of different views of causation, without consideration of how those evidential regimes would perform in the actual social context of medical research, replete with nonevidential social forces that significantly impact the course and outcomes of medical research. A similar critique has been advanced by Mattia Andreoletti and David Teira (Reference Andreoletti and Teira2019), who criticize EBM+ for taking a “platonic approach” in which epistemology precedes application.
While Howick (Reference Howick, Jeremy2019) has accepted and built from these critiques, EBM+ proponents have offered counterarguments. Most notably, Jon Williamson has responded that “association studies and mechanistic studies have complementary strengths … because they make up for one another’s deficiencies … their combined evidential value is more than the sum of their parts” (Williamson Reference Williamson2021, 204). But notice that this is not a response to the concerns raised by the critics; this is simply a restatement of the same argument for evidential pluralism that critics have already found wanting. Though EBM+ advocates have at times attempted to illustrate their view with case studies (Abdin et al. Reference Abdin, Auker-Howlett, Landes, Mulla, Jacob and Osimani2019; Auker-Howlett and Wilde Reference Auker-Howlett and Wilde2019), they do not consider the types of cases most central to the concerns expressed by critics (i.e., safety and efficacy judgments pertaining to market entry).Footnote 3 Yet not all members of the EBM+ group are similarly acontextual in their advocacy for medical pluralism. For example, Jacob Stegenga (Reference Stegenga2018) puts forward an argument that draws heavily on concerns around commercial influences on pharmaceutical research.
In response to both types of arguments, critics have issued essentially the same challenge: Provide detailed evidence from case studies that demonstrate society would be better off if it adopted the EBM+ standard (Andreoletti and Teira Reference Andreoletti and Teira2019; Holman Reference Holman2019a; Howick 2019). However, if one is already inclined to accept the EBM+ standard of evidence, one might reasonably make the same demand of EBM proponents: Provide detailed evidence from case studies that demonstrate society is better off by having adopted the EBM standard. Both of these counterfactual arguments can be difficult to make. Especially because EBM is the dominant standard, it is difficult to show that if the EBM+ standard had been followed, a regulator would have correctly approved a drug they would have otherwise wrongly rejected, the drug in this case would never see the light of day. However, a recent approval by the US Food and Drug Administration (FDA) provides just such a test case, which we discuss in the following section.
3. EBM+ and pharmaceutical regulation: The case of aducanumab
On June 7, 2021, the FDA approved Biogen’s aducanumab (a.k.a. Aduhelm) for treatment of Alzheimer’s disease (AD) (FDA 2021). The approval gained considerable attention because aducanumab was the first AD medication to be approved by the FDA since 2003. However, due to the uncertainty surrounding the evidentiary basis of the approval, the decision immediately led to a controversy regarding the drug’s safety and efficacy. In section 3.1, we provide the background for the approval decision. Section 3.2 reviews how mechanistic evidence was used to supplement evidence of association to gain market approval and why this should be seen as a victory for the EBM+ standard of evidence. Section 3.3 argues that from an EBM perspective the approval is a mistake and thus why EBM predicts that the drug has a high likelihood of ultimately needing to be recalled. Finally, section 3.4 reprises the evidential significance of the case.
3.1 Background
Aducanumab is an amyloid-beta-directed antibody whose mechanism is based on theory that the accumulation of amyloid-beta plaques in the brain contributes to the development of AD (Hardy Reference Hardy2009). From the mechanistic standpoint, aducanumab is designed to treat AD by binding to amyloid-beta plaques and removing them through a microglia-mediated phagocytosis mechanism (FDA 2020b, 14). Biogen’s development of aducanumab was a long and complex journey whose history can be traced back to the mid-2000s.Footnote 4 After initially promising signs and high expectations, Biogen conducted two identically designed Phase 3 clinical trials (called EMERGE and ENGAGE) involving nearly 3,300 patients with AD from 2015 to 2018. Two-thirds of the trial participants were randomized to receive monthly infusions containing either a low or high dose of the drug whereas the remaining participants were given placebos. The data regarding aducanumab’s treatment potential for slowing cognitive impairment was acquired by assessing the participants’ health status 18 months after the treatment. Biogen was forced to halt both trials because an interim analysis by an independent monitoring group concluded that the drug showed no potential for treatment.
Although Biogen declared the aducanumab program a failure following the interim analysis, there was evidence that aducanumab had successfully prevented the accumulation of amyloid-beta plaques, and the company subsequently determined that one of the two trials showed a potential benefit through an internal post-hoc analysis of the data. Consequently, Biogen established contact with several regulators at the FDA to probe the possibility of approval on an alternative evidential basis. After close collaborations with the FDA, Biogen revived the program (as Project Onyx) and eventually filed for accelerated approval.
A meeting of the Peripheral and Central Nervous System (PCNS) Drugs Advisory Committee preceded the FDA’s final decision and provided the FDA with independent advice on the advisability of approval (FDA 2020c). Biogen presented its ENGAGE, EMERGE, and safety/tolerability study (designated as Study 301, Study 302, and Study 103, respectively) as the evidence of aducanumab’s safety and efficacy (FDA 2020b, 2020d). With regard to efficacy, the first two were submitted as RCTs while the third was submitted as mechanistic evidence.
Citing traditional EBM requirements of evidence, the PCNS committee, save one abstention, unanimously recommended rejection (FDA 2020e; Feuerstein et al. Reference Feuerstein, Herper and Garde2021). Yet upon the basis of the combination of association studies and mechanistic studies, the FDA granted the drug accelerated approval. The track grants conditional approval when there are surrogate outcomes that are reasonably likely to be translated into the clinical outcomes of interest (FDA 2021; Naci et al. Reference Naci, Smalley and Kesselheim2017). This is distinct from the traditional FDA requirements for approval, which require at least two well-conducted RCTs to establish efficacy. While granting the right to market and sell the treatment, the accelerated approval requires the producer to verify the clinical benefit of its product in a postapproval trial.
3.2 The EBM+ case for approving aducanumab
There is no dispute that the analysis of EMERGE and ENGAGE demonstrates a statistical association between taking high doses of the drug early and symptomatic improvement. At the same time, there is no dispute that the trials alone would not suffice to ground aducanumab’s efficacy due to complications regarding their design and execution. In cases such as this, “it can be useful to consider the evidence in favour of the hypothesised mechanism of action. A well-established mechanism of action can support the efficacy claim” (Aronson et al. Reference Aronson, Caze, Kelly, Parkkinen and Williamson2018, 1170; Parkkinen et al. Reference Parkkinen, Wallmann, Wilde, Clarke, Illari, Kelley, Norell, Russo, Shaw and Williamson2018; Williamson Reference Williamson2021). This is precisely what Biogen did by introducing Study 103.
Study 103 was a Phase 1, 12-month, multicenter, randomized, double-blind, placebo-controlled, dose-ranging, staggered cohort study involving participants with prodromal and mild AD (FDA 2020a, 2020b, 2020d). The study provided mechanistic evidence concerning multiple biomarkers and clinical endpoints. In particular, the study included secondary endpoints including serum pharmacokinetic characteristics, immunogenicity, and changes in amyloid positron emission tomography (PET) scans (obtained during Week 26). Additionally, the exploratory endpoints included the Clinical Dementia Rating Sum of Boxes (CDRSB), the Mini-Mental State Examination (MMSE), and changes in amyloid PET scans (obtained during Week 54).
As argued by Biogen during the PCNS committee meeting, the study provides significant mechanistic evidence that the drug affects the clinical mechanisms that cause AD (FDA 2020a, 2020b, 2020e). First, the pharmacodynamic biomarkers demonstrated dose and time-dependent reduction of amyloid-beta plaques. Moreover, the analysis demonstrates that, on average, patients that experienced the reduction of amyloid-beta plaques showed improvement on some of the clinical endpoints described in the preceding text. Although the analyses of these clinical endpoints were exploratory, Biogen complemented the study with a sensitivity analysis. In short, Biogen combined Study 103 with the post-hoc analysis of the two RCTs to advance a “totality of evidence” or weight-of-evidence argument in favor of approval. The crux of the argument was that Study 302 demonstrated a statistical association that supported an efficacy claim and that Study 103 strengthened Study 302 by providing evidence that the drug positively affected the mechanism responsible for developing AD.
From the perspective of EBM+, the case for approval should be seen as an example of how regulators can make use of different types of evidence that complement each other and “further reduce the influence of subjectivity” (Williamson Reference Williamson2021, 202). Indeed, as noted by Veli-Pekka Parkkinen et al. (Reference Parkkinen, Wallmann, Wilde, Clarke, Illari, Kelley, Norell, Russo, Shaw and Williamson2018, 15), “by considering evidence of mechanisms in conjunction with clinical study evidence, decisions can be made earlier: one can reduce the time taken for a drug to reach market.” On this view, the FDA approval of aducanumab would be a brave rejection of the overly rigid requirements of the traditional EBM hierarchy of evidence. Moreover, adopting the standard would pave the way for an effective treatment for patients who would otherwise be left to suffer.
3.3 The EBM case for rejecting aducanumab
As laid out by Andreoletti and Teira (Reference Andreoletti and Teira2019), there is a difference between rules and standards in the regulatory context. An ideal rule is an if-then statement, such as, if two RCTs demonstrate that the treatment is safe and effective, then approve the drug. In contrast, a standard admits of flexible interpretation. EBM+ provides an evidence assessment checklist that helps to assess the strength of the evidence while advising a holistic judgment on whether the evidence is sufficient (Parkkinen et al. Reference Parkkinen, Wallmann, Wilde, Clarke, Illari, Kelley, Norell, Russo, Shaw and Williamson2018, 42). In contrast, given the traditional rules for market approval, an EBM approach would, as the advisory committee recommended, reject the drug for failing to satisfy the rule’s antecedent condition.Footnote 5 The case is essentially as follows.
Study 301 and 302 were Phase 3, multicenter, randomized, double-blind, placebo-controlled, parallel-group trials involving participants with early AD (FDA 2020a, 2020b, 2020d). Their primary objective was to assess the efficacy of monthly doses of aducanumab, with the primary endpoint being the change from the CDRSB scores during Week 78. The studies also included secondary endpoints including scores of various cognitive tests conducted at the time of primary endpoint measurement. The study participants were randomly assigned to three treatment groups (high dose, low dose, and placebo), and the stratification was conducted on the basis of the participants’ genetic profiles.
Biogen merged the two studies (as identical) for the interim futility analysis because the assumption of identity would maximize the chance of achieving statistical significance (Feuerstein et al. Reference Feuerstein, Herper and Garde2021). However, this created another complication. Though the interim analysis indicated futility, the post-analysis revealed that the studies were discordant. To be specific, Study 301 yielded a negative result (i.e., no treatment effect relative to placebo) whereas Study 302 yielded a positive result. Moreover, the discordance occurred at high-dose levels, with both studies showing similarly negative results at low-dose levels. In sum, Biogen was left with two incomplete and (partially) discordant RCTs.
Biogen’s post-hoc analysis provides evidence that results of Study 302 are reliable, and with a few caveats, supported by study 301 (FDA 2020b, 2020e). First, Biogen noted that the apparent failure of Study 301 can be accounted for by a few “rapid progressors.” In short, some subjects were outliers and underwent rapid cognitive decline due to AD pathology, and excluding these anomalous patients would bring the results of the apparently failed study into alignment with Study 302. Second, Biogen presented a subgroup analysis that identifies subpopulations within both studies that showed improvement. Finally, Biogen and the FDA agreed on a procedure to simulate the end of the trial based on data already collected to understand the range of plausible outcomes had Study 302 not been terminated early and shore up confidence that the result would have likely held up had the trial been completed as planned. On this basis, Biogen (and its collaborators at the FDA) argued that Study 301 does not (significantly) detract from the evidence of aducanumab’s efficacy (as shown by Study 302).
Irrespective of the extent to which these analyses increase our confidence that the results in Study 302 are accurate and the results of Study 301 are either consonant or do not seriously detract, the EBM-style rules require such analyses must be prespecified to be acceptable (FDA 2020d, 2020e). Neither Biogen’s account regarding rapid progressors nor Biogen’s subgroup analysis were prespecified. Likewise, Biogen’s simulated probability of type I error treated all endpoints as statistically equivalent rather than assigning due weight to the primary endpoint to reflect design prespecification. In sum, while EBM+ may view mechanistic evidence as able to largely assuage one’s fears that these analyses have generated a false positive, the EBM view is that the post-hoc analysis supplemented with the mechanistic evidence provided by Study 103 may be fruitful for exploratory purposes (i.e., hypothesis generation), but not sufficient to satisfy the regulatory rule regarding efficacy establishment. While these strictures might appear to be overly rigid to the point of unnecessarily discarding or ignoring supporting evidence, they are intended to reduce the latitude for pharmaceutical companies to engage in discrete manipulation of the evidence.
3.4 Hypotheses fingo
In this section, we have argued both that the approval of aducanumab was a triumph of EBM+ standards and that it was a failure from the standpoint of EBM. While we believe the latter portends an eventual need for withdrawal of the drug, there is no noncircular argument for this claim. Some might view this as major limitation, perhaps even one that nullifies any import of the case. We disagree. Regulatory standards are not meant to be infallible; they are meant to balance public safety and the need to get efficacious drugs in use quickly. In the future, we will have significantly more research on aducanumab and presumably sufficient grounds to know whether the drug should be kept on the market irrespective of one’s preferred regulatory policy. While no individual case is dispositive, a series of such cases can inform us on whether our regulatory policy demands too much or too little evidence for market entry. As to why philosophers should take up such cases before the outcome is known, we address this concern in the next section.
4. Objections and replies
Given that the import of the aducanumab story is unclear, one might think that philosophers must wait until more evidence is in before we can bring the case to bear on our disputes. While seemingly plausible, the antiarrhythmic drug case illustrates the problem with this approach. Howick (Reference Howick2011) uses the case to illustrate the frailties of mechanistic evidence. Parkkinen et al. (Reference Parkkinen, Wallmann, Wilde, Clarke, Illari, Kelley, Norell, Russo, Shaw and Williamson2018, 19) claim that the case illustrates the importance of mechanistic evidence: “it looks as though insufficient attention had been paid to mechanistic evidence. In particular, there was little reason to think that reducing arrhythmia was a good surrogate outcome for reducing mortality due to heart attacks.… In this case, properly considering the mechanistic evidence may have led to not recommending anti-arrhythmic drugs.” Finally, Holman (Reference Holman2019b) argues that the primary issue was a matter of social epistemology rather than epistemology. It would seem that once the outcome is known, all parties have little trouble looking at the case as supporting their view. Accordingly, while we do not know immediately how the aducanumab case will turn out, this is a strength of our argument rather than a weakness.
Of course, it depends on EBM+ advocates endorsing our claim that they should think aducanumab is likely to be a success. However, they may choose to reject this claim for one reason or another. Provided they do so before we know the drug’s ultimate fate, the case fails to serve its intended purpose; however, it would still continue to be instructive. To this point, they have offered numerous cases in support of their view, but they fail to be the kind of case required. Accordingly, even if they did not endorse the approval of aducanumab, this argument continues to provide a model of the type of case they need to provide (and provide repeatedly) to establish a track record that would be convincing.
Moreover, suppose for a moment that EBM+ advocates contend that the approval of aducanumab on mechanistic grounds was actually, contrary to appearances, a perversion of the view they offer. This would still provide prima facie support of Andretti and Teira’s (Reference Andreoletti and Teira2019) contention that standards are more easily exploited than rules. Indeed, the concern that the accelerated approval standard was improperly abused has been raised (Feuerstein et al. Reference Feuerstein, Herper and Garde2021) and the US House Committee on Oversight and Reform and the Committee on Energy and Commerce have initiated an investigation of the approval process (Cohrs Reference Cohrs2021). It is not enough to say that the EBM+ standard would work better in a perfect world if the world were far from perfect; there remains the institutional challenge of what we call “epistemic gatekeeping,” that is, ensuring that epistemic rules and standards are not abused as rhetoric or framing tactics for unduly influencing regulatory decisions.Footnote 6
Another objection would be that the preceding argument contains little that is new. Don’t we already know that the commercial aspects of science wreak havoc with philosophical views that ignore them (e.g., Biddle Reference Biddle2007; Fernández Pinto Reference Fernández Pinto2015)? We are inclined to agree that this general point has been made before. However, it is manifest from the fact that acontextual views persist and that the argument still needs to be made more convincingly. Indeed, in this case, EBM+ is the dominant view in philosophy of medicine, and it is seeking to influence medical science. Consequently, it seems that even if the general argument has been made, its specific application to EBM+ remains pressing.
Beyond this, another contribution of this article is a proposal to separate out the context of regulation from the rest of medical science, including (but limited to) individual treatment decisions. Williamson (Reference Williamson2021, 200) claims that EBM+ “offers a general methodology for assessing causation in medicine.” However, given that the mainline of critiques of EBM+ have focused narrowly on pharmaceutical regulation and defenses of it have focused on other aspects of medical science, we have proposed that the debate would continue more fruitfully if it were confined within specific domains. Accordingly, even if one thought some of the themes here have already been rehearsed by others, the argument still advances the debate in philosophy of medicine.
5. Conclusion
Advocates and critics of EBM+ have largely been talking past each other. We argue that the reason for this is that the critics have focused on regulatory issues while EBM+ has focused on medical science writ large. We suggest the debate might continue more fruitfully by confining arguments to specific domains as it may well be possible that EBM+ is superior in some domains, but dysfunctional in others. Here, we focus on the domain of pharmaceutical regulation where the argument against EBM+ appears to be strongest. Beyond this distinction, our main contribution is our willingness to apply our view of medical evidence prospectively to cases in which we do not yet know the outcome and where we risk making a failed prediction. If EBM+ wants to create convincing grounds to enact the sweeping reforms they propose, they should be willing to do the same.
Acknowledgments
The authors thank the editor, the reviewers, the participants of PSA 2022, and Dr. Stephen D. Nightingale for their help and feedback.