1. Introduction
Theory assessment is widely regarded to be a one-way road: Theories are assessed on the basis of the evidence they explain; the evidence for (or against) theories, however, is assessed independently of theories explaining the evidence. I call this view the Independence Thesis of Theory Assessment (ITTA). This view is intuitive and rationally compelling: If ITTA was wrong, theory assessment would seem to be circular. And yet, as I want to argue in this article, ITTA has been violated in some of the best scientific achievements of which we know. These cases, I will argue, indicate that the explanatory power of a theory—usually measured in terms of its simplicity and unifying power and so forth—can help stabilize evidence in cases of evidential uncertainty.Footnote 1 These cases suggest a model of theory assessment that competes with ITTA. I call this model the Mutual Stabilization Model of Theory Assessment (MuSTA). While I will not argue here that ITTA should necessarily be given up entirely, I do think that the cases to be discussed should give us pause for thought, especially because MuSTA is not as problematic as it might first appear.
ITTA is a view that can be found in many corners of the philosophy of science. For example, ITTA is assumed in the “inference to the best explanation”: Given some evidence, one compares several hypotheses and infers the likely truth of the hypothesis that explains the evidence “best” or that is the “loveliest” (Harman Reference Harman1965; Lipton Reference Lipton1991/2004). The debate has centered squarely on the question of whether or not that inference from best explanation to truth is warranted (see e.g., Douven Reference Douven, McCain and Poston2017). The requirement that the hypotheses must fit all the evidence has been taken for granted in this debate.
Bayes’s theorem—the holy grail of Bayesianism, the dominant theory of confirmation—relates prior and posterior probabilities of hypotheses in the light of some evidence. On the face of it, explanatory concerns play no special role at all on Bayesianism: Whether a hypothesis is explanatorily virtuous or not, it gets subdued to the same kind of machinery. Accordingly, explanatory concerns cannot possibly make a difference to a theory’s empirical confirmation, very much in accordance with ITTA.Footnote 2 Yet, intriguingly, a recent wave of Bayesian accounts has indeed sought to incorporate explanatory concerns. Whether this means a departure from ITTA we will discuss later in this article.
Some philosophers seemingly reject ITTA. Douglas (Reference Douglas2009), for example, has argued that in many contexts in science “values” may legitimately affect how the evidence is assessed. Douglas conceives of values very broadly, so as to include social, ethical, and what she calls “epistemic” or “cognitive” values, meaning simplicity, unifying power, and so forth; in other words, properties that I previously referred to as explanatory concerns. But no matter their nature, values may only play an “indirect,” not a “direct” role in the assessment of evidence on Douglas’s view: Whereas values in their indirect role may legitimately help weigh the uncertainty of a claim given some evidence (e.g., in setting the type I and type II errors in statistical testing), in their direct role, values would illegitimately function as “reasons in themselves to accept a theory” (Douglas Reference Douglas2009, 96). Douglas is particularly adamant that explanatory concerns must not affect evidential concerns (102–8). Douglas also explicitly rejects the idea that explanatory concerns could ever legitimately make one reconsider the reliability of certain data so as to “override the evidence” (102). Despite first appearances, Douglas can thus also be described as adhering to ITTA.Footnote 3
Finally, there is work that genuinely seems to depart from ITTA, but perhaps not in a fully relevant sense. Forster and Sober (Reference Forster and Sober1994) have argued that simpler curve-fitting models tend to be preferable over more complex models. That is so, because models that fully fit the data risk overfitting the data, that is, they accommodate noise in the data. It is questionable, however, whether simplicity in Forster and Sober’s sense—namely the number of free parameters in a model—can be viewed as an explanatory property and whether their conclusions would at all extrapolate from the context of data models to theories one or two steps removed from the data (Schindler Reference Schindler2018).
The paper is structured as follows. In section 2, I articulate the challenge to ITTA coming from situations of evidential uncertainty and propose MuSTA as an alternative, which I then support with two detailed case studies. In section 3, I address objections to my interpretation of the cases studies. In section 4, I discuss how the new wave of Bayesianism has sought to incorporate explanatory concerns and how it fares in the face of my cases. Section 5 concludes the article.
2. Explanation and evidential uncertainty
The real world is a messy affair. And so is the running of experiments. It takes a great deal of skill to get things right, control for confounders and errors, and get the experimental apparatus to work in the intended way. Accordingly, experiments regularly produce conflicting results—simply by virtue of some experiments not working correctly, or experiments failing to control for confounders. And oftentimes, when the target phenomenon is novel or ill-understood, it is simply unclear to scientists what part of the produced data is reliable evidence. Even when scientists repeat experiments, there is no guarantee that data conflicts will go away (Franklin Reference Franklin2002; Schindler Reference Schindler2013; Bailey Reference Bailey2017). Scientists are thus regularly faced with situations of evidential uncertainty.
I will argue that in situations of evidential uncertainty scientists sometimes look to their theories for guidance. More specifically, I will show that there are cases in which the explanatory power of the assessed theory decreased confidence in the evidence that stood in apparent contradiction to the theory. More precisely, and more comprehensively:
1. There was evidential uncertainty: Some data spoke for and some data spoke against theory T about phenomenon P; let e1 refer to the former and e2 to the latter.
2. T had significant explanatory virtues.
3. T was held by a group of scientists G who, based on T’s explanation of e1, decreased their credence in e2.
4. G considered T confirmed by e1; e2 did not negatively affect G’s credence in T.
5. Another group of scientists F sought to accommodate both e1 and e2.
6. Because T did not fit both e1 and e2, F’s credence in T was not high.
7. Guided by T, G made an important discovery about P, whereas F did not.
On ITTA, cases with this structure must seem curious: ITTA does not allow explanatory concerns to weigh in on evidential concerns. Instead, the cases are better captured by MuSTA, according to which the evidence not only helps us narrow down the set of plausible theories but also, vice versa, the explanatory power of a theory can help stabilize situations of evidential conflict. MuSTA shows scientists a way out of desperate situations like these by recommending that scientists take into consideration explanatory concerns: Theories that explain at least some of the evidence in a highly satisfying and revelatory way will cast doubt on evidence that is in apparent conflict with the theory.Footnote 4
I believe that there are many historical cases that contradict ITTA and support MuSTA (Schindler Reference Schindler2013, Reference Schindler2018).Footnote 5 Here I will have to focus on laying out just two cases in more detail: the discoveries of the periodic table of chemical elements and the structure of DNA.
2.1. Mendeleev, Meyer, and the periodic table
Dimitri Mendeleev published his first version of the periodic table of chemical elements in 1869. Mendeleev’s insight was to order all known chemical elements primarily according to their atomic weight. It is well known that this resulted in several gaps in the table, which led to Mendeleev’s famous predictions of three new chemical elements, namely Scandium, Aluminum, and Germanium, which were all discovered in subsequent years. It is less well known that Mendeleev also made so-called contrapredictions, that is, predictions against what was assumed to be established knowledge at the time (Brush Reference Brush1996; Schindler Reference Schindler2014). For example, uranium was widely believed to have a chemical weight of 120. This would have put uranium between tin (weight of 118) and antimony (122). But chemically, uranium did not quite fit there; for example, uranium has a much higher density than these elements. Mendeleev therefore suggested in 1870 that the weight of uranium be doubled (by doubling its valence), moving it to a group with other heavy atomic elements. Mendeleev sought empirical evidence for his theoretical hunch in the form of specific heat experiments (carried out by an assistant of his), but they failed to deliver a favorable result. Still, Mendeleev had convinced himself of this change and retained U = 240 in his tables, even though it was not until several years later that Mendeleev’s prediction was experimentally confirmed (see Smith Reference Smith1976, 332–35).
Interestingly, Mendeleev highlighted counterpredictions such as his change of the weight of uranium in his later reflections on the periodic table: “Where, then, lies the secret of the special importance which has since been attached to the periodic law?” he asked. “In the first place,” his answer was, “we have the circumstance that, as soon as the law made its appearance, it demanded a revision of many facts which were considered by chemists as fully established by existing experience” (Mendeleev 1889, in Jensen Reference Jensen2005, 167; added emphasis). Here Mendeleev emphasized specifically the guidance provided by the periodic table in the assessment of empirical facts:
before the periodic law was formulated, the atomic weights of the elements were purely empirical numbers so that [their chemical properties] could only be tested by critically examining the methods of determination; in short, we were compelled to move in the dark, to submit to the facts, instead of being masters of them. (Mendeleev 1889, in Jensen Reference Jensen2005, 178; added emphasis)
In another place, Mendeleev wrote similarly that before the periodic table, there “was only a grouping, a scheme, a subordination to a given fact; while the periodic law furnishes the facts” (Mendeleev 1879, in Jensen Reference Jensen2005, 138; added emphasis). Previous attempts to classify the chemical elements, Mendeleev explained, were hampered “because the facts, and not the law, stood foremost in all attempts” (Mendeleev 1889, in Jensen Reference Jensen2005, 166–67). Doubtlessly, then, Mendeleev attached great weight to counterpredictions.
Mendeleev’s approach contrasted with Lothar Meyer’s, who is often mentioned as the codiscoverer of the periodic table. Before the publication of Mendeleev’s first periodic table in 1869, Meyer published a table of elements in his book Die Modernen Theorien der Chemie (Meyer Reference Meyer1864). Here, Meyer classified the known elements in a main table consisting of 28 elements and two further, smaller tables, containing 16 + 6 elements, respectively. Only in the main table were the elements arranged so that their weight consistently increased from left to right and from top to bottom. Meyer strangely did not at all discuss this aspect of the table, but instead lay his main focus on the differences in atomic weights and did not maintain a consistent application of the weight criterion.Footnote 6 Later Meyer would acknowledge explicitly that it was Mendeleev who in 1869 had “emphasized more decisively than it had happened until that point” that the properties of the chemical elements are a periodic function of their atomic weight and that this was the case for all the elements, “including even those with highly uncertain atomic weights” (Meyer Reference Meyer1872, 298).
Only after Mendeleev’s first publication of his periodic table in 1969 did Meyer publish a note in which he sought to stake his claim as codiscoverer of the periodic table, stating that his table was “essentially identical to that given by Mendeleev” (Meyer Reference Meyer1870).Footnote 7 Even though Meyer considered the possibility of changes of atomic weights, he was—quite in contrast to Mendeleev—not ready to suggest any changes. As he put it, “It would be rash to change the accepted atomic weights on the basis of so uncertain a starting-point [as the weight ordering criterion of the periodic table]” (ibid.). In the ensuing priority dispute, Mendeleev later granted Meyer an understanding of the “outer meaning of the periodic law,” but did not think that Meyer had “penetrate[ed] its inner meaning” until after Mendeleev’s first publication in 1869 (Mendeleev 1880, in Jensen Reference Jensen2005, 146). This allegation Mendeleev justified by saying that Meyer had only repeated his table in another form and “left underdeveloped those aspects of the subject…which alone could have proven the correctness and universality of the law.” Those aspects included the prediction of the properties of unknown elements and the change in atomic weights of extant elements (ibid.).Footnote 8
Meyer was certainly much more cautious than Mendeleev in changing atomic weights. Commenting on his first fragmentary periodic table, Meyer, on the one hand, was confident that “a certain lawfulness governs the numeric values of the atomic weights” but, on the other hand, he also thought it “quite unlikely that it would be as simple as it appears” (Meyer Reference Meyer1864, 139). Although Meyer believed that some of the empirically determined atomic weights would turn out to be wrong, he doubted that all discrepancies could be accounted for in this way. In particular, he criticized as unjustified “arbitrary” corrections and changes of atomic weights (made by Proutians) only “for the sake of a presumed regularity.” He insisted that only experiment could replace any current values (ibid.). In sum, he took it to be a “great danger” to assess and correct empirical results on the basis of any theoretical considerations (ibid.).
What can we learn from this case? As per point (1) of the aforementioned scheme, chemists who recognized atomic weight as a decent ordering criterion, faced a situation of evidential uncertainty in the mid-1860s. Although atomic weight clearly allowed a classification of the chemical elements into groups of similar chemical properties, a number of chemical elements did not want to fit this classification. For chemists like Meyer, who was keen to closely stick to the (apparent) facts, this meant that, for him and others, a coherent classification of all of the known chemical elements was not possible (by point 6). Mendeleev, however, saw in the periodic recurrence of chemical properties within the table more than just a description, namely a law of nature that transcended the apparent facts. The presupposition that this regularity was an exceptionless law then allowed Mendeleev to predict new elements and correct the values of already known ones.Footnote 9
Why was Mendeleev so adamant that he had discovered a law of nature, rather than just a regularity describing only a subset of the chemical elements? The answer, I think, has to be sought in the table’s explanatory power. In spite of the fact that Mendeleev believed that the “periodic law” required explanation and that he saw himself incapable of providing such an explanation (Smith Reference Smith1976, 300), there is indeed a sense in which the periodic table explained the properties of the chemical elements: If the chemical properties of elements are determined by their atomic weight and their position in the periodic table, then membership in a particular group in the table explains why an element must have the properties that it does have. To put it in terms of Lange’s recent account of explanation by constraint (Lange Reference Lange2017), one may say that Mendeleev’s table set necessary constraints on the elements, first and foremost the constraint of ascending atomic weights and group membership. On that take, the periodic table explains why element X has chemical properties Y because X could not possibly have had properties Y, had X’s weight been significantly different. Of course, strictly speaking the ascending weight constraint is not in fact necessary, as there are exceptions (see footnote 8), and because the more fundamental ordering criterion is atomic number. But Mendeleev certainly treated the criterion as if it was necessary.
That Mendeleev subscribed to such a minimal notion of explanation is indicated by him also speaking of the chemical elements being “fragmentary, incidental in Nature” prior to the periodic table and that there had been no “special reasons” to expect the discovery of new elements (Mendeleev 1889, in Jensen Reference Jensen2005, 178). The “special reasons” and the properties of chemical elements ceasing to appear “incidental,” I would argue, derived from this minimally explanatory power of the table. Compare: Had the table been merely a description of the facts, there would have been no good reason for Mendeleev to want to change any of the element weights. This is clearly illustrated by the tables produced by all those chemists who did not dare to go beyond the apparent facts and who remained content in finding regularities between those apparent facts. About his precursors Mendeleev complained that “they merely wanted the boldness necessary to place the whole question at such a height that its reflection on the facts could be seen clearly” (Mendeleev 1889, in Jensen Reference Jensen2005, 167). In other words, Mendeleev saw his contribution in elevating the periodicity of the elements to a “law” that had to be respected, even if that required revision of apparently established weights.
In sum, we can conclude that Mendeleev’s periodic table had significant explanatory virtues, as in point (2) of the aforementioned scheme, and Mendeleev was keenly aware of these virtues. Mendeleev decreased his credences in those atomic weights that did not obey his “periodic law,” as in (3), and he took the congruent weights to confirm his “periodic law,” as in (4). This was a major reason for Mendeleev’s successful discovery of the periodic table and Meyer’s (and others’) failure (by point 5–7).
2.2. Watson and Crick’s model of the DNA structure and Franklin’s evidence
The second case I want to highlight is the discovery of the DNA structure. The case is of course well-known in popular culture, not least because of the sexism that Rosalind Franklin was potentially subjected to, by way of Watson and Crick downplaying her contribution to the discovery. I do not want to take any stance toward this controversial question here. Instead, our focus shall lie on the approach that these historical figures chose in deciphering the structure of DNA.
It is well known that Watson and Crick sought to discover the structure of DNA by constructing physical models representing all the relevant stereochemical constraints, such as interatomic distances and angles of the DNA molecules. It is also well known that they made a number of (sometimes embarrassing) mistakes on the way (Judson Reference Judson1996). However, the model-building approach was not the approach Franklin and her colleague Wilkins at King’s college in London chose. Instead, they tried to infer the structure from pictures that they took of the structure of DNA by using x-ray crystallography in which Franklin was an undisputed expert (Schindler Reference Schindler2008; Bolinska Reference Bolinska2018).Footnote 10 Using a mathematical technique known as Fourier transforms, Franklin sought to reconstruct the structure of the molecules. A massive hurdle to this approach, though, was that the phases of x-rays cannot be reconstructed from x-ray photographs. Franklin had to resort to so-called Patterson functions, which allowed the reconstruction only of intermolecular distances, but not molecule locations in space (see Klug Reference Klug2004; Schindler Reference Schindler2008). Retrospectively, relying on this approach looks rather haphazard, but Franklin made a conscious decision not to use model building. Her PhD student Gosling would later describe her attitude thus: “We are not going to speculate, we are going to wait, we are going to let the spots on this photograph [via the Patterson synthesis] tell us what the structure is” (Gosling, cited in Judson Reference Judson1996, 127).Footnote 11 Similarly, Franklin’s colleague Wilkins relayed an episode in which he encouraged a student by the name of Bruce Fraser to engage in model building: “Rosalind dismissed our excitement [about model-building] by saying that model-building is what you do after you have found the structure” (Wilkins Reference Wilkins2003, 160; added emphasis).
Although Franklin was not keen on model building, she produced some very crucial evidence for the DNA structure with her x-ray crystallographic work. In 1952 she and her student Gosling had discovered that there were two forms of DNA: a “dry,” or “crystalline,” form and a “wet” form (with a water content of more than 75 percent). Before, x-ray photographs had contained mixtures between the two forms, making interpretation difficult. In particular, picture #51 of the B form, which would later also gain popular prominence, provided unequivocal evidence for the structure of DNA being helical. Unequivocal, because the basic X-shaped pattern could be deduced from helical diffraction theory, which Crick was involved in developing (Schindler Reference Schindler2008). Picture #51 also indicated, almost at a glance, the size of the axial repeat and the axial spacing of the helix (Klug Reference Klug1968, 810).Footnote 12 Franklin accepted that much, but she could not convince herself that the same applied to the dry form as well. Quite the opposite, after she noticed “a very definite asymmetry” in the A form she came to firmly believe that the A form was irreconcilable with a helical structure (ibid., 843). With Gosling, she even wrote a little note announcing the “death of D.N.A helix (crystalline)” (Maddox Reference Maddox2002, 184–85). Even Wilkins, with whom Franklin had a difficult relationship, and their joint colleague Stokes “could see no way round the conclusion that Rosalind had reached after months of careful work” (Wilkins Reference Wilkins2003, 182).
Unfortunately for Franklin, she decided to focus her efforts on the A form, which gave rise to sharper spots on the x-ray photographs and was thus much more amenable to her Patterson synthesis. Franklin’s notebooks show that she was considering various kinds of structures for the A form, such as “sheets, rods made of two chains running in opposite directions…and also a pseudohelical structure…which looked like a figure of eight in projection” (Klug Reference Klug1968, 844). Franklin then tried to accommodate both her antihelical A form and helical B form. Klug comments, “In her notebooks we see her shuttling backwards and forwards between the data for the two forms, applying helical diffraction theory to the B form and trying to fit the Patterson function of the A form” (ibid.). Klug sympathizes with the dilemma Franklin in which found herself: “The stage reached by Franklin at the time is a stage recognizable to many scientific workers, when there are apparently contradictory, or discordant, observations jostling for one’s attention and one does not know which are the clues to select for solving the puzzle” (1968, 844).
Watson and Crick were quite aware of Franklin’s apparently antihelical evidence. But Crick was confident that the apparent asymmetry visible in pictures of the A form resulted from “a small perturbation” in the molecular structure due their close packing in the crystal lattice (Klug Reference Klug1968, 810). Crick later remarked that he had learned from another episode that “it was important not to place too much reliance on any single piece of experimental evidence” and relayed Watson’s “brasher” conclusion according to which “no good model ever accounted for all the facts, since some data was bound to be misleading if not plain wrong. A theory that did fit all the data would have been ‘carpentered’ to do this and would thus be open to suspicion” (Crick Reference Crick1988, 59–60; original emphasis). This sentiment was not only shared by Crick and Watson. Wilkins later admitted that he and Franklin should not have been so perturbed by the antihelical evidence and they instead should have followed admiral Nelson who “won the battle of Copenhagen [in 1801] by putting his blind eye to the telescope so that he did not see the signal to stop fighting” (Wilkins Reference Wilkins2003, 166).
To sum up this case, the DNA researchers in the early 1950s faced a situation of conflicting evidence: Some of the x-ray diffraction pictures spoke for and some against the structure of DNA being helical, as in (1) in the aforementioned scheme. The model developed by Watson and Crick had a number of explanatory virtues (point 2 in the scheme): Most importantly, perhaps, it was internally consistent, as it successfully accommodated all stereochemical constraints, and it was also externally consistent because it accommodated important crystallographic information (such as the repeat and rise of the helix). The model also offered a possible mechanism of how genetic replication might work and set off an incredibly successful research program. In other words, the model was theoretically fertile, and Crick and Watson correctly guessed already in their discovery paper that it might be. Upon seeing the X-shaped picture #51 Watson and Crick knew that the DNA had to be helical. In contrast to their colleagues in London, though, who convinced themselves that the A form could not be helical (by points 5 and 6), Watson and Crick disregarded the apparently antihelical evidence and took their helical model to be confirmed by the B form (by point 3 and 4). This was a crucial factor in Watson and Crick’s success and Franklin’s unsuccessful attempts at reconstructing the structure of DNA (by point 7).
3. Objections
The two cases discussed in the previous section—and further cases (see Schindler [Reference Schindler2013, Reference Schindler2018])—challenge ITTA: Explanatory concerns clearly affected evidential concerns and helped disambiguate situations of evidential uncertainty. Instead, the cases support MuSTA, which allows explanatory concerns to stabilize situations of evidential uncertainty. There are four objections one may want to raise against my way of trying to make sense of the historical cases:
Objection 1: One may object that it is not really surprising that scientists want their pet theories to be true. Sometimes they may regrettably go as far as questioning the evidence contradicting their theories. In the cases at hand scientists were just lucky that history vindicated them. This tells us nothing about theory assessment, let alone how theory assessment ought to be; it merely tells us something about human psychology. There is thus no reason to give up on ITTA.
Reply: I agree that to question negative evidence purely because one is personally attached to one theory would be utterly bad practice. And I certainly would not want to claim that scientists are never overattached to their theories. But whether attached or not, the group G scientists in our case studies had excellent reasons for believing their theories: In the case of Mendeleev, the vast majority of known chemical elements fit into the periodic table just fine and the table nicely explained the properties of the elements; and, in the case of Watson and Crick, the B-form provided indisputable evidence for a helical structure of DNA and the double-helix offered a plausible mechanism for replication. So group G scientists may have been biased toward their theory, but that bias was well justified.
Objection 2: If explanatory theories really were allowed to arbitrate the data relevant to their confirmation, then confirmation would be circular. More specifically, the worry is that if h affects which evidence e we should trust and if this self-same evidence is the evidence that confirms h, then h justifies e and e justifies h. But, again, this is circular.
Reply: The circularity concern would be warranted if h entirely determined e (Brown Reference Brown1994). But that clearly was not the case in either of the two discoveries discussed here: The data in question were neither fabricated nor falsified. And Mendeleev, Crick, and Watson did seek to accommodate most of the (entirely contingent) data in their models: Mendeleev sought to accommodate most of the known elements by his “periodic law” and Crick and Watson sought to accommodate all the stereochemical constraints set by the components of DNA in their models. It was on the basis of the explanation that they had devised for most of the data that they developed doubt about data that did not fit their models. In other words, their doubts about the apparent counterevidence were epistemically justified.Footnote 13
Objection 3: Group G did not take the positive evidence to confirm their theories; they merely pursued theoretical possibilities. Because the evidential demands on pursuit are much lower than on confirmation (see e.g. Nyrup [Reference Nyrup2015]), these cases are not apt to show us anything about ITTA.
Reply: I do not think that this interpretation of the cases is supported by the historical evidence. Had group G entertained the theories in question just to explore theoretical possibilities, they would have had no grounds for questioning the negative evidence and for viewing the theory confirmed by the positive evidence. But they did both. In the absence of the epistemic guidance provided by their theories, group G was not in a position to level criticisms of the relevant experiments: They had no expert knowledge regarding the relevant experiments, nor were they in a position to properly assess what may have gone wrong in the experiments.
Objection 4: The historical case studies are easily accommodated by confirmational holism. According to confirmational holism, theories are not tested in isolation but rather as part of “nets of belief.” So when there is apparent negative evidence, we need not necessarily change or reject our theory, but we may consider revising our beliefs about the evidence. Confirmational holism is widely held among philosophers and ITTA may thus simply be a straw man.Footnote 14
Reply: A problem entailed by confirmational holism is that a negative test result does not tell us where in the net of our beliefs lies the error (this is, of course, the classical “Duhem problem”). My thesis in this article—which I think is supported by my case studies—is that explanatory power can guide researchers in where to put the blame. MuSTA thus clearly goes beyond saying that any assumption in our net of belief is up for grabs.
While it may be true that confirmational holism is widely held, it interestingly has not affected debates about IBE, underdetermination, values in science, and so forth to such an extent that ITTA would have been called into question. For example, in debates about IBE, it is presumed that the evidence is stable prior to the consideration of explanatory alternatives (Douven Reference Douven, McCain and Poston2017).Footnote 15 Changing this presupposition would give the debate a rather different outlook. It is thus not the case that ITTA is just a strawman.
But suppose we abandon ITTA and commit to confirmational holism. Could we then not construe our cases as cases in which scientists used IBE to infer from the best explanation of some or most of their data that some other data are mistaken? I would not have anything against such a construal, as it concedes the point I was making in this article, namely that ITTA is questionable and may have to be given up, at least in some cases.
Having disarmed these four objections, I will now proceed to assess how Bayesianism fares in the face of my cases.
4. Explanation and New Wave Bayesianism
Bayesianism is the standard theory of confirmation. As mentioned in the introduction, Bayesianism has not much to say about explanation over and above a theory’s confirmation by the evidence. It is thus hard to see how standard Bayesianism could accommodate my cases. However, recently there has been a “new wave” of Bayesianism that does try to incorporate explanatory concerns. In what follows, I will first outline how this integration has been carried out and whether these attempts have been compelling. I will then assess whether New Wave Bayesianism can accommodate my cases. To anticipate, I will argue that New Wave Bayesianism does not reserve a very substantial role for explanation and that there are no obvious ways in which Bayesians could accommodate my cases.
4.1. Direct and indirect strategies
Standard Bayesianism is seemingly blind to explanatory concerns. At the core of Bayesianism lies Bayesian conditionalization (BC), according to which one’s degrees of belief in a proposition h upon learning evidence e is equal to one’s original degrees of belief in h conditional on e. In short, ${P_n}\left( h \right) = {P_o}\left( {h{\rm{|}}e} \right)$ , i.e., the new probability is equivalent to the old probability conditional on the evidence, whereby the conditional probability is determined by Bayes’s theorem: ${P_o}\left( {h{\rm{|}}e} \right) = {{{P_o}(e|h) \times {P_o}\left( h \right)} \over {{P_o}\left( e \right)}}$ . ${P_o}\left( h \right)$ is also known as the “prior,” ${P_o}\left( {h{\rm{|}}e} \right)$ as the “posterior,” and ${P_o}(e|h)$ as the “likelihood” of e on h. Whether h is explanatorily virtuous or not, BC works just the same. However, recently there is what one might call a “new wave” of Bayesian measures of confirmation that have sought to incorporate explanatory concerns. More specifically, Bayesians have sought to reconcile BC with the inference to the best explanation (IBE). There are basically two strategies for achieving reconciliation, which I will refer to as direct and indirect.
The direct strategy awards bonus points to the posteriors of the most explanatory hypothesis. Unfortunately, this turns out to be inconsistent with BC and the axioms of probability, generating “Dutch Book” situations, where agents are bound to accept a series of bets, which individually seem reasonable, but that jointly are guaranteed to result in losses (van Fraassen Reference van Fraassen1989). This seems to be the least appealing strategy, even though there are a handful of philosophers who have argued not only that betting behavior is a wrong model for BC but also that—in computer simulations—explanatory models of belief updating can outperform BC models (Douven and Schupbach Reference Douven and Schupbach2015; Douven Reference Douven, McCain and Poston2017). I will here not further comment on this strategy (see e.g., Pettigrew Reference Pettigrew2021), but if this strategy were successful, it would be quite similar to IBE, and therefore very much in line with ITTA: First check whether the hypotheses are likely conditional on the evidence and then award extra points for explanation.
By far the most popular strategy of reconciling BC with IBE is the indirect strategy. The indirect strategy tries to reconcile IBE with BC by arguing that the posteriors are affected by explanatory concerns boosting the priors and/or the likelihoods in BC (Salmon Reference Salmon1990; Lipton Reference Lipton1991/2004; Okasha Reference Okasha2000; Salmon Reference Salmon2001; Henderson Reference Henderson2013).Footnote 16 Two varieties of this strategy should be distinguished: Sometimes IBE is viewed as complementing or as “filling holes” left open by BC (Lipton Reference Lipton1991/2004; Okasha Reference Okasha2000). Other times, BC and IBE are presented as fully aligned, and BC as simply explicating IBE in probabilistic terms (Salmon Reference Salmon2001; Henderson Reference Henderson2013). One may refer to this version of the indirect strategy as “reductive.”Footnote 17 But the outcome of each of the two versions of the indirect strategy is the same. As Henderson puts it succinctly, “If the theory is more plausible, it gets a higher prior; if it is lovelier [i.e., more explanatory], it gets a higher likelihood” (Henderson Reference Henderson2013, 712).
To illustrate the basic idea of the indirect strategy: When assessing whether a patient has COVID-19 or just the flu, a doctor will have certain priors, for example, COVID-19 sometimes involves the loss of smell, whereas influenza does not. Upon hearing about the patient’s symptoms (i.e., the evidence), which involve an increased temperature but also a loss of smell (meaning that the COVID-19 hypothesis is much more likely than the influenza hypothesis), the doctor concludes (calculating her posteriors) that the patient’s symptoms are better explained by COVID-19 than by influenza. Bayesian updating and explanation, so it would seem, go hand in hand.
Even though the indirect strategy of reconciling Bayesianism with explanation at first sight may succeed, there are some general problems. Let us first focus on the priors and then proceed to the likelihoods.
4.1. Explanation and priors
If the priors were the only place in which explanatory concerns could figure in BC, the role of explanation in BC would be rather feeble. This is so because BC usually is sold in tandem with the idea that the priors “wash out” with repeated updating. In fact, the washing out of priors almost seems a prerequisite on any reasonable version of the widely preferred subjectivist version of Bayesianism: Without washing out, the fact that different subjects will set their priors differently would inadvertently result in diverging posteriors for the same evidence. But with the same kind of evidence, we want subjects to converge on similar posteriors, even when they start out in different places. Abandoning the idea that the priors wash out would also make the “problem of priors,” that is the problem how the priors are to be determined, much more pressing, especially for the dominant view of subjective Bayesianism (see e.g., Titelbaum Reference Titelbaum2022).Footnote 18 So either there is washing out of priors and explanatory concerns are negligible on Bayesianism, or there is no washing out of priors and the Bayesian approach would lose a great deal of its plausibility.Footnote 19 With regards to ITTA, we can conclude that, if explanatory concerns were to enter the Bayesian calculus via the priors only, their role for confirmation would be negligible indeed. ITTA would seem to remain untouched.
4.2. Explanation and likelihoods
While the idea that explanatorily more appealing theories should result in higher priors seem intuitively plausible (but ultimately problematic), things get more involved with the likelihoods. In its simplest incarnation, though, the idea is intuitive enough: If ${h_1}$ is explanatorily more virtuous than ${h_2}$ , the corresponding likelihoods stand in the following inequality $P\left( {e{\rm{|}}{h_1}} \right) \gt P\left( {e{\rm{|}}{h_2}} \right)$ , mutatis mutandis (see e.g., Okasha Reference Okasha2000). In other words, if ${h_1}$ is explanatorily more appealing than ${h_2}$ , then e is more likely, or less “surprising” on ${h_1}$ than it is on ${h_2}$ . The basic idea has been widely endorsed (Salmon Reference Salmon1990; Lipton Reference Lipton1991/2004; Okasha Reference Okasha2000; Henderson Reference Henderson2013).
The measures proposed for explanatory power can be fairly complex. For example, Schupbach and Sprenger (Reference Schupbach and Sprenger2011) define explanatory power as the degree of expectedness of e given h and, adding some conditions, propose the following measure: $\varepsilon \left( {h,e} \right) = {{P\left( {h{\rm{|}}e} \right) - P(h|\neg e)} \over {P\left( {h{\rm{|}}e} \right) + (h|\neg e)}}$ .Footnote 20 There are several other measures (e.g., McGrew Reference McGrew2003; Myrvold Reference Myrvold2003; Crupi and Tentori Reference Crupi and Tentori2012), but many of them boil down to the aforementioned inequality when used to compare two explanatory hypotheses (Schupbach Reference Schupbach2005, Reference Schupbach, McCain and Poston2017).
Bayesian measures of explanatory power in terms of likelihoods have been critiqued by Glymour (Reference Glymour2015) and Lange (Reference Lange2022a, Reference Lange2022b). The upshot of their criticism is that higher explanatory power is not necessarily reflected in higher likelihoods or priors.Footnote 21 For example, Newtonian mechanics is strictly speaking false, its prior accordingly zero, and its likelihood therefore undefined, despite its unquestionably high explanatory power (Glymour Reference Glymour2015), and different explanations may have the same likelihood because they all deductively entail the evidence (Lange Reference Lange2022a). What is more, there seems nothing in the New Wave Bayesian measures of explanation that would prevent any nonexplanatory theory to have as high a likelihood as an explanatorily highly virtuous theory. But that would mean that Bayesian measures of explanation are rather limited indeed for modeling real theory choice scenarios.
4.3. Bayesianism and cases of evidential uncertainty
Regardless of the aforementioned general concerns about Bayesians’ attempts to incorporate explanatory power into their measures, let us see whether Bayesians can accommodate the cases that I discussed in section 2.
Standard Bayesianism implies a strongly foundationalist epistemology, in which we are certain about the evidence (see e.g., Weisberg Reference Weisberg2009a). This is of course incompatible with cases such the ones discussed here are cases in which there was considerable uncertainty about the evidence. But Bayesians have found a straightforward way for modeling evidential uncertainty, also known as Jeffrey conditionalization:
where ${P_o}\left( h \right)$ represents the prior of h before new information about the evidence is received and ${P_n}\left( h \right)$ the new probability of h after this information has been received. ${P_n}\left( h \right)$ is then the sum of the old conditional probabilities ${P_o}\left( {h{\rm{|}}e} \right)$ and ${P_o}(h|\neg e)$ , each weighted with the degrees of belief in $e$ occurring or not occurring, namely ${P_n}\left( e \right)$ and ${P_n}\left( {\neg e} \right)$ , respectively.
Now, in situations of evidential conflict, where both ${P_n}\left( {\neg e} \right)$ and ${P_n}\left( e \right)$ are $ \gt 0$ , it is not at all clear whether the posterior ${P_n}\left( {h{\rm{|}}e} \right)\;$ should be $ \lt $ or $ \gt .5$ . Bayesians should therefore recommend the suspension of judgment—independently of whether they believe that explanatory power be manifested in a theory’s likelihood or not. Although scientist group F seemed to follow this recommendation, group G decided to decrease their credence in ${P_n}\left( {\neg e} \right)$ . According to MuSTA, group G was justified in doing so: Theories that explain at least some of the evidence in a highly satisfying and revelatory way may cast doubt on evidence that is in apparent conflict with the theory. In Bayesian terms, explanatory theories may decrease credence in the data inconsistent with the theory, namely ${P_n}\left( {\neg e} \right)$ .
Suppose now that you are a confirmational holist and you think my historical cases can easily be accommodated by allowing beliefs about theories to affect beliefs about the evidence (but see my reply to objection 4 in the previous section). Could you give us a convincing Bayesian rendering of my cases? Strevens (Reference Strevens2001) has argued that the Duhem-Quine problem—which has it that hypotheses are never confirmed or disconfirmed in isolation, but only in conjunction with other assumptions—can be solved within the Bayesian framework. In particular, Strevens derives the conclusion that the higher one’s prior probability of a hypothesis h, the less h will be blamed when the conjunction of h and an auxiliary a is refuted by experiment, and that the magnitude of the impact of e on h is greater the higher the prior probability of a. True, scientists in my cases had high priors in their theories when dismissing apparent counterevidence, for which they had low priors, but Strevens gives no guidance as to when and why such scenarios arise and when it would be justified for scientists to have high priors for their theories and low priors for their auxiliaries. I have argued that the reason why group G scientists opted for the giving more weight to their theories than to the auxiliaries supporting the negative evidence is that they were guided by the explanatory power of their theories, whereas group F scientists were not. The Duhemian Bayesian à la Strevens could of course try to accommodate our cases by modeling explanatory power in terms of higher priors, but, as we have seen, this will not be a satisfactory solution either, at least not in a fully coherent Bayesian account.
Bayesianism may in fact be altogether incompatible with confirmational holism. In a seminal paper, Weisberg (Reference Weisberg2009a) has shown that Bayesianism cannot simultaneously satisfy both holism and commutativity, where the former is understood as belief revision in the light of changing background beliefs.Footnote 22 This characterization of confirmational holism in fact fits the bill very well in my cases: Mendeleev’s periodic law and Crick and Watson’s DNA model had scientists change their background beliefs about the relevant experiments, so that beliefs about the evidence changed. What is more, Weisberg has even argued that holism is incompatible with the two standard forms of Bayesian conditionalization (“strict” and Jeffrey), as both require “rigidity” of conditional probabilities on the evidence, namely that ${P_n}\left( {h{\rm{|}}e} \right) = {P_o}\left( {h{\rm{|}}e} \right)$ . Attempts to address Weisberg’s challenge have resulted in the abandonment of both of these standard forms of conditionalization (Gallow Reference Gallow2014). Even if Bayesianism was reconcilable with holism in some way, nothing would as yet have been said about the role of explanation in guiding researchers in cases of evidential uncertainty. It is thus not clear whether Bayesians can model MuSTA at all.
5. Conclusion
Cases like the ones discussed here challenge ITTA and its decree of independence of empirical and explanatory concerns: Scientists were guided by the explanatory power of their theories when assessing their theories’ empirical confirmation.Footnote 23 More specifically, explanatory power guided the disambiguation of data conflicts. On ITTA, cases like these must appear to be methodologically problematic. But given that the cases we are concerned with here are ground-breaking scientific discoveries that have shaped the development of the disciplines until this very day, and given that we retrospectively can say that Mendeleev, Crick, and Watson made the right decisions in doubting the apparent counterevidence to their theories, I think we should keep an open mind about revising our philosophical norms in the face of these cases.
I have argued that cases like the ones discussed here support what I have called the MuSTA. According to this model, evidential concerns do not only help weed out theories, but also, vice versa, explanatory concerns may also contribute to the stabilization of the empirical basis. On that model, scientists ought to consider whether the evidence can be explained by a plausible, that is, explanatorily virtuous, and empirically supported, theory. In cases in which a theory explains some of the available evidence in a highly virtuous way but is at odds with some other evidence, MuSTA—but not ITTA—allows scientists to be guided by the explanatory power of their theories when assessing the apparent counterevidence. Although MuSTA is a normative thesis, it makes no prescriptions as to whether scientists ought to question apparent counterevidence in every instance; this has got to be a case-to-case judgment that is better left to the scientists.
I have also argued that Bayesianism, the most widely accepted model of theory confirmation in science, struggles to accommodate cases like the ones discussed here. Standard Bayesianism reserves no role for explanatory factors to begin with. Although proponents of what I called New Wave Bayesianism have sought to remedy this, I do not think they have succeeded in carving out a very substantial role for explanation. Bayesianism combined with confirmational holism, while not incompatible with my cases, still needs to provide a role for explanation that mirrors my cases. Worse, the very notion of Bayesian holism may even be incoherent. It is therefore not clear at all that Bayesians could successfully model MuSTA.
MuSTA entails that explanatory power is at heart an epistemic, that is, truth-conducive concern; otherwise, the explanatory power of a theory could never be allowed to cast doubt on apparent counterevidence. Such a view of explanatory power, and its associated virtues such as simplicity, unifying power, fertility, and consistency, will certainly be welcomed by scientific realists (Psillos Reference Psillos1999; Schindler Reference Schindler2018) and proponents of the IBE (Lipton Reference Lipton1991/2004; Douven Reference Douven, McCain and Poston2017). Skeptics of explanatory power will be less enthused (van Fraassen Reference van Fraassen1980; Douglas Reference Douglas2009); but they will have to answer to the cases of the kind presented here.Footnote 24
The arguments provided in this article need not entail that ITTA is never an adequate model of how science is done or should be done; ITTA could be adequate in many or even most contexts. But ITTA’s scope could in fact be more limited than one might think, especially if data conflicts were more common than generally appreciated (Franklin Reference Franklin2002; Schindler Reference Schindler2013; Bailey Reference Bailey2017). The need for the stabilizing presence of explanatory theories for the empirical basis of science may be most apparent where this presence is lacking. For example, the recent replication crisis in the social sciences has been read by some as an absence of decent psychological theories making definitive predictions. In particular, a theory’s capacity to guide researchers when experimental results are confusing has been lamented (Klein Reference Klein2014; Muthukrishna and Henrich Reference Muthukrishna and Henrich2019; Lavelle Reference Lavelle2022). Somewhat ironically, ITTA may thus be an illusion brought about by the very thing it wants to prohibit, namely an impact of explanatory on empirical concerns.
Acknowledgements
I wish to thank the anonymous referees for this journal and Eli Lichtenstein for their critical comments and suggestions for improvement. I also thank the audiences at the following events for their feedback: the Mainz Colloquium for History and Philosophy of Science (and especially Cornelis Menke), the workshop on Theoretical Virtues and Non-Empirical Guides in Scientific Theory Development at Aarhus University in November 2022, and the workshop on The Role of Theory Confirmation in Science at the University of Stockholm in May 2022. Special thanks to Richard Dawid for encouragement.