1. Introduction
Keith Markus’ (Reference Markus2021) comparison of the causal frameworks associated with Pearl, Rubin and Lewis is a gift to scholars of causation. The differences between Pearl and Rubin’s frameworks – called structural causal models (SCMs) and Rubin causal models (RCMs), respectively – have been especially obscure to outsiders not already committed to one of them. As each has impacted a wide swath of disciplines (which tend to adopt one or the other) the question of whether they differ in style or substance is significant for causal methodology. Markus’ article offers both a guide to those perplexed by the competition between these frameworks and a demonstration that comparing them is philosophically worthwhile. I am hopeful that Markus’ article will serve as a starting point for a fruitful literature comparing the approaches, and thus offer this commentary evaluating what it has and has not established.
2. Strong and weak equivalence
Although I will focus on SCMs and RCMs, a brief comparison of Pearl and Lewis will help situate Markus’ discussion. Lewis (Reference Lewis1973) provides counterfactuals with a ‘possible worlds’ semantics. He views possible worlds as conceptually prior to causes, in the sense that he explicates causation using counterfactuals. Galles and Pearl (Reference Galles and Pearl1998) discuss Lewis’ counterfactual semantics in isolation from his philosophical commitments, and prove that were one (contra Lewis) to interpret claims about the closeness of possible worlds in terms of interventions on variables in causal models, doing so would require no restrictions beyond those already in Lewis’ framework. While they do not claim to have shown that Pearl’s and Lewis’ frameworks are equivalent, Pearl does claim this elsewhere (Pearl et al. Reference Pearl, Glymour and Jewell2016: 126).
Markus argues that Pearl’s and Lewis’ frameworks are not ‘strongly equivalent’ in the sense of saying ‘the same thing in different ways’ (3). At best, Galles and Pearl show that one can take Lewis’ notation and assimilate it into Pearl’s framework. This might demonstrate ‘weak equivalence’ (3), meaning that one can give the formal expressions in one framework an interpretation within the others. But it does not show that the expressions within each framework express the same things. Of course, Pearl can argue that the ability to do without a possible worlds metaphysics is an advantage of his account (Pearl Reference Pearl2009: ${\rm{\S}}$ 7.4.1; Woodward Reference Woodward2003: ${\rm{\S}}$ 3.6), but this advantage does not derive from the frameworks being strongly equivalent. Clearly, they are not.
Markus sees Pearl’s comparison of Rubin’s framework with his own as flawed in a similar way. Pearl adopts RCM notation to express causal counterfactuals, and interprets these counterfactuals within his own framework. As with Pearl’s discussion of Lewis, this strategy can only establish the weak equivalence of the frameworks. This opens the door to asking whether RCMs and SCMs are in fact strongly equivalent, and Markus argues they are not. He also raises concerns about whether they are even weakly equivalent (see Halpern Reference Halpern2000), though here I will focus on his arguments against strong equivalence.
Markus’ arguments against strong equivalence highlight ways in which a model within one framework expresses something different than the corresponding model from the other. Footnote 1 In my view, this is an unsatisfactory way to evaluate strong equivalence. Since strong equivalence concerns the expressive power of the frameworks, the relevant question is not whether a particular model within one framework says the same thing as a model within the other, but rather whether any scenario expressible within one framework can be expressed by some (set of) model(s) within the other. Evaluating strong equivalence by pairwise comparison of models amounts to adopting the unreasonably stringent requirement that strongly equivalent theories have a one-to-one correspondence.
More concretely, consider Markus’ discussions of correlated disturbances (pp. 7, 11). With SCMs, one either (A) assumes the variable set modelled includes all common causes of variables in the set (this is called causal sufficiency) or (B) uses ‘bi-directed arcs’ to indicate possible unmeasured common causes. Assuming that disturbances (or ‘error terms’) correspond to unmeasured causes of measured variables, this entails that if two variables are not connected by a bi-directed arc, their disturbances have no common cause, and thus will be uncorrelated. Markus emphasizes that within SCMs, accepting a model in which disturbances are uncorrelated amounts to ruling out the possibility that there is a correlation. Such assumptions underwrite results about when a causal effect is identifiable from a probability distribution (Pearl Reference Pearl2009: Ch. 3). This contrasts with RCMs, which allow for uncertainty regarding whether disturbances are uncorrelated. Markus presents a scenario in which an SCM model rules out correlated errors, but the corresponding RCM does not, and takes this to show that the frameworks are not strongly equivalent.
Markus sees it as beside the point that SCMs can represent correlated disturbances (p. 7), but using a different model than the one he considers. His point is that there are cases where an RCM allows for correlated disturbances, but its SCM counterpart does not. But the fact that some SCM can represent the same scenario as the RCM is what one should care about. If it could be shown that any scenario represented in one framework could be represented in the other, this would establish that each framework can say ‘the same thing in different ways’ and would vindicate Pearl’s treatment of the frameworks as inter-translatable. This is not to say that each framework might not be better suited for different aims. Given Pearl’s aim of giving a general account of identifiability, it makes sense to design models allowing the user to unambiguously specify that the errors are uncorrelated. To express uncertainty about whether certain error terms are in fact uncorrelated, the SCM modeller could link the relevant variables with a bi-directed arc. Footnote 2 But the model does not internally distinguish between the insertion of an arc to indicate belief in the existence of a latent common cause and its insertion to indicate uncertainty, and the ability of RCMs to explicitly represent uncertainty might thus be construed as an advantage. Such pragmatic differences merit philosophical attention, but are not relevant to semantic questions about framework equivalence.
A further argument against strong equivalence relies on the fact that SCMs, but not RCMs, explicitly refer to a causal model in their notation. That is, while Rubin’s potential outcomes are primitives denoting how individuals would counterfactually respond to experimental treatments, Pearl’s counterfactuals are evaluated by reference to a model describing an individual. Footnote 3 Markus claims that this rules out strong equivalence. An SCM modeller who adopts a false causal model for an individual will be forced to accept false causal counterfactuals about that individual. In contrast, an RCM modeller can denote counterfactuals about an individual without committing to a particular causal model. Accordingly, SCMs lack the notation to represent a mismatch between one’s causal model and the empirical individual one uses it to represent. This, however, does not show that RCMs can represent scenarios that SCMs cannot. In cases where a particular SCM fails to represent an individual, there is an available model that can represent her – namely, the correct model.
Markus’ final argument against strong equivalence concerns ‘non-identical but necessarily numerically equal’ (6) variables. Consider the equation $X = Z$ . Interpreted within SCM, this is a ‘modular’ structural equation, meaning that its right-hand side may be replaced while leaving the other equations unchanged. This contrasts with Rubin (Reference Rubin1974), who Markus reads as allowing $X$ and $Z$ to necessarily take on the same values. Markus suggests that the SCM with $X = Z$ allows for a wider range of possibilities than the corresponding RCM, as only the former allows the variables to vary independently. Considered, however, in terms of expressive power, this appears to be a further example in which RCMs represent a possibility that SCMs cannot: SCMs cannot represent two variables as both distinct and necessary equivalent.
In what sense are the variables in question ‘necessarily’ equivalent? One possibility is that they are necessarily equal because they denote the same quantity. Since such an equivalence might be non-transparent, one might permit one’s framework to represent the variables separately. But this would be a modeller’s convenience rather than an extension in the framework’s ability to represent states of the world. Another possibility is that the variables refer to distinct quantities that must match due to standing in some non-causal necessitation relationship. Would this prove that RCMs can represent non-causal relationships that SCMs cannot? Not necessarily. A framework can represent causal relationships among a variable set containing non-causally related variables without thereby providing a semantics for the non-causal relationships modelled. Such a framework might allow non-causally related variables into the model, but treat them as a nuisance to be cordoned off to facilitate causal inference. If so, then the framework should not be understood as extending the worldly relationships that can be modelled, but rather as loosening the restrictions on which variables are allowed within causal models.
3. Case study: consistency
For the reasons provided, I deny that Markus has shown the frameworks to be not strongly equivalent. Markus would respond that he has, insofar as he has shown that the corresponding models are interpreted differently across the frameworks. The strong/weak distinction is Markus’ and he is free to use the terms as he wishes. What matters is whether the distinction supports his critique that when Pearl uses notation from alternative frameworks within his own, it means something different than when interpreted within those frameworks. I have suggested that whether formalisms share an interpretation should not be evaluated based on one-to-one correspondence, but rather based on whether the frameworks can express the same causal scenarios. I now motivate this position by appeal to a prior debate between RCM and SCM proponents.
Recall Markus’ appeal to the fact that only SCMs explicitly refer to models in their notation. Within SCM, the bridge between models and reality is provided by a theorem called consistency. It says that given that a person actually receives a treatment, the observed outcome (i.e. effect) is the one that the model says the individual would have were they to receive that treatment (in SCM notation: $X(u) = x \Rightarrow Y(u) = {Y_x}(u)$ ). The status of this principle has been a source of debate between SCM and RCM theorists, and thus serves as a test case for comparisons of the frameworks. Footnote 4
Within Lewis’ counterfactual theory (Lewis Reference Lewis1973: ${\rm{\S}}$ 1.7), consistency follows from the assumption that every world is closest to itself. From the counterfactual ‘Were I to paint the wall red, my uncle would be happy’, it follows that if I actually paint the wall red, my uncle is happy. One might worry that the antecedent could obtain, but with side effects producing an outcome different from that given by the consequent. If the red paint is toxic, my uncle’s happiness would be a dubious proposition. A consistency defender would reply that if the paint is toxic, one should not accept the stated counterfactual. This back-and-forth regarding consistency is recapitulated among causal modellers (Cole and Frangakis Reference Cole and Frangakis2009; VanderWeele 2009; Pearl Reference Pearl2010). Consider an SCM licensing the counterfactual that participants in a job programme will increase their employability. Yet participants who are forced to participate in the programme might be resentful and consequently not get its benefits. Pearl’s response: if so, then one should not accept a model entailing that those individuals would be helped.
Many RCM modellers will not be satisfied with Pearl’s response. An experimenter testing the effects of a voluntary job programme that does not produce resentment might have no position on whether the programme would produce resentment as a side-effect among those who are forced to participate. Yet Pearl’s approach requires that if one models the treatment as ‘job programme’ and the outcome as ‘employability’, one must take a stance on the general causal relationship between these variables, and thus places a burden on modellers to answer questions they might not want to address. This motivates VanderWeele (2009) to avoid building consistency into the axioms of RCMs, and instead treat it as an empirical assumption requiring case-by-case evaluation. RCMs interpreted without consistency have fewer implications than the corresponding SCMs. But this is compatible with the frameworks being in an important sense interchangeable. RCM modellers can express the content of SCMs by accepting consistency. And SCM modellers can respond to alleged counterexamples to consistency by providing a model satisfying it. Yet the debate teaches us more than that the frameworks can express the same scenarios. Although it implies the semantic non-equivalence of corresponding SCM and RCM models, at its core it is a pragmatic dispute over modelling methodology. There is a trade-off between requiring more assumption-laden models representing the general relationships among a set of variables and allowing less general models that make fewer commitments, but which are limited to modelling variables within localized experimental contexts. Faced with this trade-off, RCM and SCM modellers make different choices.
4. Manipulability
Although Markus’ primary target is the strong equivalence of the frameworks, he briefly considers whether they ‘assume different forms of causation’ (p. 8). His most direct evidence that they do is that Pearl asserts, while some RCM theorists deny, Footnote 5 that so-called ‘non-manipulable’ variables can be causes (Pearl Reference Pearl2019; Holland Reference Holland1986, Reference Holland, Zuberi and Bonilla-Silva2008). Race and gender, which arguably cannot be experimentally manipulated, are key examples of such variables. The disagreement over whether certain variables can be causes suggests that the frameworks make different commitments regarding causation, and is at odds with the more conciliatory treatment of the frameworks I have been defending.
My response is that although advocates of the frameworks adopt conflicting positions regarding certain variables, these positions are not forced upon them by their frameworks. When one moves away from thorny variables such as race and gender and looks at debates regarding slightly less contentious variables such as obesity (Herńan and Taubman 2008; Pearl Reference Pearl2018), the modelling issues in play significantly overlap with those arising in the consistency debate. Whereas RCM modellers link potential outcomes to particular experimental manipulations, SCM modellers represent manipulations by formally applying the do-operator to variables in a graph. Let’s reserve the term ‘interventions’ for variables characterized by this operator. Provided that the treatment variable is not ‘ambiguous’ (Spirtes and Scheines Reference Spirtes and Scheines2004), the effects of interventions on an outcome will be invariant across distinct ways of manipulating the treatment. The first-order debate appears to be not over the difference between manipulable and non-manipulable variables, but rather one regarding whether causal claims should be linked to particular manipulations or rather characterized as interventions on variables allowing for distinct manipulations.
Admittedly, Pearl does assert that that one can intervene upon gender without specifying a manipulation. He would, however, require ‘do(gender)’ to be well-defined, which requires there be at least hypothetical manipulations on gender (perhaps available only to ‘Lady Nature herself’; Pearl Reference Pearl2018: 4). Whether such a manipulation is coherent is debatable, and resolving this debate would require careful attention to the purportedly non-manipulable variable. Given SCM modellers’ willingness to characterize interventions in a way that abstracts away from concrete manipulations, it is unsurprising that they would have a higher tolerance than RCM modellers for talk of hypothetical manipulations. Yet the frameworks themselves do not settle what one should say about particular ‘non-manipulable’ variables.
5. Individuals and populations
While I have here focused on Markus’ central philosophical thesis, my argument in no way undermines the value of Markus’ characterization of the differences between the approaches, summarized in his table 2 (p. 12). My criticisms only target his explanation of these differences by appeal to the non-equivalence of the frameworks. I further endorse his positions that Pearl has not established strong equivalence, and that even if he had, comparing the frameworks would still be worthwhile.
I will now highlight one benefit of considering the potential outcomes framework alongside Pearl’s. The former, by including a subscript for the individual in its notation, forces the user to attend to issues of aggregation and abstraction in a way that SCM does not. This is evident from the centrality of ‘the fundamental problem of causal inference’ (Holland Reference Holland1986) to RCM. The crux of the problem is that although an individual’s causal effect is the difference between her outcomes under treatment and control conditions, one only ever observes one of these outcomes. The solution is that, in the limit, randomization ensures that the difference in expected outcomes between the treatment and control groups measures the average effect across the individuals. Note that the average effect is just as much identifiable with the SCM framework, and that the RCM framework never in reality identifies the effect for an individual characterized using a maximally fine-grained description. But the RCM framework makes salient the way that population-level causal relationships aggregate over individual-level effects in a way that may not be transparent when using an SCM to identify the relationships given a joint probability distribution.
One might suppose that RCMs’ emphasis on individual-level causes indicates that they interpret causation differently from SCMs. Yet population- and individual-level causes need not be understood as picking out distinct causal concepts. The view that they are is encouraged by the position (Sober Reference Sober1984) that ‘type’ and ‘token’ causation pick out two metaphysical relations, one between properties and the other between events. Yet careful observers of SCMs have denied that claims about populations and individuals employ distinct metaphysical concepts (Woodward Reference Woodward2003: 40; Hausman Reference Hausman2005). Individuals can be considered either as concrete tokens or as types characterized by their properties, and type-level causal relationships generalize over counterfactuals about token individuals. Footnote 6 RCM discussions of the ‘fundamental problem’ support this analysis. The view that claims about individuals and populations pick out distinct causal concepts remains prevalent, but in my view should be abandoned.
6. Conclusion
I conclude that Markus has shown neither that RCM and SCM are strongly non-equivalent nor that they employ distinct notions of causation. Disputes between proponents of the frameworks are better understood as what Weinberger and Bradley (Reference Weinberger and Bradley2020) call a ‘non-factual disagreement’. Non-factual disagreements concern not some first-order fact within the domain of dispute, but reflect different views of the aims and methods for studying the domain. Regarding SCM and RCM modellers, Markus succinctly captures their distinct aims as follows:
SCM seeks to encapsulate general scientific knowledge represented in multi-purpose causal models and use them to guide estimation of various causal effects included in the model. In contrast, RCM instead emphasizes the representation of specific events in the context of a specific study. (Markus Reference Markus2021: 9)
The dispute arises because proponents of each framework see their aims as primary and view the tools of the other as being ill-suited for addressing the questions they view as most important. Should the frameworks turn out to be strongly equivalent, this would not motivate focusing on one framework to the exclusion of another, as there are insights that arise when using each that are less transparent when using the other. But the insights to be gleaned pertain not to metaphysics, but to modelling.
Acknowledgements
I am grateful to Keith Markus for detailed and extensive commentary on earlier drafts and for emails clarifying his position. Thank you to Felix Elwert, Patrick Klössel, and two anonymous reviewers for helpful feedback. This article was written during a postdoc that was generously funded by the Alexander von Humboldt Foundation.
Naftali Weinberger is a postdoctoral research fellow in the Munich Center for Mathematical Philosophy at Ludwig Maximilian University of Munich. He works on foundational questions related to causal modelling and its applications in the social sciences and life sciences. He currently has ongoing projects on causation in dynamical systems, and on causally modelling discrimination.