1. Introduction
Patrick Suppes (Reference Suppes, Nagel, Suppes and Tarski1962) was the first to provide a systematic account of how data are used in experiments to draw inferences about scientific hypotheses under scrutiny. Suppes suggested that this kind of inference is drawn through what he calls a hierarchy of models (HoM), which consists of three main levels, ranging from low-level data models, via models of experiment, to high-level theoretical models. Suppes’ HoM account was later elaborated by Deborah Mayo (Reference Mayo1996), and since then it has come to be regarded as the standard philosophical account of what can be called model-based scientific experimentation.
In a recently published paper, Karaca (Reference Karaca2018) has disputed the scope of the HoM account on the ground that it lacks a modelling concept that is specific to the process of data acquisition, which he calls the model of data acquisition. Karaca’s Reference Karaca2018 paper does not consider (computer) simulation models, which are considered to be indispensable for designing and performing present-day experiments in physical and biological sciences. While the various uses of simulation models have received quite some attention from philosophers of science over the past two decades or so (see e.g. Winsberg Reference Winsberg and Zalta2019), only recently have their uses in experimental data acquisition and data analysis been discussed, especially in the context of the Large Hadron Collider (LHC) experiments (see e.g. Morrison Reference Morrison2015; Massimi and Bhimji Reference Massimi and Bhimji2015; Boge and Zeitnitz Reference Boge and Zeitnitz2020; Mättig Reference Mättig2021). However, in this literature, the effects of simulation models on the structure of model-based experimentation have not been given due attention.Footnote 1 In this paper, I will examine the model-based structure of the Higgs boson search carried out in the ATLAS experiment, where simulation models are used in addition to experimental and theoretical models. I will argue that the foregoing types of models relate to each other through a network-like structure, as opposed to the linear and hierarchical model-based structure suggested by Suppes and Mayo.
The plan of the present paper is roughly as follows. In the next section, I will revisit the HoM account to set the stage for the discussion in the subsequent sections, where I will examine how theoretical, experimental and simulation models are used and relate to each other in the ATLAS Higgs boson search, which is an experiment representative of the present-day HEP experiments. In the last two sections, I shall take stock of the findings of the previous sections to advance a novel account of model-based scientific experimentation. The discussions in sections 3, 4 and 6 of the present paper in part rely on Karaca (Reference Karaca2018). Since both papers deal with the use of models in the ATLAS experiment, this reliance is unavoidable but justified as the present paper uses the same case to develop a novel account.
2. The HoM account
The HoM that Suppes proposed for the model-based structure of scientific experimentation consists of three distinct but related levels. At the highest level of the Suppean HoM lie models of theory that primarily represent natural phenomena under scrutiny in an experiment. According to Suppes’ definition, a model of theory is a set-theoretical structure of a comprehensive theory, such as Newton’s theory of motion, the kinetic theory of gases, and Maxwell’s theory of electromagnetism. One step down the proposed hierarchy lie models of experiment that are primarily created to relate testable conclusions of models of theory to experimental data. These models fulfill this task by specifying various factors in an experiment, including the testing rule, the choice of experimental parameters, and the number of trials. Models of experiment are linkage models between models of theory and models of data, which constitute the third level of the Suppean HoM. Each model of data includes a possible realization of the experimental data, but not vice versa. In order for a possible realization of experimental data to count as a model of data, it must satisfy certain canonical features of data, such as homogeneity and stationarity. In this sense, in Suppes’s account, models of data offer canonical representations of experimental data and thereby “incorporate all the information about the experiment which can be used in statistical tests of the adequacy of the theory” (Suppes Reference Suppes, Nagel, Suppes and Tarski1962, 258).
At the bottom of the Suppean HoM lie two more levels for which no specific models are provided. The first of these levels is the level of experimental design concerning experimental procedures, such as the calibration of instruments and the randomization of data, which directly relate to the formation of the models of data. Below this level lie ceteris paribus conditions, namely auxiliary factors that contain “detailed information about the distribution of physical parameters characterizing the experimental environment” (ibid.). These conditions might include auxiliary factors such as control of loud noises, bad odors, wrong times of day or season and so forth, which involve no formal statistics.
Deborah Mayo provided a more detailed and systematic account of the HoM within the context of her error-statistical account of scientific experimentation (Mayo Reference Mayo1996, chapter 5). At the top of her version of the Suppean HoM lie primary models that serve to:
Break down [experimental] inquiry into questions that can be addressed by canonical models for testing hypotheses and estimating values of parameters in equations and theories
Test hypotheses by applying procedures of testing and estimation to models of data (Mayo Reference Mayo1996, 140)
The second function stated above is rather misleading, because, in Mayo’s account, hypothesis testing is undertaken by models of experiment that serve to:
Break down questions into tests of experimental hypotheses, select relevant canonical models of error for performing primary tests
Specify experiments: choice of experimental model, sample size, experimental variables, and tests statistics
Specify analytical methods to answer questions framed in terms of the experiment: choice of testing or estimating procedures, specification of a measure of fit and of test characteristics (error probabilities), e.g., significance level (Ibid.)
As in Suppes’ account, in Mayo’s version of the HoM account, the next level is populated by models of data that serve to:
Put raw data into a canonical form to apply analytical methods and run hypothesis tests
Test whether assumptions of the experimental model hold for the actual data (remodel data, run statistical tests for independence and for experimental control), test for robustness (Ibid.)
Unlike Suppes, Mayo combined the level of ceteris paribus conditions and the level of experimental design into a single level that serves “[p]lanning and executing data generation procedures” (Ibid.). As in Suppes’ account, no specific models are provided for these procedures in Mayo’s account.
For the ensuing discussion, it is important to note that despite the encompassing term of model of experiment, what is in fact intended in the HoM account by this type of model is rather a model of statistical hypothesis testing that serves to test the testable predictions of a theoretical model against data models. This is more explicit in Mayo’s accountFootnote 2 where the relation between models of theory and models of data is of statistical in nature and thus needs to be modeled statistically:
Because of the many sources of approximation and error that enter into arriving at the data, the data would rarely if ever be expected to agree exactly with theoretical predictions. As such, the link between data model and experimental hypothesis or question may often be modeled statistically, whether or not the primary theory or hypothesis is statistical. This statistical link can be modeled in two ways: the experimental prediction can itself be framed as a statistical hypothesis, or the statistical considerations may be seen to be introduced by the test (in its using a statistical test rule). (Mayo Reference Mayo1996, 134)
The HoM account recognizes only the aforementioned types of models that relate to each other through a linear hierarchical structure. In the following sections, I will argue that the involvement of models in present-day HEP experiments is way more varied and complicated than presented by the HoM account. I will show that the experimental process in these experiments requires additional types of models relating to each other and to the ones recognized by the HoM account in ways that cannot be accommodated within a linear hierarchical structure. In what follows, I shall examine in turn the types of models that are used in the ATLAS Higgs boson search.
3. The primary theoretical model and its role in data selection
ATLAS (ATLAS Collaboration 2008) is a multi-purpose experiment that is primarily aimed to test not only the Higgs boson hypothesis by the standard model (SM) of elementary particle physics but also the predictions of what are called the models beyond the SM (the BSM models). The latter are a group of HEP models that have been offered as possible extensions of the SM, such as extra-dimensional and supersymmetric models (see e.g. Lykken Reference Lykken2010). The aims of the ATLAS experiment also include searching for novel physics processes that are not predicted by the present HEP models, as well as performing precision measurements within and beyond the SM.
Theoretical HEP models are involved in present-day HEP experiments through their specific predictions regarding high transverse-momentum pT and transverse-energy ET types of signatures, i.e. stable decay products, which are detected at the LHC.Footnote 3 In the context of these experiments, high pT and ET refer to energy and momentum values that are approximately of the order of 10 GeV and 100 GeV for particles and jets, respectively. The signatures predicted by the SM for the Higgs boson and the signatures predicted by the BSM models for the new particles—such as new heavy-gauge bosons W′ and Z′, super-symmetric particles and gravitons—are high pT photons, leptonsFootnote 4 and jets as well as high missing and total ET. At the LHC, the foregoing high pT and ET types of signatures might result from the decay processes involving the Higgs boson and the aforementioned new particles predicted by the BSM models. These signatures might also result from the unforeseen processes occurring at high energy scale. The collision events containing the aforementioned high pT and ET decay signatures are distinguished from the rest of the collision events and thereby considered interesting for the process of data analysis in that they are relevant to the objectives of the ATLAS experiment. The events considered interesting are therefore selected out from the rest of the collision events by using the selection criteria consisting mainly of the foregoing signatures.Footnote 5
In this paper, for the sake of brevity, I shall focus on the ATLAS Higgs boson search experiment where the SM Higgs boson hypothesis was tested. The SM consists of two main gauge theories, namely the electroweak theory of the weak and electromagnetic interactions and the theory of quantum chromo-dynamics (QCD) which accounts for strong interactions (Karaca Reference Karaca2013). Even though the Higgs boson (hypothesis) is a direct result of the electroweak theory, QCD is also needed to account for the production and decay processes of the Higgs boson. Therefore, the SM should be regarded as the primary theoretical model that is tested in the ATLAS Higgs boson search experiment, where the following decay channels of the SM Higgs boson (H) were considered: H → W*; H → ZZ*; and H → γγ.Footnote 6 In these physics processes, the Higgs boson decays respectively into two W bosons, two Z bosons, and two photons. According to the SM, the W and Z bosons produced in the foregoing decays could subsequently decay into leptons, including electrons, electron neutrinos and muons (μ). Therefore, the events having at least one high ET electron or muon can include the first two of the aforementioned decay processes of the Higgs boson, while the events having at least two high ET photons can include the aforementioned third decay process of the Higgs boson. This in turn means that selection signatures consisting of at least one high ET electron or muon and those consisting of at least two high ET photons are appropriate for the testing of the SM’s prediction of the Higgs boson. The selection signatures “e25i” and “μ20i,” which require at least one isolated electron and muon with an ET threshold of 25 GeV and 20 GeV respectively, and the selection signature “2γ20i,” which requires at least two isolated photons each of which has an ET threshold of 20 GV, exemplify selection signatures appropriate for the testing of the SM’s prediction of the Higgs boson. Therefore, the data selection criteria used in the ATLAS Higgs boson search are determined in reference to the predictions of the SM concerning the decay channels of the Higgs boson. In the next section, I shall discuss how data selection criteria are applied to collision events in the ATLAS experiment.
4. Model of data acquisition
In present-day HEP experiments, the technical limitations in terms of data storage capacity and data process time make it necessary to apply the data selection criteria to the collision events in real-time, i.e. during the course of particle collisions inside the LHC. As a result of these technical limitations, only a minute fraction of the interesting events (approximately 5 in every 1 million events) can be selected for the process of data analysis in the ATLAS experiment. As I shall describe in this section, the acquisition of interesting events in the ATLAS experiment is modeled as a three-level selection process through which a set of predetermined data selection criteria, called a trigger menu, is applied to the collision events in real-time by trigger systems at different levels (ATLAS Collaboration 2003). Since the ATLAS experiment is a multi-purpose experiment, the trigger menu is sufficiently diversified in terms of types of selection signatures that are appropriate for the various objectives of the experiment (for details, see Karaca Reference Karaca2017).
The level-1 selection process is performed by the level-1 hardware-based trigger system. Since the level-1 trigger decision time is extremely short (∼ 2.5 microseconds), the level-1 selection can identify only the regions of the ATLAS detector that contain signals (for particles, jets, and missing and total energy) satisfying the energy threshold conditions specified in the trigger menu. Therefore, at the end of the level-1 selection, the information regarding the location, momentum, and energy of particles and jets, or missing energy, contained in a selected event is fragmented across the different subsystems of the ATLAS detector system (see Figure 1), and the different pieces of this fragmented information, called event fragments, are not assembled yet, meaning that the full descriptions of the selected events are missing in this stage of the selection process.
The level-2 and level-3 selection processes are carried out by the level-2 and level-3 software-based trigger systems, which are jointly referred to as the high-level trigger system. The event fragments identified at the level-1 selection are assembled through the level-2 selection in order to obtain the full descriptions of the selected events. The level-2 selection consists of two sub-stages. In the first stage, the results of the level-1 selection are passed to the level-2 trigger system for more refined trigger decisions. In the second stage, called event building, the event fragments satisfying the conditions specified by the trigger menu are assembled, and thereby the selected events are reconstructed using specialized software algorithms. At the level-3 selection, called event filtering, the reconstructed events undergo a filtering process through which specialized software algorithms are used to further refine event selections according to the trigger menu. The events that have passed this event-filtering process are then sent to the data-storage unit for the offline data-analysis.Footnote 7
The above discussion indicates that a model of data acquisition is an essential component of the experimental process in the ATLAS experiment. This model serves to specify and organize the experimental procedures through which the chosen data selection criteria are applied to acquire interesting collision events that are considered relevant to the objectives of the ATLAS experiment. The above discussion also shows that in this experiment data generation and data acquisition are two distinct but related processes. Neither Suppes nor Mayo differentiates between these processes, and what they call experimental design includes both processes. As the case of the ATLAS experiment illustrates, the process of data generation is concerned with the production of collision events by colliding proton beams inside the collider, while the process of data acquisition is concerned with the selection of the interesting events out of the collision events that have been already generated and detected by the collider and detectors systems.
5. Simulation models
In the context of present-day HEP experiments, a simulation model is basically a mathematical model consisting of a set of partial differential equations that are numerically solvable through a simulation program executed on a computer. I shall suggest that simulation models used in present-day HEP experiments (ATLAS Collaboration 2010a) can be divided into three main categories with respect to the representations they provide: namely, as I shall call them, simulation models of instruments, simulation models of collision events, and hybrid simulation models that are combinations of simulation models of collision events and simulation models of instruments.
Simulation models of instruments represent the geometry and material composition of the instruments used in HEP experiments, including detector and trigger systems. These models involve theoretical elements such as considerations from the SM, Maxwell’s theory of electromagnetism, atomic and nuclear physics as well as engineering models concerning the material composition of the experimental set-up. They also involve empirical elements including numerical results from previous experiments about certain physical parameters, such as their upper or lower limits.
Simulation models of (target) phenomena represent collision events, without a consideration of the geometry and material composition of the instruments used to generate and detect collision events. The need for these models in present-day HEP experiments arises mainly from the fact that the SM and the BSM models are too complex to be solved analytically for the descriptions of the proton-proton collisions. Moreover, approximation methods like perturbation theory fail at low energies in the QCD sector of the SM which account for the strong interactions between protons. Simulation models of proton-proton collision events are essentially numerical implementations of (or combinations of) the phenomenological models of the SM and those of the BSM models through computer programs (based on the Monte Carlo approach) called event generators, which are used to simulate collision events, such as those resulting from the proton-proton collisions at the LHC. The simulation models of the proton-proton collision events are constructed by simulating their various aspects by means of different event generators in accordance with the SM and the BSM models. The simulation model of a proton-proton collision event primarily involves the simulation (by an event generator) of what is called the hard scattering process, which is the part of a proton-proton collision having the highest momentum transfer process.Footnote 8 The hard scattering process in turn produces the SM related processes including hard QCD process (such as quark-gluon scattering), jet production, W and Z boson production, the SM Higgs boson production,Footnote 9 as well as the processes related to the BSM models, such as the production of new gauge bosons and supersymmetric particles. For the simulation of all these physics processes resulting from the hard-scattering process, the event generators must rely on the SM and the BSM models. The physics processes relevant to the ATLAS Higgs boson search are those concerning the production of the SM Higgs boson and the SM background. These processes are simulated by using different general-purpose event generators, as shown in Table 1.
Hybrid simulation models are constructed by integrating simulation models of phenomena into simulation models of instruments. Unlike the latter models, hybrid models also represent the passage of the final decay products (such as jets, leptons, photons resulting from particle collisions) inside the instruments (such as detector and trigger systems) as well as the interaction between these instruments and the final decay products. Since hybrid models can represent the actual experimental environment and process, the use of computer simulation as an experimental method in present-day HEP experiments is mainly by virtue of these models.
In the ATLAS experiment, a simulation toolkit called Geant4 is used to simulate the passage of particles through matter (Allison et al. Reference Allison2006; Agostinelli et al. Reference Agostinelli2003). In addition, a toolkit of basic geometrical shapes, called GeoModel, is used for the simulation model of the geometrical structure of the ATLAS detector which consists of several sub-systems as shown in Figure 1 (Clark Reference Clark2011; Boge and Zeitnitz Reference Boge and Zeitnitz2020).Footnote 10 In order to construct a hybrid simulation model of the response of the ATLAS detector system to the LHC collision events (resulting from proton-proton collisions), first the simulation model of the ATLAS detector, which illustrates simulation models of instruments, is transferred to Geant4, whereby the simulation models of proton-proton collisions are incorporated into the simulation model of the ATLAS detector. The resulting hybrid simulation model represents the geometrical structure and the material composition of the ATLAS detector as well as the process of digitization that concerns the conversion of the effects of the impinging particles into detector signals.Footnote 11 Figure 2 shows a schematic view of an event represented by a hybrid simulation model.
The simulation of the ATLAS (LHC) data selection process requires the simulation of the response of the ATLAS detector and trigger systems to the LHC collision events. This means that the level-1 trigger system, which is hardware-based, also needs to be simulated through a simulation model of instrument.Footnote 12 There is no need to simulate the high-level trigger system, which consists of the level-1 and level-2 trigger systems, as these are software systems consisting of selection and reconstruction algorithms. The simulation model of the level-1 trigger is integrated into the hybrid simulation model of the response of the ATLAS detector system to the LHC collision events. In this way, the process of LHC data selection in the ATLAS experiment can be simulated by means of a hybrid simulation model. This model, which I shall call the simulation model of data acquisition, is used to select the simulated events according to the selection criteria given in the ATLAS trigger menu. The selection of the simulated events is similar to the selection of the LHC events. The simulated events that have passed the level-1 trigger selection are then passed through the high-level trigger system. Those simulated events satisfying the trigger menu are stored as the simulation data to be used in the process of data analysis. As I shall discuss in the next section, the simulation data are used to realistically estimate the amount and composition of the SM background as well as the signal expectation for a Higgs boson.
6. Model of statistical hypothesis testing and models of data
The model of hypothesis testing used in the ATLAS experiment to test the SM Higgs boson hypothesis is based on the consideration that the Higgs boson signals (in a given decay channel) are distinguished from the SM background processes through what are called discriminating variables (ATLAS Collaboration 2012b). Since the SM Higgs boson processes are expected to be detected in observed excesses of events with either invariant mass—also called rest mass—or transverse massFootnote 13 relative to the background expectation, these quantities are taken to be the relevant discriminating variables in the model of testing adopted in the ATLAS Higgs boson search. This model of testing is essentially a statistical model in the sense that it quantifies the level of agreement between the data and the hypothesis being tested through the local p value, which signifies “the probability that the background can produce a fluctuation greater than or equal to the excess observed in data” (ATLAS Collaboration 2012a, 11).Footnote 14
In the ATLAS experiment, the statistical testing of the SM Higgs boson hypothesis is carried out by using the sets of proton-proton collision events produced at a center-of-mass energy of √s = 7 TeV in 2011, and those produced at √s = 8 TeV in 2012 for the following Higgs boson decay channels: H → ZZ* → 4l, where l stands for an electron or a muon; H → γγ, where γ denotes a photon; and H → WW* → eνμν, where e, and μ denote respectively an electron, a neutrino and a muon (ATLAS Collaboration 2012a). The discriminating variable in the ATLAS statistical model of testing is taken to be the four-lepton invariant mass for the channel: H → ZZ* → 4l, and the diphoton invariant mass (mγγ) for the channel H → γγ. Instead of invariant mass, transverse mass (mT) is taken to be the discriminating variable for the channel H → WW* → eνμν; because the neutrino is invisible in the detector and thus one cannot reconstruct the mass of the W bosons from the invariant masses of their decay products (Barr et al. Reference Barr, Gripaios and Lester2009, 1).
For the sake of brevity, I shall consider only the data analysis performed for the four-lepton decay channel: H → ZZ* → 4l, where the SM background has both reducible and irreducible components. The irreducible component arises from the production of ZZ via quark-antiquark annihilation or gluon fusion, resulting in events having the same final states as the Higgs boson, while the reducible component mainly consists of events containing leptons resulting from the production of a Z boson or a top quark in association with jets that can be misidentified as leptons. In the ATLAS Higgs boson search, the irreducible background is estimated by using exclusively the simulation data, while the reducible background is estimated by using both the simulation data and the LHC data.Footnote 15 In Figure 3, the purple and red colored histograms show respectively the reducible and irreducible components of the SM background in the four-lepton channel. The blue colored histogram is also based on simulated events and gives an estimation of the distribution of a SM Higgs boson signal near mH = 125 GeV, which is consistent with the excess of SM Higgs signal events (the LHC data) observed near this energy as shown by black dots in the same figure. Unlike the four-lepton channel, the background in the two-photon channel is determined mainly from the LHC data, while both the LHC and the simulation data are used to estimate the background in the decay channel: H → HH* → eνμν. For the search results combined for all the three decay channels, the local p value was computed to be 1.7 × 10−9, corresponding to a statistical significance of 5.9 standard deviations. For the ATLAS Collaboration, this result means that the excess of events observed “is compatible with the production and decay of the [SM] Higgs boson,” with a mass of 126.0 ± 0.4 (stat) ± 0.4 (sys) GeV, indicating an experimental confirmation of the SM Higgs boson hypothesis (ATLAS Collaboration 2012a, 1). This in turn resulted in the ATLAS Collaboration’s claim of the discovery of the SM Higgs boson in 2012.
In the ATLAS Higgs boson search, the simulation data are made use of to estimate the SM background and thereby to quantify the statistical significance of the excess of the Higgs boson events observed. The strong dependence of the background estimation of the four-lepton channel on simulation data arises from the fact that there are relatively much fewer observed events associated with this channel and that the corresponding statistical errors are thus large, as shown in Figure 3. This makes the simulation data preferable over the LHC data for an accurate estimation of the background in the four-lepton decay channel.Footnote 16 It is worth noting that HEP experimenters seek to reduce the dependence of results on simulation data, albeit without compromising their accuracy. For instance, in an analysis recently conducted by the ATLAS Collaboration, the background in the four-lepton decay channel is estimated by using, in addition to simulation data, a technique driven by the LHC data that allows a reduction in the systematic uncertainty (ATLAS Collaboration 2019). However, this does not mean that simulation data cannot be constitutive of experimental results. On the contrary, the ATLAS Higgs boson search shows that in present-day HEP experiments, simulation data can be preferred over collider data if the former enables more accurate experimental results.
According to the HoM account, in order for data sets to be usable for hypothesis testing, they must be put into models of data that satisfy the statistical requirements imposed by the model of statistical testing. In the ATLAS experiment, the SM Higgs boson hypothesis is tested against the transverse and invariant mass distributions in the LHC data consisting of selected SM Higgs signal and background events. Therefore, these mass distributions can be regarded as data models, in the sense defined by the HoM account, namely that they represent the data forms that bring out the features of the LHC data that are relevant to the statistical testing of the SM Higgs boson hypothesis.Footnote 17 The ATLAS statistical model of testing also applies to the simulation data consisting of selected simulated SM Higgs signal and background events, because the mass distributions are also calculated for these simulated events. One can therefore regard these foregoing mass distributions as the models of the simulation data, which are distinct from the above-mentioned models of the LHC data. For instance, Figure 3 shows the invariant four-lepton mass distributions based on the LHC data and the simulation data for the decay channel: H → ZZ* → 4l. These mass distributions can be regarded as the LHC data models and simulation data models.
The above considerations indicate that the sets of procedures involved in the acquisition of the data in the ATLAS experiment and those involved in the statistical testing of the SM Higgs boson hypothesis are distinct from each other and thus modeled through different types of models, namely a model of data acquisition and a model of statistical testing, respectively. In present-day HEP experiments, the selected interesting events are the outputs of the model of data acquisition. During the stage of data analysis, these events are put into the models of data against which the primary theoretical model is statistically tested, as I have illustrated in the case of the ATLAS Higgs boson search. Therefore, in present-day HEP experiments, the model of data acquisition precedes both the models of data and the model of hypothesis testing in the experimental process. This in turn means that in these experiments, there exists no overarching model that can be referred to as the model of the experiment—as suggested by the HoM account—that encompasses the procedures for both hypothesis testing and data acquisition.
7. The model-to-model relations in the ATLAS Higgs boson search
The discussion in the previous sections indicates that performing the ATLAS Higgs boson search experiment requires using different types of models. The relations among these models are shown in Figure 4, where an arrow pointing towards a given model denotes an input to that model from the model that the arrow originates. In these model-to-model relations, the SM acts as the main theoretical model, in the sense that it provides the SM Higgs boson hypothesis for the testing of which the ATLAS Higgs boson search experiment was designed and performed. This testing consists in determining the statistical significance of the agreement between the SM Higgs boson hypothesis and the excess of SM Higgs events observed above the SM background events. For this purpose, a model of statistical hypothesis testing is used, and as shown in Figure 4, this model requires as inputs not only the SM Higgs boson hypothesis but also the models of the LHC data and the models of simulation data. These data models consist of the mass distributions for the selected LHC and simulated events (i.e. SM Higgs signal and background events). The (unidirectional) relation between the model of LHC data acquisition and the models of LHC data is indicative of the fact that the SM Higgs hypothesis is tested against the mass distributions for the selected LHC events that are provided by the model of LHC data acquisition. Similarly, the (unidirectional) relation between the simulation model of data acquisition and the models of simulation data is indicative of the fact that the relevant SM backgrounds, as well as the signal expectation for a SM Higgs boson, are estimated by using the mass distributions for the selected simulated SM background events that are provided by the simulation model of data acquisition.
The foregoing model-to-model relations are necessary to analyze the selected data—namely LHC and simulated events—and thereby to test the SM Higgs boson hypothesis against the resulting data models—namely the models of LHC and simulation data. Both the model of LHC data acquisition and the simulation model of data acquisition are designed to perform event selection in accordance with the chosen selection criteria given in a trigger menu, which are also referred to as triggers. As shown in Figure 4, the predictions (or conclusions) of the SM concerning the Higgs decay channels and the background production processes are then the inputs of the SM to the model of LHC data acquisition and to the simulation model of data acquisition, in the sense that these predictions serve to determine the triggers that can be seen as the (data selection) parameters of the above models of data acquisition. As also shown in the same figure, the predictions of the SM concerning the Higgs decay channels and the background production processes are also the inputs to the simulation models of proton-proton collisions (namely general-purpose event generators), in the sense that their construction is (partly) based on these predictions.Footnote 18 The results of these models are the simulated LHC events that are then the inputs to the simulation model of data acquisition.
The simulation models of proton-proton collisions involve “a number of relatively free parameters which must be tweaked if [they are] to describe experimental data” (Buckley Reference Buckley, Hoeth, Lacker, Schulz and Seggern2010, 331).Footnote 19 The LHC events selected by the model of LHC data acquisition are used to optimize the free parameters of the simulation models of proton-proton collisions. This process of parameter optimization is called tuning. Prior to the discovery of the Higgs boson in 2012, the event generators used in the ATLAS experiment were tuned twice. The first tuning was based on the data obtained in previous HEP experiments, namely the CDF and D0 experiments (ATLAS Collaboration 2010b). The second tuning was based on the LHC data obtained in 2010 as well as the data from the CDF experiment, but with a greater reliance on the LHC data. It was this second tuning that served to optimize the free parameters of the event generators used in the ATLAS Higgs boson search (ATLAS Collaboration 2011). This shows that the LHC data, which are the outputs of the ATLAS data-acquisition model, are also the inputs to the simulation models of the proton-proton collisions, as they are used to tune the free parameters of these models. Similarly, the model of LHC data acquisition also needs as inputs the simulated SM Higgs signal and background events provided by the simulation model of data acquisition, because simulation studies based on these events are performed in order to optimize the performance of the trigger menu. To this end, the rates of the triggers and their efficiencies are determined from these simulation studies (ATLAS Collaboration 2012c; 2017).Footnote 20 In this way, the weight of each trigger in the total trigger rate (for each level of selection) is estimated, and thereby both the composition of the trigger menu and the associated threshold energies are adjusted so as to optimize the efficiencies of the triggers. The aim of this optimization is to ensure that the events considered interesting are selected with high efficiency (ATLAS Collaboration 2008). Therefore, as explained above and also shown in Figure 4, in the ATLAS Higgs boson search experiment, there exists a feedback loop between the model of LHC data acquisition and the simulation model of data acquisition via the simulation models of proton-proton collisions, because the outputs of these models contribute to the inputs that they receive from each other and thereby affecting their own outputs. As the previous discussion indicates, this feedback loop serves to carry out the experimental design necessary to optimize the trigger menu—namely data selection parameters—and the free parameters of the simulation models of proton-proton collisions. It is therefore important to note that in the ATLAS experiment there exists no separate model of experimental design for data acquisition and that this design is carried out through the aforementioned model-to-model relations.
In order for the above-mentioned model-to-model relations to be established in accordance with the objective of the ATLAS Higgs boson search, the model of LHC data acquisition and the simulation model of data acquisition need to satisfy the requirement of high selectivity for the events considered interesting. This modelling requirement serves to ensure the relevance of the LHC and simulation data models to the testing of the SM Higgs boson hypothesis. It was satisfied by determining the trigger menu in accordance with the SM’s predictions concerning the decay channels of the Higgs boson as well as by optimizing its efficiency based on the results of simulation studies. The previous discussion indicates that the experimental work necessary to satisfy this modelling requirement is performed through the (unidirectional) relations between the SM and the models of data acquisition (for both the LHC and simulation data) as well as through the feedback loop among the latter models. Moreover, the simulation models of proton-proton collisions need to satisfy an important requirement, namely that the accuracy of their results in reproducing the LHC data. This modelling requirement must be satisfied to ensure the accuracy of the estimation of the SM background events, as this background is estimated through the simulation data models based on the results of the simulation models of proton-proton collisions. In the ATLAS experiment, the requirement of accuracy was satisfied by tuning the free parameters of these simulation models to the LHC events acquired through the model of data acquisition.
The failure to satisfy the modelling requirements of high selectivity and accuracy would endanger the objective of the ATLAS Higgs search experiment, as their fulfillment is necessary to ensure the relevance and accuracy of the data models used in the testing of the SM Higgs boson hypothesis. The experimental work that goes into satisfying these modelling requirements is an important part of the process of experimental design in the ATLAS Higgs boson search. This illustrates Mayo’s characterization of the process of experimental design, which she takes to consist of “all of the considerations in the data generation that relate explicitly to the recorded data, that is, to some feature of the data models” (Mayo Reference Mayo1996, 139). In Mayo’s account, experimental design, which lies beneath the data models in the HoM, serves to ensure that what she calls experimental assumptionsFootnote 21 are satisfied:
The worry about experimental assumptions occurs at the level of the data model in the hierarchy. However, the work that goes into satisfying the experimental assumptions would be placed at the levels below the data model. (Mayo Reference Mayo1996, 156)
However, according to the HoM account, experimental design is not a model-based process, whereas in the case of the ATLAS Higgs boson search experiment, the experimental design concerning the acquisition of both the LHC and simulation data proceeds through the relations among the primary theoretical model, the models of data acquisition for both the LHC data and the simulation data, and the simulation models of proton-proton collisions. These model-to-model relations (including a feedback loop) need to be established before the experimental process enters the stage of data acquisition, meaning that they are not part of the Suppean HoM that governs the statistical testing of the SM Higgs hypothesis. Rather, they enable the experimental design that is appropriate to acquire the LHC and simulation data used in performing this testing.
Since the ATLAS experiment is a multi-purpose experiment, the process of experimental design concerning data acquisition is carried out not only for the SM Higgs boson search, but also for the other types of searches, including supersymmetry, extra dimensions, and top quark (see ATLAS Collaboration 2008). The models of data acquisition (for both the LHC and simulation data) and the general-purpose simulation models of proton-proton collisions are jointly used for different searches carried out in the ATLAS experiment. This means that the experimental design aiming at the optimization of the trigger menu and of the free parameters of the simulation models of proton-proton collisions is common to all the foregoing searches. The model-to-model relations underlying this design process differ from each other by their primary theoretical models that are different in different searches, such as the SM and the BSM models.
8. Conclusions
The main tenet of the HoM account is that experimental procedures concerning data analysis and hypothesis testing are governed by models of different types. Since the formation of data models always precedes the testing of theoretical models, linear hierarchy is the only type of relation that can exist among the types of models recognized by the HoM account. The present case study illustrates that the model-based structure of the present-day HEP experiments also involves models of data acquisition and simulation models. Since these are not the types of models recognized by the HoM account, this account is not equipped to provide a model-based characterization of the inference underlying the acquisition of data in present-day HEP experiments. The HoM account is thus more of an account of (statistical) hypothesis testing against available data than an account of scientific experimentation, which involves other processes such as experimental design and data acquisition. In fact, what Suppes aimed with his HoM account was “to show that exact analysis of the relation between empirical theories and relevant data calls for a hierarchy of models of different logical type” (Suppes Reference Suppes, Nagel, Suppes and Tarski1962, 260). In the HoM account, there exists a boundary between theory and experiment in the sense that “a whole hierarchy of models stands between the model of the basic theory and the complete experimental practice” (ibid.). This means that theory is barred from having bottom-up effects on the aspects of the experimental process concerning data acquisition, as these aspects lie at the bottom of the Suppean HoM. As a result, in the HoM account, theory can only have top-down effects on the aspects of the experimental process concerning hypothesis testing, and these effects are brought about by virtue of the relation of the model of theory to the model of hypothesis testing.
The case of the ATLAS Higgs boson search illustrates that in present-day HEP experiments the chain of inference leading from the collisions of particles to the testing of HEP models is entirely model-based. In the ATLAS case, this inference consists of two parts. The first part is concerned with the acquisition of the LHC and simulation data—namely LHC and simulated SM Higgs signal events and relevant SM background events—while the second part is concerned with the use of the acquired data for the determination of the statistical significance of the excess of the LHC SM Higgs signal events above the SM background. As indicated by the discussion in the previous section, this two-fold inference is drawn through the interrelations among the various models involved in the ATLAS Higgs boson search. In order for the first part of the above-mentioned inference to be reliably drawn, it is necessary that the SM, the models of LHC and simulation data acquisition, and the simulation models of proton-proton collisions relate to each other in the ways shown in Figure 4. These model-to-model relations indicate that no HoM is required to draw the inference leading to the acquisition of the LHC and simulation data. Whereas the second part of this inference process, which is directly concerned with the statistical testing of the SM Higgs boson hypothesis, is drawn through a Suppean HoM that leads from the models of the LHC and simulation data and proceeds through the statistical model of testing towards the SM. The SM lies at the top of this hierarchy as it provides the Higgs boson hypothesis tested in the ATLAS experiment. Therefore, the model-to-model relations underlying the chain of inference leading to the discovery of the SM Higgs boson constitute what is akin to a network of models (NoM). This NoM subsumes the Suppean HoM as the model-based characterization of the part of the foregoing chain of inference leading from the data to the validity or invalidity of the theoretical model tested in a HEP experiment. The part of the proposed NoM falling outside the Suppean HoM is concerned with the relations that exist among the primary theoretical model, the models of data acquisition for both the collider data and simulation data, and the simulation models of particle collisions. The model-to-model relations in this part of the NoM lie within a non-linear structure through which the experimental design concerning the acquisition of both the collider data and simulation data is performed. This is unlike the linear structure of the model-to-relations relations constituting the Suppean HoM.
The fact that the part of the NoM concerning the experimental design in the ATLAS Higgs boson search involves the SM as the primary theoretical model is indicative of the bottom-up effects of theoretical considerations about the phenomena of interest on the process of data acquisition. These effects illustrate that in present-day HEP experiments the data is theory-laden in the sense that its acquisition is guided by theoretical considerations based on the HEP models, such as the SM and BSM models tested in the LHC experiments.Footnote 22 In these experiments, theory-ladenness is intensified due to the fact that the required data partly consists of the simulation data that is largely theoretical in the sense that it is composed of the results of simulation models of particle collisions, which are the numerical solutions of the phenomenological models of HEP. Moreover, simulation data is also used in the acquisition of the collider data. Therefore, the NoM proposed in this paper can account for both the top-down and bottom-up effects of theoretical considerations on the experimental process, and it thereby provides a model-based characterization of the theory-ladenness of experimental results.
Acknowledgments
I am grateful to Christian Zeitnitz and Michael Krämer for helpful conversations concerning simulations in present-day high energy physics experiments. I would also like to thank three anonymous referees of this journal for their comments and suggestions on earlier versions of this paper.