Hostname: page-component-cd9895bd7-p9bg8 Total loading time: 0 Render date: 2024-12-23T10:12:24.242Z Has data issue: false hasContentIssue false

Behavioristic, Evidentialist, and Learning Models of Statistical Testing

Published online by Cambridge University Press:  01 April 2022

Deborah G. Mayo*
Affiliation:
Department of Philosophy Virginia Polytechnic Institute and State University

Abstract

While orthodox (Neyman-Pearson) statistical tests enjoy widespread use in science, the philosophical controversy over their appropriateness for obtaining scientific knowledge remains unresolved. I shall suggest an explanation and a resolution of this controversy. The source of the controversy, I argue, is that orthodox tests are typically interpreted as rules for making optimal decisions as to how to behave–-where optimality is measured by the frequency of errors the test would commit in a long series of trials. Most philosophers of statistics, however, view the task of statistical methods as providing appropriate measures of the evidential-strength that data affords hypotheses. Since tests appropriate for the behavioral-decision task fail to provide measures of evidential-strength, philosophers of statistics claim the use of orthodox tests in science is misleading and unjustified. What critics of orthodox tests overlook, I argue, is that the primary function of statistical tests in science is neither to decide how to behave nor to assign measures of evidential strength to hypotheses. Rather, tests provide a tool for using incomplete data to learn about the process that generated it. This they do, I show, by providing a standard for distinguishing differences (between observed and hypothesized results) due to accidental or trivial errors from those due to systematic or substantively important discrepancies. I propose a reinterpretation of a commonly used orthodox test to make this learning model of tests explicit.

Type
Research Article
Copyright
Copyright © 1985 by the Philosophy of Science Association

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

I am grateful to Ronald Giere, Norman Gilinsky, I. J. Good, Oscar Kempthorne, Henry Kyburg, and Larry Laudan for very helpful comments. I thank Jim Fetzer for first suggesting I spell out my (learning) model by contrasting it to the existing (behavioristic and evidentialist) models of statistical tests.

References

REFERENCES

Birnbaum, A. (1977), “The Neyman-Pearson Theory as Decision Theory, and as Inference Theory; With a Criticism of the Lindley-Savage Argument for Bayesian Theory”, Synthese 36: 1950.CrossRefGoogle Scholar
Carnap, R. (1950), Logical Foundations of Probability. Chicago: University of Chicago Press.Google Scholar
Edwards, W.; Lindman, H.; and Savage, L. J. (1963), “Bayesian Statistical Inference for Psychological Research”, Psychological Review 70: 193–242.CrossRefGoogle Scholar
Fetzer, J. H. (1981), Scientific Knowledge. Dordrecht: D. Reidel.CrossRefGoogle Scholar
Fisher, R. A. (1955), “Statistical Methods and Scientific Induction”, Journal of the Royal Statistical Society B 17: 6978.Google Scholar
Giere, R. N. (1969), “Bayesian Statistics and Biased Procedures”, Synthese 20: 371–87.CrossRefGoogle Scholar
Giere, R. N. (1976), “Empirical Probability, Objective Statistical Methods and Scientific Inquiry”, in Foundations of Probability Theory, Statistical Inference and Statistical Theories of Science, vol. 2, W. L. Harper and C. A. Hooker (eds.). Dordrecht: D. Reidel, pp. 63101.CrossRefGoogle Scholar
Giere, R. N. (1977), “Testing vs. Information Models of Statistical Inference”, in Logic, Laws and Life, Colodny, R. G. (ed.). Pittsburgh: University of Pittsburgh Press, pp. 1970.Google Scholar
Good, I. J. (1950), Probability and the Weighing of Evidence. London: Griffin; New York: Hafner.Google Scholar
Good, I. J. (1980), “The Diminishing Significance of a P-Value as the Sample Size Increases”, Journal of Statistical Computation and Simulation 11: 307–9.CrossRefGoogle Scholar
Good, I. J. (1981), “Some Logic and History of Hypothesis Testing”, in Philosophy in Economics, Pitt, J. C. (ed.), Dordrecht: D. Reidel, pp. 149–74.Google Scholar
Good, I. J. (1982), “Standardized Tail-Area Probabilities”, Journal of Statistical Computation and Simulation 13: 6566.CrossRefGoogle Scholar
Hacking, I. (1965), Logic of Statistical Inference. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Hacking, I. (1980), “The Theory of Probable Inference: Neyman, Peirce and Braithwaite”, in Science, Belief and Behavior: Essays in Honour of R. B. Braithwaite, Mellor, D. H. (ed.). Cambridge: Cambridge University Press, pp. 141–60.Google Scholar
Jeffreys, H. [1938] (1961), Theory of Probability. Oxford: Clarendon Press.Google Scholar
Kempthorne, O. (1971), “Probability, Statistics, and the Knowledge Business”, in Foundations of Statistical Inference, Godambe, V. P. and Sprott, D. A. (eds.). Toronto: Holt, Rinehart and Winston of Canada, pp. 470–92.Google Scholar
Kempthorne, O., and Folks, L. (1971), Probability, Statistics, and Data Analysis. Ames: Iowa State University Press.Google Scholar
Kyburg, H. E., Jr, . (1971), “Probability and Informative Inference”, in Foundations of Statistical Inference, Godampe, V. P. and Sprott, D. A. (eds.). Toronto: Holt, Rinehart and Winston of Canada, pp. 82103.Google Scholar
Kyburg, H. E., Jr, . (1974), The Logical Foundations of Statistical Inference. Dordrecht: D. Reidel.CrossRefGoogle Scholar
Levi, I. (1980), The Enterprise of Knowledge. Cambridge: The MIT Press.Google Scholar
Lindley, D. V. (1965), Introduction to Probability and Statistics From a Bayesian Point of View. Part 2: Inference. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Lindley, D. V. (1972), Bayesian Statistics, A Review. Philadelphia: Society for Industrial and Applied Mathematics.CrossRefGoogle Scholar
Mayo, D. (1981a), “In Defense of the Neyman-Pearson Theory of Confidence Intervals”, Philosophy of Science 48: 269–80.CrossRefGoogle Scholar
Mayo, D. (1981b), “Testing Statistical Testing”, in Philosophy of Economics, Pitt, J. C. (ed.). Dordrecht: D. Reidel, pp. 175–203.Google Scholar
Mayo, D. (1982), “On After-Trial Criticisms of Neyman-Pearson Theory of Statistics”, in PSA 1982, vol. 1, P. Asquith and T. Nickles (eds.). East Lansing: Philosophy of Science Association, pp. 145–58.Google Scholar
Mayo, D. (1983), “An Objective Theory of Statistical Testing”, Synthese 57: 297340.CrossRefGoogle Scholar
Neyman, J., and Pearson, E. S. (1933), “On the Problem of the Most Efficient Tests of Statistical Hypothesis”, Philosophical Transactions of the Royal Society A 231: 289337. (Reprinted in Joint Statistical Papers, Berkeley: University of California Press, 1967, pp. 276–83.)Google Scholar
Pearson, E. S. (1947), “The Choice of Statistical Tests Illustrated on the Interpretation of Data Classed in a 2 × 2 Table”, Biometrika 34: 139–67. (Reprinted in The Selected Papers of E. S. Pearson, Berkeley: University of California Press, pp. 169–97.)Google Scholar
Pearson, E. S. (1955), “Statistical Concepts in Their Relation to Reality”, Journal of the Royal Statistical Society B 17: 204–7.Google Scholar
Rosenkrantz, R. D. (1977), Inference, Method and Decision. Dordrecht: D. Reidel.CrossRefGoogle Scholar
Rosenthal, R., and Gaito, J. (1963), “The Interpretation of Levels of Significance by Psychological Researchers”, Journal of Psychology 55: 3338.CrossRefGoogle Scholar
Seidenfeld, T. (1979), Philosophical Problems of Statistical Inference. Dordrecht: D. Reidel.Google Scholar
Smith, C. (1977), “The Analogy between Decision and Inference”, Synthese 36: 7185.CrossRefGoogle Scholar
Spielman, S. (1973), “A Refutation of the Neyman-Pearson Theory of Testing”, British Journal for the Philosophy of Science 24: 201–22.CrossRefGoogle Scholar