Published online by Cambridge University Press: 01 April 2022
This paper attempts to define Exploratory Data Analysis (EDA) more precisely than usual, and to produce the beginnings of a philosophy of this topical and somewhat novel branch of statistics.
A data set is, roughly speaking, a collection of k-tuples for some k. In both descriptive statistics and in EDA, these k-tuples, or functions of them, are represented in a manner matched to human and computer abilities with a view to finding patterns that are not “kinkera”. A kinkus is a pattern that has a negligible probability of being even partly potentially explicable. A potentially explicable pattern is one for which there probably exists a hypothesis of adequate “explicativity”, which is another technical probabilistic concept. A pattern can be judged to be probably potentially explicable even if we cannot find an explanation. The theory of probability understood here is one of partially ordered (interval-valued), subjective (personal) probabilities. Among other topics relevant to a philosophy of EDA are the “reduction” of data; Francis Bacon's philosophy of science; the automatic formulation of hypotheses; successive deepening of hypotheses; neurophysiology; and rationality of type II.
I am grateful to John W. Pratt for some useful criticisms. This work was supported in part by N.I.H. Grant R01-GM18770.