Why Data Never Speak for Themselves

S. Nassir Ghaemi

doi:10.1017/9781108887526.002

Statistics grew out of the need to better understand clinical experience. The early “numerical method” simply meant counting, which is better than pure clinical experience. But you can’t simply count either, because of the importance of confounding factors that distort your clinical experience; hence, the need to understand statistical methods.

The beginning of wisdom is to recognize our own ignorance. We mental health clinicians need to start by acknowledging that we are ignorant: we do not know what to do; if we did, we would not need to read anything, much less this book – we could then just treat our patients with the infallible knowledge that we already possess. Although there are dogmatists (and many of them) of this variety – who think that they can be good mental health professionals by simply applying the truths of, say, Freud (or Prozac) to all – this book is addressed to those who know that they do not know, or who at least want to know more.

When faced with persons with mental illnesses, we clinicians need to first determine what their problems are, and then what kinds of treatments to give them. In both cases, in particular the matter of treatment, we need to turn somewhere for guidance: how should we treat patients?

We no longer live in the era of Galen: pointing to the opinions of a wise man is insufficient (though some still do this). Many have accepted that we should turn to science; some kind of empirical research should guide us.

If we accept this view – that science is our guide – then the first question is how are we to understand science?

Science Is Not Simple

This book would be unnecessary if science was simple. I would like to disabuse the reader of any simple notion of science, specifically “positivism”: the view that science consists of positive facts, piled one upon another, each of which represents an absolute truth or an independent reality, our business being simply to discover those truths or realities.

This is simply not the case. Science is much more complex.

For the past century scientists and philosophers have debated this matter, and it comes down to this: Facts cannot be separated from theories; science involves deduction, not just induction. In this way, no facts are observed without a preceding hypothesis. Sometimes, the hypothesis is not even fully formulated or even conscious; I may have a number of assumptions that direct me to look at certain facts. It is in this sense that philosophers say that facts are “theory-laden”; between fact and theory, no sharp line can be drawn.

How Statistics Came to Be

A broad outline of how statistics came to be is as follows (Salsburg 2001a): Statistics were developed in the eighteenth century because scientists and mathematicians began to recognize the inherent role of uncertainty in all scientific work. In physics and astronomy, for instance, Pierre-Simon LaPlace realized that certain error was inherent in all calculations. Instead of ignoring the error, he chose to quantify it, and the field of statistics was born. He even showed that there was a mathematical distribution to the likelihood of errors observed in given experiments. Statistical notions were first explicitly applied to human beings by the nineteenth-century Belgian Lambert Adolphe Quetelet, who applied them to the normal population, and the nineteenth-century French physician Pierre Louis, who applied them to sick persons. In the late nineteenth century, Francis Galton, a founder of genetics and a mathematical leader, applied them to human psychology (studies of intelligence) and worked out the probabilistic nature of statistical inference more fully. His student, Karl Pearson, then took LaPlace one step further and showed that not only is there a probability to the likelihood of error, but even our own measurements are probabilities: “Looking at the data accumulated in biology, Pearson conceived the measurements themselves, rather than errors in the measurement, as having a probability distribution” (Salsburg 2001a, p. 16). Pearson called our observed measurements “parameters” (Greek for “almost measurements”), and he developed staple notions such as the mean and standard deviation. Pearson’s revolutionary work laid the basis for modern statistics. But if he was the Marx of statistics (he actually was a socialist), the Lenin of statistics would be the early-twentieth-century geneticist Ronald Fisher, who introduced randomization and p-values, followed by A. Bradford Hill in the mid-twentieth century, who applied these concepts to medical illnesses and founded clinical epidemiology. (The reader will see some of these names repeatedly in the rest of this book; the ideas of these thinkers form the basis of understanding statistics.)

It was Fisher who first coined the term “statistic” (Louis had called it the “numerical method”), by which he meant the observed measurements in an experiment, seen as a reflection of all possible measurements. It is “a number that is derived from the observed measurements and that estimates a parameter of the distribution” (Salsburg 2001a, p. 89). He saw the observed measurement as a random number among the possible measurements that could have been made, and thus “since a statistic is random, it makes no sense to talk about how accurate a single value of it is … What is needed is a criterion that depends on the probability distribution of the statistic.” How probably valid is the observed measurement, asked Fisher? Statistical tests are all about establishing these probabilities, and statistical concepts are about how we can use mathematical probability to know whether our observations are more or less likely to be correct.

A Scientific Revolution

This process really was a revolution; it was a major change in our thinking about science. Prior to these developments, even the most enlightened thinkers (such as the French Encylopedists of the eighteenth century, and Auguste Comte in the nineteenth century) saw science as the process of developing absolutely certain knowledge through refinements of sense-observation. Statistics rests on the concept that scientific knowledge, derived from observation using our five senses and aided by technologies, is not absolute. Hence, “the basic idea behind the statistical revolution is that the real things of science are distributions of number, which can then be described by parameters. It is mathematically convenient to embed that concept into probability theory and deal with probability distributions” (Salsburg 2001a, pp. 307–8).

It is thus not an option to avoid statistics if one cares about science. And if one understands science correctly, not as a matter of absolute positive knowledge but as a much more complex probabilistic endeavor (see Chapter 11), then statistics are part and parcel of science.

Some doctors hate statistics, yet they claim to support science. They cannot have it both ways.

A Benefit to Mankind

Statistics thus developed outside of medicine, in other sciences in which researchers realized that uncertainty and error were in the nature of science. Once the wish for absolute truth was jettisoned, statistics would become an essential aspect of all science. And if physics involves uncertainty, how much more uncertainty is there in medicine? Human beings are much more uncertain than atoms and electrons.

The practical results of statistics in medicine are undeniable. If nothing else had been achieved but two things – in the nineteenth century, the end of bleeding, purging, and leeching as a result of Louis’ studies; in the twentieth century, the proof of cigarette-smoking-related lung cancer as a result of Hill’s studies – we would have to admit that medical statistics have delivered humanity from two powerful scourges.

Numbers Do Not Stand Alone

The history of science shows us that scientific knowledge is not absolute, and that all science involves uncertainty. These truths lead us to a need for statistics. Thus, in learning about statistics, the reader should not expect pure facts; the result of statistical analyses is not unadorned and irrefutable fact; all statistical inference is an act of interpretation, and the result of statistics is more interpretation. This is, in reality, the nature of all science: it is interpretation of facts, not simply facts by themselves.

This statistical reality – the fact that data do not speak for themselves and that therefore positivistic reliance on facts is wrong – is called confounding bias. As discussed in the next chapter, observation is fallible: we sometimes think we see what is not in fact there. This is especially the case in research on human beings. Consider the assertion that caffeine causes cancer. Numerous studies have shown this; the observation has been made over and over again: among those with cancer, coffee use is high compared to those without cancer. Those are the unadorned facts – and they are wrong. Why? Because coffee drinkers also smoke cigarettes more than non-coffee drinkers. Cigarettes are a confounding factor in this observation, and our lives are chock full of such confounding factors. Meaning: We cannot believe our eyes. Observation is not enough for science; one must try to observe accurately, by removing confounding factors. How? In two ways:

1. Experiment, by which we control all other factors in the environment except one, thus knowing that any changes are due to the impact of that one factor. This can be done with animals in a laboratory, but human beings cannot (ethically) be controlled in this way. Enter the randomized clinical trial (RCT) – RCTs are how we experiment with humans to be able to observe accurately.
2. Statistics: Certain methods (like regression modeling; see Chapter 6) have been devised to mathematically correct for the impact of measured confounding factors.

We thus need statistics, either through the design of RCTs or through special analyses, so that we can make our observations accurate and so that we can correctly (and not spuriously) accept or reject our hypotheses.

Science is about hypotheses and hypothesis testing, about confirmation and refutation, about confounding bias and experiment, about randomized clinical trials and statistical analysis: in a word, it is not just about facts. Facts always need to be interpreted. And that is the job of statistics: not to tell us the truth, but to help us get closer to the truth by understanding how to interpret the facts.

Knowing Less, Doing More

That is the goal of this book. If you are a researcher, perhaps this book will explain why you do some of the things you do in your analyses and studies, and how you might improve them. If you are a clinician, hopefully it will put you in a place where you can begin to make independent judgments about studies and not simply be at the mercy of the interpretations of others. It may help you realize that the facts are much more complex than they seem; you may end up “knowing” less than you do now, in the sense that you will realize that much that passes for knowledge is only one among other interpretations. At the same time I hope this statistical wisdom proves liberating: you will be less at the mercy of numbers and more in charge of knowing how to interpret numbers. You will know less but, at the same time, what you do know will be more valid and more solid, and thus you will become a better clinician: applying accurate knowledge rather than speculation, and being more clearly aware of where the region of our knowledge ends and where the realm of our ignorance begins.

Clinical Implications

The founder of modern statistics, Pierre-Simon LaPlace, reportedly said that what we know is much less than what we don’t know. This perspective applies to anyone who has decent statistical knowledge and a scientific attitude. Such a person will realize that the application of scientific concepts involves the refutation of hypotheses, not just their confirmation. They will realize that most of our best scientific data refute false beliefs without necessarily replacing them immediately with true beliefs. Hence, as we become more scientific, we realize how many of our ideas were false, and thus we give up beliefs that we used to have. The result is that we will know less, but what we do know will be more solid.

A truly scientifically oriented person must become comfortable with the idea of not knowing many things. Most clinicians believe too strongly that what they think is true. They have a very high threshold for changing their ideas based on scientific evidence, even though they had a very low threshold for accepting those ideas to begin with. This is exactly the opposite of the scientific attitude. A knowledge of statistics is a good vaccine against this kind of common antiscientific attitude, held by many clinicians and even researchers.

Book contents

Chapter 1 - Why Data Never Speak for Themselves

Summary

Keywords

Science Is Not Simple

How Statistics Came to Be

A Scientific Revolution

A Benefit to Mankind

Numbers Do Not Stand Alone

Knowing Less, Doing More

Clinical Implications

Book contents

Chapter 1 - Why Data Never Speak for Themselves

Summary

Keywords

Science Is Not Simple

How Statistics Came to Be

A Scientific Revolution

A Benefit to Mankind

Numbers Do Not Stand Alone

Knowing Less, Doing More

Clinical Implications

Save book to Kindle

Save book to Dropbox

Save book to Google Drive