Statistical models and inference

Coryn A. L. Bailer-Jones

doi:10.1017/9781108123891.004

In this chapter I will introduce the principles of probabilistic inference. We will see how to set up models and will learn about the prior, likelihood, posterior, and evidence. I will show how inference works in practice using two simple examples: model comparison in the context of medical testing, and parameter estimation in astronomy. All of the issues covered in this chapter form the basis for deeper exploration in later chapters.

Introduction to data modelling

We perform experiments or make observations in order to learn about a phenomenon. We may describe the resulting data by calculating statistics and making plots. Such data explorations and summaries are useful – even essential – to get a feel for the data, but they are just a first step. To interpret the data we usually have to model them.

Typically we can only observe a phenomenon in part, and the data we obtain on it are noisy. Inference is the process of making general statements about a phenomenon, via a model, using noisy and incomplete data. The model represents the data in a form that gives us scientific meaning.

To do inference we must describe both the phenomenon itself and the measurement process. Consider modelling the orbit of a planet around its host star. We first define a relevant model M. This might describe the orbit as an ellipse (Keplerian orbit), as opposed to an oval or rosette. The model will have some parameters θ that describe the specific properties of the model. In the elliptical orbit case this would include the size (semi-major axis) and shape (eccentricity) of the ellipse, as well as its orientation in space. But when we observe the motion of a planet about a star, we do not observe directly the shape of the orbit or any of the other parameters. We instead see the planet at different positions (and with different velocities) at different times. The generative model (also called the forward model) is the theoretical entity that generates (or simulates) the observable data from the model parameters. Normally this is a mathematical equation. In this example it would be a deterministic equation derived from the physical laws of mechanics and gravity: given the model parameters and time of the observation, the position and velocity of the planet can be simulated exactly.

Book contents

3 - Statistical models and inference

Summary

Access options

Book contents

3 - Statistical models and inference

Summary

Access options

Save book to Kindle

Save book to Dropbox

Save book to Google Drive