5 - Logistic regression
Published online by Cambridge University Press: 05 June 2012
Summary
No, no, you're not thinking; you're just being logical.
Niels BohrIntroduction
We introduce here the logistic regression model. It is a widely used statistical technique in biomedical data analysis, for a number of reasons.
First, it estimates the probability of a case belonging to a group – in principle this provides more information than simply deciding (yes, no) to which group a case belongs. Moreover, the probability estimates can be turned into predictions.
Second, it is fairly easy to interpret, as the (log odds of the) probability is expressed as a linear weighted sum of the features, much like a regression analysis.
Third, like regression analysis, the coefficients in the model can (with similar care) be interpreted as positive or negative associations of the variables with the predicted probability. For example, individual genes or clinical findings can be assigned protective or risk values expressed as log odds.
As will be discussed in Chapters 8 and 9, there are problems with interpreting any regression models, yet compared to the other statistical learning machines we eventually discuss, logistic regression is far easier to interpret. This is why we suggest first applying learning machines to the data, to identify the most informative features, then generating a simpler, equally accurate model – logistic regression – using just those features. The logistic regression model is an endpoint or reference model throughout this book.
- Type
- Chapter
- Information
- Statistical Learning for Biomedical Data , pp. 91 - 117Publisher: Cambridge University PressPrint publication year: 2011