Book contents
- Frontmatter
- Dedication
- Contents
- Preface
- Acknowledgments
- Notation
- Part I Classic Statistical Inference
- Part II Early Computer-Age Methods
- Part III Twenty-First-Century Topics
- 15 Large-Scale Hypothesis Testing and FDRs
- 16 Sparse Modeling and the Lasso
- 17 Random Forests and Boosting
- 18 Neural Networks and Deep Learning
- 19 Support-Vector Machines and Kernel Methods
- 20 Inference After Model Selection
- 21 Empirical Bayes Estimation Strategies
- Epilogue
- References
- Author Index
- Subject Index
15 - Large-Scale Hypothesis Testing and FDRs
from Part III - Twenty-First-Century Topics
Published online by Cambridge University Press: 05 July 2016
- Frontmatter
- Dedication
- Contents
- Preface
- Acknowledgments
- Notation
- Part I Classic Statistical Inference
- Part II Early Computer-Age Methods
- Part III Twenty-First-Century Topics
- 15 Large-Scale Hypothesis Testing and FDRs
- 16 Sparse Modeling and the Lasso
- 17 Random Forests and Boosting
- 18 Neural Networks and Deep Learning
- 19 Support-Vector Machines and Kernel Methods
- 20 Inference After Model Selection
- 21 Empirical Bayes Estimation Strategies
- Epilogue
- References
- Author Index
- Subject Index
Summary
By the final decade of the twentieth century, electronic computation fully dominated statistical practice. Almost all applications, classical or otherwise, were now performed on a suite of computer platforms: SAS, SPSS, Minitab, Matlab, S (later R), and others.
The trend accelerates when we enter the twenty-first century, as statistical methodology struggles, most often successfully, to keep up with the vastly expanding pace of scientific data production. This has been a twoway game of pursuit, with statistical algorithms chasing ever larger data sets, while inferential analysis labors to rationalize the algorithms. Part III of our book concerns topics in twenty-first-century1 statistics.
The word “topics” is intended to signal selections made from a wide catalog of possibilities. Part II was able to review a large portion (though certainly not all) of the important developments during the postwar period. Now, deprived of the advantage of hindsight, our survey will be more illustrative than definitive.
For many statisticians, microarrays provided an introduction to largescale data analysis. These were revolutionary biomedical devices that enabled the assessment of individual activity for thousands of genes at once— and, in doing so, raised the need to carry out thousands of simultaneous hypothesis tests, done with the prospect of finding only a few interesting genes among a haystack of null cases. This chapter concerns large-scale hypothesis testing and the false-discovery rate, the breakthrough in statistical inference it elicited.
Large-Scale Testing
The prostate cancer data, Figure 3.4, came from a microarray study of n = 102 men, 52 prostate cancer patients and 50 normal controls. Each man's gene expression levels were measured on a panel of N = 6033 genes, yielding a 6033 102 matrix of measurements xij,
For each gene, a two-sample t statistic (2.17) ti was computed comparing gene i 's expression levels for the 52 patients with those for the 50 controls. Under the null hypothesis H0i that the patients’ and the controls’ responses come from the same normal distribution of gene i expression levels, ti will follow a standard Student t distribution with 100 degrees of freedom, t100.
- Type
- Chapter
- Information
- Computer Age Statistical InferenceAlgorithms, Evidence, and Data Science, pp. 271 - 297Publisher: Cambridge University PressPrint publication year: 2016