Book contents
- Frontmatter
- Contents
- Preface
- 1 Basic Concepts in Probability and Statistics
- 2 Hypothesis Tests
- 3 Confidence Intervals
- 4 Statistical Tests Based on Ranks
- 5 Introduction to Stochastic Processes
- 6 The Power Spectrum
- 7 Introduction to Multivariate Methods
- 8 Linear Regression: Least Squares Estimation
- 9 Linear Regression: Inference
- 10 Model Selection
- 11 Screening: A Pitfall in Statistics
- 12 Principal Component Analysis
- 13 Field Significance
- 14 Multivariate Linear Regression
- 15 Canonical Correlation Analysis
- 16 Covariance Discriminant Analysis
- 17 Analysis of Variance and Predictability
- 18 Predictable Component Analysis
- 19 Extreme Value Theory
- 20 Data Assimilation
- 21 Ensemble Square Root Filters
- Appendix
- References
- Index
10 - Model Selection
Published online by Cambridge University Press: 03 February 2022
- Frontmatter
- Contents
- Preface
- 1 Basic Concepts in Probability and Statistics
- 2 Hypothesis Tests
- 3 Confidence Intervals
- 4 Statistical Tests Based on Ranks
- 5 Introduction to Stochastic Processes
- 6 The Power Spectrum
- 7 Introduction to Multivariate Methods
- 8 Linear Regression: Least Squares Estimation
- 9 Linear Regression: Inference
- 10 Model Selection
- 11 Screening: A Pitfall in Statistics
- 12 Principal Component Analysis
- 13 Field Significance
- 14 Multivariate Linear Regression
- 15 Canonical Correlation Analysis
- 16 Covariance Discriminant Analysis
- 17 Analysis of Variance and Predictability
- 18 Predictable Component Analysis
- 19 Extreme Value Theory
- 20 Data Assimilation
- 21 Ensemble Square Root Filters
- Appendix
- References
- Index
Summary
This chapter discusses the problem of selecting predictors in a linear regression model, which is a special case of model selection. One might think that the best model is the one with the most predictors. However, each predictor is associated with a parameter that must be estimated, and errors in the estimation add uncertainty to the final prediction. Thus, when deciding whether to include certain predictors or not, the associated gain in prediction skill should exceed the loss due to estimation error. Model selection is not easily addressed using a hypothesis testing framework because multiple testing is involved. Instead, the standard approach is to define a criterion for preferring one model over another. One criterion is to select the model that gives the best predictions of independent data. By independent data, we mean data that is generated independently of the sample that was used to inform the model building process. Criteria for identifying the model that gives the best predictions in independent data include Mallows’ Cp, Akaike’s Information Criterion, Bayesian Information Criterion, and cross-validated error.
Keywords
- Type
- Chapter
- Information
- Statistical Methods for Climate Scientists , pp. 237 - 254Publisher: Cambridge University PressPrint publication year: 2022