Book contents
- Frontmatter
- Dedication
- Contents
- Figures
- Preface
- 1 Learning from Data, and Tools for the Task
- 2 Generalizing from Models
- 3 Multiple Linear Regression
- 4 Exploiting the Linear Model Framework
- 5 Generalized Linear Models, and Survival Analysis
- 6 Time Series Models
- 7 Multilevel Models, and Repeated Measures
- 8 Tree-Based Classification and Regression
- 9 Multivariate Data Exploration and Discrimination
- Appendix A The R System: a Brief Overview
- References
- References to R Packages
- Index of R Functions
- Index of Terms
1 - Learning from Data, and Tools for the Task
Published online by Cambridge University Press: 11 May 2024
- Frontmatter
- Dedication
- Contents
- Figures
- Preface
- 1 Learning from Data, and Tools for the Task
- 2 Generalizing from Models
- 3 Multiple Linear Regression
- 4 Exploiting the Linear Model Framework
- 5 Generalized Linear Models, and Survival Analysis
- 6 Time Series Models
- 7 Multilevel Models, and Repeated Measures
- 8 Tree-Based Classification and Regression
- 9 Multivariate Data Exploration and Discrimination
- Appendix A The R System: a Brief Overview
- References
- References to R Packages
- Index of R Functions
- Index of Terms
Summary
We begin by illustrating the interplay between questions of scientific interest and the use of data in seeking answers. Graphs provide a window through which meaning can often be extracted from data. Numeric summary statistics and probability distributions provide a form of quantitative scaffolding for models of random as well as nonrandom variation. Simple regression models foreshadow the issues that arise in the more complex models considered later in the book. Frequentist and Bayesian approaches to statistical inference are contrasted, the latter primarily using the Bayes Factor to complement the limited perspective that p-values offer. Akaike Information Criterion (AIC) and related "information" statistics provide a further perspective. Resampling methods, where the one available dataset is used to provide an empirical substitute for a theoretical distribution, are introduced. Remaining topics are of a more general nature. RStudio is one of several tools that can help in organizing and managing work. The checks provided by independent replication at another time and place are an indispensable complement to statistical analysis. Questions of data quality, of relevance to the questions asked, of the processes that generated the data, and of generalization, remain just as important for machine learning and other new analysis approaches as for more classical methods.
- Type
- Chapter
- Information
- A Practical Guide to Data Analysis Using RAn Example-Based Approach, pp. 1 - 87Publisher: Cambridge University PressPrint publication year: 2024