4 - Start with Data Science
Published online by Cambridge University Press: 28 April 2022
Summary
Introduction
As interest in data science and computing increases, educators are presented with an opportunity to introduce students to quantitative studies through the lens of data science. Data science courses and programs are popping up all over the place, however the jury is still out on how we can effectively and efficiently teach data science to students with little to no background in computing and statistical thinking. Additionally, how can we equip students with the skills and tools for reasoning with data? Finally, and most importantly, how can we ensure students leave a data science course wanting to learn more? This chapter describes an introductory data science course that provides an answer to these questions.
Many university curricula require that students take at least one quantitative course. Most students fulfill this requirement with an introductory statistics course. While many of these courses incorporate real data sets, these datasets tend to be small and clean, unlike real datasets caught in the wild. Additionally, these courses tend to focus primarily on statistical inference for small to medium sized data. Few have provided guidance for what to do when these conditions don't hold (which is true for most real data). Additionally, the great focus on inference has meant that little time is spent on other important data analysis steps like importing data, cleaning data and performing thorough exploratory data analysis.
The increased availability of data and the recent emergence of the field of data science are two of the main influences behind the most recent modifications to the Guidelines for Assessment and Instruction in Statistics Education (GAISE) (Carver et al., 2016). These modifications include increased emphasis on teaching statistical thinking in introductory statistics classes. Specifically, the guidelines emphasize that introductory statistics should be taught as an investigative cycle of asking questions and obtaining answers, particularly those involving the relationships between multiple variables. The recommendations also stress using technology to explore concepts and analyze data. Data analysis comprises so much more than just inference and modeling. As Grolemund and Wickham (2018) suggest in their book R for Data Science, data analysis comprises a full lifecycle from importing data sources to communicating results (https://r4ds.had.co.nz/introduction. html).
- Type
- Chapter
- Information
- Data Science in the LibraryTools and Strategies for Supporting Data-Driven Research and Instruction, pp. 67 - 80Publisher: FacetPrint publication year: 2021