Part III - Statistical Framework
Published online by Cambridge University Press: 05 July 2014
Summary
Statistical Framework
If big data are to be used for the public good, the inference that is drawn from them must be valid for different, targeted populations. For that to occur, statisticians have to access the data so that they may understand the data-generating process, know whether the assumptions of their statistical model are met, and see what relevant information is included or excluded. It is clear from earlier chapters in this book that the utility of big data lies in being able to study small groups in real time, using new data analytic techniques, such as machine learning or data mining. These demands pose real challenges for anonymization and statistical analysis. The essays in this part of the book identify the issues, spell out the statistical framework for both analysis and data release, and outline key directions for future research.
A major theme of the essays is that neither the data-generating process nor the data collection process is well understood for big data. As Kreuter and Peng argue, almost all statistical experience with human subjects is based on survey data, and over time statisticians have parsed the sources of error neatly into a total survey error framework. But the data-generating process of many data streams – such as administrative data or big data – is less transparent and is not under the control of the researcher; therefore, access to the data itself is critical to building the necessary understanding. Continuous effort will be needed to develop standards of transparency in the collection of big data. Transparency is also needed on the ‘back end’ – any linkage, data preparation and processing, analysis, and reporting – to ensure reproducibility. Kreuter and Peng point out that much more research is needed on linkage and matching, because the resulting knowledge will not only enrich possible analysis, but also help to evaluate the quality of the linked sources.
- Type
- Chapter
- Information
- Privacy, Big Data, and the Public GoodFrameworks for Engagement, pp. 253 - 256Publisher: Cambridge University PressPrint publication year: 2014