Book contents
- Frontmatter
- Contents
- Contributors
- Editors’ Note
- 1 Opportunities and Challenges: Lessons from Analyzing Terabytes of Scanner Data
- 2 Is Big Data a Big Deal for Applied Microeconomics?
- 3 Low-Frequency Econometrics
- 4 Shocks, Sign Restrictions, and Identification
- 5 Macroeconometrics – A Discussion
- 6 On the Distribution of the Welfare Losses of Large Recessions
- 7 Computing Equilibria in Dynamic Stochastic Macro-Models with Heterogeneous Agents
- 8 Recent Advances in Empirical Analysis of Financial Markets: Industrial Organization Meets Finance
- 9 Practical and Theoretical Advances in Inference for Partially Identified Models
- 10 Partial Identification in Applied Research: Benefits and Challenges
- Index
1 - Opportunities and Challenges: Lessons from Analyzing Terabytes of Scanner Data
Published online by Cambridge University Press: 27 October 2017
- Frontmatter
- Contents
- Contributors
- Editors’ Note
- 1 Opportunities and Challenges: Lessons from Analyzing Terabytes of Scanner Data
- 2 Is Big Data a Big Deal for Applied Microeconomics?
- 3 Low-Frequency Econometrics
- 4 Shocks, Sign Restrictions, and Identification
- 5 Macroeconometrics – A Discussion
- 6 On the Distribution of the Welfare Losses of Large Recessions
- 7 Computing Equilibria in Dynamic Stochastic Macro-Models with Heterogeneous Agents
- 8 Recent Advances in Empirical Analysis of Financial Markets: Industrial Organization Meets Finance
- 9 Practical and Theoretical Advances in Inference for Partially Identified Models
- 10 Partial Identification in Applied Research: Benefits and Challenges
- Index
Summary
This paper seeks to better understand what makes big data analysis different, what we can and cannot do with existing econometric tools, and what issues need to be dealt with in order to work with the data efficiently. As a case study, I set out to extract any business cycle information that might exist in four terabytes of weekly scanner data. The main challenge is to handle the volume, variety, and characteristics of the data within the constraints of our computing environment. Scalable and efficient algorithms are available to ease the computation burden, but they often have unknown statistical properties and are not designed for the purpose of efficient estimation or optimal inference. As well, economic data have unique characteristics that generic algorithms may not accommodate. There is a need for computationally efficient econometric methods as big data is likely here to stay.
INTRODUCTION
The goal of a researcher is often to extract signals from the data, and without data, no theory can be validated or falsified. Fortunately, we live in a digital age that has an abundance of data. According to the website Wikibon (www.wikibon.org), there are some 2.7 zetabytes of data in the digital universe. The US Library of Congress collected 235 terabytes of data as of 2011. Facebook alone stores and analyzes over 30 petabytes of user-generated data. Google processed 20 petabytes of data daily back in 2008, and undoubtedly much more are being processed now. Walmart handles more than one million customer transactions per hour. Data from financial markets are available at ticks of a second. We now have biometrics data on finger prints, handwriting, medical images, and last but not least, genes. The 1000 Genomes project stored 464 terabytes of data in 2013 and the size of the database is still growing. Even if these numbers are a bit off, there is lot of information out there to be mined. The data can potentially lead economists to a better understanding of consumer and firm behavior, as well as the design and functioning of markets.
- Type
- Chapter
- Information
- Advances in Economics and EconometricsEleventh World Congress, pp. 1 - 34Publisher: Cambridge University PressPrint publication year: 2017
- 4
- Cited by