Opportunities and Challenges: Lessons from Analyzing Terabytes of Scanner Data

doi:10.1017/9781108227223.001

1 - Opportunities and Challenges: Lessons from Analyzing Terabytes of Scanner Data

Published online by Cambridge University Press: 27 October 2017

Serena Ng

Edited by

Bo Honoré ,

Ariel Pakes ,

Monika Piazzesi and

Larry Samuelson

Show author details

Serena Ng: Affiliation:
Columbia University
Bo Honoré: Affiliation:
Princeton University, New Jersey
Ariel Pakes: Affiliation:
Harvard University, Massachusetts
Monika Piazzesi: Affiliation:
Stanford University, California
Larry Samuelson: Affiliation:
Yale University, Connecticut

Book contents

Get access

Summary

This paper seeks to better understand what makes big data analysis different, what we can and cannot do with existing econometric tools, and what issues need to be dealt with in order to work with the data efficiently. As a case study, I set out to extract any business cycle information that might exist in four terabytes of weekly scanner data. The main challenge is to handle the volume, variety, and characteristics of the data within the constraints of our computing environment. Scalable and efficient algorithms are available to ease the computation burden, but they often have unknown statistical properties and are not designed for the purpose of efficient estimation or optimal inference. As well, economic data have unique characteristics that generic algorithms may not accommodate. There is a need for computationally efficient econometric methods as big data is likely here to stay.

INTRODUCTION

The goal of a researcher is often to extract signals from the data, and without data, no theory can be validated or falsified. Fortunately, we live in a digital age that has an abundance of data. According to the website Wikibon (www.wikibon.org), there are some 2.7 zetabytes of data in the digital universe. The US Library of Congress collected 235 terabytes of data as of 2011. Facebook alone stores and analyzes over 30 petabytes of user-generated data. Google processed 20 petabytes of data daily back in 2008, and undoubtedly much more are being processed now. Walmart handles more than one million customer transactions per hour. Data from financial markets are available at ticks of a second. We now have biometrics data on finger prints, handwriting, medical images, and last but not least, genes. The 1000 Genomes project stored 464 terabytes of data in 2013 and the size of the database is still growing. Even if these numbers are a bit off, there is lot of information out there to be mined. The data can potentially lead economists to a better understanding of consumer and firm behavior, as well as the design and functioning of markets.

Type: Chapter
Information: Advances in Economics and Econometrics
Eleventh World Congress
, pp. 1 - 34

DOI: https://doi.org/10.1017/9781108227223.001 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2017

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book contents

1 - Opportunities and Challenges: Lessons from Analyzing Terabytes of Scanner Data

Summary

Access options

Save book to Kindle

Save book to Dropbox

Save book to Google Drive