The Value of Big Data for Urban Science

doi:10.1017/CBO9781107590205.009

6 - The Value of Big Data for Urban Science

Published online by Cambridge University Press: 05 July 2014

Steven E. Koonin and

Michael J. Holland

Edited by

Julia Lane ,

Victoria Stodden ,

Stefan Bender and

Helen Nissenbaum

Show author details

Steven E. Koonin: Affiliation:
New York University
Michael J. Holland: Affiliation:
New York University
Julia Lane: Affiliation:
American Institutes for Research, Washington DC
Victoria Stodden: Affiliation:
Columbia University, New York
Stefan Bender: Affiliation:
Institute for Employment Research of the German Federal Employment Agency
Helen Nissenbaum: Affiliation:
New York University

Book contents

Get access

Summary

Introduction

The past two decades have seen rapid advances in sensors, database technologies, search engines, data mining, machine learning, statistics, distributed computing, visualization, and modeling and simulation. These technologies, which collectively underpin ‘big data’, are allowing organizations to acquire, transmit, store, and analyze all manner of data in greater volume, with greater velocity, and of greater variety. Cisco, the multinational manufacturer of networking equipment, estimates that by 2017 there will be three networked devices for every person on the globe. The ‘instrumenting of society’ that is taking place as these technologies are widely deployed is producing data streams of unprecedented granularity, coverage, and timeliness.

The tsunami of data is increasingly impacting the commercial and academic spheres. A decade ago, it was news that Walmart was using predictive analytics to anticipate inventory needs in the face of upcoming severe weather events. Today, retail (inventory management), advertising (online recommendation engines), insurance (improved stratification of risk), finance (investment strategy, fraud detection), real estate, entertainment, and political campaigns routinely acquire, integrate, and analyze large amounts of societal data to improve their performance. Scientific research is also seeing the rise of big data technologies. Large federated databases are now an important asset in physics, astronomy, the earth sciences, and biology. The social sciences are beginning to grapple with the implications of this transformation. The traditional data paradigm of social science relies upon surveys and experiments, both qualitative and quantitative, as well as exploitation of administrative records created for non-research purposes. Well-designed surveys generate representative data from comparatively small samples, and the best administrative datasets provide high-quality data covering a total population of interest. The opportunity now presents to understand how these traditional tools can be complemented by large volumes of ‘organic’ data that are being generated as a natural part of a modern, technologically advanced society. Depending upon how sampling errors, coverage errors, and biases are accounted for, we believe the combination can yield new insights into human behavior and social norms.

Type: Chapter
Information: Privacy, Big Data, and the Public Good
Frameworks for Engagement
, pp. 137 - 152

DOI: https://doi.org/10.1017/CBO9781107590205.009 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2014

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book purchase

Temporarily unavailable

References

Hays, Constance L., “What Wal-Mart Knows About Customers’ Habits,” The New York Times, November 14, 2004

King, G., “Ensuring the Data-Rich Future of the Social Sciences,” Science 331, no. 6018 (2011): 719–721CrossRef Google Scholar PubMed

Groves, Robert M., “Three eras of survey research,” Public Opinion Quarterly 75, no. 5 (2011): 861–871CrossRef Google Scholar

Couper, Mick P., Singer, Eleanor, Conrad, Frederick G., and Groves, Robert M., “Experimental Studies of Disclosure Risk, Disclosure Harm, Topic Sensitivity, and Survey Participation,” Journal of Official Statistics 26, no. 2 (2010): 287–300Google Scholar PubMed

Making Open and Machine Readable the New Default for Government Information, 78 FR 28111, May 14, 2013

Manyika, James, Chui, Michael, Farrell, Diana, Van Kuiken, Steve, Groves, Peter, and Almasi Doshi, Elizabeth, Open Data: Unlocking Innovation and Performance with Liquid Information (McKinsey Global Institute, October 2013)Google Scholar

The World Factbook 2013–14 (Washington, DC: Central Intelligence Agency, 2013)

Batty, M., Axhausen, K. W., Giannotti, F., Pozdnoukhov, A., Bazzani, A., Wachowicz, M., Ouzounis, G., and Portugali, Y., “Smart Cities of the Future,” European Physical Journal – Special Topics 214 (2012): 481–518CrossRef Google Scholar

Bettencourt, Luís M. A., Lobo, José, Helbing, Dirk, Kühnert, Christian, and West, Geoffrey B., “Growth, Innovation, Scaling, and the Pace of Life in Cities,” PNAS 104, no. 17 (2007): 7301–7306CrossRef Google Scholar PubMed

Bettencourt, L., Lobo, J., and Strumsky, D., “Invention in the City: Increasing Returns to Patenting as a Scaling Function of Metropolitan Size,” Research Policy 36 (2007): 107–120CrossRef Google Scholar

Gonzalez, Marta C., Hidalgo, Cesar A., and Barabasi, Albert-Laszlo, “Understanding Individual Human Mobility Patterns,” Nature 453, no. 5 (2008): 779–782CrossRef Google Scholar PubMed

Wang, P., Hunter, T., Bayen, A. M., Schechtner, K., and Gonzalez, M. C., “Understanding Road Usage Patterns in Urban Areas,” Scientific Reports 2 (2012)CrossRef Google Scholar PubMed

Giannetsos, T., Dimitriou, T., and Prasad, N. R., “People-centric Sensing in Assistive Healthcare: Privacy Challenges and Directions,” Security and Communication Networks 4 (2011): 1295–1307CrossRef Google Scholar

Farabet, Clément, Couprie, Camille, Najman, Laurent, and LeCun, Yann, “Learning Hierarchical Features for Scene Labeling,” IEEE Transactions on Pattern Analysis and Machine Intelligence 35, no. 8 (2013): 1915–1929CrossRef Google Scholar PubMed

Briffault, Richard, “A Government for Our Time? Business Improvement Districts and Urban Governance,” Columbia Law Review 99, no. 2 (1999): 365–477CrossRef Google Scholar

Yang, Xiaojun, Urban Remote Sensing: Monitoring, Synthesis and Modeling in the Urban Environment (Hoboken, NJ: Wiley-Blackwell, 2011)CrossRef Google Scholar

Buckingham Shum, S. et al., “Towards a Global Participatory Platform,” European Physical Journal – Special Topics 214 (2012): 109–152CrossRef Google Scholar

Dunn, Erica H. et al., “Enhancing the Scientific Value of the Christmas Bird Count,” The Auk 122 (2005): 338–346CrossRef Google Scholar

Maisonneuve, Nicolas, Stevens, Matthias, and Ochab, Bartek, “Participatory Noise Pollution Monitoring using Mobile Phones,” Information Polity 15 (2010): 51–71Google Scholar

Butt, Nathalie, Slade, Eleanor, Thompson, Jill, Malhi, Yadvinder, and Riutta, Terhi, “Quantifying the Sampling Error in Tree Census Measurements by Volunteers and Its Effect on Carbon Stock Estimates,” Ecological Applications 23, no. 4 (2013): 936–943CrossRef Google Scholar PubMed

Kanhere, Salil S., “Participatory Sensing: Crowdsourcing Data from Mobile Smartphones in Urban Spaces,” in Distributed Computing and Internet Technology, 19–26 (Berlin: Springer, 2013)CrossRef Google Scholar

Capps, C. and Wright, T., “Toward a Vision: Official Statistics and Big Data,” Amstat News, August 1, 2013

National Research Council, Frontiers in Massive Data Analysis (Washington, DC: The National Academies Press, 2013)Google Scholar

Sweeney, Latanya, “K-anonymity: A Model for Protecting Privacy,” International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 10, no. 5 (2002): 557–570CrossRef Google Scholar

Zhang, Xuyun, Liu, Chang, Nepal, Surya, Pandey, Suraj, and Chen, Jinjun, “A Privacy Leakage Upper-Bound Constraint Based Approach for Cost-Effective Privacy Preserving of Intermediate Datasets in Cloud,” IEEE Transactions on Parallel and Distributed Systems 24, no. 6 (2013): 1192–1202CrossRef Google Scholar

Ferreira, N., Poco, J., Vo, H. T., Freire, J., and Silva, C. T., “Visual Exploration of Big Spatio-Temporal Urban Data: A Study of New York City Taxi Trips,” IEEE Transactions on Visualization and Computer Graphics 19, no. 12 (2013): 2149–2158CrossRef Google Scholar PubMed

Dasgupta, Aritra and Kosara, Robert, “Privacy-Preserving Data Visualization Using Parallel Coordinates,” in Proc. Visualization and Data Analysis (VDA), 78680O-1–78680O-12 (International Society for Optics and Photonics, 2011)

Chui, Michael, Farrell, Diana, and Van Ku, Steve, “Generating Economic Value through Open Data,” in Beyond Transparency: Open Data and the Future of Civic Innovation, ed. Goldstein, Brett and Dyson, Lauren (San Francisco, CA: Code for America Press, 2013), 169Google Scholar

Kamal Dankar, Fida, El Emam, Khaled, Neisa, Angelica, and Roffey, Tyson, “Estimating the Re-identification Risk of Clinical Data Sets,” BMC Medical Informatics & Decision Making 12, no. 1 (2012): 66–80CrossRef Google Scholar

Climate Change 2007: Synthesis Report. Contribution of Working Groups I, II and III to the Fourth Assessment Report of the Intergovernmental Panel on Climate Change, ed. Core Writing Team, Pachauri, R. K., and Reisinger, A. (Geneva: IPCC, 2007)

Dawes, S. S. and Helbig, N., “Information Strategies for Open Government: Challenges and Prospects for Deriving Public Value from Government Transparency,” in Electronic Government, ed. Wimmer, M. A. et al., Lecture Notes in Computer Science 6228 (Berlin: Springer, 2010), 50–60CrossRef Google Scholar

World Population Prospects: The 2010 Revision, Volume I: Comprehensive Tables, ST/ESA/SER.A/313 (United Nations, Department of Economics and Social Affairs, Population Division, 2011)

Jones, D. R., “Protecting the Treasure: An Assessment of State Court Rules and Policies for Access to Online Civil Court Records,” Drake Law Review 61 (2013): 375Google Scholar

Porter, Theodore M., Trust in Numbers: The Pursuit of Objectivity in Science and Public Life (Princeton, NJ: Princeton University Press, 1996)CrossRef Google Scholar

Flood, Joe, The Fires: How a Computer Formula, Big Ideas, and The Best of Intentions Burned Down New York City—and Determined the Future of Cities (New York: Riverhead Books, 2010)Google Scholar