Data in the cloud

Gautam Shroff

doi:10.1017/CBO9780511778476.015

Since the 80s relational database technology has been the ‘default’ data storage and retrieval mechanism used in the vast majority of enterprise applications. The origins of relational databases, beginning with System R and Ingres in the 70s, focused on introducing this new paradigm as a general purpose replacement for hierarchical and network databases, for the most common business computing tasks at the time, viz. transaction processing.

In the process of creating a planetary scale web search service, Google in particular has developed a massively parallel and fault tolerant distributed file system (GFS) along with a data organization (BigTable) and programming paradigm (MapReduce) that is markedly different from the traditional relational model. Such ‘cloud data strategies’ are particularly well suited for large-volume massively parallel text processing, as well as possibly other tasks, such as enterprise analytics. The public cloud computing offerings from Google (i.e. App Engine) as well as those from other vendors have made similar data models (Google's Datastore, Amazon's Simple DB) and programming paradigms (Hadoop on Amazon's EC2) available to users as part of their cloud platforms.

At the same time there have been new advances in building specialized database organizations optimized for analytical data processing, in particular column-oriented databases such as Vertica. It is instructive to note that the BigTable-based data organization underlying cloud databases exhibits some similarities to column-oriented databases.

Book contents

Chapter 10 - Data in the cloud

Summary

Access options

Book contents

Chapter 10 - Data in the cloud

Summary

Access options

Save book to Kindle

Save book to Dropbox

Save book to Google Drive