Book contents
- Frontmatter
- Contents
- List of Contributors
- 1 Data-Intensive Computing: A Challenge for the 21st Century
- 2 Anatomy of Data-Intensive Computing Applications
- 3 Hardware Architectures for Data-Intensive Computing Problems: A Case Study for String Matching
- 4 Data Management Architectures
- 5 Large-Scale Data Management Techniques in Cloud Computing Platforms
- 6 Dimension Reduction for Streaming Data
- 7 Binary Classification with Support Vector Machines
- 8 Beyond MapReduce: New Requirements for Scalable Data Processing
- 9 Let the Data Do the Talking: Hypothesis Discovery from Large-Scale Data Sets in Real Time
- 10 Data-Intensive Visual Analysis for Cyber-Security
- Index
- References
5 - Large-Scale Data Management Techniques in Cloud Computing Platforms
Published online by Cambridge University Press: 05 December 2012
- Frontmatter
- Contents
- List of Contributors
- 1 Data-Intensive Computing: A Challenge for the 21st Century
- 2 Anatomy of Data-Intensive Computing Applications
- 3 Hardware Architectures for Data-Intensive Computing Problems: A Case Study for String Matching
- 4 Data Management Architectures
- 5 Large-Scale Data Management Techniques in Cloud Computing Platforms
- 6 Dimension Reduction for Streaming Data
- 7 Binary Classification with Support Vector Machines
- 8 Beyond MapReduce: New Requirements for Scalable Data Processing
- 9 Let the Data Do the Talking: Hypothesis Discovery from Large-Scale Data Sets in Real Time
- 10 Data-Intensive Visual Analysis for Cyber-Security
- Index
- References
Summary
Introduction
In the last two decades, the continuous increase of computational power has produced an overwhelming flow of data, which called for a paradigm shift in the computing architecture and large scale data processing mechanisms. In a speech given just a few weeks before he was lost at sea off the California coast in January 2007, Jim Gray, a database software pioneer and a Microsoft researcher, called the shift a “fourth paradigm” [32]. The first three paradigms were experimental, theoretical and, more recently, computational science. Gray argued that the only way to cope with this paradigm is to develop a new generation of computing tools to manage, visualize, and analyze the data flood. In general, the current computer architectures are increasingly imbalanced where the latency gap between multicore CPUs and mechanical hard disks is growing every year, which makes the challenges of data-intensive computing harder to overcome [6]. Therefore, there is a crucial need for a systematic and generic approach to tackle these problems with an architecture that can also scale into the foreseeable future. In response, Gray argued that the new trend should instead focus on supporting cheaper clusters of computers to manage and process all this data instead of focusing on having the biggest and fastest single computer. Figure 5.1 illustrates an example of the explosion in scientific data, which creates major challenges for cutting-edge scientific projects. For example, modern high-energy physics experiments, such as DZero, typically generate more than one terabyte of data per day.
- Type
- Chapter
- Information
- Data-Intensive ComputingArchitectures, Algorithms, and Applications, pp. 85 - 123Publisher: Cambridge University PressPrint publication year: 2012