Book contents
- Frontmatter
- Contents
- List of Contributors
- 1 Data-Intensive Computing: A Challenge for the 21st Century
- 2 Anatomy of Data-Intensive Computing Applications
- 3 Hardware Architectures for Data-Intensive Computing Problems: A Case Study for String Matching
- 4 Data Management Architectures
- 5 Large-Scale Data Management Techniques in Cloud Computing Platforms
- 6 Dimension Reduction for Streaming Data
- 7 Binary Classification with Support Vector Machines
- 8 Beyond MapReduce: New Requirements for Scalable Data Processing
- 9 Let the Data Do the Talking: Hypothesis Discovery from Large-Scale Data Sets in Real Time
- 10 Data-Intensive Visual Analysis for Cyber-Security
- Index
- References
6 - Dimension Reduction for Streaming Data
Published online by Cambridge University Press: 05 December 2012
- Frontmatter
- Contents
- List of Contributors
- 1 Data-Intensive Computing: A Challenge for the 21st Century
- 2 Anatomy of Data-Intensive Computing Applications
- 3 Hardware Architectures for Data-Intensive Computing Problems: A Case Study for String Matching
- 4 Data Management Architectures
- 5 Large-Scale Data Management Techniques in Cloud Computing Platforms
- 6 Dimension Reduction for Streaming Data
- 7 Binary Classification with Support Vector Machines
- 8 Beyond MapReduce: New Requirements for Scalable Data Processing
- 9 Let the Data Do the Talking: Hypothesis Discovery from Large-Scale Data Sets in Real Time
- 10 Data-Intensive Visual Analysis for Cyber-Security
- Index
- References
Summary
Introduction
With sensors becoming ubiquitous, there is an increasing interest in mining the data from these sensors as the data are being collected. This analysis of streaming data, or data streams, is presenting new challenges to analysis algorithms. The size of the data can be massive, especially when the sensors number in the thousands and the data are sampled at a high frequency. The data can be non-stationary, with statistics that vary over time. Real-time analysis is often required, either to avoid untoward incidents or to understand an interesting phenomenon better. These factors make the analysis of streaming data, whether from sensors or other sources, very data- and compute-intensive. One possible approach to making this analysis tractable is to identify the important data streams to focus on them. This chapter describes the different ways in which this can be done, given that what makes a stream important varies from problem to problem and can often change with time in a single problem. The following illustrate these techniques by applying them to data from a real problem and discuss the challenges faced in this emerging field of streaming data analysis.
This chapter is organized as follows: first, I define what is meant by streaming data and use examples from practical problems to discuss the challenges in the analysis of these data. Next, I describe the two main approaches used to handle the streaming nature of the data – the sliding window approach and the forgetting factor approach.
- Type
- Chapter
- Information
- Data-Intensive ComputingArchitectures, Algorithms, and Applications, pp. 124 - 156Publisher: Cambridge University PressPrint publication year: 2012