Dimension Reduction for Streaming Data

doi:10.1017/CBO9780511844409.006

6 - Dimension Reduction for Streaming Data

Published online by Cambridge University Press: 05 December 2012

Chandrika Kamath

Edited by

Ian Gorton and

Deborah K. Gracio

Show author details

Chandrika Kamath: Affiliation:
Lawrence Livermore National Laboratory
Ian Gorton: Affiliation:
Pacific Northwest National Laboratory, Washington
Deborah K. Gracio: Affiliation:
Pacific Northwest National Laboratory, Washington

Book contents

Get access

Summary

Introduction

With sensors becoming ubiquitous, there is an increasing interest in mining the data from these sensors as the data are being collected. This analysis of streaming data, or data streams, is presenting new challenges to analysis algorithms. The size of the data can be massive, especially when the sensors number in the thousands and the data are sampled at a high frequency. The data can be non-stationary, with statistics that vary over time. Real-time analysis is often required, either to avoid untoward incidents or to understand an interesting phenomenon better. These factors make the analysis of streaming data, whether from sensors or other sources, very data- and compute-intensive. One possible approach to making this analysis tractable is to identify the important data streams to focus on them. This chapter describes the different ways in which this can be done, given that what makes a stream important varies from problem to problem and can often change with time in a single problem. The following illustrate these techniques by applying them to data from a real problem and discuss the challenges faced in this emerging field of streaming data analysis.

This chapter is organized as follows: first, I define what is meant by streaming data and use examples from practical problems to discuss the challenges in the analysis of these data. Next, I describe the two main approaches used to handle the streaming nature of the data – the sliding window approach and the forgetting factor approach.

Type: Chapter
Information: Data-Intensive Computing
Architectures, Algorithms, and Applications
, pp. 124 - 156

DOI: https://doi.org/10.1017/CBO9780511844409.006 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2012

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book purchase

Temporarily unavailable

References

1. Abed-Meraim, K., Chkeif, A., and Hua, Y. “Fast Orthonormal PAST Algorithm.” IEEE Signal Processing Letters 7 (2000): 60–62.CrossRef Google Scholar

2. Achlioptas, D. “Database-Friendly Random Projections: Johnson-Lindenstrauss with Binary Coins.” Journal of Computer and System Sciences 66 (2003): 671–87.CrossRef Google Scholar

3. Aggarwal, C. C., Ed. Data Streams: Models and Algorithms. New York: Springer, 2007.CrossRef

4. Ali, I., Kim, D. N., and Jeong, T. T. “A New Subspace Tracking Algorithm Using Approximation of Gram-Schmidt Procedure.” In Proceedings, IEEE International Conference on Information and Multimedia Technology (2009).Google Scholar

5. Bertoni, A., and Valentini, G. “Random Projections for Assessing Gene Expression Cluster Stability.” In IEEE International Joint Conference on Neural Networks (2005), pp. 149–154.Google Scholar

6. Bingham, E., and Mannila, H. “Random Projection in Dimensionality Reduction: Applications to Image and Text Data.” In Proceedings, ACM International Conference on Knowledge Discovery and Data Mining (2001), pp. 245–250.Google Scholar

7. Brand, M. “Incremental Singular Value Decomposition of Uncertain Data with Missing Value.” In Proceedings, European Conference on Computer Vision (ECCV), Lecture Notes in Computer Science, Volume 2350 (2002), pp. 707–720.Google Scholar

8. Brand, M. “Fast Online SVD Revisions for Lightweight Recommender Systems.” In Proceedings, SIAM International Conference on Data Mining (2003), pp. 37–46.Google Scholar

9. Brand, M. “Fast Low-Rank Modifications of the Thin Singular Value Decomposition.” Linear Algebra and its Applications 415 (2006), 20–30.CrossRef Google Scholar

10. Chlorine data set. http:// www.cs.cmu.edu/afs/cs/project/s pirit-1/www/. Accessed June 1, 2012.

11. Dasgupta, S. “Experiments with Random Projection.” In Proceedings, Sixteenth Conference on Uncertainty in Artificial Intelligence (2000), pp. 143–151.Google Scholar

12. DIII-D Website. https://fusion.gat.com/global/DIII-D. Accessed June 1, 2012.

13. Fern, X. Z., and Brodley, C. “Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach.” In Proceedings of 20th International Conference on Machine Learning (2003), pp. 186–193.Google Scholar

14. Gama, J.Knowledge Discovery from Data Strams. Boca Raton, FL: CRC Press, 2010.CrossRef Google Scholar

15. Gama, J., and Gaber, M. M., Eds. Learning from Data Strams. New York: Springer, 2007.CrossRef

16. Hall, M. A.Correlation-Based Feature Selection for Machine Learning. PhD thesis, Department of Computer Science, University of Waikato, New Zealand, 1998.Google Scholar

17. Jolliffe, I. T.Principal Components Analysis, second ed. New York: Springer, 2002.Google Scholar

18. Kamath, C.Scientific Data Mining: A Practical Pers pective. Philadephia: SIAM, 2009.CrossRef Google Scholar

19. LAPACK Website. http://www.netlib.org/lapack/. Accessed June 1, 2012.

20. Menon, A., Pham, G. V. A., Chawla, S., and Viglas, A. “An Incremental Data-Stream Sketch Using Sparse Random Projections.” In Proceedings, SIAM International Conference on Data Mining (2007), pp. 563–568.Google Scholar

21. Papadimitriou, S., Sun, J., and Faloutsos, C. “Streaming Pattern Discovery in Multiple Time Series.” In Proceedings, 31st VLDB Conference (2005).Google Scholar

22. Real, E. C., Tufts, D. W., and Cooley, J. “Two Algorithms for Fast Approximate Subspace Tracking.” IEEE Transactions on Signal Processing 47, 7 (July 1999),1936–1945.CrossRef Google Scholar

23. Saad, Y.Numerical Methods for Large Eigenvalue Problems. Manchester, UK: Manchester University Press, 1992. Available online from http://www- users.cs.umn.edu/~saad/books.html.Google Scholar

24. Saad, Y.Numerical Methods for Large Eigenvalue Problems, Second ed. Philadelphia: SIAM, 2011. Avaialble online from http://www-users.cs.umn.edu/~saad/books.html.CrossRef Google Scholar

25. Sayed, A.Fundamentals of Adaptive Filtering. New York: John Wiley and Sons, 2003.Google Scholar

26. Stewart, G. W. “On the Early History of the Singular Vaue Decomposition.” SIAM Review 35, no. 4 (1993): 551–66.CrossRef Google Scholar

27. Strobach, P.Low Rank Adaptive Filters. IEEE Trans. on Signal Processing 44, no. 12 (1996): 2932–47.CrossRef Google Scholar

28. Strobach, P.The Fast Recursive Row-Householder Subspace Tracking Algorithm. Signal Processing 89 (2009): 2514–28.CrossRef Google Scholar

29. Teixeira, P. H. S.Data Stream Anomaly Detection Through Principal Subspace Tracking. Master's thesis, Pontifícia Universidade Católica do Rio de Janeiro, 2009.Google Scholar

30. Tufts, D. W., Real, E. C., and Cooley, J. “Fast Approximate Subspace Tracking (FAST).” In Proceedings, IEEE Int. Conf. Acoust., Speech, Signal Processing (1997), vol. 1.Google Scholar

31. Yang, B.Projection Approximation Subspace Tracking. IEEE Transactions on Signal Processing 43, no. 1 (January 1995): 95–107.Google Scholar