Skip to main content Accessibility help
×
Hostname: page-component-cd9895bd7-q99xh Total loading time: 0 Render date: 2024-12-24T18:03:27.003Z Has data issue: false hasContentIssue false

6 - Dimension Reduction for Streaming Data

Published online by Cambridge University Press:  05 December 2012

Chandrika Kamath
Affiliation:
Lawrence Livermore National Laboratory
Ian Gorton
Affiliation:
Pacific Northwest National Laboratory, Washington
Deborah K. Gracio
Affiliation:
Pacific Northwest National Laboratory, Washington
Get access

Summary

Introduction

With sensors becoming ubiquitous, there is an increasing interest in mining the data from these sensors as the data are being collected. This analysis of streaming data, or data streams, is presenting new challenges to analysis algorithms. The size of the data can be massive, especially when the sensors number in the thousands and the data are sampled at a high frequency. The data can be non-stationary, with statistics that vary over time. Real-time analysis is often required, either to avoid untoward incidents or to understand an interesting phenomenon better. These factors make the analysis of streaming data, whether from sensors or other sources, very data- and compute-intensive. One possible approach to making this analysis tractable is to identify the important data streams to focus on them. This chapter describes the different ways in which this can be done, given that what makes a stream important varies from problem to problem and can often change with time in a single problem. The following illustrate these techniques by applying them to data from a real problem and discuss the challenges faced in this emerging field of streaming data analysis.

This chapter is organized as follows: first, I define what is meant by streaming data and use examples from practical problems to discuss the challenges in the analysis of these data. Next, I describe the two main approaches used to handle the streaming nature of the data – the sliding window approach and the forgetting factor approach.

Type
Chapter
Information
Data-Intensive Computing
Architectures, Algorithms, and Applications
, pp. 124 - 156
Publisher: Cambridge University Press
Print publication year: 2012

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

1. Abed-Meraim, K., Chkeif, A., and Hua, Y.Fast Orthonormal PAST Algorithm.” IEEE Signal Processing Letters 7 (2000): 60–62.CrossRefGoogle Scholar
2. Achlioptas, D.Database-Friendly Random Projections: Johnson-Lindenstrauss with Binary Coins.” Journal of Computer and System Sciences 66 (2003): 671–87.CrossRefGoogle Scholar
3. Aggarwal, C. C., Ed. Data Streams: Models and Algorithms. New York: Springer, 2007.CrossRef
4. Ali, I., Kim, D. N., and Jeong, T. T. “A New Subspace Tracking Algorithm Using Approximation of Gram-Schmidt Procedure.” In Proceedings, IEEE International Conference on Information and Multimedia Technology (2009).Google Scholar
5. Bertoni, A., and Valentini, G. “Random Projections for Assessing Gene Expression Cluster Stability.” In IEEE International Joint Conference on Neural Networks (2005), pp. 149–154.Google Scholar
6. Bingham, E., and Mannila, H. “Random Projection in Dimensionality Reduction: Applications to Image and Text Data.” In Proceedings, ACM International Conference on Knowledge Discovery and Data Mining (2001), pp. 245–250.Google Scholar
7. Brand, M. “Incremental Singular Value Decomposition of Uncertain Data with Missing Value.” In Proceedings, European Conference on Computer Vision (ECCV), Lecture Notes in Computer Science, Volume 2350 (2002), pp. 707–720.Google Scholar
8. Brand, M. “Fast Online SVD Revisions for Lightweight Recommender Systems.” In Proceedings, SIAM International Conference on Data Mining (2003), pp. 37–46.Google Scholar
9. Brand, M.Fast Low-Rank Modifications of the Thin Singular Value Decomposition.” Linear Algebra and its Applications 415 (2006), 20–30.CrossRefGoogle Scholar
10. Chlorine data set. http:// www.cs.cmu.edu/afs/cs/project/s pirit-1/www/. Accessed June 1, 2012.
11. Dasgupta, S. “Experiments with Random Projection.” In Proceedings, Sixteenth Conference on Uncertainty in Artificial Intelligence (2000), pp. 143–151.Google Scholar
12. DIII-D Website. https://fusion.gat.com/global/DIII-D. Accessed June 1, 2012.
13. Fern, X. Z., and Brodley, C. “Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach.” In Proceedings of 20th International Conference on Machine Learning (2003), pp. 186–193.Google Scholar
14. Gama, J.Knowledge Discovery from Data Strams. Boca Raton, FL: CRC Press, 2010.CrossRefGoogle Scholar
15. Gama, J., and Gaber, M. M., Eds. Learning from Data Strams. New York: Springer, 2007.CrossRef
16. Hall, M. A.Correlation-Based Feature Selection for Machine Learning. PhD thesis, Department of Computer Science, University of Waikato, New Zealand, 1998.Google Scholar
17. Jolliffe, I. T.Principal Components Analysis, second ed. New York: Springer, 2002.Google Scholar
18. Kamath, C.Scientific Data Mining: A Practical Pers pective. Philadephia: SIAM, 2009.CrossRefGoogle Scholar
19. LAPACK Website. http://www.netlib.org/lapack/. Accessed June 1, 2012.
20. Menon, A., Pham, G. V. A., Chawla, S., and Viglas, A. “An Incremental Data-Stream Sketch Using Sparse Random Projections.” In Proceedings, SIAM International Conference on Data Mining (2007), pp. 563–568.Google Scholar
21. Papadimitriou, S., Sun, J., and Faloutsos, C. “Streaming Pattern Discovery in Multiple Time Series.” In Proceedings, 31st VLDB Conference (2005).Google Scholar
22. Real, E. C., Tufts, D. W., and Cooley, J.Two Algorithms for Fast Approximate Subspace Tracking.” IEEE Transactions on Signal Processing 47, 7 (July 1999),19361945.CrossRefGoogle Scholar
23. Saad, Y.Numerical Methods for Large Eigenvalue Problems. Manchester, UK: Manchester University Press, 1992. Available online from http://www- users.cs.umn.edu/~saad/books.html.Google Scholar
24. Saad, Y.Numerical Methods for Large Eigenvalue Problems, Second ed. Philadelphia: SIAM, 2011. Avaialble online from http://www-users.cs.umn.edu/~saad/books.html.CrossRefGoogle Scholar
25. Sayed, A.Fundamentals of Adaptive Filtering. New York: John Wiley and Sons, 2003.Google Scholar
26. Stewart, G. W.On the Early History of the Singular Vaue Decomposition.” SIAM Review 35, no. 4 (1993): 551–66.CrossRefGoogle Scholar
27. Strobach, P.Low Rank Adaptive Filters. IEEE Trans. on Signal Processing 44, no. 12 (1996): 2932–47.CrossRefGoogle Scholar
28. Strobach, P.The Fast Recursive Row-Householder Subspace Tracking Algorithm. Signal Processing 89 (2009): 2514–28.CrossRefGoogle Scholar
29. Teixeira, P. H. S.Data Stream Anomaly Detection Through Principal Subspace Tracking. Master's thesis, Pontifícia Universidade Católica do Rio de Janeiro, 2009.Google Scholar
30. Tufts, D. W., Real, E. C., and Cooley, J.Fast Approximate Subspace Tracking (FAST).” In Proceedings, IEEE Int. Conf. Acoust., Speech, Signal Processing (1997), vol. 1.Google Scholar
31. Yang, B.Projection Approximation Subspace Tracking. IEEE Transactions on Signal Processing 43, no. 1 (January 1995): 95–107.Google Scholar

Save book to Kindle

To save this book to your Kindle, first ensure [email protected] is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×