Book contents
- Frontmatter
- Contents
- List of Contributors
- 1 Data-Intensive Computing: A Challenge for the 21st Century
- 2 Anatomy of Data-Intensive Computing Applications
- 3 Hardware Architectures for Data-Intensive Computing Problems: A Case Study for String Matching
- 4 Data Management Architectures
- 5 Large-Scale Data Management Techniques in Cloud Computing Platforms
- 6 Dimension Reduction for Streaming Data
- 7 Binary Classification with Support Vector Machines
- 8 Beyond MapReduce: New Requirements for Scalable Data Processing
- 9 Let the Data Do the Talking: Hypothesis Discovery from Large-Scale Data Sets in Real Time
- 10 Data-Intensive Visual Analysis for Cyber-Security
- Index
- References
3 - Hardware Architectures for Data-Intensive Computing Problems: A Case Study for String Matching
Published online by Cambridge University Press: 05 December 2012
- Frontmatter
- Contents
- List of Contributors
- 1 Data-Intensive Computing: A Challenge for the 21st Century
- 2 Anatomy of Data-Intensive Computing Applications
- 3 Hardware Architectures for Data-Intensive Computing Problems: A Case Study for String Matching
- 4 Data Management Architectures
- 5 Large-Scale Data Management Techniques in Cloud Computing Platforms
- 6 Dimension Reduction for Streaming Data
- 7 Binary Classification with Support Vector Machines
- 8 Beyond MapReduce: New Requirements for Scalable Data Processing
- 9 Let the Data Do the Talking: Hypothesis Discovery from Large-Scale Data Sets in Real Time
- 10 Data-Intensive Visual Analysis for Cyber-Security
- Index
- References
Summary
Introduction
Data-intensive applications have special characteristics that in many cases prevent them from executing well on traditional cache-based processors. They can have highly irregular access patterns with very little locality that do not match the expectations of automatically controlled caches. In other cases, such as when they process data in streaming, they do not have temporal locality at all and only limited spatial locality, therefore reducing the effectiveness of caches.
We present an application-driven study of several architectures that are suitable for data-intensive algorithms. Our chosen application is high-speed string matching, which exhibits two key properties of data-intensive codes: highly irregular access patterns and high-speed streaming data. Irregular access patterns appear in string matching when traversing graph-based representations of the pattern dictionaries being used. String matching is typically used in cybersecurity applications to scan incoming network traffic or files for the presence of signatures (such as specific sequences of symbols), which may relate to attack patterns, viruses, or other malware.
String Matching
String matching algorithms check and detect the presence of one or more known symbol sequences inside the analyzed data sets. Besides their wellknown application to databases and text processing, they are the basis of several other critical, real-world applications. String matching algorithms are key components of DNA and protein sequencing, data mining, security systems, such as Intrusion Detection Systems (IDS) for Networks (NIDS), Applications (APIDS), Protocols (PIDS), or Systems (Host based IDS [HIDS]), anti-virus software, and machine learning problems.
- Type
- Chapter
- Information
- Data-Intensive ComputingArchitectures, Algorithms, and Applications, pp. 24 - 47Publisher: Cambridge University PressPrint publication year: 2012