Hardware Architectures for Data-Intensive Computing Problems: A Case Study for String Matching

doi:10.1017/CBO9780511844409.003

3 - Hardware Architectures for Data-Intensive Computing Problems: A Case Study for String Matching

Published online by Cambridge University Press: 05 December 2012

Antonino Tumeo ,

Oreste Villa and

Daniel Chavarría-Miranda

Edited by

Ian Gorton and

Deborah K. Gracio

Show author details

Antonino Tumeo: Affiliation:
Pacific Northwest National Laboratory
Oreste Villa: Affiliation:
Pacific Northwest National Laboratory
Daniel Chavarría-Miranda: Affiliation:
Pacific Northwest National Laboratory
Ian Gorton: Affiliation:
Pacific Northwest National Laboratory, Washington
Deborah K. Gracio: Affiliation:
Pacific Northwest National Laboratory, Washington

Book contents

Get access

Summary

Introduction

Data-intensive applications have special characteristics that in many cases prevent them from executing well on traditional cache-based processors. They can have highly irregular access patterns with very little locality that do not match the expectations of automatically controlled caches. In other cases, such as when they process data in streaming, they do not have temporal locality at all and only limited spatial locality, therefore reducing the effectiveness of caches.

We present an application-driven study of several architectures that are suitable for data-intensive algorithms. Our chosen application is high-speed string matching, which exhibits two key properties of data-intensive codes: highly irregular access patterns and high-speed streaming data. Irregular access patterns appear in string matching when traversing graph-based representations of the pattern dictionaries being used. String matching is typically used in cybersecurity applications to scan incoming network traffic or files for the presence of signatures (such as specific sequences of symbols), which may relate to attack patterns, viruses, or other malware.

String Matching

String matching algorithms check and detect the presence of one or more known symbol sequences inside the analyzed data sets. Besides their wellknown application to databases and text processing, they are the basis of several other critical, real-world applications. String matching algorithms are key components of DNA and protein sequencing, data mining, security systems, such as Intrusion Detection Systems (IDS) for Networks (NIDS), Applications (APIDS), Protocols (PIDS), or Systems (Host based IDS [HIDS]), anti-virus software, and machine learning problems.

Type: Chapter
Information: Data-Intensive Computing
Architectures, Algorithms, and Applications
, pp. 24 - 47

DOI: https://doi.org/10.1017/CBO9780511844409.003 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2012

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book purchase

Temporarily unavailable

References

1. Aho, A. V., and Corasick, M. J. “Efficient String Matching: An Aid to Bibliographic Search.” Communications of the ACM 18, 6(1975): 333–40.CrossRef Google Scholar

2. Chavarría-Miranda, D., Marquez, A., Nieplocha, J., Maschhoff, K., and Scherrer, C.Early Experience with Out-of-Core Applications on the Cray XMT. In IPDPS '08: 22nd IEEE International Parallel and Distributed Processing Symposium (April 2008), pp. 1–8.Google Scholar

3. Cho, Y. H., and Mangione-Smith, W. H. “Deep Packet Filter with Dedicated Logic and Read Only Memories.” In FCCM '04: 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (April 2004), pp. 125–34.CrossRef Google Scholar

4. Clark, C. R., and Schimmel, D. E. “Scalable Pattern Matching for High Speed Networks.” In FCCM '04: 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (Apr. 2004), pp. 249–57.CrossRef Google Scholar

5. Feo, J., Harper, D., Kahan, S., and Konecny, P. Eldorado. In CF '05: Proceedings of the 2nd conference on Computing frontiers (New York, NY, USA, 2005), ACM, pp. 28–34.Google Scholar

6. Jacob, N., and Brodley, C. “Offloading IDS Computation to the GPU.” In ACSAC '06: 22nd Annual Computer Security Applications Conference (Dec. 2006), pp. 371–80.Google Scholar

7. Message Passing Interface Forum. MPI: A Message-Passing Interface Standard. Version 2.2, September 2009.

8. Mitra, A., Najjar, W., and Bhuyan, L. “Compiling PCRE to FPGA for accelerating SNORT IDS.” In ANCS '07: The 3rd ACM/IEEE Symposium on Architecture for Networking and Communications Systems (2007), pp. 127–36.CrossRef Google Scholar

9. Nawathe, U., Hassan, M., Yen, K., Kumar, A., Ramachandran, A., and Greenhill, D. “Implementation of an 8-Core, 64-Thread, Power-Efficient SPARC Server on a Chip.” Solid-State Circuits, IEEE Journal of 43, 1 (Jan. 2008): 6–20.Google Scholar

10. Nvidia Nvidia Cuda: Compute Unified Device Architecture. Programming guide. Version 2.0, July 2008.

11. Pasetto, D., Petrini, F., and Agarwal, V.Tools for Very Fast Regular Expression Matching. Computer 43 (2010): 50–58.CrossRef Google Scholar

12. Roesch, M.Snort: Lightweight Intrusion Detection for Networks. In LISA (1999), pp. 229–38.Google Scholar

13. Ruetsch, G., and Micikevicius, P. “NVIDIA Whitepaper: Optimizing Matrix Transpose in CUDA.”

14. Scarpazza, D. P., Villa, O., and Petrini, F. “Exact Multi-Pattern String Matching on the Cell/B.E. Processor.” In CF '08: Proceedings of the 2008 conference on Computing frontiers (New York, NY, USA, 2008), ACM, pp. 33–42.Google Scholar

15. Sourdis, I., and Pnevmatikatos, D.Fast, Large-Scale String Match for a 10Gbps FPGA-Based Network Intrusion. In FPL '03: 13th Conference on Field Programmable Logic and Applications (September 2003), pp. 880–89.Google Scholar

16. ,Symantec Global Internet Security Threat Report. White Paper (April 2008).Google Scholar

17. Tumeo, A., Villa, O., Chavarria-Miranda, D. “Aho-Corasick String Matching on Shared and Distributed-Memory Parallel Architectures,” IEEE Transactions on Parallel and Distributed Systems, pp. 436–43, March, 2012.Google Scholar

18. Vasiliadis, G., Antonatos, S., Polychronakis, M., Markatos, E. P., and Ioannidis, S. “Gnort: High Performance Network Intrusion Detection Using Graphics Processors.” In RAID '08: 11th international symposium on Recent Advances in Intrusion Detection (2008), pp. 116–34.CrossRef Google Scholar

19. Villa, O., Chavarria-Miranda, D., and Maschhoff, K. “Input-Independent, Scalable and Fast String Matching on the Cray XMT.” In IPDPS '09: The 2009 IEEE International Symposium on Parallel & Distributed Processing (2009), pp. 1–12.Google Scholar

20. Villa, O., Scarpazza, D. P., and Petrini, F. “Accelerating Real-Time String Searching with Multicore Processors.” Computer 41, 4 (2008): 42–50.CrossRef Google Scholar