Book contents
- Frontmatter
- Contents
- Preface
- Chapter 1 Algorithms on Words
- Chapter 2 Structures for Indexes
- Chapter 3 Symbolic Natural Language Processing
- Chapter 4 Statistical Natural Language Processing
- Chapter 5 Inference of Network Expressions
- Chapter 6 Statistics on Words with Applications to Biological Sequences
- Chapter 7 Analytic Approach to Pattern Matching
- Chapter 8 Periodic Structures in Words
- Chapter 9 Counting, Coding, and Sampling with Words
- Chapter 10 Words in Number Theory
- References
- General Index
Chapter 8 - Periodic Structures in Words
Published online by Cambridge University Press: 05 June 2013
- Frontmatter
- Contents
- Preface
- Chapter 1 Algorithms on Words
- Chapter 2 Structures for Indexes
- Chapter 3 Symbolic Natural Language Processing
- Chapter 4 Statistical Natural Language Processing
- Chapter 5 Inference of Network Expressions
- Chapter 6 Statistics on Words with Applications to Biological Sequences
- Chapter 7 Analytic Approach to Pattern Matching
- Chapter 8 Periodic Structures in Words
- Chapter 9 Counting, Coding, and Sampling with Words
- Chapter 10 Words in Number Theory
- References
- General Index
Summary
Introduction
Repetitions (periodicities) in words are important objects that play a fundamental role in combinatorial properties of words and their applications to string processing, such as compression or biological sequence analysis. Using properties of repetitions allows one to speed up pattern matching algorithms.
The problem of efficiently identifying repetitions in a given word is one of the classical pattern matching problems. Recently, searching for repetitions in strings received a new motivation, due to the biosequence analysis. In DNA sequences, successively repeated fragments often bear important biological information and their presence is characteristic for many genomic structures (such as telomer regions). From a practical view-point, satellites and alu-repeats are involved in chromosome analysis and genotyping, and thus are of major interest to genomic researchers. Thus, different biological studies based on the analysis of tandem repeats have been done, and even databases of tandem repeats in certain species have been compiled.
In this chapter, we present a general efficient approach to computing different periodic structures in words. It is based on two main algorithmic techniques – a special factorization of the word and so-called longest extension functions – described in Section 8.3. Different applications of this method are described in Sections 8.4, 8.5, 8.6, 8.7, and 8.8. These sections are preceded by Section 8.2 devoted to combinatorial enumerative properties of repetitions. Bounding the maximal number of repetitions is necessary for proving complexity bounds of corresponding search algorithms.
- Type
- Chapter
- Information
- Applied Combinatorics on Words , pp. 430 - 477Publisher: Cambridge University PressPrint publication year: 2005
- 1
- Cited by