Hostname: page-component-78c5997874-s2hrs Total loading time: 0 Render date: 2024-11-05T06:09:05.807Z Has data issue: false hasContentIssue false

A practical parameterised algorithm for the individual haplotyping problem MLF

Published online by Cambridge University Press:  27 October 2010

MINZHU XIE
Affiliation:
School of Information Science and Engineering, Central South University, Changsha 410083, P.R.China and College of Physics and Information Science, Hunan Normal University, Changsha 410081, P.R.China Email: [email protected]
JIANXIN WANG
Affiliation:
School of Information Science and Engineering, Central South University, Changsha 410083, P.R.China Email: [email protected]
JIANER CHEN
Affiliation:
School of Information Science and Engineering, Central South University, Changsha 410083, P.R.China and Department of Computer Science, Texas A&M University, College Station, TX 77843, U.S.A. Email: [email protected]

Abstract

Haplotypes are more useful in complex disease gene mapping than single-nucleotide polymorphisms (SNPs). However, haplotypes are difficult to obtain directly using biological experiments, which has prompted research into efficient computational methods for determining haplotypes. The individual haplotyping problem called Minimum Letter Flip (MLF) is a computational problem that, given a set of aligned DNA sequence fragment data of an individual, induces the corresponding haplotypes by flipping minimum SNPs. There has been no practical exact algorithm for solving the problem. Due to technical limits in DNA sequencing experiments, the maximum length of a fragment sequenced directly is about 1kb. In consequence, with a genome-average SNP density of 1.84 SNPs per 1 kb of DNA sequence, the maximum number k1 of SNP sites that a fragment covers is usually small. Moreover, in order to save time and money, the maximum number k2 of fragments that cover an SNP site is usually no more than 19. Building on these fragment data properties, the current paper introduces a new parameterised algorithm with running time O(nk22k2 + mlogm + mk1), where m is the number of fragments and n is the number of SNP sites. In practical biological applications, the algorithm solves the MLF problem efficiently even if m and n are large.

Type
Paper
Copyright
Copyright © Cambridge University Press 2010

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Bonizzoni, P., Vedova, G. D., Dondi, R. and Li, J. (2003) The haplotyping problem: an overview of computational models and solutions. J. Comp. Sci. Technol. 18 (6)675688.Google Scholar
Chen, C. T. L., Wang, J. C. and Cohen, B. A. (2007) The strength of selection on ultraconserved elements in the human genome. The American Journal of Human Genetics 80 (4)692704.CrossRefGoogle ScholarPubMed
Chen, Z., Fu, B., Schweller, R., Yang, B., Zhao, Z. and Zhu, B. (2008) Linear time probabilistic algorithms for the singular haplotype reconstruction problem from SNP fragments. J. Comput. Biol. 15 (5)535546.CrossRefGoogle ScholarPubMed
Gabriel, S. B. et al. (2002) The structure of haplotype blocks in the human genome. Science 296 (5576) 22252229.CrossRefGoogle ScholarPubMed
Greenberg, H. J., Hart, W. E. and Lancia, G. (2004) Opportunities for combinatorial optimization in computational biology. INFORMS J. Comput. 16 (3)211231.Google Scholar
Hinds, D. A. et al. (2005) Whole-genome patterns of common DNA variation in three human populations. Science 307 (5712) 10721079.CrossRefGoogle ScholarPubMed
Hüffner, F. (2005) Algorithm engineering for optimal graph bipartization. In: Proceedings of the 4th International Workshop on Efficient and Experimental Algorithms, WEA 2005. Springer-Verlag Lecture Notes in Computer Science 3503 240252.CrossRefGoogle Scholar
Huson, D. H., Halpern, A. L., Lai, Z., Myers, E. W., Reinert, K. and Sutton, G. G (2001) Comparing assemblies using fragments and mate-pairs. In: Proceedings of the 1st Workshop on Algorithms in BioInformatics, WABI 2001. Springer-Verlag Lecture Notes in Computer Science 2149 294306.CrossRefGoogle Scholar
IHC (The International HapMap Consortium) (2003) The international hapmap project. Nature 426 (6968)789796.CrossRefGoogle Scholar
IHC (The International HapMap Consortium) (2005) A haplotype map of the human genome. Nature 437 (7063) 12991320.CrossRefGoogle Scholar
IHGS (the International Human Genome Sequencing Consortium) (2001) Initial sequencing and analysis of the human genome. Nature 409 (6822)860921.CrossRefGoogle Scholar
Lancia, G., Bafna, V., Istrail, S., Lippert, R. and Schwartz, R. (2001) SNPs problems, complexity and algorithms. In: Proceedings of the 9th Annual European Symposium on Algorithms, ESA 2001. Springer-Verlag Lecture Notes in Computer Science 2161 182193.CrossRefGoogle Scholar
Levy, S. et al. (2007) The diploid genome sequence of an individual human. PLoS Biology 5 (10)e254.CrossRefGoogle ScholarPubMed
Lippert, R., Schwartz, R., Lancia, G. and Istrail, S. (2002) Algorithmic strategies for the single nucleotide polymorphism haplotype assembly problem. Brief. Bioinform 3 (1)19.CrossRefGoogle ScholarPubMed
Myers, G. (1999) A dataset generator for whole genome shotgun sequencing. In: Proceedings of the 7th International Conference on Intelligent Systems for Molecular Biology, ISMB, AAAI Press 202210.Google Scholar
Panconesi, A. and Sozio, M. (2004) Fast hare: a fast heuristic for single individual SNP haplotype reconstruction. In: Proceedings of the 4th Workshop on Algorithms in BioInformatics, WABI 2004. Springer-Verlag Lecture Notes in Computer Science 3240 266277.Google Scholar
Tachmazidou, I., Verzilli, C. J. and Iorio, M. D. (2007) Genetic association mapping via evolution-based clustering of haplotypes. PLoS Genet. 3 (7)e111.Google Scholar
Venter, J. C. et al. (2001) The sequence of the human genome. Science 291 (5507)13041351.CrossRefGoogle ScholarPubMed
Wang, R., Wu, L., Li, Z. and Zhang, X. (2005) Haplotype reconstruction from SNP fragments by minimum error correction. Bioinformatics 21 (10)24562462.Google Scholar
Wernicke, S. (2003) On the algorithmic tractability of single nucleotide polymorphism (SNP) analysis and related problems, Ph.D. Thesis, University Tübingen.Google Scholar
Xie, M., Chen, J. and Wang, J. (2007) Research on parameterized algorithms of the individual haplotyping problem. Journal of Bioinformatics and Computational Biology 5 (3)795816.Google Scholar
Xie, M. and Wang, J. (2008) An improved (and practical) parameterized algorithm for the individual haplotyping problem MFR with mate-pairs. Algorithmica 52 (2)250266.CrossRefGoogle Scholar
Zhang, X., Wang, R., Wu, L. and Chen, L. (2006) Models and algorithms for haplotyping problem. Current Bioinformatics 1 (1)105114.CrossRefGoogle Scholar
Zhao, Y., Wu, L., Zhang, J., Wang, R. and Zhang, X. (2005) Haplotype assembly from aligned weighted SNP fragments. Computational Biology and Chemistry 29 (4)281287.CrossRefGoogle ScholarPubMed