Data mining parasite genomes

M. BERRIMAN

doi:10.1017/S0031182004006857

Data mining parasite genomes

Published online by Cambridge University Press: 12 May 2005

M. BERRIMAN

Show author details

M. BERRIMAN: Affiliation:
Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, CB10 ISA, UK

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

The term ‘data mining’ can be used to describe any process where useful information is extracted from data with a large background of ‘noise’. In the context of a genome project, several stages involve data mining. Amongst the sequence data, ‘signals’ need to be detected that indicate the presence of interesting features. Often this involves differentiating between transcribed and non-transcribed bases to predict coding regions. After detection, defining the roles of these sequences involves sifting through multiple lines of evidence. If these roles are accurately reflected in genome annotation, they can be used by researchers to frame queries and interrogate the data further.

Keywords

Annotation genome gene ontology gene prediction

Type: Research Article
Information: Parasitology , Volume 128 , Issue S1 , October 2004 , pp. S23 - S31

DOI: https://doi.org/10.1017/S0031182004006857 [Opens in a new window]
Copyright: © 2004 Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

REFERENCES

ALTSCHUL, S. F., GISH, W., MILLER, W., MYERS, E. W. & LIPMAN, D. J. ( 1990). Basic local alignment search tool. Journal of Molecular Biology 215, 403–410.CrossRef Google Scholar

ASHBURNER, M., BALL, C. A., BLAKE, J. A., BOTSTEIN, D., BUTLER, H., CHERRY, J. M., DAVIS, A. P., DOLINSKI, K., DWIGHT, S. S., EPPIG, J. T., HARRIS, M. A., HILL, D. P., ISSEL-TARVER, L., KASARSKIS, A., LEWIS, S., MATESE, J. C., RICHARDSON, J. E., RINGWALD, M., RUBIN, G. M. & SHERLOCK, G. ( 2000). Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature Genetics 25, 25–29.CrossRef Google Scholar

BERRIMAN, M. & RUTHERFORD, K. ( 2003). Annotation and visualisation of sequences using Artemis. Brief Bioinformatics 4, 124–132.CrossRef Google Scholar

BUCHER, P. & BAIROCH, A. ( 1994). A generalized profile syntax for biomolecular sequence motifs and its function in automatic sequence interpretation. Proceedings of the International Conference on Intelligent Systems for Molecular Biology 2, 53–61.Google Scholar

CAWLEY, S. E., WIRTH, A. I. & SPEED, T. P. ( 2001). Phat – a gene finding program for Plasmodium falciparum. Molecular and Biochemical Parasitology 118, 167–174.CrossRef Google Scholar

EDDY, S. R. ( 1998). Profile hidden Markov models. Bioinformatics 14, 755–763.CrossRef Google Scholar

GARDNER, M. J., HALL, N., FUNG, E., WHITE, O., BERRIMAN, M., HYMAN, R. W., CARLTON, J. M., PAIN, A., NELSON, K. E., BOWMAN, S., PAULSEN, I. T., JAMES, K., EISEN, J. A., RUTHERFORD, K., SALZBERG, S. L., CRAIG, A., KYES, S., CHAN, M. S., NENE, V., SHALLOM, S. J., SUH, B., PETERSON, J., ANGIUOLI, S., PERTEA, M., ALLEN, J., SELENGUT, J., HAFT, D., MATHER, M. W., VAIDYA, A. B., MARTIN, D. M., FAIRLAMB, A. H., FRAUNHOLZ, M. J., ROOS, D. S., RALPH, S. A., McFADDEN, G. I., CUMMINGS, L. M., SUBRAMANIAN, G. M., MUNGALL, C., VENTER, J. C., CARUCCI, D. J., HOFFMAN, S. L., NEWBOLD, C., DAVIS, R. W., FRASER, C. M. & BARRELL, B. ( 2002). Genome sequence of the human malaria parasite Plasmodium falciparum. Nature 419, 498–511.CrossRef Google Scholar

HALL, N., PAIN, A., BERRIMAN, M., CHURCHER, C., HARRIS, B., HARRIS, D., MUNGALL, K., BOWMAN, S., ATKIN, R., BAKER, S., BARRON, A., BROOKS, K., BUCKEE, C. O., BURROWS, C., CHEREVACH, I., CHILLINGWORTH, C., CHILLINGWORTH, T., CHRISTODOULOU, Z., CLARK, L., CLARK, R., CORTON, C., CRONIN, A., DAVIES, R., DAVIS, P., DEAR, P., DEARDEN, F., DOGGETT, J., FELTWELL, T., GOBLE, A., GOODHEAD, I., GWILLIAM, R., HAMLIN, N., HANCE, Z., HARPER, D., HAUSER, H., HORNSBY, T., HOLROYD, S., HORROCKS, P., HUMPHRAY, S., JAGELS, K., JAMES, K. D., JOHNSON, D., KERHORNOU, A., KNIGHTS, A., KONFORTOV, B., KYES, S., LARKE, N., LAWSON, D., LENNARD, N., LINE, A., MADDISON, M., McLEAN, J., MOONEY, P., MOULE, S., MURPHY, L., OLIVER, K., ORMOND, D., PRICE, C., QUAIL, M. A., RABBINOWITSCH, E., RAJANDREAM, M. A., RUTTER, S., RUTHERFORD, K. M., SANDERS, M., SIMMONDS, M., SEEGER, K., SHARP, S., SMITH, R., SQUARES, R., SQUARES, S., STEVENS, K., TAYLOR, K., TIVEY, A., UNWIN, L., WHITEHEAD, S., WOODWARD, J., SULSTON, J. E., CRAIG, A., NEWBOLD, C. & BARRELL, B. G. ( 2002). Sequence of Plasmodium falciparum chromosomes 1, 3–9 and 13. Nature 419, 527–531.CrossRef Google Scholar

HARRIS, M. A., CLARK, J., IRELAND, A., LOMAX, J., ASHBURNER, M., FOULGER, R., EILBECK, K., LEWIS, S., MARSHALL, B., MUNGALL, C., RICHTER, J., RUBIN, G. M., BLAKE, J. A., BULT, C., DOLAN, M., DRABKIN, H., EPPIG, J. T., HILL, D. P., NI, L., RINGWALD, M., BALAKRISHNAN, R., CHERRY, J. M., CHRISTIE, K. R., COSTANZO, M. C., DWIGHT, S. S., ENGEL, S., FISK, D. G., HIRSCHMAN, J. E., HONG, E. L., NASH, R. S., SETHURAMAN, A., THEESFELD, C. L., BOTSTEIN, D., DOLINSKI, K., FEIERBACH, B., BERARDINI, T., MUNDODI, S., RHEE, S. Y., APWEILER, R., BARRELL, D., CAMON, E., DIMMER, E., LEE, V., CHISHOLM, R., GAUDET, P., KIBBE, W., KISHORE, R., SCHWARZ, E. M., STERNBERG, P., GWINN, M., HANNICK, L., WORTMAN, J., BERRIMAN, M., WOOD, V., DE LA CRUZ, N., TONELLATO, P., JAISWAL, P., SEIGFRIED, T. & WHITE, R. ( 2004). The Gene Ontology (GO) database and informatics resource. Nucleic Acids Research 32, D258–D261.Google Scholar

HERTZ-FOWLER, C. & PEACOCK, C. S. ( 2002). Introducing GeneDB: a generic database. Trends in Parasitology 18, 465–467.CrossRef Google Scholar

HERTZ-FOWLER, C., PEACOCK, C. S., WOOD, V., ASLETT, M., KERHORNOU, A., MOONEY, P., TIVEY, A., BERRIMAN, M., HALL, N., RUTHERFORD, K., PARKHILL, J., IVENS, A. C., RAJANDREAM, M. A. & BARRELL, B. ( 2004). GeneDB: a resource for prokaryotic and eukaryotic organisms. Nucleic Acids Research 32, D339–D343.CrossRef Google Scholar

KROGH, A. ( 1998). An Introduction to Hidden Markov Models for Biological Sequences. In Computational Methods in Molecular Biology (ed. S. L. Salzberg, D. B. Searls and S. Kasif), pp. 45–63. Elsevier Amsterdam.CrossRef

MOUNT, D. W. ( 2001). Bioinformatics: Sequence and Genome Analysis. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York.

PASSARGE, E., HORSTHEMKE, B. & FARBER, R. A. ( 1999). Incorrect use of the term synteny. Nature Genetics 23, 387.CrossRef Google Scholar

PEARSON, W. R. & LIPMAN, D. J. ( 1988). Improved tools for biological sequence comparison. Proceedings of the National Academy of Sciences, USA 85, 2444–2448.CrossRef Google Scholar

PETERSON, D. S., MILLER, L. H. & WELLEMS, T. E. ( 1995). Isolation of multiple sequences from the Plasmodium falciparum genome that encode conserved domains homologous to those in erythrocyte-binding proteins. Proceedings of the National Academy of Sciences, USA 92, 7100–7104.CrossRef Google Scholar

RUTHERFORD, K., PARKHILL, J., CROOK, J., HORSNELL, T., RICE, P., RAJANDREAM, M. A. & BARRELL, B. ( 2000). Artemis: sequence visualization and annotation. Bioinformatics 16, 944–945.CrossRef Google Scholar

SALZBERG, S. L., PERTEA, M., DELCHER, A. L., GARDNER, M. J. & TETTELIN, H. ( 1999). Interpolated Markov models for eukaryotic gene finding. Genomics 59, 24–31.CrossRef Google Scholar

THOMPSON, J. D., HIGGINS, D. G. & GIBSON, T. J. ( 1994). CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Research 22, 4673–4680.CrossRef Google Scholar

Article contents

Data mining parasite genomes

Abstract

Keywords

Access options

Article purchase

Temporarily unavailable

References

REFERENCES

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests