Coding Sequence Prediction with Dicodons

Rex A. Dwyer

doi:10.1017/CBO9781139164764.015

14 - Coding Sequence Prediction with Dicodons

Published online by Cambridge University Press: 05 June 2012

Rex A. Dwyer

Show author details

Rex A. Dwyer: Affiliation:
The BioAlgorithmic Consultancy

Book contents

Get access

Summary

Once a new segment of DNA is sequenced and assembled, researchers are usually most interested in knowing what proteins, if any, it encodes. We have learned that much DNA does not encode proteins: some encodes catalytic RNAs, some regulates the rate of production of proteins by varying the ease with which transcriptases or ribosomes bind to coding sequences, and much has no known function. If study of proteins is the goal, how can their sequences be extracted from the DNA? This question is the main focus of gene finding or gene prediction.

One approach is to look for open reading frames (ORFs). An open reading frame is simply a sequence of codons beginning with ATG for methionine and ending with one of the stop codons TAA, TGA, or TAG. To gain confidence that an ORF really encodes a gene, we can translate it and search for homologous proteins in a protein database. However, there are several difficulties with this method.

It is ineffective in eukaryotic DNA, in which coding sequences for a single gene are interrupted by introns.
It is ineffective when the coding sequence extends beyond either end of the available sequence.
Random DNA contains many short ORFs that don't code for proteins. This is because one of every 64 random codons codes for M and three of every 64 are stop codons.
The proteins it detects will probably not be that interesting since they will be very similar to proteins with known functions.

Type: Chapter
Information: Genomic Perl
From Bioinformatics Basics to Working Code
, pp. 231 - 244

DOI: https://doi.org/10.1017/CBO9781139164764.015 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2002

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book contents

14 - Coding Sequence Prediction with Dicodons

Summary

Access options

Book purchase

Temporarily unavailable

Book contents

14 - Coding Sequence Prediction with Dicodons

Summary

Access options

Book purchase

Temporarily unavailable

Save book to Kindle

Save book to Dropbox

Save book to Google Drive