Fold assignments for proteins from the Escherichia coli
genome are carried out using BASIC, a profile–profile
alignment algorithm, recently tested on fold recognition
benchmarks and on the Mycoplasma genitalium genome
and PSI BLAST, the newest generation of the de facto standard
in homology search algorithms. The fold assignments are
followed by automated modeling and the resulting three-dimensional
models are analyzed for possible function prediction.
Close to 30% of the proteins encoded in the E. coli
genome can be recognized as homologous to a protein family
with known structure. Most of these homologies (23% of
the entire genome) can be recognized both by PSI BLAST
and BASIC algorithms, but the latter recognizes an additional
260 homologies. Previous estimates suggested that only
10–15% of E. coli proteins can be characterized
this way. This dramatic increase in the number of recognized
homologies between E. coli proteins and structurally
characterized protein families is partly due to the rapid
increase of the database of known protein structures, but
mostly it is due to the significant improvement in prediction
algorithms.
Knowing protein structure adds a new dimension to our understanding
of its function and the predictions presented here can
be used to predict function for uncharacterized proteins.
Several examples, analyzed in more detail in this paper,
include the DPS protein protecting DNA from oxidative damage
(predicted to be homologous to ferritin with iron ion acting
as a reducing agent) and the ahpC/tsa family of proteins,
which provides resistance to various oxidating agents (predicted
to be homologous to glutathione peroxidase).