Comparative protein structure prediction is limited
mostly by the errors in alignment and loop modeling. We
describe here a new automated modeling technique that significantly
improves the accuracy of loop predictions in protein structures.
The positions of all nonhydrogen atoms of the loop are
optimized in a fixed environment with respect to a pseudo
energy function. The energy is a sum of many spatial restraints
that include the bond length, bond angle, and improper
dihedral angle terms from the CHARMM-22 force field, statistical
preferences for the main-chain and side-chain dihedral
angles, and statistical preferences for nonbonded atomic
contacts that depend on the two atom types, their distance
through space, and separation in sequence. The energy function
is optimized with the method of conjugate gradients combined
with molecular dynamics and simulated annealing. Typically,
the predicted loop conformation corresponds to the lowest
energy conformation among 500 independent optimizations.
Predictions were made for 40 loops of known structure at
each length from 1 to 14 residues. The accuracy of loop
predictions is evaluated as a function of thoroughness
of conformational sampling, loop length, and structural
properties of native loops. When accuracy is measured by
local superposition of the model on the native loop, 100,
90, and 30% of 4-, 8-, and 12-residue loop predictions,
respectively, had <2 Å RMSD error for the mainchain
N, Cα, C, and O atoms; the average accuracies
were 0.59 ± 0.05, 1.16 ± 0.10, and 2.61 ±
0.16 Å, respectively. To simulate real comparative
modeling problems, the method was also evaluated by predicting
loops of known structure in only approximately correct
environments with errors typical of comparative modeling
without misalignment. When the RMSD distortion of the main-chain
stem atoms is 2.5 Å, the average loop prediction
error increased by 180, 25, and 3% for 4-, 8-, and 12-residue
loops, respectively. The accuracy of the lowest energy
prediction for a given loop can be estimated from the structural
variability among a number of low energy predictions. The
relative value of the present method is gauged by (1) comparing
it with one of the most successful previously described
methods, and (2) describing its accuracy in recent blind
predictions of protein structure. Finally, it is shown
that the average accuracy of prediction is limited primarily
by the accuracy of the energy function rather than by the
extent of conformational sampling.