CORA is a suite of programs for multiply aligning and analyzing
protein structural families to identify the consensus positions
and capture their most conserved structural characteristics
(e.g., residue accessibility, torsional angles, and global
geometry as described by inter-residue vectors/contacts).
Knowledge of these structurally conserved positions, which
are mostly in the core of the fold and of their properties,
significantly improves the identification and classification
of newly-determined relatives. Information is encoded in
a consensus three-dimensional (3D) template and relatives
found by a sensitive alignment method, which employs a
new scoring scheme based on conserved residue contacts.
By encapsulating these critical “core” features,
templates perform more reliably in recognizing distant
structural relatives than searches with representative
structures.
Parameters for 3D-template generation and alignment were
optimized for each structural class (mainly-α, mainly-β,
α-β), using representative superfold families.
For all families selected, the templates gave significant
improvements in sensitivity and selectivity in recognizing
distant structural relatives. Furthermore, since templates
contain less than 70% of fold positions and compare fewer
positions when aligning structures, scans are at least
an order of magnitude faster than scans using selected
structures. CORA was subsequently tested on eight other
broad structural families from the CATH database.
Diagnostics plots are generated automatically and provide
qualitative assistance for classifying newly determined
relatives. They are demonstrated here by application to
the large globin-like fold family. CORA templates for both
homologous superfamilies and fold families will be stored
in CATH and used to improve the classification and analysis
of newly determined structures.