β-Galactosidase (lacZ) from Escherichia
coli is a 464 kDa homotetramer. Each subunit consists
of five domains, the third being an α/β barrel
that contains most of the active site residues. A comparison
is made between each of the domains and a large set of
proteins representative of all structures from the protein
data bank. Many structures include an α/β barrel.
Those that are most similar to the α/β barrel of
E. coli β-galactosidase have similar catalytic
residues and belong to the so-called “4/7 superfamily”
of glycosyl hydrolases. The structure comparison suggests
that β-amylase should also be included in this family.
Of three structure comparison methods tested, the “ProSup”
procedure of Zu-Kang and Sippl and the “Superimpose”
procedure of Diederichs were slightly superior in discriminating
the members of this superfamily, although all procedures
were very powerful in identifying related protein structures.
Domains 1, 2, and 4 of E. coli β-galactosidase
have topologies related to “jelly-roll barrels”
and “immunoglobulin constant” domains. This
fold also occurs in the cellulose binding domains (CBDs)
of a number of glycosyl hydrolases. The fold of domain
1 of E. coli β-galactosidase is closely related
to some CBDs, and the domain contributes to substrate binding,
but in a manner unrelated to cellulose binding by the CBDs.
This is typical of domains 1, 2, 4, and 5, which appear
to have been recruited to play roles in β-galactosidase
that are unrelated to the functions that such domains provide
in other contexts. It is proposed that β-galactosidase
arose from a prototypical single domain α/β barrel
with an extended active site cleft. The subsequent incorporation
of elements from other domains could then have reduced
the size of the active site from a cleft to a pocket to
better hydrolyze the disaccharide lactose and, at the same
time, to facilitate the production of inducer, allolactose.