INTRODUCTION
The extreme diversity and plasticity of prokaryotic genomes, manifest both at the level of gene loss and acquisition via horizontal gene transfer and at the level of gene order rearrangement, are arguably among the major generalizations of comparative genomics (Coenye et al., 2005; Koonin & Galperin, 1997, 2002; Snel et al., 2002). Bacteria and archaea have numerous partially conserved operons, probably thanks, in part, to the ‘selfish’ behaviour of operons, but little conservation of genome organization is seen at large evolutionary distances beyond the operon level (Dandekar et al., 1998; Lawrence, 1999; Mushegian & Koonin, 1996; Watanabe et al., 1997; Wolf et al., 2001). To detect traces of such long-range conservation, specially designed computational methods for detecting ‘überoperons’, or partially conserved gene neighbourhoods, have been developed (Lathe et al., 2000; Rogozin et al., 2002).
As a case study for testing the methods for conserved neighbourhood analysis that we have developed, we characterized an extensive gene set that included several proteins related to DNA or RNA metabolism and was, mostly, specific to thermophiles (Makarova et al., 2002). These genes comprise a complex array of overlapping neighbourhoods that are partially conserved but highly diversified, in terms of both gene composition and gene order, and are represented in all archaeal and many bacterial genomes. At the time, we hypothesized that these genes encoded an uncharacterized, versatile repair system, largely associated with the thermophilic lifestyle.