“Multiple introns, and the prospect that
these occur within several genes in the same metabolic pathway, suggest
a regulatory role for splicing … ” (1). That statement was made
more than a decade ago by our colleagues and ourselves in reference to
the phage T4 group I introns, all three of which reside in genes
involved in nucleotide metabolism. The intervening years were
distinguished by a striking absence of any demonstration of a
regulatory role for these introns. Furthermore, they seem not to confer
any selective advantage to T4 phage, and, despite the discovery of more
introns in genes of DNA metabolism, the question of “why exclusively
in those genes?” has been all but forgotten. Now we are
faced with a report in this issue of the Proceedings of an
unrelated phage and a different type of splicing element, an intein;
not only does this element coexist with a group I intron in the same
gene but also the gene is one of nucleotide metabolism (2). With
lightning apparently striking twice in the same place, we are again
forced to confront our doubts about pure chance. What might account for
the colocalization of two different types of intervening sequence in
the same gene on the same metabolic pathway?
The now familiar group I introns self-splice at the RNA level (3),
whereas inteins, which are in-frame protein fusions, self-splice at the
protein level (ref. 4; Fig. 1).
The first inteins were described just a few years ago, and immediately
examples emerged in all three biological kingdoms, the archaea,
bacteria, and eukarya (5–7). The recent identification of more than 50
inteins, mostly by sequence comparisons (refs. 4, 8, and 9; New England
Biolabs Intein Database Intein Registry at http://www.neb.com; S.
Pietrokovski at
http://www.blocks.fhcrc.org/~pietro/inteins) has shown
them to be present in a wide range of organisms. However, until now,
none had been found in bacteriophage. Until, that is, the
above-mentioned report, which identifies an intein and a group I intron
in the gene for a putative ribonucleotide reductase subunit (the
bnrdE gene) in the Bacillus subtilis
bacteriophage SPβ (2).
Lazarevic et al. (2) also identified a group I intron
in the neighboring bnrdF gene, encoding a second putative
subunit of SPβ ribonucleotide reductase. The bnrdE and
bnrdF introns are in the same general family as the three
group I introns of phage T4 (1, 10) and the introns in
Bacillus phages β22 (11) and SPO1 and three of its close
relatives (12). Remarkably, all but one of the introns and the single
intein so far identified in bacteriophage are located in genes involved
in DNA metabolism (10–13). The SPO1 intron is in the DNA polymerase
gene, whereas β22 and T4 have introns in their thymidylate synthase
genes. The remaining T4 introns are in nrdB, which encodes a
small subunit of ribonucleotide reductase, and sunY, renamed
nrdD, encoding an anaerobic ribonucleotide reductase. Given
the small number of group I introns so far discovered in bacteriophage
genes and the apparently very low occurrence of inteins in these
organisms, these findings are provocative indeed.
Although self-splicing introns in bacteria and lower eukaryotes are not
confined to genes of DNA metabolism, but rather are situated in a
different spectrum of loci, including tRNA, rRNA, and energy metabolism
genes (14), the genes playing host to inteins are once again heavily
biased toward DNA metabolism (refs. 4, 8, and 9; New England Biolabs
Intein Database Intein Registry at http://www.neb.com; S.
Pietrokovski at http://www.blocks.fhcrc.org/~pietro/inteins).
Approximately 70% of known inteins are confined to such functions.
They include DNA polymerases, helicases, gyrases, RecA recombinase,
and, once again, ribonucleotide reductases, both aerobic and
anaerobic.
In addition to splicing at the RNA or protein levels, many group
I introns and inteins are mobile genetic elements at the DNA level, by
virtue of endonucleases encoded within them (Figs. 1 and
2). These endonucleases cleave
intron-minus or intein-minus alleles of their cognate genes, initiating
a unidirectional gene conversion event that results in copying of the
intein or intron DNA (Fig. 2). There has been much discussion of the
evolution of mobile group I introns and inteins. It has been argued
that endonuclease genes are the ancestral mobile elements that
colonized ancient introns and inteins (15–20). Endonuclease invasion
of essential genes would naturally be lethal because it would abolish
gene expression. However, insertion into DNA encoding introns or
inteins would afford endonuclease genes safe haven because of the
ability of the intron or intein to splice and therefore to preserve
expression of the host gene. The intron or intein in turn acquires
mobility. While the presence of nonmobile introns in the same genes of
widely differing organisms is consistent with their being of ancient
lineage, the mobility of the endonuclease-containing elements allows
for lateral transfer between genes and organisms in more recent times.
Why then is the distribution of particular introns and the
inteins so narrow? We offer four different possibilities. First, back
to querying the original suggestion: do these elements regulate the
genes in which they reside, and thereby give a selective advantage to
the organism, too subtle to be measured under standard laboratory
conditions, but manifest under the nutrient-deprived, oxygen-limited,
temperature-stressed environments prevalent in their natural habitats?
Although no evidence has yet been forthcoming for a regulatory role for
introns or inteins, they may confer a selective advantage by other
means. It has been shown, for example, that in the archaeon
Sulfolobus acidocaldarius an rDNA intron provides cells with
an advantage over intronless cells (21). Additionally, it would appear
that the intron endonucleases can confer a selective advantage. The
endonuclease of the subtilis phage SP82 preferentially cleaves the DNA
of its close relative SPO1 in the vicinity of its intron, thereby
effecting exclusion of SPO1 markers and promoting the propagation of
its own in mixed infections (22). Furthermore, very different inteins
reside at different positions in the recA genes of
Mycobacterium tuberculosis and Mycobacterium
leprae but are not found in the nonpathogenic mycobacteria. This
has been interpreted to suggest that inteins may confer some advantage
to their pathogenic hosts (23).
Second, might facilitated entry of introns and inteins into DNA
account for their prevalence at particular loci? Possibilities here are
that distinctive DNA or nucleoid structures could provide easy access
to invasive elements. Such an explanation has been suggested for the
high incidence of some mobile elements, such as pathogenicity islands
and Ty retrotransposons, in highly transcribed regions of
DNA (24, 25).
Third, might the sites of intron and intein occupancy be in
functions that cannot readily tolerate their loss? Because of the
powerful invasive potential of these elements, they would tend to
simply reenter sites of perfect excision by endonuclease-mediated
mobility. Imperfect excision, on the other hand, might abolish gene
function, resulting in a lethal event, thereby favoring retention of
the element. Indeed, it has been noted that archaeal and group I
introns are often positioned in functionally important regions of genes
(26), and a recent systematic alignment of the sites of intein
insertion shows them to be at or close to residues involved in
catalysis or substrate or cofactor binding (8).
Finally, might some of the preferred target genes in phage act as sinks
for intervening sequences because they encode duplicated functions
already present in the bacterial host? Thymidylate synthase, and both
aerobic and anaerobic ribonucleotide reductases in the coliphage- and
subtilis phage-host systems represent such duplicated functions. The
“short-term” disadvantage afforded by insertion of a novel intron
or intein, which may splice poorly until the element and host
environment can adapt to one another, might be tolerated under these
circumstances if the bacterial enzymes can be utilized to maintain
phage survival.
Determining which, if any, of these possibilities apply, must await
more experimental data and genome analyses. However, it is not
unreasonable to suspect that one or another of these alternatives may
apply to different inteins and introns in different organisms and at
different genetic addresses.