Rules of Nomenclature |
Our nomenclature scheme implements the following guidelines:
(i) Gene
products will have names that parallel the genes encoding them, e.g., aroA encodes AroA. (This does not prevent
other descriptions of the gene product, e.g., AroA being the universal acronym
proper for the enzyme known as DAHP synthase). Standard 4-letter gene and gene
product acronyms are referred to as the 'acronym proper'. Note that additional
symbols and conventions that convey important information, as described herein,
are appended to the acronym proper.
(ii) Genes in a pathway or pathway segment are named in the
order of the reactions catalyzed by the gene products, e.g., AroA, AroB and AroC, which catalyze the
first three reactions of aromatic biosynthesis, are encoded by aroA, aroB,
and aroC.
(iii) The
smallest unit of naming is at the level of discrete catalytic or allosteric
domains. Multi-domain fusions are
designated with intervening bullets, e.g., tyrAaroF encodes
fused catalytic domains that abound elsewhere as stand-alone tyrA and aroF
domains. Note that a catalytic domain such as TyrA is actually a "supradomain"
consisting of an N-terminal cofactor domain and a C-terminal catalytic domain,
and potentially these could be named separately. In fact, the cofactor domain
(Chothia, et al., 2003) is widely distributed in
combination with domains other than the TyrA catalytic domain. However, the
separate two domains of TyrA are thus far not known to possess any functional
roles as independent entities. TyrA catalytic domains always coexist with the
cofactor domain to form an intimate functional unit, and the functional site may
very well be created between the two domains (Sun, et al. 2006).
In this case, the TyrA name is currently applied to the supradomain, so named as the smallest functional
unit at the present time. Hence, in this context of function, we sometimes
apply "supradomain" as an equivalent of "domain" in those cases where the
smallest functional unit appears to be the supradomain.
Identical functional roles will be associated with identical
acronym-proper labels, regardless of whether the gene products are homologs or
analogs. On the other hand, note that it is occasionally possible for enzymes
catalyzing different reactions to carry the same acronym proper if they are
embedded in the same overall metabolic conversion (see rule x).
(iv) If an enzyme consists of subunits, the corresponding gene
and gene-product names are designated with additional lower-case letters, e.g., the anthranilate synthase complex consists
of the large aminase subunit TrpAa and the small amidotransferase subunit
TrpAb, these being encoded by trpAa and trpAb, respectively.
(v) Distinct allosteric domains are designated with 3 capital letters. One example is pheAACT encoding PheAACT.
(vi) Different homology classes (analogs) that have independently acquired the same function are designated with Roman-numeral subscripts, e.g., aroAI and aroAII encode analogs that catalyze the same reaction. The latter exemplifies a case where, on structural grounds, the apparent analogs could possibly be distant homologs that diverged sufficiently to mask definitive recognition of the homology (given the limitation of current resources). However, we do not infer homology if it cannot be proven.
If a homology class consists of distinct, well-separated
subgroups, additional lower-case Greek subscripts can be appended to designate
them, as illustrated by AroAIα and AroAIβ (Fig.
2). If there were no known analogs, then any well-separated sub-homolog groups
would be designated without Roman-numeral subscripts, as is exemplified by TyrAα
and TyrAβ.
(vii) If different enzyme reactions converge upon a common
intermediate as is the case in early aromatic biosynthesis, genes within one of
the convergent branches are designated with a 'prime'. Usually this would apply to the least widely
distributed branch. Thus, AroA and AroB, on the one hand, and AroA' and AroB',
on the other hand, describe different initial routes that converge to provide
exactly the same product (dehydroquinate) to AroC for use as its substrate (White, 2004). Thus, AroA and AroA' each
catalyze the first committed step of aromatic biosynthesis in different
organisms, but the particular reactions catalyzed are not the same. And the
same is true of AroB and AroB'.
Our subsystem coverage does not yet include the large number of
connecting links that will be added.
Of these, the metabolic linkage to NAD biosynthesis via quinolinate
comes to mind because the alternative tryptophan-to-quinolinate and
aspartate-to quinolinate pathways (Kurnasov, et al., 2003) will exemplify another
instance of pathway convergence to a common intermediate.
(viii) Paralogs, which originated from recent gene duplications and which have no obvious differential functional specializations (one-function paralog family), are distinguished from one another with underscore numbers. Recent gene duplicates (e.g., trpD_1, trpD_2, and trpD_3) might have selective value via manifestation of a gene-dose effect, or they might include pseudogenes destined for elimination (apparently a common phenomenon). If one of the multiple paralogs seems to be uniquely suited to carry out the function corresponding to that of a well-characterized single-gene ortholog in organismal relatives, the preference would be to label it trpD_1. For example, such a scenario would apply in the situation (see (Xie, et. al., 2003) where trpD_1 occupies a perfect and complete tryptophan operon in some cyanobacteria, whereas the extra-operonic trpD_2 and trpD_3 paralogs exhibit especially long branches on a protein tree and lack one or more amino-acid residues known to be important for catalysis (thus being likely pseudogenes). In comparisons of the same genes in a collection of organisms where some of the organisms support multiple paralogs and others do not, the single genes cannot properly be labeled the same as a particular paralog member present in a multi-paralog organism. Thus, for example, organisms with a single trpD gene would simply be denoted trpD since the latter has equally orthologous relationships with each of the recent paralogs present in sister organisms (Fitch, 2000).
(ix) Same-function ancient paralogs that vary in some
specialized feature carry appropriate underscore notations. Ancient paralogs
arose from gene duplications that preceded speciation (Fitch, 2000). Ancient paralogs are usually differentially
specialized, and those with different catalytic functions will carry names that
reflect different pathway roles. (The ancient AroA and KdsA paralogs of Fig. 2
would be examples). However, occasional ancient paralogs have retained the same
enzymatic function and hence share the same acronym proper, but they are
differentially specialized in some other way. For example, the trio of paralogous
DAHP synthases in enteric bacteria are AroAIα proteins that are
subject to differential regulation by feedback inhibition: AroAIα_W by tryptophan, AroAIα_F
by phenylalanine, and AroAIα_Y
by tyrosine.
Occasionally some member species of two-paralog lineages possess a single remnant paralog, which, in addition to its usual function, has acquired the function of the lost sister paralog, thus being bifunctional. In such cases the name of the surviving paralog (identified by homology or by operon context) is given first and in bold fonts, and separated from the name of the missing paralog by a double 'slash'. Examples of such relatively rare bifunctional proteins covered in this article are PabAb // TrpAb and AroJIβ // HisG in a small clade of Bacillus species, as well as HisD // TrpCII in Actinomycete bacteria.
(x) Different substrate specificities of homologs typically support different functional roles in different pathways, e.g., the aforementioned DAHP synthase/KDOP synthase dichotomy (Fig. 2). If homologs having different substrate specificities are embedded within the flow route of the same pathway such that they perform equivalent functional roles at the overall pathway level, they will share the same acronym proper, with the differing specificities indicated with subscript identifiers. This is exemplified by the alternative flow routes between prephenate and L-tyrosine in Fig.1. The tyrosine-pathway dehydrogenases catalyze different reactions, being specific for prephenate, for L-arogenate, or able to utilize both. These three variant specificities are indicated with lower-case, rightward subscripts: TyrAp, TyrAa, and TyrAc, respectively. However, at the broader pathway level, the functional role of each of the three is identical. Namely, in each case the cyclohexadienyl substrate is aromatized via an oxidative reaction that is driven by elimination of the ring-attached carbon dioxide.
If substrate ambiguity of such one-pathway homologs extends to a
second substrate, specificity for a second substrate can be designated with
leftward subscripts, e.g., NADTyrAp
is a tyrosine-pathway dehydrogenase specific for the NAD+/prephenate couple, whereas NADPTyrAa refers to
specificity for the NADP+/arogenate
couple.
(xi) Genes that encode cleavable signal (or transit) peptides
are denoted by leading-asterisk superscripts.
Thus, aroHIα encodes cytoplasmic
chorismate mutase, whereas *aroHIα encodes periplasmic (or secreted) chorismate mutase.
Overall rationale in support of the acronym scheme. The above
nomenclature scheme strives to relate the acronym library to the evolutionary
thread. This is not absolutely necessary to the extent that the single critical
need is to implement a consistent, universal assemblage of acronyms. However, a
significant advantage of the system proposed is that a given acronym is
designed to convey a large amount of biochemical and evolutionary knowledge.
Information is conveyed, not only by what is present in the acronym, but also
by what is absent. For example, consider the hypothetical Xyz pathway in which
one encounters the gene product XyzCII, encoded by xyzCII.
Even being unfamiliar with the Xyz pathway,
one knows (because of the 'C') that reference is being made to the third enzyme
in the pathway. The Roman-numeral subscript reveals that this enzyme is one of
at least two analog classes, and the bullet informs that there is a C-terminal
fusion. XyzCII cannot be a subunit component; otherwise there would
be a lowercase letter immediately after the acronym proper. There is no
cleavable signal or transit peptide: otherwise there would be a leading asterisk.
It is not a member of a one-function paralog family, otherwise we would see
underscore notations. It is not a member of a homolog family that separates
into distinct subgroups; otherwise an α, β, etc. would follow the
Roman-numeral subscript. XyzCII has not expanded its functional
repertoire by "borrowing" a second functional role that is exercised elsewhere
in the lineage by a paralog relative;
otherwise the acronym for the "borrowed" functional role of the lost
paralog would be applied (with separation by a "double slash") after that of
the surviving paralog.
L O S A L A M O S N A T I O N A L L A B O R A T O R Y • Est 1943 Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's National Nuclear Security Administration © Copyright 2006 LANS LLC All rights reserved | Disclaimer/Privacy |