The DDBJ/EMBL/GenBank
Feature Table:
Definition
Version 8 Oct 2008
DNA Data Bank of Japan, Mishima, Japan.
EMBL Nucleotide Sequence Database, Cambridge, UK.
GenBank, NCBI, Bethesda, MD, USA.
1 Introduction
2 Overview of the Feature Table format
2.1 Format Design
2.2 Key aspects of this feature table design
2.3 Feature Table Terminology
3 Feature table components and format
3.1 Naming conventions
3.2 Feature keys
3.2.1 Purpose
3.2.2 Format and conventions
3.2.3 Key groups and hierarchy
3.2.4 Feature key examples
3.3 Qualifiers
3.3.1 Purpose
3.3.2 Format and conventions
3.3.3 Qualifier values
3.3.4 Qualifier examples
3.4 Feature labels
3.4.1 Purpose
3.4.2 Format and conventions
3.4.3 Examples of feature labels
3.5 Location
3.5.1 Purpose
3.5.2 Format and conventions
3.5.3 Location examples
4 Feature table Format
4.1 Format examples
4.2 Definition of line types
4.3 Data item positions
4.4 Use of blanks
5 Examples of sequence annotation
5.1 Eukaryotic gene
5.2 Bacterial operon
5.3 Artificial cloning vector (circular)
5.4 Plasmid
5.5 Repeat element
5.6 Immunoglobulin heavy chain
5.7 T-cell receptor
5.8 Transfer RNA
6 Limitations of this feature table design
7 Appendices
7.1 Appendix I EMBL, GenBank and DDBJ entries
7.1.1 EMBL Format
7.1.2 GenBank Format
7.1.3 DDBJ Format
7.2 Appendix II Feature table: Backus-Naur form
7.3 Appendix III: Feature keys reference
7.4 Appendix IV: Summary of qualifiers for feature keys
7.4.1 Qualifier List
7.4.2 Feature qualifiers - mapped to Feature keys
7.5 Appendix V: Controlled vocabularies
7.5.1 Nucleotide base codes (IUPAC)
7.5.2 Modified base abbreviations
7.5.3 Amino acid abbreviations
7.5.4 Modified and unusual Amino Acids
7.5.5 Genetic Code Tables
7.5.6 Country Names
1 Introduction
Nucleic acid sequences provide the fundamental starting point for describing
and understanding the structure, function, and development of genetically
diverse organisms. The GenBank, EMBL, and DDBJ nucleic acid sequence data
banks have from their inception used tables of sites and features to describe
the roles and locations of higher order sequence domains and elements within
the genome of an organism.
In February, 1986, GenBank and EMBL began a collaborative effort (joined by
DDBJ in 1987) to devise a common feature table format and common standards for
annotation practice.
2 Overview of the Feature Table format
The overall goal of the feature table design is to provide an extensive
vocabulary for describing features in a flexible framework for manipulating
them. The Feature Table documentation represents the shared rules that allow
the three databases to exchange data on a daily basis.
The range of features to be represented is diverse, including regions which:
* perform a biological function,
* affect or are the result of the expression of a biological function,
* interact with other molecules,
* affect replication of a sequence,
* affect or are the result of recombination of different sequences,
* are a recognizable repeated unit,
* have secondary or tertiary structure,
* exhibit variation, or have been revised or corrected.
2.1 Format Design
The format design is based on a tabular approach and consists of the following
items:
Feature key - a single word or abbreviation indicating functional group
Location - instructions for finding the feature
Qualifiers - auxiliary information about a feature
2.2 Key aspects of this feature table design
* Feature keys allow specific annotation of important sequence features.
* Related features can be easily specified and retrieved.
Feature keys are arranged hierarchically, allowing complex and compound
features to be expressed. Both location operators and the feature keys show
feature relationships even when the features are not contiguous. The hierarchy
of feature keys allows broad categories of biological functionality, such as
rRNAs, to be easily retrieved.
* Generic feature keys provide a means for entering new or undefined features.
A number of "generic" or miscellaneous feature keys have been added to permit
annotation of features that cannot be adequately described by existing feature
keys. These generic feature keys will serve as an intermediate step in the
identification and addition of new feature keys. The syntax has been designed
to allow the addition of new feature keys as they are required.
* More complex locations (fuzzy and alternate ends, for example) can be specified.
Each end point of a feature may be specified as a single point, an alternate
set of possible end points, a base number beyond which the end point lies, or
a region which contains the end point.
* Features can be combined and manipulated in many different ways.
The location field can contain operators or functional descriptors specifying
what must be done to the sequence to reproduce the feature. For example, a
series of exons may be "join"ed into a full coding sequence.
* Standardized qualifiers provide precision and parsibility of descriptive details
A combination of standardized qualifiers and their controlled-vocabulary
values enable free-text descriptions to be avoided.
* The nature of supporting evidence for a feature can be explicitly indicated.
Features, such as open reading frames or sequences showing sequence similarity
to consensus sequences, for which there is no direct experimental evidence can
be annotated. Therefore, the feature table can incorporate contributions from
researchers doing computational analysis of the sequence databases. However,
all features that are supported by experimental data will be clearly marked as
such.
* The table syntax has been designed to be machine parsible.
A consistent syntax allows machine extraction and manipulation of sequences
coding for all features in the table.
2.3 Feature Table Terminology
The format and wording in the feature table use common biological research
terminology whenever possible. For example, an item in the feature table such as:
Key Location/Qualifiers
CDS 23..400
/product="alcohol dehydrogenase"
/gene="adhI"
might be read as:
The feature CDS is a coding sequence beginning at base 23 and ending at base
400, has a product called 'alcohol dehydrogenase' and is coded for by a gene
called "adhI".
A more complex description:
Key Location/Qualifiers
CDS join(544..589,688..>1032)
/product="T-cell receptor beta-chain"
which might be read as:
This feature, which is a partial coding sequence, is formed by joining
elements indicated to form one contiguous sequence encoding a product called T-
cell receptor beta-chain.
The following sections contain detailed explanations of the feature table
design showing conventions for each component of the feature table, examples
of how the format might be implemented, a description of the exact column
placement of all the data items and examples of complete sequence entries that
have been annotated using the new format. The last section of this document
describes known limitations of the current feature table design.
Appendix I gives an example database entry for the DDBJ, GenBank and EMBL
formats.
Appendix II describes the format in Backus-Naur Form (BNF). This information
will not be presented in future editions of this document.
Appendices III and IV provide reference manuals for the feature table keys and
qualifiers, respectively.
Appendix V includes controlled vocabularies such as nucleotide base codes,
modified base abbreviations, genetic code tables etc.
This document defines the syntax and vocabulary of the feature table. The
syntax is sufficiently flexible to allow expression of a single biological
entity in numerous ways. In such cases, the annotation staffs at the databases
will propose conventions for standard means of denoting the entities.
This feature table format is shared by GenBank, EMBL and DDBJ. Comments,
corrections, and suggestions may be submitted to any of the database staffs.
New format specifications will be added as needed.
3 Feature table components and format
3.1 Naming conventions
Feature table components, including feature keys, qualifiers, accession
numbers, database name abbreviations, feature labels, and location operators,
are all named following the same conventions. Component names may be no more
than 20 characters long (Feature keys 15, Feature qualifiers 20) and must
contain at least one letter. Case should not be regarded as significant in
comparing feature labels ("Prot1" and "pROT1" are the same). The following
characters are permitted to occur in feature table component names:
* Uppercase letters (A-Z)
* Lowercase letters (a-z) Numbers (0-9)
* Underscore (_)
* Hyphen (-)
* Single quotation mark or apostrophe (')
* Asterisk (*)
3.2 Feature keys
3.2.1 Purpose
Feature keys indicate
(1) the biological nature of the annotated feature or
(2) information about changes to or other versions of the sequence.
The feature key permits a user to quickly find or retrieve similar features or
features with related functions.
3.2.2 Format and conventions
There is a defined list of allowable feature keys, which is shown in Appendix
III. Each feature must contain a feature key.
3.2.3 Key groups and hierarchy
The feature keys fall into families which are in some sense similar in
function and which are annotated in a similar manner. A functional family may
have a "generic" or miscellaneous key, which can be recognized by the 'misc.'
prefix, that can used for instances not covered by the other defined keys of
that group.
The feature key groups are listed below with a short definition and an
annotation example:
1. Difference and change features
Indicate ways in which a sequence should be changed to produce a different
"version":
misc_difference location
/replace="change_location"
2. Expression signal features
Indicate regions containing a signal that alters a biological function:
misc_signal location
3. Transcript features
Indicate products made by a region:
misc_RNA location
4. Binding features
Indicate that a sequence or nucleotide is covalently, non-covalently, or
otherwise bound to something else:
misc_binding location
/bound_moiety="bound molecule"
5. Repeat features
Indicate repetitive sequence elements:
repeat_region location
6. Recombination features
Indicate regions that have been either inserted or deleted by recombination:
misc_recomb location
7. Structure features
Indicate sequence for which there is secondary or tertiary structural
information:
misc_structure location
In addition to the functional groupings shown above, the feature keys can also
be arranged in a hierarchical tree based on the degree of specificity or level
of detail known about a feature. This hierarchy is shown in outline form in
Appendix III where the most general level is the 'misc_feature' key and other
keys are arranged in increasing level of detail. By using more general keys,
features can be annotated even if their biological functions are
insufficiently well characterized to assign them more specific keys.
3.2.4 Feature key examples
Key Description
CDS Protein-coding sequence
RBS ribosome binding site
rep_origin Origin of replication
protein_bind Protein binding site on DNA
tRNA mature transfer RNA
See Appendix III for descriptions of all feature keys.
3.3 Qualifiers
3.3.1 Purpose
Qualifiers provide a general mechanism for supplying information about
features in addition to that conveyed by the key and location.
3.3.2 Format and conventions
Qualifiers take the form of a slash (/) followed by the qualifier name and, if
applicable, an equal sign (=) and a value. Each qualifier should have a single
value; if multiple values are necessary, these should be represented by
iterating the same qualifier, eg:
Key Location/Qualifiers
CDS 1..1000
/codon=(seq:"cug",aa:Ser)
/codon=(seq:"tga",aa:Trp)
If the location descriptor does not need a continuation line, the first
qualifier begins a new line in the feature location column. If the location
descriptor requires a continuation line, the first qualifier may follow
immediately after the location. Any necessary continuation lines begin in the
same column. See Section 4 for a complete description of data item positions.
3.3.3 Qualifier values
Since qualifiers convey many different types of information, there are several value formats:
1. Free text
2. Controlled vocabulary or enumerated values
3. Citation or reference numbers
4. Sequences
5. Feature labels
3.3.3.1 Free text
Most qualifier values will be a descriptive text phrase which must be enclosed
in double quotation marks. When the text occupies more than one line, a single
set of quotation marks is required at the beginning and at the end of the
text. The text itself may be composed of any printable characters (ASCII
values 32-126 decimal). If double quotation marks are used within a free text
string, each set (") must be 'escaped' by placing a second double quotation
mark immediately before it (""). For example:
/note="This is an example of ""escaped"" quotation marks"
3.3.3.2 Controlled vocabulary or enumerated values
Some qualifiers require values from a controlled vocabulary and are entered
without quotation marks. For example, the '/direction' qualifier has only
three values: 'left', 'right' or 'both'. Qualifier value controlled
vocabularies, like feature table component names, must be treated as
completely case insensitive: they may be entered and displayed in any
combination of upper and lower case ('/direction=Left' '/direction=left' and '/
direction=LEFT' are all legal and all convey the same meaning). The database
staffs reserve the right to regularize the case of qualifier values in the
interest of readability, unlike the case of feature labels where the databases
will maintain the case as originally entered (see Section 3.4.2). Qualifier
value controlled vocabularies will be maintained by the cooperating database
staffs. Examples of controlled vocabularies can be found in Appendices IV and
V. The database staff should be contacted for the current lists.
3.3.3.3 Citation or reference numbers
The citation or published reference number (as enumerated in the entry
'REFERENCE' or 'RN' data item) should be enclosed in square brackets
(e.g., [3]) to distinguish it from other numbers.
3.3.3.4 Sequences
Literal sequence of nucleotide bases e.g., join(12..45,"atgcatt",988..1050) in
location descriptors has become illegal starting from implementation of
version 2.1 of the Feature Table Definition Document (December 15, 1998)
3.3.4 Qualifier examples
Key Location/Qualifiers
source 1..1509
/organism="Mus musculus"
/strain="CD1"
/mol_type="genomic DNA"
promoter <1..9
/gene="ubc42"
mRNA join(10..567,789..1320)
/gene="ubc42"
CDS join(54..567,789..1254)
/gene="ubc42"
/product="ubiquitin conjugating enzyme"
/function="cell division control"
3.4 Feature labels
The /label= qualifier takes as its value a feature label. Feature labels
follow the same naming conventions as other feature table components (e.g.,
keys and qualifiers). While feature labels are optional, attaching a label to
a feature allows it to be referred to unambiguously. For example, the feature
label can be used to refer unambiguously to a coding region that exists in a
different entry to the exons of which it is comprised.
3.4.1 Purpose
The feature label identifies a feature item within an entry and, when combined
with the entry's primary accession number and the name of the database from
which it came, is a permanent internationally unique tag for that feature.
There are, however, certain situations in which a "permanent" feature may "
disappear" from the distributed version of the database and others in which it
may be desirable to change a feature's label.
3.4.2 Format and conventions
Each feature in a feature table may have a label which must be unique within
that entry, but which may be the same as feature labels used in other entries.
A feature can be given any label. However, labels containing meaningful
abbreviations will be much more easily remembered than non-descriptive labels.
Because letter case is not significant, two features within one entry cannot
have labels that differ only in case: '16S_rRNA' and '16s_rRNA' could not both
be used in the same entry.
The full feature name syntax is as follows:
Database name::primary accession number:feature label
References to a feature should use as much of the full feature name as
required to unambiguously identify the feature.
3.4.3 Examples of feature labels
Feature label Description
adhI adhI gene coding for alcohol dehydrogenase
tfp35 tail fiber protein 35
3'-ltr long terminal repeat
a1col_x51 prepro-alpha-1-collagen, exon 51
X10045:diff1 first conflict for the sequence of entry X10045
GB::K10675:catexA feature with label catexA in entry K10675 of the
GenBank databank
3.5 Location
3.5.1 Purpose
The location indicates the region of the presented sequence which corresponds
to a feature.
3.5.2 Format and conventions
The location contains at least one sequence location descriptor and may
contain one or more operators with one or more sequence location descriptors.
Base numbers refer to the numbering in the entry. This numbering designates
the first base (5' end) of the presented sequence as base 1.
Base locations beyond the range of the presented sequence may not be used in
location descriptors, the only exception being location in a remote entry (see
3.5.2.1, e).
Location operators and descriptors are discussed in more detail below.
3.5.2.1 Location descriptors
The location descriptor can be one of the following:
(a) a single base number
(b) a site between two indicated adjoining bases
(c) a single base chosen from within a specified range of bases (not allowed for new
entries)
(d) the base numbers delimiting a sequence span
(e) a remote entry identifier followed by a local location descriptor
(i.e., a-d)
A site between two adjoining nucleotides, such as endonucleolytic cleavage
site, is indicated by listing the two points separated by a carat (^). The
permitted formats for this descriptor are n^n+1 (for example 55^56), or, for
circular molecules, n^1, where "n" is the full length of the molecule, ie
1000^1 for circular molecule with length 1000.
A single base chosen from a range of bases is indicated by the first base
number and the last base number of the range separated by a single period
(e.g., '12.21' indicates a single base taken from between the indicated
points). From October 2006 the usage of this descriptor is restricted :
it is illegal to use "a single base from a range" (c) either on its own or
in combination with the "sequence span" (d) descriptor for newly created entries.
The existing entries where such descriptors exist are going to be retrofitted.
Sequence spans are indicated by the starting base number and the ending base
number separated by two periods (e.g., '34..456'). The '<' and '>' symbols may
be used with the starting and ending base numbers to indicate that an end
point is beyond the specified base number. The starting and ending base
positions can be represented as distinct base numbers ('34..456') or a site
between two indicated adjoining bases.
A location in a remote entry (not the entry to which the feature table
belongs) can be specified by giving the accession-number and sequence version
of the remote entry, followed by a colon ":", followed by a location
descriptor which applies to that entry's sequence (i.e. J12345.1:1..15, see
also examples below)
3.5.2.2 Operators
The location operator is a prefix that specifies what must be done to the
indicated sequence to find or construct the location corresponding to the
feature. A list of operators is given below with their definitions and most
common format.
complement(location)
Find the complement of the presented sequence in the span specified by "
location" (i.e., read the complement of the presented strand in its 5'-to-3'
direction)
join(location,location, ... location)
The indicated elements should be joined (placed end-to-end) to form one
contiguous sequence
order(location,location, ... location)
The elements can be found in the
specified order (5' to 3' direction), but nothing is implied about the
reasonableness about joining them
Note : location operator "complement" can be used in combination with either "
join" or "order" within the same location; combinations of "join" and "order"
within the same location (nested operators) are illegal.
3.5.3 Location examples
The following is a list of common location descriptors with their meanings:
Location Description
467 Points to a single base in the presented sequence
340..565 Points to a continuous range of bases bounded by and
including the starting and ending bases
<345..500 Indicates that the exact lower boundary point of a feature
is unknown. The location begins at some base previous to
the first base specified (which need not be contained in
the presented sequence) and continues to and includes the
ending base
<1..888 The feature starts before the first sequenced base and
continues to and includes base 888
1..>888 The feature starts at the first sequenced base and
continues beyond base 888
102.110 Indicates that the exact location is unknown but that it is
one of the bases between bases 102 and 110, inclusive
123^124 Points to a site between bases 123 and 124
join(12..78,134..202) Regions 12 to 78 and 134 to 202 should be joined to form
one contiguous sequence
complement(34..126) Start at the base complementary to 126 and finish at the
base complementary to base 34 (the feature is on the strand
complementary to the presented strand)
complement(join(2691..4571,4918..5163))
Joins regions 2691 to 4571 and 4918 to 5163, then
complements the joined segments (the feature is on the
strand complementary to the presented strand)
join(complement(4918..5163),complement(2691..4571))
Complements regions 4918 to 5163 and 2691 to 4571, then
joins the complemented segments (the feature is on the
strand complementary to the presented strand)
J00194.1:100..202 Points to bases 100 to 202, inclusive, in the entry (in
this database) with primary accession number 'J00194'
join(1..100,J00194.1:100..202)
Joins region 1..100 of the existing entry with the region
100..202 of remote entry J00194
4 Feature table Format
The examples below show the preferred sequence annotations for a number of
commonly occurring sequence types. These examples may not be appropriate in
all cases but should be used as a guide whenever possible. This section
describes the columnar format used to write this feature table in "flat-file"
form for distributions of the database.
4.1 Format examples
Feature table format example (EMBL):
source 1..1859
/db_xref="taxon:3899"
/organism="Trifolium repens"
/tissue_type="leaves"
/clone_lib="lambda gt10"
/clone="TRE361"
/mol_type="genomic DNA"
CDS 14..1495
/db_xref="MENDEL:11000"
/db_xref="SWISS-PROT:P26204"
/note="non-cyanogenic"
/EC_number="3.2.1.21"
/product="beta-glucosidase"
/protein_id="CAA40058.1"
/translation="MDFIVAIFALFVISSFTITSTNAVEASTLLDIGNLSR.......
---------+---------+---------+---------+---------+---------+---------+---------
1 10 20 30 40 50 60 70 79
Feature table format example (GenBank):
source 1..8959
/organism="Homo sapiens"
/db_xref="taxon:9606"
/mol_type="genomic DNA"
gene 212..8668
/gene="NF1"
CDS 212..8668
/gene="NF1"
/note="putative"
/codon_start=1
/product="GAP-related protein"
/protein_id="AAA59924.1"
/translation="MAAHRPVEWVQAVVSRFDEQLPIKTGQQNTHTKVSTE.......
---------+---------+---------+---------+---------+---------+---------+---------
1 10 20 30 40 50 60 70 79
Feature table format example (DDBJ):
source 1..2136
/clone="pK28"
/organism="Rattus norvegicus"
/strain="Sprague-Dawley"
/tissue_type="kidney"
/mol_type="genomic DNA"
mRNA 19..2128
CDS 31..1212
/codon_start=1
/function="Dual specificity protein tyrosine/threonine
kinase"
/product="MAP kinase kinase"
/protein_id="BAA02603.1"
/translation="MPKKKPTPIQLNPAPDGSAVNGTSSAETNLEALQKKL.......
---------+---------+---------+---------+---------+---------+---------+---------
1 10 20 30 40 50 60 70 79
4.2 Definition of line types
The feature table consists of a header line, which contains the column titles
for the table, and the individual feature entries. Each feature entry is
composed of a feature descriptor line and qualifier and continuation lines,
if needed. The feature descriptor line contains the feature's name, key, and
location. If the location cannot be contained on the first line of the feature
descriptor, it is continued on a continuation line immediately following the
descriptor line. If the feature requires further attributes, feature qualifier
lines immediately follow the corresponding feature descriptor line (or its
continuation). Qualifier information that cannot be contained on one line
continues on the following continuation lines as necessary.
Thus, there are 4 types of feature table lines:
Line type Content #/entry #/feature
--------- ------- ------- ---------
Header Column titles 1* N/A
Feature descriptor Key and location 1 to many* 1
Feature qualifiers Qualifiers and values N/A 0 to many
Continuation lines Feature descriptor or 0 to many 0 to many
qualifier continuation
4.3 Data item positions
The position of the data items within the feature descriptor line is as follows:
column position data item
--------------- ---------
1-5 blank
6-20 feature key
21 blank
22-80 location
Data on the qualifier and continuation lines begins in column position 22 (the
first 21 columns contain blanks). The EMBL format for all lines differs from
the GenBank / DDBJ formats that it includes a line type abbreviation in
columns 1 and 2.
4.4 Use of blanks
Blanks (spaces) may, in general, be used within the feature location and
qualifier values to make the construction more readable. The following rules
should be observed:
* Names of feature table components may not contain blanks (see Section 3.1)
* Operator names may not be separated from the following open parenthesis (the
beginning of the operand list) by blanks.
* Qualifiers may not be separated from the preceding slash or the following
equals sign (if one) by blanks
5 Examples of sequence annotation
The examples below show the preferred sequence annotations for a number of
commonly occurring sequence types. These examples may not be appropriate in
all cases but should be used as a guide whenever possible.
5.1 Eukaryotic gene
source 1..1509
/organism="Mus musculus"
/strain="CD1"
/mol_type="genomic DNA"
promoter <1..9
/gene="ubc42"
mRNA join(10..567,789..1320)
/gene="ubc42"
CDS join(54..567,789..1254)
/gene="ubc42"
/product="ubiquitin conjugating enzyme"
/function="cell division control"
/translation="MVSSFLLAEYKNLIVNPSEHFKISVNEDNLTEGPPDTLY
QKIDTVLLSVISLLNEPNPDSPANVDAAKSYRKYLYKEDLESYPMEKSLDECS
AEDIEYFKNVPVNVLPVPSDDYEDEEMEDGTYILTYDDEDEEEDEEMDDE"
exon 10..567
/gene="ubc42"
/number=1
intron 568..788
/gene="ubc42"
/number=1
exon 789..1320
/gene="ubc42"
/number=2
polyA_signal 1310..1317
/gene="ubc42"
5.2 Bacterial operon
source 1..9430
/organism="Lactococcus sp."
/strain="MG1234"
/mol_type="genomic DNA"
operon 160..6865
/operon="gal"
-35_signal 160..165
/operon="gal"
/experiment="experimental evidence, no additional details
recorded"
-10_signal 179..184
/operon="gal"
/experiment="experimental evidence, no additional details
recorded"
CDS 405..1934
/operon="gal"
/gene="galA"
/product="galactose permease"
/function="galactose transporter"
/experiment="experimental evidence, no additional details
recorded"
CDS 2003..3001
/operon="gal"
/gene="galM"
/product="aldose 1-epimerase"
/EC_number="5.1.3.3"
/function="mutarotase"
CDS 3235..4537
/operon="gal"
/gene="galK"
/product="galactokinase"
/EC_number="2.7.1.6"
/experiment="experimental evidence, no additional details
recorded"
mRNA 189..6865
/operon="gal"
/experiment="experimental evidence, no additional details
recorded"
5.3 Artificial cloning vector (circular)
source 1..5300
/organism="Cloning vector pABC"
/lab_host="Escherichia coli"
/mol_type="other DNA"
/focus
source 1..5138
/organism="Escherichia coli"
/mol_type="other DNA"
/strain="K12"
source 5139..5247
/organism="Aequorea victoria"
/mol_type="other DNA"
/dev_stage="adult"
source 5248..5300
/organism="Escherichia coli"
/mol_type="other DNA"
/strain="K12"
CDS join(complement(<1..799),complement(5080..5120))
/gene="mob1"
/product="mobilization protein 1"
CDS complement(1697..2512)
/gene="Km"
/product="kanamycin resistance protein"
CDS 3037..3711
/gene="rep1"
/product="replication protein 1"
CDS complement(4170..4829)
/gene="Cm"
/product="chloramphenicol resistance protein"
CDS 5139..5247
/gene="GFP"
/product="green fluorescent protein"
5.4 Plasmid 5.4 Plasmid
source 1..2245
/organism="Escherichia coli"
/plasmid="Plasmid XYZ"
/strain="K12"
/mol_type="genomic DNA"
rep_origin 6
/direction=LEFT
/note="ori"
CDS join(complement(567..795),complement(21..349))
/gene="trbC"
/product="transfer protein C"
CDS 803..1344
/gene="traN"
/product="transfer protein N"
CDS 1559..1985
/gene="incA"
/product="incompatability protein A"
CDS join(2004..2195,3..20)
/gene="finP"
/product="fertility inhibition protein P"
5.5 Repeat element
source 1..1011
/organism="Homo sapiens"
/clone="pha281u/1DO"
/mol_type="genomic DNA"
repeat_region 80..401
/rpt_type=DISPERSED
/rpt_family="Alu-J"
5.6 Immunoglobulin heavy chain
source 1..321
/organism="Mus musculus "
/strain="BALB/c2
/cell_line="hybridoma 1A4"
/rearranged
/mol_type="mRNA"
CDS <1..>321
/codon_start=1
/gene="VFM1-DFL16.1-JH4"
/product="immunoglobulin heavy chain"
V_region 1..277
/gene="VFM1"
/product="immunoglobulin heavy chain variable region"
5.7 T-cell receptor
source 1..402
/organism="Homo sapiens"
/sex="male"
/cell_type="CD4+ T-lymphocyte"
/rearranged
/clone="TCR1A.12"
/mol_type="mRNA"
sig_peptide 1..54
/gene="TCR1A"
CDS 1..402
/gene="TCR1A"
/product="T-cell receptor alpha chain"
mat_peptide 55..399
/gene="TCR1A"
/product="T-cell receptor alpha chain"
V_region 55..327
/gene="TCR1A"
J_segment 328..393
/gene="TCR1A"
C_region 394..399
/gene="TCR1A"
5.8 Transfer RNA
source 1..2345
/organism="Yersinia sp."
/strain="IP134"
/mol_type="genomic DNA"
-35_signal 644..650
/gene="tRNA-Leu(UUR)"
tRNA 655..730
/gene="tRNA-Leu(UUR)"
/anticodon=(pos:678..680,aa:Leu)
/product="transfer RNA-Leu(UUR)"
6. Limitations of this feature table design
During the development of the feature table design numerous choices between
simplicity and representational power had to be made. In order to create a
design which was capable of representing the most common features of
biological significance, a certain degree of complexity in the syntax was
guaranteed. However, to limit that level of complexity, certain limitations of
the design syntax have been accepted.
7. Appendices
7.1 Appendix I EMBL, GenBank and DDBJ entries
7.1.1 EMBL Format
ID X64011; SV 1; linear; genomic DNA; STD; PRO; 756 BP.
XX
AC X64011; S78972;
XX
SV X64011.1
XX
DT 28-APR-1992 (Rel. 31, Created)
DT 30-JUN-1993 (Rel. 36, Last updated, Version 6)
XX
DE Listeria ivanovii sod gene for superoxide dismutase
XX
KW sod gene; superoxide dismutase.
XX
OS Listeria ivanovii
OC Bacteria; Firmicutes; Bacillus/Clostridium group;
OC Bacillus/Staphylococcus group; Listeria.
XX
RN [1]
RX MEDLINE; 92140371.
RA Haas A., Goebel W.;
RT "Cloning of a superoxide dismutase gene from Listeria ivanovii by
RT functional complementation in Escherichia coli and characterization of the
RT gene product.";
RL Mol. Gen. Genet. 231:313-322(1992).
XX
RN [2]
RP 1-756
RA Kreft J.;
RT ;
RL Submitted (21-APR-1992) to the EMBL/GenBank/DDBJ databases.
RL J. Kreft, Institut f. Mikrobiologie, Universitaet Wuerzburg, Biozentrum Am
RL Hubland, 8700 Wuerzburg, FRG
XX
FH Key Location/Qualifiers
FH
FT source 1..756
FT /db_xref="taxon:1638"
FT /organism="Listeria ivanovii"
FT /strain="ATCC 19119"
FT /mol_type="genomic DNA"
FT RBS 95..100
FT /gene="sod"
FT terminator 723..746
FT /gene="sod"
FT CDS 109..717
FT /db_xref="SWISS-PROT:P28763"
FT /transl_table=11
FT /gene="sod"
FT /EC_number="1.15.1.1"
FT /db_xref="GOA:P28763"
FT /db_xref="HSSP:P00448"
FT /db_xref="InterPro:IPR001189"
FT /db_xref="UniProtKB/Swiss-Prot:P28763"
FT /product="superoxide dismutase"
FT /protein_id="CAA45406.1"
FT /translation="MTYELPKLPYTYDALEPNFDKETMEIHYTKHHNIYVTKLNEAVSG
FT HAELASKPGEELVANLDSVPEEIRGAVRNHGGGHANHTLFWSSLSPNGGGAPTGNLKAA
FT IESEFGTFDEFKEKFNAAAAARFGSGWAWLVVNNGKLEIVSTANQDSPLSEGKTPVLGL
FT DVWEHAYYLKFQNRRPEYIDTFWNVINWDERNKRFDAAK"
XX
SQ Sequence 756 BP; 247 A; 136 C; 151 G; 222 T; 0 other;
cgttatttaa ggtgttacat agttctatgg aaatagggtc tatacctttc gccttacaat 60
gtaatttctt .......... 120
//
7.1.2 GenBank Format
LOCUS LISOD 756 bp DNA linear BCT 30-JUN-1993
DEFINITION Listeria ivanovii sod gene for superoxide dismutase.
ACCESSION X64011 S78972
VERSION X64011.1 GI:44010
KEYWORDS sod gene; superoxide dismutase.
SOURCE Listeria ivanovii
ORGANISM Listeria ivanovii
Bacteria; Firmicutes; Bacillales; Listeriaceae; Listeria.
REFERENCE 1 (bases 1 to 756)
AUTHORS Haas,A. and Goebel,W.
TITLE Cloning of a superoxide dismutase gene from Listeria ivanovii by
functional complementation in Escherichia coli and characterization
of the gene product
JOURNAL Mol. Gen. Genet. 231 (2), 313-322 (1992)
MEDLINE 92140371
REFERENCE 2 (bases 1 to 756)
AUTHORS Kreft,J.
TITLE Direct Submission
JOURNAL Submitted (21-APR-1992) J. Kreft, Institut f. Mikrobiologie,
Universitaet Wuerzburg, Biozentrum Am Hubland, 8700 Wuerzburg, FRG
FEATURES Location/Qualifiers
source 1..756
/organism="Listeria ivanovii"
/strain="ATCC 19119"
/db_xref="taxon:1638"
/mol_type="genomic DNA"
RBS 95..100
/gene="sod"
gene 95..746
/gene="sod"
CDS 109..717
/gene="sod"
/EC_number="1.15.1.1"
/codon_start=1
/transl_table=11
/product="superoxide dismutase"
/db_xref="GI:44011"
/db_xref="GOA:P28763"
/db_xref="InterPro:IPR001189"
/db_xref="UniProtKB/Swiss-Prot:P28763"
/protein_id="CAA45406.1"
/translation="MTYELPKLPYTYDALEPNFDKETMEIHYTKHHNIYVTKLNEAVS
GHAELASKPGEELVANLDSVPEEIRGAVRNHGGGHANHTLFWSSLSPNGGGAPTGNLK
AAIESEFGTFDEFKEKFNAAAAARFGSGWAWLVVNNGKLEIVSTANQDSPLSEGKTPV
LGLDVWEHAYYLKFQNRRPEYIDTFWNVINWDERNKRFDAAK"
terminator 723..746
/gene="sod"
ORIGIN
1 cgttatttaa ggtgttacat agttctatgg aaatagggtc tatacctttc gccttacaat
61 gtaatttctt ..........
//
7.1.3 DDBJ Format
LOCUS LISOD 756 bp DNA linear BCT 30-JUN-1993
DEFINITION Listeria ivanovii sod gene for superoxide dismutase.
ACCESSION X64011 S78972
VERSION X64011.1 GI:44010
KEYWORDS sod gene; superoxide dismutase.
SOURCE Listeria ivanovii
ORGANISM Listeria ivanovii
Bacteria; Firmicutes; Bacillales; Listeriaceae; Listeria.
REFERENCE 1 (bases 1 to 756)
AUTHORS Haas,A. and Goebel,W.
TITLE Cloning of a superoxide dismutase gene from Listeria ivanovii by
functional complementation in Escherichia coli and characterization
of the gene product
JOURNAL Mol. Gen. Genet. 231 (2), 313-322 (1992)
MEDLINE 92140371
REFERENCE 2 (bases 1 to 756)
AUTHORS Kreft,J.
TITLE Direct Submission
JOURNAL Submitted (21-APR-1992) J. Kreft, Institut f. Mikrobiologie,
Universitaet Wuerzburg, Biozentrum Am Hubland, 8700 Wuerzburg, FRG
FEATURES Location/Qualifiers
source 1..756
/organism="Listeria ivanovii"
/strain="ATCC 19119"
/db_xref="taxon:1638"
/mol_type="genomic DNA"
RBS 95..100
/gene="sod"
gene 95..746
/gene="sod"
CDS 109..717
/gene="sod"
/EC_number="1.15.1.1"
/codon_start=1
/transl_table=11
/product="superoxide dismutase"
/db_xref="GOA:P28763"
/db_xref="HSSP:P00448"
/db_xref="InterPro:IPR001189"
/db_xref="UniProtKB/Swiss-Prot:P28763"
/protein_id="CAA45406.1"
/db_xref="SWISS-PROT:P28763"
/translation="MTYELPKLPYTYDALEPNFDKETMEIHYTKHHNIYVTKLNEAVS
GHAELASKPGEELVANLDSVPEEIRGAVRNHGGGHANHTLFWSSLSPNGGGAPTGNLK
AAIESEFGTFDEFKEKFNAAAAARFGSGWAWLVVNNGKLEIVSTANQDSPLSEGKTPV
LGLDVWEHAYYLKFQNRRPEYIDTFWNVINWDERNKRFDAAK"
terminator 723..746
/gene="sod"
BASE COUNT 247 a 136 c 151 g 222 t
ORIGIN
1 cgttatttaa ggtgttacat agttctatgg aaatagggtc tatacctttc gccttacaat
61 gtaatttctt ..........
//
7.2 Appendix II Feature table: Backus-Naur form
This information will not be presented in future editions of this document.
Feature table is a mandatory part of an entry. Full entry syntax is
specified elsewhere.
feature_table ::= <feature_table_header><feature_table_body> feature_table_header ::=
FH Key Location/Qualifiers |
FEATURES Location/Qualifiers
feature_table_body ::= <feature> | <feature_table_body><feature>
At least one feature is required.
feature ::= <feature_key><feature_details>
Key is required, location required, qualifier list optional
feature_key ::= <symbol> | -
feature_details ::= <location><qualifier_list> | <location>
There exists a table of legal keys.
location ::= <absolute_location> | <feature_name> |
<functional_operator>(<location_list>)
absolute_location ::= <local_location> | <path> : <local_location>
path ::= <database> :: <primary_accession> | <primary_accession>
feature_name ::= <path>:<feature_label> | <feature_label>
feature_label :== <symbol>
local_location ::= <base_position> | <between_position> | <base_range>
location_list ::= <location> | <location_list>,<location>
functional_operator ::= <symbol>
base_position ::= <integer> | <low_base_bound> | <high_base_bound> |
<two_base_bound>
low_base_bound ::= > <integer>
high_base_bound ::= < <integer>
two_base_bound ::= <base_position>.<base_position>
between_position ::= <base_position>^<base_position>
base_range ::= <base_position>..<base_position>
database ::= <symbol>
primary_accession ::= <symbol>
sequence_character ::= a | b | c | d | g | h | k | m | n | r | s | t | u | v | w | y
qualifier_list ::= <qualifier> | <qualifier_list><qualifier>
qualifier ::= /<qualifier_name> | /<qualifier_name>=<value>
qualifier_name ::= <symbol>
value ::= <simple_value> | (<value_list>) | (<tagged_value_list>)
simple_value ::= <integer> | <location> | <reference_number> | "<text_string>" |
<symbol>
value_list ::= <value> | <value_list>,<value>
tagged_value_list ::= <tagged_value> | <tagged_value_list>,<tagged_value>
tagged_value ::= <tag>:<value>
tag ::= <symbol>
reference_number ::= [ <unsigned_integer> ]
symbol ::= <letter> | <symbol><symbol_character> | <symbol_character><symbol>
text_string ::= <string_character>| <text_string><string_character>
unsigned_integer ::= <digit> | <unsigned_integer><digit>
integer ::= <unsigned_integer> | - <unsigned_integer>
string_character ::= <letter> | <digit> | <punctuation> | ""
symbol_character ::= <up_case_letter> | <low_case_letter> |<digit> | _ | - | ' | *
letter ::= <up_case_letter> | <low_case_letter>
up_case_letter ::= A | B| ... | Z
low_case_letter ::= a | b | ... | z
digit ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
punctuation ::= <space> | ! | # | $ | % | & | ' | ( | ) | * | + | , |
- | . | / | : | ; | < | = | > | ? | @ | [ | \ | ] | ^ | _ | ` | { |
<bar> | } | ~
bar ::= |
space ::= ascii 32
7.3 Appendix III: Feature keys reference
The following has been organized according to the following format:
Feature Key the feature key name
Definition the definition of the key
Mandatory qualifiers qualifiers required with the key; if there are no
mandatory qualifiers, this field is omitted.
Optional qualifiers optional qualifiers associated with the key
Organism scope valid organisms for the key; if the scope is any
organism, this field is omitted.
Molecule scope valid molecule types; if the scope is any molecule
type, this field is omitted.
References citations of published reports, usually supporting the
feature consensus sequence
Comment comments and clarifications
Abbreviations:
accnum an entry primary accession number
<amino_acid> abbreviation for amino acid
<base_range> location descriptor for a simple range of bases
<bool> Boolean truth value. Valid values are yes and no
feature_label the feature label (follows naming conventions for all
feature table components)
<integer> unsigned integer value
<location> general feature location descriptor
<modified_base> abbreviation for modified nucleotide base
[number] integer representing number of citation in entry's
reference list
<repeat_type> value indicating the organization of a repeated
sequence.
"text" any text or character string. Since the string is
delimited by double quotes, double quotes may only
appear as part of the string if they appear in pairs.
For example, the sentence:
The feature label "ops-tata" is used with the
"promotor" feature key
would be formatted thus:
"The feature label""ops-tata" " is used with the
" "promoter" " feature key"
Feature Key attenuator
Definition 1) region of DNA at which regulation of termination of
transcription occurs, which controls the expression
of some bacterial operons;
2) sequence segment located between the promoter and the
first structural gene that causes partial termination
of transcription
Optional qualifiers /allele="text"
/citation=[number]
/db_xref="<database>:<identifier>"
/experiment="text"
/gene="text"
/gene_synonym="text"
/inference="TYPE[ (same species)][:EVIDENCE_BASIS]"
/label=feature_label
/locus_tag="text" (single token)
/map="text"
/note="text"
/old_locus_tag="text" (single token)
/operon="text"
/phenotype="text"
Organism scope prokaryotes
Molecule scope DNA
Feature Key C_region
Definition constant region of immunoglobulin light and heavy
chains, and T-cell receptor alpha, beta, and gamma
chains; includes one or more exons depending on the
particular chain
Optional qualifiers /allele="text"
/citation=[number]
/db_xref="<database>:<identifier>"
/experiment="text"
/gene="text"
/gene_synonym="text"
/inference="TYPE[ (same species)][:EVIDENCE_BASIS]"
/label=feature_label
/locus_tag="text" (single token)
/map="text"
/note="text"
/old_locus_tag="text" (single token)
/product="text"
/pseudo
/standard_name="text"
Parent Key CDS
Organism scope eukaryotes
Feature Key CAAT_signal
Definition CAAT box; part of a conserved sequence located about 75
bp up-stream of the start point of eukaryotic
transcription units which may be involved in RNA
polymerase binding; consensus=GG(C or T)CAATCT [1,2].
Optional qualifiers /allele="text"
/citation=[number]
/db_xref="<database>:<identifier>"
/experiment="text"
/gene="text"
/gene_synonym="text"
/inference="TYPE[ (same species)][:EVIDENCE_BASIS]"
/label=feature_label
/locus_tag="text" (single token)
/map="text"
/note="text"
/old_locus_tag="text" (single token)
Organism scope eukaryotes and eukaryotic viruses
Molecule scope DNA
References [1] Efstratiadis, A. et al. Cell 21, 653-668 (1980)
[2] Nevins, J.R. "The pathway of eukaryotic mRNA formation"
Ann Rev Biochem 52, 441-466 (1983)
Feature Key CDS
Definition coding sequence; sequence of nucleotides that
corresponds with the sequence of amino acids in a
protein (location includes stop codon);
feature includes amino acid conceptual translation.
Optional qualifiers /allele="text"
/citation=[number]
/codon=(seq:"codon-sequence",aa:<amino_acid>)
/codon_start=<1 or 2 or 3>
/db_xref="<database>:<identifier>"
/EC_number="text"
/exception="text"
/experiment="text"
/function="text"
/gene="text"
/gene_synonym="text"
/inference="TYPE[ (same species)][:EVIDENCE_BASIS]"
/label=feature_label
/locus_tag="text" (single token)
/map="text"
/note="text"
/number=unquoted text (single token)
/old_locus_tag="text" (single token)
/operon="text"
/product="text"
/protein_id="<identifier>"
/pseudo
/ribosomal_slippage
/standard_name="text"
/translation="text"
/transl_except=(pos:<base_range>,aa:<amino_acid>)
/transl_table =<integer>
/trans_splicing
Comment /codon_start has valid value of 1 or 2 or 3, indicating
the offset at which the first complete codon of a coding
feature can be found, relative to the first base of
that feature;
/transl_table defines the genetic code table used if
other than the universal genetic code table;
genetic code exceptions outside the range of the specified
tables are reported in /codon or /transl_except qualifiers
/protein_id consists of a stable ID portion (3+5 format
with 3 position letters and 5 numbers) plus a version
number after the decimal point; when the protein
sequence encoded by the CDS changes, only the version
number of the /protein_id value is incremented; the
stable part of the /protein_id remains unchanged and as
a result will permanently be associated with a given
protein;
Feature Key conflict
Definition independent determinations of the "same" sequence differ
at this site or region;
Mandatory qualifiers /citation=[number]
Or
/compare=[accession-number.sequence-version]
Optional qualifiers /allele="text"
/db_xref="<database>:<identifier>"
/experiment="text"
/gene="text"
/gene_synonym="text"
/inference="TYPE[ (same species)][:EVIDENCE_BASIS]"
/label=feature_label
/locus_tag="text" (single token)
/map="text"
/note="text"
/old_locus_tag="text" (single token)
/replace="text"
Comment use /replace="" to annotate deletion, e.g.
conflict 4..5
/replace=""
Feature Key D-loop
Definition displacement loop; a region within mitochondrial DNA in
which a short stretch of RNA is paired with one strand
of DNA, displacing the original partner DNA strand in
this region; also used to describe the displacement of a
region of one strand of duplex DNA by a single stranded
invader in the reaction catalyzed by RecA protein
Optional qualifiers /allele="text"
/citation=[number]
/db_xref="<database>:<identifier>"
/experiment="text"
/gene="text"
/gene_synonym="text"
/inference="TYPE[ (same species)][:EVIDENCE_BASIS]"
/label=feature_label
/locus_tag="text" (single token)
/map="text"
/note="text"
/old_locus_tag="text" (single token)
Molecule scope DNA
Feature Key D_segment
Definition Diversity segment of immunoglobulin heavy chain, and
T-cell receptor beta chain;
Optional qualifiers /allele="text"
/citation=[number]
/db_xref="<database>:<identifier>"
/experiment="text"
/gene="text"
/gene_synonym="text"
/inference="TYPE[ (same species)][:EVIDENCE_BASIS]"
/label=feature_label
/locus_tag="text" (single token)
/map="text"
/note="text"
/old_locus_tag="text" (single token)
/product="text"
/pseudo
/standard_name="text"
Parent Key CDS
Organism scope eukaryotes
Feature Key enhancer
Definition a cis-acting sequence that increases the utilization of
(some) eukaryotic promoters, and can function in either
orientation and in any location (upstream or downstream)
relative to the promoter;
Optional qualifiers /allele="text"
/bound_moiety="text"
/citation=[number]
/db_xref="<database>:<identifier>"
/experiment="text"
/label=feature_label
/gene="text"
/gene_synonym="text"
/inference="TYPE[ (same species)][:EVIDENCE_BASIS]"
/locus_tag="text" (single token)
/map="text"
/note="text"
/old_locus_tag="text" (single token)
/standard_name="text"
Organism scope eukaryotes and eukaryotic viruses
Feature Key exon
Definition region of genome that codes for portion of spliced mRNA,
rRNA and tRNA; may contain 5'UTR, all CDSs and 3' UTR;
Optional qualifiers /allele="text"
/citation=[number]
/db_xref="<database>:<identifier>"
/EC_number="text"
/experiment="text"
/function="text"
/gene="text"
/gene_synonym="text"
/inference="TYPE[ (same species)][:EVIDENCE_BASIS]"
/label=feature_label
/locus_tag="text" (single token)
/map="text"
/note="text"
/number=unquoted text (single token)
/old_locus_tag="text" (single token)
/product="text"
/pseudo
/standard_name="text"
Feature Key gap
Definition gap in the sequence
Mandatory qualifiers /estimated_length=unknown or <integer>
Optional qualifiers /experiment="text"
/inference="TYPE[ (same species)][:EVIDENCE_BASIS]"
/map="text"
/note="text"
Comment the location span of the gap feature for an unknown
gap is 100 bp, with the 100 bp indicated as 100 "n"'s in
the sequence. Where estimated length is indicated by
an integer, this is indicated by the same number of
"n"'s in the sequence.
No upper or lower limit is set on the size of the gap.
Feature Key GC_signal
Definition GC box; a conserved GC-rich region located upstream of
the start point of eukaryotic transcription units which
may occur in multiple copies or in either orientation;
consensus=GGGCGG;
Optional qualifiers /allele="text"
/citation=[number]
/db_xref="<database>:<identifier>"
/experiment="text"
/gene="text"
/gene_synonym="text"
/inference="TYPE[ (same species)][:EVIDENCE_BASIS]"
/label=feature_label
/locus_tag="text" (single token)
/map="text"
/note="text"
/old_locus_tag="text" (single token)
Organism scope eukaryotes and eukaryotic viruses
Feature Key gene
Definition region of biological interest identified as a gene
and for which a name has been assigned;
Optional qualifiers /allele="text"
/citation=[number]
/db_xref="<database>:<identifier>"
/experiment="text"
/function="text"
/gene="text"
/gene_synonym="text"
/inference="TYPE[ (same species)][:EVIDENCE_BASIS]"
/label=feature_label
/locus_tag="text" (single token)
/map="text"
/note="text"
/old_locus_tag="text" (single token)
/operon="text"
/product="text"
/pseudo
/phenotype="text"
/standard_name="text"
/trans_splicing
Comment the gene feature describes the interval of DNA that
corresponds to a genetic trait or phenotype; the feature is,
by definition, not strictly bound to it's positions at the
ends; it is meant to represent a region where the gene is
located.
Feature Key iDNA
Definition intervening DNA; DNA which is eliminated through any of
several kinds of recombination;
Optional qualifiers /allele="text"
/citation=[number]
/db_xref="<database>:<identifier>"
/experiment="text"
/function="text"
/gene="text"
/gene_synonym="text"
/inference="TYPE[ (same species)][:EVIDENCE_BASIS]"
/label=feature_label
/locus_tag="text" (single token)
/map="text"
/note="text"
/number=unquoted text (single token)
/old_locus_tag="text" (single token)
/standard_name="text"
Molecule scope DNA
Comment e.g., in the somatic processing of immunoglobulin genes.
Feature Key intron
Definition a segment of DNA that is transcribed, but removed from
within the transcript by splicing together the sequences
(exons) on either side of it;
Optional qualifiers /allele="text"
/citation=[number]
/db_xref="<database>:<identifier>"
/experiment="text"
/function="text"
/gene="text"
/gene_synonym="text"
/inference="TYPE[ (same species)][:EVIDENCE_BASIS]"
/label=feature_label
/locus_tag="text" (single token)
/map="text"
/note="text"
/number=unquoted text (single token)
/old_locus_tag="text" (single token)
/pseudo
/standard_name="text"
Feature Key J_segment
Definition joining segment of immunoglobulin light and heavy
chains, and T-cell receptor alpha, beta, and gamma
chains;
Optional qualifiers /allele="text"
/citation=[number]
/db_xref="<database>:<identifier>"
/experiment="text"
/gene="text"
/gene_synonym="text"
/inference="TYPE[ (same species)][:EVIDENCE_BASIS]"
/locus_tag="text" (single token)
/map="text"
/note="text"
/old_locus_tag="text" (single token)
/product="text"
/pseudo
/standard_name="text"
Parent Key CDS
Organism scope eukaryotes
Feature Key LTR
Definition long terminal repeat, a sequence directly repeated at
both ends of a defined sequence, of the sort typically
found in retroviruses;
Optional qualifiers /allele="text"
/citation=[number]
/db_xref="<database>:<identifier>"
/experiment="text"
/function="text"
/gene="text"
/gene_synonym="text"
/inference="TYPE[ (same species)][:EVIDENCE_BASIS]"
/label=feature_label
/locus_tag="text" (single token)
/map="text"
/note="text"
/old_locus_tag="text" (single token)
/standard_name="text"
Feature Key mat_peptide
Definition mature peptide or protein coding sequence; coding
sequence for the mature or final peptide or protein
product following post-translational modification; the
location does not include the stop codon (unlike the
corresponding CDS);
Optional qualifiers /allele="text"
/citation=[number]
/db_xref="<database>:<identifier>"
/EC_number="text"
/experiment="text"
/function="text"
/gene="text"
/gene_synonym="text"
/inference="TYPE[ (same species)][:EVIDENCE_BASIS]"
/label=feature_label
/locus_tag="text" (single token)
/map="text"
/note="text"
/old_locus_tag="text" (single token)
/product="text"
/pseudo
/standard_name="text"
Feature Key misc_binding
Definition site in nucleic acid which covalently or non-covalently
binds another moiety that cannot be described by any
other binding key (primer_bind or protein_bind);
Mandatory qualifiers /bound_moiety="text"
Optional qualifiers /allele="text"
/citation=[number]
/db_xref="<database>:<identifier>"
/experiment="text"
/function="text"
/gene="text"
/gene_synonym="text"
/inference="TYPE[ (same species)][:EVIDENCE_BASIS]"
/label=feature_label
/locus_tag="text" (single token)
/map="text"
/note="text"
/old_locus_tag="text" (single token)
Comment note that the key RBS is used for ribosome binding sites
Feature Key misc_difference
Definition feature sequence is different from that presented
in the entry and cannot be described by any other
Difference key (conflict, unsure, old_sequence,
variation, or modified_base);
Optional qualifiers /allele="text"
/citation=[number]
/clone="text"
/compare=[accession-number.sequence-version]
/db_xref="<database>:<identifier>"
/experiment="text"
/gene="text"
/gene_synonym="text"
/inference="TYPE[ (same species)][:EVIDENCE_BASIS]"
/label=feature_label
/locus_tag="text" (single token)
/map="text"
/note="text"
/old_locus_tag="text" (single token)
/phenotype="text"
/replace="text"
/standard_name="text"
Comment the misc_difference feature key should be used to
describe variability that arises as a result of
genetic manipulation (e.g. site directed mutagenesis);
use /replace="" to annotate deletion, e.g.
misc_difference 412..433
/replace=""
Feature Key misc_feature
Definition region of biological interest which cannot be described
by any other feature key; a new or rare feature;
Optional qualifiers /allele="text"
/citation=[number]
/db_xref="<database>:<identifier>"
/experiment="text"
/function="text"
/gene="text"
/gene_synonym="text"
/inference="TYPE[ (same species)][:EVIDENCE_BASIS]"
/label=feature_label
/locus_tag="text" (single token)
/map="text"
/note="text"
/number=unquoted text (single token)
/old_locus_tag="text" (single token)
/phenotype="text"
/product="text"
/pseudo
/standard_name="text"
Comment this key should not be used when the need is merely to
mark a region in order to comment on it or to use it in
another feature's location
Feature Key misc_recomb
Definition site of any generalized, site-specific or replicative
recombination event where there is a breakage and
reunion of duplex DNA that cannot be described by other
recombination keys or qualifiers of source key
(/proviral);
Optional qualifiers /allele="text"
/citation=[number]
/db_xref="<database>:<identifier>"
/experiment="text"
/gene="text"
/gene_synonym="text"
/inference="TYPE[ (same species)][:EVIDENCE_BASIS]"
/label=feature_label
/locus_tag="text" (single token)
/map="text"
/note="text"
/old_locus_tag="text" (single token)
/standard_name="text"
Molecule scope DNA
Feature Key misc_RNA
Definition any transcript or RNA product that cannot be defined by
other RNA keys (prim_transcript, precursor_RNA, mRNA,
5'UTR, 3'UTR, exon, CDS, sig_peptide, transit_peptide,
mat_peptide, intron, polyA_site, ncRNA, rRNA and tRNA);
Optional qualifiers /allele="text"
/citation=[number]
/db_xref="<database>:<identifier>"
/experiment="text"
/function="text"
/gene="text"
/gene_synonym="text"
/inference="TYPE[ (same species)][:EVIDENCE_BASIS]"
/label=feature_label
/locus_tag="text" (single token)
/map="text"
/note="text"
/old_locus_tag="text" (single token)
/operon="text"
/product="text"
/pseudo
/standard_name="text"
/trans_splicing
Feature Key misc_signal
Definition any region containing a signal controlling or altering
gene function or expression that cannot be described by
other signal keys (promoter, CAAT_signal, TATA_signal,
-35_signal, -10_signal, GC_signal, RBS, polyA_signal,
enhancer, attenuator, terminator, and rep_origin).
Optional qualifiers /allele="text"
/citation=[number]
/db_xref="<database>:<identifier>"
/experiment="text"
/function="text"
/gene="text"
/gene_synonym="text"
/inference="TYPE[ (same species)][:EVIDENCE_BASIS]"
/label=feature_label
/locus_tag="text" (single token)
/map="text"
/note="text"
/old_locus_tag="text" (single token)
/operon="text"
/phenotype="text"
/standard_name="text"
Feature Key misc_structure
Definition any secondary or tertiary nucleotide structure or
conformation that cannot be described by other Structure
keys (stem_loop and D-loop);
Optional qualifiers /allele="text"
/citation=[number]
/db_xref="<database>:<identifier>"
/experiment="text"
/function="text"
/gene="text"
/gene_synonym="text"
/inference="TYPE[ (same species)][:EVIDENCE_BASIS]"
/label=feature_label
/locus_tag="text" (single token)
/map="text"
/note="text"
/old_locus_tag="text" (single token)
/standard_name="text"
Feature Key modified_base
Definition the indicated nucleotide is a modified nucleotide and
should be substituted for by the indicated molecule
(given in the mod_base qualifier value)
Mandatory qualifiers /mod_base=<modified_base>
Optional qualifiers /allele="text"
/citation=[number]
/db_xref="<database>:<identifier>"
/experiment="text"
/frequency="text"
/gene="text"
/gene_synonym="text"
/inference="TYPE[ (same species)][:EVIDENCE_BASIS]"
/label=feature_label
/locus_tag="text" (single token)
/map="text"
/note="text"
/old_locus_tag="text" (single token)
Comment value is limited to the restricted vocabulary for
modified base abbreviations;
Feature Key mRNA
Definition messenger RNA; includes 5'untranslated region (5'UTR),
coding sequences (CDS, exon) and 3'untranslated region
(3'UTR);
Optional qualifiers /allele="text"
/citation=[number]
/db_xref="<database>:<identifier>"
/experiment="text"
/function="text"
/gene="text"
/gene_synonym="text"
/inference="TYPE[ (same species)][:EVIDENCE_BASIS]"
/label=feature_label
/locus_tag="text" (single token)
/map="text"
/note="text"
/old_locus_tag="text" (single token)
/operon="text"
/product="text"
/pseudo
/standard_name="text"
/trans_splicing
Feature Key ncRNA
Definition a non-protein-coding gene, other than ribosomal RNA and
transfer RNA, the functional molecule of which is the RNA
transcript;
Mandatory qualifiers /ncRNA_class="TYPE"
Optional qualifiers /allele="text"
/citation=[number]
/db_xref="<database>:<identifier>"
/experiment="text"
/function="text"
/gene="text"
/gene_synonym="text"
/inference="TYPE[ (same species)][:EVIDENCE_BASIS]"
/label=feature_label
/locus_tag="text" (single token)
/map="text"
/note="text"
/old_locus_tag="text" (single token)
/product="text"
/pseudo
/standard_name="text"
/trans_splicing
/operon="text"
Example /ncRNA_class="miRNA"
/ncRNA_class="siRNA"
/ncRNA_class="scRNA"
Comment the ncRNA feature is not used for ribosomal and transfer
RNA annotation, for which the rRNA and tRNA feature keys
should be used, respectively;
Feature Key N_region
Definition extra nucleotides inserted between rearranged
immunoglobulin segments.
Optional qualifiers /allele="text"
/citation=[number]
/db_xref=":"
/experiment="text"
/gene="text"
/gene_synonym="text"
/inference="TYPE[ (same species)][:EVIDENCE_BASIS]"
/label=feature_label
/locus_tag="text" (single token)
/map="text"
/note="text"
/old_locus_tag="text" (single token)
/product="text"
/pseudo
/standard_name="text"
Parent Key CDS
Organism scope eukaryotes
Feature Key old_sequence
Definition the presented sequence revises a previous version of the
sequence at this location;
Mandatory qualifiers /citation=[number]
Or
/compare=[accession-number.sequence-version]
Optional qualifiers /allele="text"
/db_xref="<database>:<identifier>"
/experiment="text"
/gene="text"
/gene_synonym="text"
/inference="TYPE[ (same species)][:EVIDENCE_BASIS]"
/label=feature_label
/locus_tag="text" (single token)
/map="text"
/note="text"
/old_locus_tag="text" (single token)
/replace="text"
Comment /replace="" is used to annotate deletion, e.g.
old_sequence 12..15
/replace=""
NOTE: This feature key is not valid in entries/records
created from 15-Oct-2007.
Feature Key operon
Definition region containing polycistronic transcript
containing genes that encode enzymes that are
in the same metabolic pathway and regulatory sequences
Mandatory qualifiers /operon="text"
Optional qualifiers /allele="text"
/citation=[number]
/db_xref="<database>:<identifier>"
/experiment="text"
/function="text"
/inference="TYPE[ (same species)][:EVIDENCE_BASIS]"
/label=feature_label
/map="text"
/note="text"
/phenotype="text"
/pseudo
/standard_name="text"
Feature Key oriT
Definition origin of transfer; region of a DNA molecule where transfer is
initiated during the process of conjugation or mobilization
Optional qualifiers /allele="text"
/bound_moiety="text"
/citation=[number]
/db_xref="<database>:<identifier>"
/direction=value
/experiment="text"
/gene="text"
/gene_synonym="text"
/inference="TYPE[ (same species)][:EVIDENCE_BASIS]"
/label=feature_label
/locus_tag="text" (single token)
/map="text"
/note="text"
/old_locus_tag="text" (single token)
/rpt_family="text"
/rpt_type=<repeat_type>
/rpt_unit_range=<base_range>
/rpt_unit_seq="text"
/standard_name="text"
Molecule Scope DNA
Comment rep_origin should be used for origins of replication;
/direction has legal values RIGHT, LEFT and BOTH, however only
RIGHT and LEFT are valid when used in conjunction with the oriT
feature;
origins of transfer can be present in the chromosome;
plasmids can contain multiple origins of transfer
Feature Key polyA_signal
Definition recognition region necessary for endonuclease cleavage
of an RNA transcript that is followed by polyadenylation;
consensus=AATAAA [1];
Optional qualifiers /allele="text"
/citation=[number]
/db_xref="<database>:<identifier>"
/experiment="text"
/gene="text"
/gene_synonym="text"
/inference="TYPE[ (same species)][:EVIDENCE_BASIS]"
/label=feature_label
/locus_tag="text" (single token)
/map="text"
/note="text"
/old_locus_tag="text" (single token)
Organism scope eukaryotes and eukaryotic viruses
References [1] Proudfoot, N. and Brownlee, G.G. Nature 263, 211-214
(1976)
Feature Key polyA_site
Definition site on an RNA transcript to which will be added adenine
residues by post-transcriptional polyadenylation;
Optional qualifiers /allele="text"
/citation=[number]
/db_xref="<database>:<identifier>"
/experiment="text"
/gene="text"
/gene_synonym="text"
/inference="TYPE[ (same species)][:EVIDENCE_BASIS]"
/label=feature_label
/locus_tag="text" (single token)
/map="text"
/note="text"
/old_locus_tag="text" (single token)
Organism scope eukaryotes and eukaryotic viruses
Feature Key precursor_RNA
Definition any RNA species that is not yet the mature RNA product;
may include 5' untranslated region (5'UTR), coding
sequences (CDS, exon), intervening sequences (intron)
and 3' untranslated region (3'UTR);
Optional qualifiers /allele="text"
/citation=[number]
/db_xref="<database>:<identifier>"
/experiment="text"
/function="text"
/gene="text"
/gene_synonym="text"
/inference="TYPE[ (same species)][:EVIDENCE_BASIS]"
/label=feature_label
/locus_tag="text" (single token)
/map="text"
/note="text"
/old_locus_tag="text" (single token)
/operon="text"
/product="text"
/standard_name="text"
/trans_splicing
Comment used for RNA which may be the result of
post-transcriptional processing; if the RNA in question
is known not to have been processed, use the
prim_transcript key.
Feature Key prim_transcript
Definition primary (initial, unprocessed) transcript; includes 5'
untranslated region (5'UTR), coding sequences
(CDS, exon), intervening sequences (intron) and 3'
untranslated region (3'UTR);
Optional qualifiers /allele="text"
/citation=[number]
/db_xref="<database>:<identifier>"
/experiment="text"
/function="text"
/gene="text"
/gene_synonym="text"
/inference="TYPE[ (same species)][:EVIDENCE_BASIS]"
/label=feature_label
/locus_tag="text" (single token)
/map="text"
/note="text"
/old_locus_tag="text" (single token)
/operon="text"
/standard_name="text"
Feature Key primer_bind
Definition non-covalent primer binding site for initiation of
replication, transcription, or reverse transcription;
includes site(s) for synthetic e.g., PCR primer elements;
Optional qualifiers /allele="text"
/citation=[number]
/db_xref="<database>:<identifier>"
/experiment="text"
/gene="text"
/gene_synonym="text"
/label=feature_label
/inference="TYPE[ (same species)][:EVIDENCE_BASIS]"
/locus_tag="text" (single token)
/map="text"
/note="text"
/old_locus_tag="text" (single token)
/standard_name="text"
/PCR_conditions="text"
Comment used to annotate the site on a given sequence to which a primer
molecule binds - not intended to represent the sequence of the
primer molecule itself; PCR components and reaction times may
be stored under the "/PCR_conditions" qualifier;
since PCR reactions most often involve pairs of primers,
a single primer_bind key may use the order() operator
with two locations, or a pair of primer_bind keys may be
used.
Feature Key promoter
Definition region on a DNA molecule involved in RNA polymerase
binding to initiate transcription;
Optional qualifiers /allele="text"
/bound_moiety="text"
/citation=[number]
/db_xref="<database>:<identifier>"
/experiment="text"
/function="text"
/gene="text"
/gene_synonym="text"
/inference="TYPE[ (same species)][:EVIDENCE_BASIS]"
/label=feature_label
/locus_tag="text" (single token)
/map="text"
/note="text"
/old_locus_tag="text" (single token)
/operon="text"
/phenotype="text"
/pseudo
/standard_name="text"
Molecule scope DNA
Feature Key protein_bind
Definition non-covalent protein binding site on nucleic acid;
Mandatory qualifiers /bound_moiety="text"
Optional qualifiers /allele="text"
/citation=[number]
/db_xref="<database>:<identifier>"
/experiment="text"
/function="text"
/gene="text"
/gene_synonym="text"
/inference="TYPE[ (same species)][:EVIDENCE_BASIS]"
/label=feature_label
/locus_tag="text" (single token)
/map="text"
/note="text"
/old_locus_tag="text" (single token)
/operon="text"
/standard_name="text"
Comment note that RBS is used for ribosome binding sites.
Feature Key RBS
Definition ribosome binding site;
Optional qualifiers /allele="text"
/citation=[number]
/db_xref="<database>:<identifier>"
/experiment="text"
/gene="text"
/gene_synonym="text"
/inference="TYPE[ (same species)][:EVIDENCE_BASIS]"
/label=feature_label
/locus_tag="text" (single token)
/map="text"
/note="text"
/old_locus_tag="text" (single token)
/standard_name="text"
References [1] Shine, J. and Dalgarno, L. Proc Natl Acad Sci USA
71, 1342-1346 (1974)
[2] Gold, L. et al. Ann Rev Microb 35, 365-403 (1981)
Comment in prokaryotes, known as the Shine-Dalgarno sequence: is
located 5 to 9 bases upstream of the initiation codon;
consensus GGAGGT [1,2].
Feature Key repeat_region
Definition region of genome containing repeating units;
Optional qualifiers /allele="text"
/citation=[number]
/db_xref="<database>:<identifier>"
/experiment="text"
/function="text"
/gene="text"
/gene_synonym="text"
/inference="TYPE[ (same species)][:EVIDENCE_BASIS]"
/label=feature_label
/locus_tag="text" (single token)
/map="text"
/mobile_element=:"<mobile_element_type>
[:<mobile_element_name>]"
/note="text"
/old_locus_tag="text" (single token)
/rpt_family="text"
/rpt_type=<repeat_type>
/rpt_unit_range=<base_range>
/rpt_unit_seq="text"
/satellite="<satellite_type>[:<class>][ <identifier>]"
/standard_name="text"
Comment mobile_element qualifier replaced /transposon and
/insertion_seq qualifiers in December 2006
Feature Key rep_origin
Definition origin of replication; starting site for duplication of
nucleic acid to give two identical copies;
Optional Qualifiers /allele="text"
/citation=[number]
/db_xref="<database>:<identifier>"
/direction=value
/experiment="text"
/gene="text"
/gene_synonym="text"
/label=feature_label
/inference="TYPE[ (same species)][:EVIDENCE_BASIS]"
/locus_tag="text" (single token)
/map="text"
/note="text"
/old_locus_tag="text" (single token)
/standard_name="text"
Comment /direction has valid values: RIGHT, LEFT, or BOTH.
Feature Key rRNA
Definition mature ribosomal RNA; RNA component of the
ribonucleoprotein particle (ribosome) which assembles
amino acids into proteins.
Optional qualifiers /allele="text"
/citation=[number]
/db_xref="<database>:<identifier>"
/experiment="text"
/function="text"
/gene="text"
/gene_synonym="text"
/inference="TYPE[ (same species)][:EVIDENCE_BASIS]"
/label=feature_label
/locus_tag="text" (single token)
/map="text"
/note="text"
/old_locus_tag="text" (single token)
/operon="text"
/product="text"
/pseudo
/standard_name="text"
Comment rRNA sizes should be annotated with the /product
Qualifier.
Feature Key S_region
Definition switch region of immunoglobulin heavy chains;
involved in the rearrangement of heavy chain DNA leading
to the expression of a different immunoglobulin class
from the same B-cell;
Optional qualifiers /allele="text"
/citation=[number]
/db_xref="<database>:<identifier>"
/gene="text"
/gene_synonym="text"
/experiment="text"
/label=feature_label
/inference="TYPE[ (same species)][:EVIDENCE_BASIS]"
/locus_tag="text" (single token)
/map="text"
/note="text"
/old_locus_tag="text" (single token)
/product="text"
/pseudo
/standard_name="text"
Parent Key misc_signal
Organism scope eukaryotes
Feature Key sig_peptide
Definition signal peptide coding sequence; coding sequence for an
N-terminal domain of a secreted protein; this domain is
involved in attaching nascent polypeptide to the
membrane leader sequence;
Optional qualifiers /allele="text"
/citation=[number]
/db_xref="<database>:<identifier>"
/experiment="text"
/function="text"
/gene="text"
/gene_synonym="text"
/inference="TYPE[ (same species)][:EVIDENCE_BASIS]"
/label=feature_label
/locus_tag="text" (single token)
/map="text"
/note="text"
/old_locus_tag="text" (single token)
/product="text"
/pseudo
/standard_name="text"
Feature Key source
Definition identifies the biological source of the specified span of
the sequence; this key is mandatory; more than one source
key per sequence is allowed; every entry/record will have, as a
minimum, either a single source key spanning the entire
sequence or multiple source keys, which together, span the
entire sequence.
Mandatory qualifiers /organism="text"
/mol_type="genomic DNA", "genomic RNA", "mRNA", "tRNA",
"rRNA", "other RNA", "other DNA", "transcribed
RNA", "viral cRNA", "unassigned DNA",
"unassigned RNA"
Optional qualifiers /bio_material="[<institution-code>:[<collection-code>:]]<material_id>"
/cell_line="text"
/cell_type="text"
/chromosome="text"
/citation=[number]
/clone="text"
/clone_lib="text"
/collected_by="text"
/collection_date="text"
/country="<country_value>[:<region>][, <locality>]"
/cultivar="text"
/culture_collection="<institution-code>:[<collection-code>:]<culture_id>"
/db_xref="<database>:<identifier>"
/dev_stage="text"
/ecotype="text"
/environmental_sample
/focus
/frequency="text"
/germline
/haplotype="text"
/host="text"
/identified_by="text"
/isolate="text"
/isolation_source="text"
/label=feature_label
/lab_host="text"
/lat_lon="text"
/macronuclear
/map="text"
/mating_type="text"
/note="text"
/organelle=<organelle_value>
/PCR_primers="[fwd_name: XXX, ]fwd_seq: xxxxx,
[rev_name: YYY, ]rev_seq: yyyyy"
/plasmid="text"
/pop_variant="text"
/proviral
/rearranged
/segment="text"
/serotype="text"
/serovar="text"
/sex="text"
/specimen_voucher="[<institution-code>:[<collection-code>:]]<specimen_id>"
/strain="text"
/sub_clone="text"
/sub_species="text"
/sub_strain="text"
/tissue_lib="text"
/tissue_type="text"
/transgenic
/variety="text"
Molecule scope any
Comment transgenic sequences must have at least two source feature
keys; in a transgenic sequence the source feature key
describing the organism that is the recipient of the DNA
must span the entire sequence;
see Appendix IV /organelle for a list of <organelle_value>
Feature Key stem_loop
Definition hairpin; a double-helical region formed by base-pairing
between adjacent (inverted) complementary sequences in a
single strand of RNA or DNA.
Optional qualifiers /allele="text"
/citation=[number]
/db_xref="<database>:<identifier>"
/experiment="text"
/function="text"
/gene="text"
/gene_synonym="text"
/inference="TYPE[ (same species)][:EVIDENCE_BASIS]"
/label=feature_label
/locus_tag="text" (single token)
/map="text"
/note="text"
/old_locus_tag="text" (single token)
/operon="text"
/standard_name="text"
Feature Key STS
Definition sequence tagged site; short, single-copy DNA sequence
that characterizes a mapping landmark on the genome and
can be detected by PCR; a region of the genome can be
mapped by determining the order of a series of STSs;
Optional qualifiers /allele="text"
/citation=[number]
/db_xref="<database>:<identifier>"
/experiment="text"
/gene="text"
/gene_synonym="text"
/inference="TYPE[ (same species)][:EVIDENCE_BASIS]"
/label=feature_label
/locus_tag="text" (single token)
/map="text"
/note="text"
/old_locus_tag="text" (single token)
/standard_name="text"
Molecule scope DNA
Parent key misc_binding
Comment STS location to include primer(s) in primer_bind key or
primers.
Feature Key TATA_signal
Definition TATA box; Goldberg-Hogness box; a conserved AT-rich
septamer found about 25 bp before the start point of
each eukaryotic RNA polymerase II transcript unit which
may be involved in positioning the enzyme for correct
initiation; consensus=TATA(A or T)A(A or T) [1,2];
Optional qualifiers /allele="text"
/citation=[number]
/db_xref="<database>:<identifier>"
/experiment="text"
/gene="text"
/gene_synonym="text"
/inference="TYPE[ (same species)][:EVIDENCE_BASIS]"
/label=feature_label
/locus_tag="text" (single token)
/map="text"
/note="text"
/old_locus_tag="text" (single token)
Organism scope eukaryotes and eukaryotic viruses
Molecule scope DNA
References [1] Efstratiadis, A. et al. Cell 21, 653-668 (1980)
[2] Corden, J., et al. "Promoter sequences of
eukaryotic protein-encoding genes" Science 209,
1406-1414 (1980)
Feature Key terminator
Definition sequence of DNA located either at the end of the
transcript that causes RNA polymerase to terminate
transcription;
Optional qualifiers /allele="text"
/citation=[number]
/db_xref="<database>:<identifier>"
/experiment="text"
/gene="text"
/gene_synonym="text"
/inference="TYPE[ (same species)][:EVIDENCE_BASIS]"
/label=feature_label
/locus_tag="text" (single token)
/map="text"
/note="text"
/operon="text"
/old_locus_tag="text" (single token)
/standard_name="text"
Molecule scope DNA
Feature Key tmRNA
Definition transfer messenger RNA; tmRNA acts as a tRNA first,
and then as an mRNA that encodes a peptide tag; the
ribosome translates this mRNA region of tmRNA and attaches
the encoded peptide tag to the C-terminus of the
unfinished protein; this attached tag targets the protein for
destruction or proteolysis;
Optional qualifiers /allele="text"
/citation=[number]
/db_xref="<database>:<identifier>"
/experiment="text"
/function="text"
/gene="text"
/gene_synonym="text"
/inference="TYPE[ (same species)][:EVIDENCE_BASIS]"
/label=feature_label
/locus_tag="text" (single token)
/map="text"
/note="text"
/old_locus_tag="text" (single token)
/product="text"
/pseudo
/standard_name="text"
/tag_peptide=<base_range>
Comment the tmRNA feature key will become valid on 15-Dec-2007
Feature Key transit_peptide
Definition transit peptide coding sequence; coding sequence for an
N-terminal domain of a nuclear-encoded organellar
protein; this domain is involved in post-translational
import of the protein into the organelle;
Optional qualifiers /allele="text"
/citation=[number]
/db_xref="<database>:<identifier>"
/experiment="text"
/function="text"
/gene="text"
/gene_synonym="text"
/inference="TYPE[ (same species)][:EVIDENCE_BASIS]"
/label=feature_label
/locus_tag="text" (single token)
/map="text"
/note="text"
/old_locus_tag="text" (single token)
/product="text"
/pseudo
/standard_name="text"
Feature Key tRNA
Definition mature transfer RNA, a small RNA molecule (75-85 bases
long) that mediates the translation of a nucleic acid
sequence into an amino acid sequence;
Optional qualifiers /allele="text"
/anticodon=(pos:<base_range>,aa:<amino_acid>)
/citation=[number]
/db_xref="<database>:<identifier>"
/experiment="text"
/function="text"
/gene="text"
/gene_synonym="text"
/inference="TYPE[ (same species)][:EVIDENCE_BASIS]"
/label=feature_label
/locus_tag="text" (single token)
/map="text"
/note="text"
/old_locus_tag="text" (single token)
/product="text"
/pseudo
/standard_name="text"
/trans_splicing
Feature Key unsure
Definition author is unsure of exact sequence in this region;
Optional qualifiers /allele="text"
/citation=[number]
/compare=[accession-number.sequence-version]
/db_xref="<database>:<identifier>"
/experiment="text"
/gene="text"
/gene_synonym="text"
/inference="TYPE[ (same species)][:EVIDENCE_BASIS]"
/label=feature_label
/locus_tag="text" (single token)
/map="text"
/note="text"
/old_locus_tag="text" (single token)
/replace="text"
Comment use /replace="" to annotate deletion, e.g.
Unsure 11..15
/replace=""
Feature Key V_region
Definition variable region of immunoglobulin light and heavy
chains, and T-cell receptor alpha, beta, and gamma
chains; codes for the variable amino terminal portion;
can be composed of V_segments, D_segments, N_regions,
and J_segments;
Optional qualifiers /allele="text"
/citation=[number]
/db_xref="<database>:<identifier>"
/experiment="text"
/gene="text"
/gene_synonym="text"
/inference="TYPE[ (same species)][:EVIDENCE_BASIS]"
/label=feature_label
/locus_tag="text" (single token)
/map="text"
/note="text"
/old_locus_tag="text" (single token)
/product="text"
/pseudo
/standard_name="text"
Parent Key CDS
Organism scope eukaryotes
Feature Key V_segment
Definition variable segment of immunoglobulin light and heavy
chains, and T-cell receptor alpha, beta, and gamma
chains; codes for most of the variable region (V_region)
and the last few amino acids of the leader peptide;
Optional qualifiers /allele="text"
/citation=[number]
/db_xref="<database>:<identifier>"
/experiment="text"
/gene="text"
/gene_synonym="text"
/inference="TYPE[ (same species)][:EVIDENCE_BASIS]"
/label=feature_label
/locus_tag="text" (single token)
/map="text"
/note="text"
/old_locus_tag="text" (single token)
/product="text"
/pseudo
/standard_name="text"
Parent Key CDS
Organism scope eukaryotes
Feature Key variation
Definition a related strain contains stable mutations from the same
gene (e.g., RFLPs, polymorphisms, etc.) which differ
from the presented sequence at this location (and
possibly others);
Optional qualifiers /allele="text"
/citation=[number]
/compare=[accession-number.sequence-version]
/db_xref="<database>:<identifier>"
/experiment="text"
/frequency="text"
/gene="text"
/gene_synonym="text"
/inference="TYPE[ (same species)][:EVIDENCE_BASIS]"
/label=feature_label
/locus_tag="text" (single token)
/map="text"
/note="text"
/old_locus_tag="text" (single token)
/phenotype="text"
/product="text"
/replace="text"
/standard_name="text"
Comment used to describe alleles, RFLP's,and other naturally occurring
mutations and polymorphisms; variability arising as a result
of genetic manipulation (e.g. site directed mutagenesis) should
be described with the misc_difference feature;
use /replace="" to annotate deletion, e.g.
variation 4..5
/replace=""
Feature Key 3'UTR
Definition region at the 3' end of a mature transcript (following
the stop codon) that is not translated into a protein;
Optional qualifiers /allele="text"
/citation=[number]
/db_xref="<database>:<identifier>"
/experiment="text"
/function="text"
/gene="text"
/gene_synonym="text"
/inference="TYPE[ (same species)][:EVIDENCE_BASIS]"
/label=feature_label
/locus_tag="text" (single token)
/map="text"
/note="text"
/old_locus_tag="text" (single token)
/standard_name="text"
/trans_splicing
Feature Key 5'UTR
Definition region at the 5' end of a mature transcript (preceding
the initiation codon) that is not translated into a
protein;
Optional qualifiers /allele="text"
/citation=[number]
/db_xref="<database>:<identifier>"
/experiment="text"
/function="text"
/gene="text"
/gene_synonym="text"
/inference="TYPE[ (same species)][:EVIDENCE_BASIS]"
/label=feature_label
/locus_tag="text" (single token)
/map="text"
/note="text"
/old_locus_tag="text" (single token)
/standard_name="text"
/trans_splicing
Feature Key -10_signal
Definition Pribnow box; a conserved region about 10 bp upstream of
the start point of bacterial transcription units which
may be involved in binding RNA polymerase;
consensus=TAtAaT [1,2,3,4];
Optional qualifiers /allele="text"
/citation=[number]
/db_xref="<database>:<identifier>"
/experiment="text"
/gene="text"
/gene_synonym="text"
/inference="TYPE[ (same species)][:EVIDENCE_BASIS]"
/label=feature_label
/locus_tag="text" (single token)
/map="text"
/note="text"
/old_locus_tag="text" (single token)
/operon="text"
/standard_name="text"
Organism scope prokaryotes
Molecule scope DNA
References [1] Schaller, H., Gray, C., and Hermann, K. Proc Natl
Acad Sci USA 72, 737-741 (1974)
[2] Pribnow, D. Proc Natl Acad Sci USA 72, 784-788 (1974)
[3] Hawley, D.K. and McClure, W.R. "Compilation and
analysis of Escherichia coli promoter DNA sequences"
Nucl Acid Res 11, 2237-2255 (1983)
[4] Rosenberg, M. and Court, D. "Regulatory sequences
involved in the promotion and termination of RNA
transcription" Ann Rev Genet 13, 319-353 (1979)
Feature Key -35_signal
Definition a conserved hexamer about 35 bp upstream of the start
point of bacterial transcription units; consensus=TTGACa
or TGTTGACA;
Optional qualifiers /allele="text"
/citation=[number]
/db_xref="<database>:<identifier>"
/experiment="text"
/gene="text"
/gene_synonym="text"
/inference="TYPE[ (same species)][:EVIDENCE_BASIS]"
/label=feature_label
/locus_tag="text" (single token)
/map="text"
/note="text"
/old_locus_tag="text" (single token)
/operon="text"
/standard_name="text"
Organism scope prokaryotes
Molecule scope DNA
References [1] Takanami, M., et al. Nature 260, 297-302 (1976)
[2] Moran, C.P., Jr., et al. Molec Gen Genet 186,
339-346 (1982)
[3] Maniatis, T., et al. Cell 5, 109-113 (1975)
7.4 Appendix IV: Summary of qualifiers for feature keys
7.4.1 Qualifier List
The following is a list of available qualifiers for feature keys and their usage.
The information is arranged as follows:
Qualifier name of qualifier; qualifier requires a value if followed by an equal
sign
Definition definition of the qualifier
Value format format of value, if required
Example example of qualifier with value
Comment comments, questions and clarifications
Qualifier /allele=
Definition name of the allele for the given gene
Value format "text"
Example /allele="adh1-1"
Comment all gene-related features (exon, CDS etc) for a given
gene should share the same /allele qualifier value;
the /allele qualifier value must, by definition, be
different from the /gene qualifier value; when used with
the variation feature key, the allele qualifier value
should be that of the variant.
Qualifier /anticodon=
Definition location of the anticodon of tRNA and the amino acid for which
it codes
Value format (pos:<base_range>,aa:<amino_acid>) where base_range
is the position of the anticodon and amino_acid is the
abbreviation for the amino acid encoded
Example /anticodon=(pos:34..36,aa:Phe)
Qualifier /bio_material=
Definition identifier for the biological material from which the nucleic
acid sequenced was obtained, with optional institution code and
collection code for the place where it is currently stored.
Value format "[<institution-code>:[<collection-code>:]]<material_id>"
Example /bio_material="CGC:CB3912" <- Caenorhabditis stock centre
Comment the bio_material qualifier should be used to annotate the
identifiers of material in biological collections that are not
appropriate to annotate as either /specimen_voucher or
/culture_collection; these include zoos and aquaria, stock
centres, seed banks, germplasm repositories and DNA banks;
material_id is mandatory, institution_code and collection_code
are optional; institution code is mandatory where collection
code is present;
the /bio_material qualifier becomes legal on 15-Dec-2007;
Qualifier /bound_moiety=
Definition name of the molecule/complex that may bind to the
given feature
Value format "text"
Example /bound_moiety="GAL4"
Comment Multiple /bound_moiety qualifiers are legal on "promoter"
and "enhancer" features. A single /bound_moiety qualifier
is legal on the "misc_binding", "oriT" and "protein_bind"
features.
Qualifier /cell_line=
Definition cell line from which the sequence was obtained
Value format "text"
Example /cell_line="MCF7"
Qualifier /cell_type=
Definition cell type from which the sequence was obtained
Value format "text"
Example /cell_type="leukocyte"
Qualifier /chromosome=
Definition chromosome (e.g. Chromosome number) from which
the sequence was obtained
Value format "text"
Example /chromosome="1"
Qualifier /citation=
Definition reference to a citation listed in the entry reference field
Value format [integer-number] where integer-number is the number of the
reference as enumerated in the reference field
Example /citation=[3]
Comment used to indicate the citation providing the claim of and/or
evidence for a feature; brackets are used for conformity.
Qualifier /clone=
Definition clone from which the sequence was obtained
Value format "text"
Example /clone="lambda-hIL7.3"
Comment not more than one clone should be specified for a given source
feature; to indicate that the sequence was obtained from
multiple clones, multiple source features should be given.
Qualifier /clone_lib=
Definition clone library from which the sequence was obtained
Value format "text"
Example /clone_lib="lambda-hIL7"
Qualifier /codon=
Definition specifies a codon which is different from any found in the
reference genetic code
Value format (seq:"codon-sequence",aa:<amino_acid>) where
"codon-sequence" contains the bases of the codon and <amino_acid> is
the abbreviation for the translated amino acid, the abbreviation
for a modified unusual amino_acids from section 7.5,
or the word OTHER
Example /codon=(seq:"ttt", aa:Leu)
Comment used to specify unusual genetic codes, organellar codes, etc,
that are different from the "normal" code for the organism;
the codon specified by "seq" codes for the amino acid or stop
codon specified by "aa";
the codon that is specified is used throughout the CDS;
amino acids that are not on the controlled vocabulary list can be
annotated by using "aa:OTHER" as the amino acid designation, and
by giving the name of the residue in a /note qualifier;
only nucleotides a, g, c or t can be used in "codon-sequence";
multiple /codon qualifiers should be used to describe ambiguous
nucleotides.
Qualifier /codon_start=
Definition indicates the offset at which the first complete codon of a
coding feature can be found, relative to the first base of that
feature.
Value format 1 or 2 or 3
Example /codon_start=2
Qualifier /collected_by=
Definition name of the person who collected the specimen
Value format "text"
Example /collected_by="Dan Janzen"
Qualifier /collection_date=
Definition date that the specimen was collected
Value format "DD-Mmm-YYYY", "Mmm-YYYY" or "YYYY"
Example /collection_date="21-Oct-1952"
/collection_date="Oct-1952"
/collection_date="1952"
Comment full date format DD-Mmm-YYYY is preferred; where day and/or month
of collection is not known either "Mmm-YYYY" or "YYYY" can be used;
three-letter month abbreviation can be one of the following: Jan,
Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec.
Qualifier /compare=
Definition Reference details of an existing public INSD entry
to which a comparison is made
Value format [accession-number.sequence-version]
Example /compare=AJ634337.1
Comment This qualifier may be used on the following features:
misc_difference, conflict, unsure, old_sequence
and variation. The features "old_sequence" and "conflict" must
have either a /citation or a /compare qualifier. Multiple /compare
qualifiers with different contents are allowed within a
single feature.
This qualifier is not intended for large-scale annotation
of variations, such as SNPs.
Qualifier /country=
Definition locality of isolation of the sequenced organism indicated in
terms of political names for nations, oceans or seas, followed
by regions and localities
Value format "<country_value>[:<region>][, <locality>]" where
country_value is any value from the controlled vocabulary at
http://www.insdc.org/page.php?page=country
Example /country="Canada:Vancouver"
/country="France:Cote d'Azur, Antibes"
/country="Atlantic Ocean:Charlie Gibbs Fracture Zone"
Comment Intended to provide a reference to the site where the source
organism was isolated or sampled. Regions and localities should
be indicated where possible. Note that the physical geography of
the isolation or sampling site should be represented in
/isolation_source.
Qualifier /cultivar=
Definition cultivar (cultivated variety) of plant from which sequence was
obtained.
Value format "text"
Example /cultivar="Nipponbare"
/cultivar="Tenuifolius"
/cultivar="Candy Cane"
/cultivar="IR36"
Comment 'cultivar' is applied solely to products of artificial
selection; use the variety qualifier for natural, named
plant and fungal varieties;
Qualifier /culture_collection=
Definition institution code and identifier for the culture from which the
nucleic acid sequenced was obtained, with optional collection
code.
Value format "<institution-code>:[<collection-code>:]<culture_id>"
Example /culture_collection="ATCC:26370"
Comment the /culture_collection qualifier should be used to annotate
live microbial and viral cultures, and cell lines that have been
deposited in curated culture collections; microbial cultures in
personal or laboratory collections should be annotated in strain
qualifiers;
annotation with a culture_collection qualifier implies that the
sequence was obtained from a sample retrieved (by the submitter
or a collaborator) from the indicated culture collection, or
that the sequence was obtained from a sample that was deposited
(by the submitter or a collaborator) in the indicated culture
collection; annotation with more than one culture_collection
qualifier indicates that the sequence was obtained from a sample
that was deposited (by the submitter or a collaborator) in more
than one culture collection.
culture_id and institution_code are mandatory, collection_code
is optional;
the /culture_collection qualifier becomes legal on 15-Dec-2007;
Qualifier /db_xref=
Definition database cross-reference: pointer to related information in
another database.
Value format "<database:identifier>" where database is
the name of the database containing related information, and
identifier is the internal identifier of the related information
according to the naming conventions of the cross-referenced
database.
Example /db_xref="UniProtKB/Swiss-Prot:P28763"
Comment the complete list of allowed database types is kept at
http://www.insdc.org/page.php?page=db_xref
Qualifier /dev_stage=
Definition if the sequence was obtained from an organism in a specific
developmental stage, it is specified with this qualifier
Value format "text"
Example /dev_stage="fourth instar larva"
Qualifier /direction=
Definition direction of DNA replication
Value format left, right, or both where left indicates toward the 5' end of
the entry sequence (as presented) and right indicates toward
the 3' end
Example /direction=LEFT
Qualifier /EC_number=
Definition Enzyme Commission number for enzyme product of sequence
Value format "text"
Example /EC_number="1.1.2.4"
/EC_number="1.1.2.-"
/EC_number="1.1.2.n"
Comment valid values for EC numbers are defined in the list prepared by the
Nomenclature Committee of the International Union of Biochemistry and
Molecular Biology (NC-IUBMB) (published in Enzyme Nomenclature 1992,
Academic Press, San Diego, or a more recent revision thereof).
The format represents a string of four numbers separated by full
stops; up to three numbers starting from the end of the string can
be replaced by dash "." to indicate uncertain assignment.
Symbol "n" can be used in the last position instead of a number
where the EC number is awaiting assignment. Please note that such
incomplete EC numbers are not approved by NC-IUBMB.
Qualifier /ecotype=
Definition a population within a given species displaying genetically
based, phenotypic traits that reflect adaptation to a local habitat.
Value Format "text"
Example /ecotype="Columbia"
Comment an example of such a population is one that has adapted hairier
than normal leaves as a response to an especially sunny habitat.
'Ecotype' is often applied to standard genetic stocks of
Arabidopsis thaliana, but it can be applied to any sessile
organism.
Qualifier /environmental_sample
Definition identifies sequences derived by direct molecular
isolation from a bulk environmental DNA sample
(by PCR with or without subsequent cloning of the
product, DGGE, or other anonymous methods) with no
reliable identification of the source organism.
Environmental samples include clinical samples,
gut contents, and other sequences from anonymous
organisms that may be associated with a particular
host. They do not include endosymbionts that can be
reliably recovered from a particular host, organisms
from a readily identifiable but uncultured field sample
(e.g., many cyanobacteria), or phytoplasmas that can be
reliably recovered from diseased plants (even though
these cannot be grown in axenic culture).
Value format none
Example /environmental_sample
Comment used only with the source feature key; source feature
keys containing the /environmental_sample qualifier
should also contain the /isolation_source qualifier.
entries including /environmental_sample must not include
the /strain qualifier
Qualifier /estimated_length=
Definition estimated length of the gap in the sequence
Value format unknown or <integer>
Example /estimated_length=unknown
/estimated_length=342
Qualifier /exception=
Definition indicates that the coding region cannot be translated using
standard biological rules
Value format "RNA editing", "reasons given in citation",
"rearrangement required for product"
Example /exception="RNA editing"
/exception="reasons given in citation"
/exception="rearrangement required for product"
Comment only to be used to describe biological mechanisms such
as RNA editing; where the exception cannot easily be described
a published citation must be referred to; protein translation of
/exception CDS will be different from the according conceptual
translation;
- must not be used where transl_except would be adequate,
e.g. in case of stop codon completion use:
/transl_except=(pos:6883,aa:TERM)
/note="TAA stop codon is completed by addition of 3' A residues to
mRNA".
- must not be used for ribosomal slippage, instead use join operator,
e.g.: CDS join(486..1784,1787..4810)
/note="ribosomal slip on tttt sequence at 1784..1787"
Qualifier /experiment=
Definition a brief description of the nature of the experimental
evidence that supports the feature identification or assignment.
Value format "text"
Example /experiment="Northern blot"
/experiment="heterologous expression system of Xenopus laevis
oocytes"
Comment detailed experimental details should not be included, and would
normally be found in the cited publications; value
"experimental evidence, no additional details recorded" was used to
replace instances of /evidence=EXPERIMENTAL in December 2005
Qualifier /focus
Definition identifies the source feature of primary biological
interest for records that have multiple source features
originating from different organisms and that are not
transgenic.
Value format none
Example /focus
Comment the source feature carrying the /focus qualifier
identifies the main organism of the entry, this
determines: a) the name displayed in the organism
lines, b) if no translation table is specified, the
translation table, c) the DDBJ/EMBL/GenBank taxonomic
division in which the entry will appear; only one
source feature with /focus is allowed in an entry; the
/focus and /transgenic qualifiers are mutually exclusive
in an entry.
Qualifier /frequency=
Definition frequency of the occurrence of a feature
Value format text representing the proportion of a population carrying the
feature expressed as a fraction
Example /frequency="23/108"
/frequency="1 in 12"
/frequency=".85"
Qualifier /function=
Definition function attributed to a sequence
Value format "text"
Example function="essential for recognition of cofactor"
Comment /function is used when the gene name and/or product name do not
convey the function attributable to a sequence.
Qualifier /gene=
Definition symbol of the gene corresponding to a sequence region
Value format "text"
Example /gene="ilvE"
Qualifier /gene_synonym=
Definition synonymous, replaced, obsolete or former gene symbol
Value format "text"
Example /gene_synonym="Hox-3.3"
in a feature where /gene="Hoxc6"
Comment used where it is helpful to indicate a gene symbol
synonym; when used, a primary gene symbol must always be
indicated in /gene
Qualifier /germline
Definition the sequence presented in the entry has not undergone somatic
rearrangement as part of an adaptive immune response; it is the
unrearranged sequence that was inherited from the parental
germline
Value format none
Example /germline
Comment /germline should not be used to indicate that the source of
the sequence is a gamete or germ cell;
/germline and /rearranged cannot be used in the same source
feature;
/germline and /rearranged should only be used for molecules that
can undergo somatic rearrangements as part of an adaptive immune
response; these are the T-cell receptor (TCR) and immunoglobulin
loci in the jawed vertebrates, and the unrelated variable
lymphocyte receptor (VLR) locus in the jawless fish (lampreys
and hagfish);
/germline and /rearranged should not be used outside of the
Craniata (taxid=89593)
Qualifier /haplotype=
Definition name for a specific set of alleles that are linked together
on the same physical chromosome. In the absence of
recombination,each haplotype is inherited as a unit, and may
be used to track gene flow in populations.
Value format "text"
Example /haplotype="Dw3 B5 Cw1 A1"
Qualifier /host=
Definition natural (as opposed to laboratory) host to the organism from
which sequenced molecule was obtained
Value format "text"
Example /host="Homo sapiens"
/host="Homo sapiens 12 year old girl"
/host="Rhizobium NGR234"
Qualifier /identified_by=
Definition name of the taxonomist who identified the specimen
Value format "text"
Example /identified_by="John Burns"
Qualifier /inference=
Definition a structured description of non-experimental evidence that supports
the feature identification or assignment.
Value format "TYPE[ (same species)][:EVIDENCE_BASIS]"
where TYPE is one of the following:
"non-experimental evidence, no additional details recorded"
"similar to sequence"
"similar to AA sequence"
"similar to DNA sequence"
"similar to RNA sequence"
"similar to RNA sequence, mRNA"
"similar to RNA sequence, EST"
"similar to RNA sequence, other RNA"
"profile"
"nucleotide motif"
"protein motif"
"ab initio prediction"
"alignment"
where the optional text "(same species)" is included when the inference comes
from the same species as the entry.
where the optional "EVIDENCE_BASIS" is either a reference to a database entry
(including accession and version) or an algorithm (including version) , eg
'INSD:AACN010222672.1', 'InterPro:IPR001900', 'ProDom:PD000600',
'Genscan:2.0', etc.
Example /inference="similar to DNA sequence:INSD:AY411252.1"
/inference="similar to RNA sequence, mRNA:RefSeq:NM_000041.2"
/inference="similar to DNA sequence (same
species):INSD:AACN010222672.1"
/inference="profile:tRNAscan:2.1"
/inference="protein motif:InterPro:IPR001900"
/inference="ab initio prediction:Genscan:2.0"
/inference="alignment:Splign:1.0"
Comment /inference="non-experimental evidence, no additional details
recorded" was used to replace instances of
/evidence=NOT_EXPERIMENTAL in December 2005;
recommentations for choice of resource acronym for
[EVIDENCE_BASIS] are provided in the /inference qualifier
vocabulary recommendation document
(http://www.insdc.org/page.php?page=inference);
Qualifier /isolate=
Definition individual isolate from which the sequence was obtained
Value format "text"
Example /isolate="Patient #152"
/isolate="DGGE band PSBAC-13"
Qualifier /isolation_source=
Definition describes the physical, environmental and/or local
geographical source of the biological sample from which
the sequence was derived
Value format "text"
Examples /isolation_source="rumen isolates from standard
Pelleted ration-fed steer #67"
/isolation_source="permanent Antarctic sea ice"
/isolation_source="denitrifying activated sludge from
carbon_limited continuous reactor"
Comment used only with the source feature key;
source feature keys containing an /environmental_sample
qualifier should also contain an /isolation_source
qualifier; the /country qualifier should be used to
describe the country and major geographical sub-region.
Qualifier /label=
Definition a label used to permanently tag a feature
Value format feature_label
Example /label=Alb1_exon1
Comment feature labels follow the naming conventions
for all feature table objects
(see Sections 3.1 and 3.4)
Qualifier /lab_host=
Definition scientific name of the laboratory host used to propagate the
source organism from which the sequenced molecule was obtained
Value format "text"
Example /lab_host="Gallus gallus"
/lab_host="Gallus gallus embryo"
/lab_host="Escherichia coli strain DH5 alpha"
/lab_host="Homo sapiens HeLa cells"
Comment the full binomial scientific name of the host organism should
be used when known; extra conditional information relating to
the host may also be included
Qualifier /lat_lon=
Definition geographical coordinates of the location where the specimen was
collected
Value format "text"
Example /lat_lon="47.94 N 28.12 W"
/lat_lon="45.01 S 4.12 E"
Comment degrees latitude and longitude in format "d[d.dd] N|S d[dd.dd] W|E"
(see the examples)
Qualifier /locus_tag=
Definition a submitter-supplied, systematic, stable identifier for a gene
and its associated features, used for tracking purposes
Value Format "text"(single token)
but not "<1-5 letters><5-9 digit integer>[.<integer>]"
Example /locus_tag="ABC_0022"
/locus_tag="A1C_00001"
Comment /locus_tag can be used with any feature that /gene
can be used with;
identical /locus_tag values may be used within an entry/record,
but only if the identical /locus_tag values are associated
with the same gene; in all other circumstances the /locus_tag
value must be unique within that entry/record. Multiple /locus_tag
values are not allowed within one feature for entries created
after 15-OCT-2004.
If a /locus_tag needs to be re-assigned the /old_locus_tag qualifier
should be used to store the old value. The /locus_tag value should
not be in a format which resembles INSD accession numbers,
accession.version, or /proteid_id identifiers.
Qualifier /map=
Definition genomic map position of feature
Value format "text"
Example /map="8q12-13"
Qualifier /macronuclear
Definition if the sequence shown is DNA and from an organism which
undergoes chromosomal differentiation between macronuclear and
micronuclear stages, this qualifier is used to denote that the
sequence is from macronuclear DNA.
Value format none
Example /macronuclear
Qualifier /mating_type=
Definition mating type of the organism from which the sequence was
obtained; mating type is used for prokaryotes, and for
eukaryotes that undergo meiosis without sexually dimorphic
gametes
Value format "text"
Examples /mating_type="MAT-1"
/mating_type="plus"
/mating_type="-"
/mating_type="odd"
/mating_type="even"
Comment /mating_type="male" and /mating_type="female" are
valid in the prokaryotes, but not in the eukaryotes;
for more information, see the entry for /sex.
Qualifier /mobile_element=
Definition type and name or identifier of the mobile element which is
described by the parent feature
Value format "<mobile_element_type>[:<mobile_element_name>]" where
mobile_element_type is one of the following:
"transposon", "retrotransposon", "integron",
"insertion sequence", "non-LTR retrotransposon",
"SINE", "MITE", "LINE", "other".
Example /mobile_element="transposon:Tnp9"
Comment /mobile_element is legal on repeat_region feature key only.
Mobile element should be used to represent both elements which
are currently mobile, and those which were mobile in the past.
Value "other" requires a mobile_element_name.
/mobile_element qualifier replaced /transposon and /insertion_seq
qualifiers in December 2006
Qualifier /mod_base=
Definition abbreviation for a modified nucleotide base
Value format modified_base
Example /mod_base=m5c
Comment modified nucleotides not found in the restricted vocabulary
list can be annotated by entering '/mod_base=OTHER' with
'/note="name of modified base"'
Qualifier /mol_type=
Definition in vivo molecule type of sequence
Value format "genomic DNA", "genomic RNA", "mRNA", "tRNA", "rRNA", "other
RNA", "other DNA", "transcribed RNA", "viral cRNA", "unassigned
DNA", "unassigned RNA"
Example /mol_type="genomic DNA"
Comment all values refer to the in vivo or synthetic molecule for
primary entries and the hypothetical molecule in Third Party
Annotation entries; the value "genomic DNA" does not imply that
the molecule is nuclear (e.g. organelle and plasmid DNA should
be described using "genomic DNA"); ribosomal RNA genes should be
described using "genomic DNA"; "rRNA" should only be used if the
ribosomal RNA molecule itself has been sequenced; /mol_type is
mandatory on every source feature key; all /mol_type values
within one entry/record must be the same; values "other RNA" and
"other DNA" should be applied to synthetic molecules, values
"unassigned DNA", "unassigned RNA" should be applied where in
vivo molecule is unknown
Qualifier /ncRNA_class=
Definition a structured description of the classification of the
non-coding RNA described by the ncRNA parent key
Value format "TYPE"
Example /ncRNA_class="miRNA"
/ncRNA_class="siRNA"
/ncRNA_class="scRNA"
Comment TYPE is a term taken from the INSDC controlled vocabulary for ncRNA
classes (http://www.insdc.org/page.php?page=rna_vocab); on
15-Oct-2008, the following terms were valid:
"antisense_RNA"
"autocatalytically_spliced_intron"
"ribozyme"
"hammerhead_ribozyme"
"RNase_P_RNA"
"RNase_MRP_RNA"
"telomerase_RNA"
"guide_RNA"
"rasiRNA"
"scRNA"
"siRNA"
"miRNA"
"piRNA"
"snoRNA"
"snRNA"
"SRP_RNA"
"vault_RNA"
"Y_RNA"
"other"
ncRNA classes not yet in the INSDC /ncRNA_class controlled
vocabulary can be annotated by entering
'/ncRNA_class="other"' with '/note="[brief explanation of
novel ncRNA_class]"';
Qualifier /note=
Definition any comment or additional information
Value format "text"
Example /note="This qualifier is equivalent to a comment."
Qualifier /number=
Definition a number to indicate the order of genetic elements (e.g.,
exons or introns) in the 5' to 3' direction
Value format unquoted text (single token)
Example /number=4
/number=6B
Comment text limited to integers, letters or combination of integers and/or
letters represented as an unquoted single token (e.g. 5a, XIIb);
any additional terms should be included in /standard_name.
Example: /number=2A
/standard_name="long"
Qualifier /old_locus_tag=
Definition feature tag assigned for tracking purposes
Value Format "text" (single token)
Example /old_locus_tag="RSc0382"
/locus_tag="YPO0002"
Comment /old_locus_tag can be used with any feature where /gene is valid and
where a /locus_tag qualifier is present.
Identical /old_locus_tag values may be used within an entry/record,
but only if the identical /old_locus_tag values are associated
with the same gene; in all other circumstances the /old_locus_tag
value must be unique within that entry/record.
Multiple/old_locus_tag qualifiers with distinct values are
allowed within a single feature; /old_locus_tag and /locus_tag
values must not be identical within a single feature.
Qualifier /operon=
Definition name of the group of contiguous genes transcribed into a
single transcript to which that feature belongs.
Value format "text"
Example /operon="lac"
Comment currently valid only on Prokaryota-specific features
Qualifier /organelle=
Definition type of membrane-bound intracellular structure from which the
sequence was obtained
Value format mitochondrion, nucleomorph, plastid, mitochondrion:kinetoplast,
plastid:chloroplast, plastid:apicoplast, plastid:chromoplast,
plastid:cyanelle, plastid:leucoplast, plastid:proplastid,
Examples /organelle="chromatophore"
/organelle="hydrogenosome"
/organelle="mitochondrion"
/organelle="nucleomorph"
/organelle="plastid"
/organelle="mitochondrion:kinetoplast"
/organelle="plastid:chloroplast"
/organelle="plastid:apicoplast"
/organelle="plastid:chromoplast"
/organelle="plastid:cyanelle"
/organelle="plastid:leucoplast"
/organelle="plastid:proplastid"
Comments modifier text limited to values from controlled list
Qualifier /organism=
Definition scientific name of the organism that provided the
sequenced genetic material.
Value format "text"
Example /organism="Homo sapiens"
Comment the organism name which appears on the OS or ORGANISM line
will match the value of the /organism qualifier of the
source key in the simplest case of a one-source sequence.
Qualifier /partial
Definition differentiates between complete regions and partial ones
Value format none
Example /partial
Comment not to be used for new entries from 15-DEC-2001;
use '<' and '>' signs in the location descriptors to
indicate that the sequence is partial.
Qualifier /PCR_conditions=
Definition description of reaction conditions and components for PCR
Value format "text"
Example /PCR_conditions="Initial denaturation:94degC,1.5min"
Comment used with primer_bind key
Qualifier /PCR_primers=
Definition PCR primers that were used to amplify the sequence.
A single /PCR_primers qualifier should contain all the primers used
for a single PCR reaction. If multiple forward or reverse primers are
present in a single PCR reaction, multiple sets of fwd_name/fwd_seq
or rev_name/rev_seq values will be present.
Value format /PCR_primers="[fwd_name: XXX1, ]fwd_seq: xxxxx1,[fwd_name: XXX2,]
fwd_seq: xxxxx2, [rev_name: YYY1, ]rev_seq: yyyyy1,
[rev_name: YYY2, ]rev_seq: yyyyy2"
Example /PCR_primers="fwd_name: CO1P1, fwd_seq: ttgattttttggtcayccwgaagt,
rev_name: CO1R4, rev_seq: ccwvytardcctarraartgttg"
/PCR_primers=" fwd_name: hoge1, fwd_seq: cgkgtgtatcttact,
rev_name: hoge2, rev_seq: cg<i>gtgtatcttact"
/PCR_primers="fwd_name: CO1P1, fwd_seq: ttgattttttggtcayccwgaagt,
fwd_name: CO1P2, fwd_seq: gatacacaggtcayccwgaagt, rev_name: CO1R4,
rev_seq: ccwvytardcctarraartgttg"
Comment fwd_seq and rev_seq are both mandatory; fwd_name and rev_name are
both optional. Both sequences should be presented in 5'>3' order.
The sequences should be given in the IUPAC degenerate-base alphabet,
except for the modified bases; those must be enclosed within angle
brackets <>
Qualifier /phenotype=
Definition phenotype conferred by the feature, where phenotype is defined as a
physical, biochemical or behavioural characteristic or set of
characteristics
Value format "text"
Example /phenotype="erythromycin resistance"
Qualifier /pop_variant=
Definition name of a variation that characterizes a particular
sub-population within a given species. The variation could be
in the genotype or the phenotype.
Value format "text"
Example /pop_variant="pop1"
/pop_variant="Bear Paw"
Qualifier /plasmid=
Definition name of naturally occurring plasmid from which the sequence was
obtained, where plasmid is defined as an independently replicating
genetic unit that cannot be described by /chromosome or /segment
Value format "text"
Example /plasmid="C-589"
Qualifier /product=
Definition name of the product associated with the feature, e.g. the mRNA of an
mRNA feature, the polypeptide of a CDS, the mature peptide of a
mat_peptide, etc.
Value format "text"
Example /product="trypsinogen" (when qualifier appears in CDS feature)
/product="trypsin" (when qualifier appears in mat_peptide feature)
/product="XYZ neural-specific transcript" (when qualifier appears in
mRNA feature)
Qualifier /protein_id=
Definition protein identifier, issued by International collaborators.
this qualifier consists of a stable ID portion (3+5 format
with 3 position letters and 5 numbers) plus a version number
after the decimal point.
Value format <identifier>
Example /protein_id="AAA12345.1"
Comment when the protein sequence encoded by the CDS changes, only
the version number of the /protein_id value is incremented;
the stable part of the /protein_id remains unchanged and as a
result will permanently be associated with a given protein;
this qualifier is valid only on CDS features which translate
into a valid protein.
Qualifier /proviral
Definition this qualifier is used to flag sequence obtained from a virus or
phage that is integrated into the genome of another organism
Value format none
Example /proviral
Qualifier /pseudo
Definition indicates that this feature is a non-functional version of the
element named by the feature key
Value format none
Example /pseudo
Qualifier /rearranged
Definition the sequence presented in the entry has undergone somatic
rearrangement as part of an adaptive immune response; it is not
the unrearranged sequence that was inherited from the parental
germline
Value format none
Example /rearranged
Comment /rearranged should not be used to annotate chromosome
rearrangements that are not involved in an adaptive immune
response;
/germline and /rearranged cannot be used in the same source
feature;
/germline and /rearranged should only be used for molecules that
can undergo somatic rearrangements as part of an adaptive immune
response; these are the T-cell receptor (TCR) and immunoglobulin
loci in the jawed vertebrates, and the unrelated variable
lymphocyte receptor (VLR) locus in the jawless fish (lampreys
and hagfish);
/germline and /rearranged should not be used outside of the
Craniata (taxid=89593)
Qualifier /replace=
Definition indicates that the sequence identified a feature's intervals is
replaced by the sequence shown in "text"; if no sequence is
contained within the qualifier, this indicates a deletion.
Value format "text"
Example /replace="a"
/replace=""
Qualifier /ribosomal_slippage
Definition during protein translation, certain sequences can program
ribosomes to change to an alternative reading frame by a
mechanism known as ribosomal slippage
Value format none
Example /ribosomal_slippage
Comment a join operator,e.g.: [join(486..1784,1787..4810)] should be used
in the CDS spans to indicate the location of ribosomal_slippage
Qualifier /rpt_family=
Definition type of repeated sequence; "Alu" or "Kpn", for example
Value format "text"
Example /rpt_family="Alu"
Qualifier /rpt_type=
Definition organization of repeated sequence
Value format tandem, inverted, flanking, terminal, direct, dispersed, and other
Example /rpt_type=INVERTED
Comment the values are case-insensitive, i.e. both "INVERTED" and "inverted"
are valid;
Definitions of the values:
tandem, a repeat that exists adjacent to another in the same
orientation;
inverted, a repeat which occurs as part of as set (normally a part)
organized in the reverse orientation;
flanking, a repeat lying outside the sequence for which it has
functional significance (eg. transposon insertion target sites);
terminal, a repeat at the ends of and within the sequence for which
it has functional significance (eg. transposon LTRs);
direct, a repeat that exists not always adjacent but is in the same
orientation;
dispersed, a repeat that is found dispersed throughout the genome;
other, a repeat exhibiting important attributes that cannot be
described by other values.
Qualifier /rpt_unit_range=
Definition identity of a repeat range
Value format <base_range>
Example /rpt_unit_range=202..245
Comment used to indicate the base range of the sequence that constitutes
a repeated sequence specified by the feature keys oriT and
repeat_region; qualifiers /rpt_unit_range and /rpt_unit_seq
replaced qualifier /rpt_unit in December 2005
Qualifier /rpt_unit_seq=
Definition identity of a repeat sequence
Value format "text"
Example /rpt_unit_seq="aagggc"
/rpt_unit_seq="ag(5)tg(8)"
/rpt_unit_seq="(AAAGA)6(AAAA)1(AAAGA)12"
Comment used to indicate the literal sequence that constitutes a
repeated sequence specified by the feature keys oriT and
repeat_region; qualifiers /rpt_unit_range and /rpt_unit_seq
replaced qualifier /rpt_unit in December 2005
Qualifier /satellite=
Definition identifier for a satellite DNA marker, compose of many tandem
repeats (identical or related) of a short basic repeated unit;
Value format "<satellite_type>[:<class>][ <identifier>]"
where satellite_type is one of the following
"satellite", "microsatellite", "minisatellite"
Example /satellite="satellite: S1a"
/satellite="satellite: alpha"
/satellite="satellite: gamma III"
/satellite="microsatellite: DC130"
Comment many satellites have base composition or other properties
that differ from those of the rest of the genome that allows
them to be identified.
Qualifier /segment=
Definition name of viral or phage segment sequenced
Value format "text"
Example /segment="6"
Qualifier /serotype=
Definition serological variety of a species characterized by its
antigenic properties
Value format "text"
Example /serotype="B1"
Comment used only with the source feature key;
the Bacteriological Code recommends the use of the
term 'serovar' instead of 'serotype' for the
prokaryotes; see the International Code of Nomenclature
of Bacteria (1990 Revision) Appendix 10.B "Infraspecific
Terms".
Qualifier /serovar=
Definition serological variety of a species (usually a prokaryote)
characterized by its antigenic properties
Value format "text"
Example /serovar="O157:H7"
Comment used only with the source feature key;
the Bacteriological Code recommends the use of the
term 'serovar' instead of 'serotype' for prokaryotes;
see the International Code of Nomenclature of Bacteria
(1990 Revision) Appendix 10.B "Infraspecific Terms".
Qualifier /sex=
Definition sex of the organism from which the sequence was obtained;
sex is used for eukaryotic organisms that undergo meiosis
and have sexually dimorphic gametes
Value format "text"
Examples /sex="female"
/sex="male"
/sex="hermaphrodite"
/sex="unisexual"
/sex="bisexual"
/sex="asexual"
/sex="monoecious" [or monecious]
/sex="dioecious" [or diecious]
Comment /sex should be used (instead of /mating_type)
in the Metazoa, Embryophyta, Rhodophyta & Phaeophyceae;
/mating_type should be used (instead of /sex)
in the Bacteria, Archaea & Fungi;
neither /sex nor /mating_type should be used
in the viruses;
outside of the taxa listed above, /mating_type
should be used unless the value of the qualifier
is taken from the vocabulary given in the examples
above
Qualifier /specimen_voucher=
Definition identifier for the specimen from which the nucleic acid
sequenced was obtained
Value format /specimen_voucher="[<institution-code>:[<collection-code>:]]<specimen_id>"
Example /specimen_voucher="UAM:Mamm:52179"
/specimen_voucher="AMCC:101706"
/specimen_voucher="USNM:field series 8798"
/specimen_voucher="personal collection:Dan Janzen:99-SRNP-2003"
/specimen_voucher="99-SRNP-2003"
Comment the /specimen_voucher qualifier is intended to annotate a
reference to the physical specimen that remains after the
sequence has been obtained;
if the specimen was destroyed in the process of sequencing,
electronic images (e-vouchers) are an adequate substitute for a
physical voucher specimen; ideally the specimens will be
deposited in a curated museum, herbarium, or frozen tissue
collection, but often they will remain in a personal or
laboratory collection for some time before they are deposited in
a curated collection;
there are three forms of specimen_voucher qualifiers; if the
text of the qualifier includes one or more colons it is a
'structured voucher'; structured vouchers include
institution-codes (and optional collection-codes) taken from a
controlled vocabulary that denotes the museum or herbarium
collection where the specimen resides;
Qualifier /standard_name=
Definition accepted standard name for this feature
Value format "text"
Example /standard_name="dotted"
Comment use /standard_name to give full gene name, but use /gene to
give gene symbol (in the above example /gene="Dt").
Qualifier /strain=
Definition strain from which sequence was obtained
Value format "text"
Example /strain="BALB/c"
Comment entries including /strain must not include
the /environmental_sample qualifier
Qualifier /sub_clone=
Definition sub-clone from which sequence was obtained
Value format "text"
Example /sub_clone="lambda-hIL7.20g"
Comment the comments on /clone apply to /sub_clone
Qualifier /sub_species=
Definition name of sub-species of organism from which sequence was
obtained
Value format "text"
Example /sub_species="lactis"
Qualifier /sub_strain=
Definition name or identifier of a genetically or otherwise modified
strain from which sequence was obtained, derived from a
parental strain (which should be annotated in the /strain
qualifier).sub_strain from which sequence was obtained
Value format "text"
Example /sub_strain="abis"
Comment If the parental strain is not given, this should
be annotated in the strain qualifier instead of sub_strain.
Either:
/strain="K-12"
/sub_strain="MG1655"
or:
/strain="MG1655"
Qualifier /tag_peptide=
Definition base location encoding the polypeptide for proteolysis tag of
tmRNA and its termination codon;
Value format <base_range>
Example /tag_peptide=90..122
Comment it is recommended that the amino acid sequence corresponding
to the /tag_peptide be annotated by describing a 5' partial
CDS feature; e.g. CDS <90..122;
the /tag_peptide qualifier (and tmRNA feature) will become
valid on 15-Dec-2007
Qualifier /tissue_lib=
Definition tissue library from which sequence was obtained
Value format "text"
Example /tissue_lib="tissue library 772"
Qualifier /tissue_type=
Definition tissue type from which the sequence was obtained
Value format "text"
Example /tissue_type="liver"
Qualifier /transgenic
Definition identifies the source feature of the organism which was
the recipient of transgenic DNA.
Value format none
Example /transgenic
Comment transgenic sequences must have at least two source feature keys;
the source feature key having the /transgenic qualifier must
span the whole sequence; the source feature carrying the
/transgenic qualifier identifies the main organism of the entry,
this determines: a) the name displayed in the organism lines,
b) if no translation table is specified, the translation table;
only one source feature with /transgenic is allowed in an entry;
the /focus and /transgenic qualifiers are mutually exclusive in
an entry.
Qualifier /translation=
Definition automatically generated one-letter abbreviated amino acid
sequence derived from either the universal genetic code or the
table as specified in /transl_table and as determined by
exceptions in the /transl_except and /codon qualifiers
Value format IUPAC one-letter amino acid abbreviation, "X" is to be used
for AA exceptions.
Example /translation="MASTFPPWYRGCASTPSLKGLIMCTW"
Comment to be used with CDS feature only; this is a mandatory qualifier
in the CDS feature key except where /pseudo is shown;
see /transl_table for definition and location of genetic code
Tables.
Qualifier /transl_except=
Definition translational exception: single codon the translation of which
does not conform to genetic code defined by Organism and /codon=
Value format (pos:location,aa:<amino_acid>) where amino_acid is the
amino acid coded by the codon at the base_range position
Example /transl_except=(pos:213..215,aa:Trp)
/transl_except=(pos:1017,aa:TERM)
/transl_except=(pos:2000..2001,aa:TERM)
/transl_except=(pos:X22222:15..17,aa:Ala)
Comment if the amino acid is not on the restricted vocabulary list use
e.g., '/transl_except=(pos:213..215,aa:OTHER)' with
'/note="name of unusual amino acid"';
for modified amino-acid selenocysteine use three letter code
'Sec' (one letter code 'U' in amino-acid sequence)
/transl_except=(pos:1002..1004,aa:Sec);
for partial termination codons where TAA stop codon is
completed by the addition of 3' A residues to the mRNA
either a single base_position or a base_range is used, e.g.
if partial stop codon is a single base:
/transl_except=(pos:1017,aa:TERM)
if partial stop codon consists of two bases:
/transl_except=(pos:2000..2001,aa:TERM) with
'/note='stop codon completed by the addition of 3' A residues
to the mRNA'.
Qualifier /transl_table=
Definition definition of genetic code table used if other than universal
genetic code table. Tables used are described in appendix V,
section 7.5.5.
Value format <integer; 1=universal table 1;2=non-universal table 2;...
Example /transl_table=4
Comment genetic code exceptions outside range of specified tables are
reported in /codon or /transl_except qualifiers.
Qualifier /trans_splicing
Definition indicates that exons from two RNA molecules are ligated in
intermolecular reaction to form mature RNA
Value format none
Example /trans_splicing
Comment should be used on features such as CDS, mRNA and other features
that are produced as a result of a trans-splicing event. This
qualifier should be used only when the splice event is indicated in
the "join" operator, eg join(complement(69611..69724),139856..140087)
Qualifier /variety=
Definition variety (= varietas, a formal Linnaean rank) of organism
from which sequence was derived.
Value format "text"
Example /variety="insularis"
Comment use the cultivar qualifier for cultivated plant
varieties, i.e., products of artificial selection;
varieties other than plant and fungal variatas should be
annotated via /note, e.g. /note="breed:Cukorova"
7.4.2 Feature qualifiers - mapped to Feature keys
The following is a list of available qualifiers mapped to the list of feature keys on which each qualifier is legal.
QUALIFIER FEATURE KEY
/allele -10_signal
/allele -35_signal
/allele 3'UTR
/allele 5'UTR
/allele attenuator
/allele C_region
/allele CAAT_signal
/allele CDS
/allele conflict
/allele D_segment
/allele D-loop
/allele enhancer
/allele exon
/allele GC_signal
/allele gene
/allele iDNA
/allele intron
/allele J_segment
/allele LTR
/allele mat_peptide
/allele misc_binding
/allele misc_difference
/allele misc_feature
/allele misc_recomb
/allele misc_RNA
/allele misc_signal
/allele misc_structure
/allele modified_base
/allele mRNA
/allele N_region
/allele old_sequence
/allele operon
/allele oriT
/allele polyA_signal
/allele polyA_site
/allele precursor_RNA
/allele prim_transcript
/allele primer_bind
/allele promoter
/allele protein_bind
/allele RBS
/allele rep_origin
/allele repeat_region
/allele rRNA
/allele S_region
/allele sig_peptide
/allele stem_loop
/allele STS
/allele TATA_signal
/allele terminator
/allele transit_peptide
/allele tRNA
/allele unsure
/allele V_region
/allele V_segment
/allele variation
/anticodon tRNA
/bio_material source
/bound_moiety enhancer
/bound_moiety misc_binding
/bound_moiety oriT
/bound_moiety promoter
/bound_moiety protein_bind
/cell_line source
/cell_type source
/chromosome source
/citation -10_signal
/citation -35_signal
/citation 3'UTR
/citation 5'UTR
/citation attenuator
/citation C_region
/citation CAAT_signal
/citation CDS
/citation conflict
/citation D_segment
/citation D-loop
/citation enhancer
/citation exon
/citation GC_signal
/citation gene
/citation iDNA
/citation intron
/citation J_segment
/citation LTR
/citation mat_peptide
/citation misc_binding
/citation misc_difference
/citation misc_feature
/citation misc_recomb
/citation misc_RNA
/citation misc_signal
/citation misc_structure
/citation modified_base
/citation mRNA
/citation N_region
/citation old_sequence
/citation operon
/citation oriT
/citation polyA_signal
/citation polyA_site
/citation precursor_RNA
/citation prim_transcript
/citation primer_bind
/citation promoter
/citation protein_bind
/citation RBS
/citation rep_origin
/citation repeat_region
/citation rRNA
/citation S_region
/citation sig_peptide
/citation source
/citation stem_loop
/citation STS
/citation TATA_signal
/citation terminator
/citation transit_peptide
/citation tRNA
/citation unsure
/citation V_region
/citation V_segment
/citation variation
/clone misc_difference
/clone source
/clone_lib source
/codon CDS
/codon_start CDS
/collected_by source
/collection_date source
/compare conflict
/compare misc_difference
/compare old_sequence
/compare variation
/compare unsure
/country source
/cultivar source
/culture_collection source
/db_xref -10_signal
/db_xref -35_signal
/db_xref 3'UTR
/db_xref 5'UTR
/db_xref attenuator
/db_xref C_region
/db_xref CAAT_signal
/db_xref CDS
/db_xref conflict
/db_xref D_segment
/db_xref D-loop
/db_xref enhancer
/db_xref exon
/db_xref GC_signal
/db_xref gene
/db_xref iDNA
/db_xref intron
/db_xref J_segment
/db_xref LTR
/db_xref mat_peptide
/db_xref misc_binding
/db_xref misc_difference
/db_xref misc_feature
/db_xref misc_recomb
/db_xref misc_RNA
/db_xref misc_signal
/db_xref misc_structure
/db_xref modified_base
/db_xref mRNA
/db_xref N_region
/db_xref old_sequence
/db_xref operon
/db_xref oriT
/db_xref polyA_signal
/db_xref polyA_site
/db_xref precursor_RNA
/db_xref prim_transcript
/db_xref primer_bind
/db_xref promoter
/db_xref protein_bind
/db_xref RBS
/db_xref rep_origin
/db_xref repeat_region
/db_xref rRNA
/db_xref S_region
/db_xref sig_peptide
/db_xref source
/db_xref stem_loop
/db_xref STS
/db_xref TATA_signal
/db_xref terminator
/db_xref transit_peptide
/db_xref tRNA
/db_xref unsure
/db_xref V_region
/db_xref V_segment
/db_xref variation
/dev_stage source
/direction oriT
/direction rep_origin
/EC_number CDS
/EC_number exon
/EC_number mat_peptide
/ecotype source
/environmental_sample source
/estimated_length gap
/exception CDS
/experiment -10_signal
/experiment -35_signal
/experiment 3'UTR
/experiment 5'UTR
/experiment attenuator
/experiment C_region
/experiment CAAT_signal
/experiment CDS
/experiment conflict
/experiment D_segment
/experiment D-loop
/experiment enhancer
/experiment exon
/experiment GC_signal
/experiment gene
/experiment iDNA
/experiment intron
/experiment J_segment
/experiment LTR
/experiment mat_peptide
/experiment misc_binding
/experiment misc_difference
/experiment misc_feature
/experiment misc_recomb
/experiment misc_RNA
/experiment misc_signal
/experiment misc_structure
/experiment modified_base
/experiment mRNA
/experiment N_region
/experiment old_sequence
/experiment operon
/experiment oriT
/experiment polyA_signal
/experiment polyA_site
/experiment precursor_RNA
/experiment prim_transcript
/experiment primer_bind
/experiment promoter
/experiment protein_bind
/experiment RBS
/experiment rep_origin
/experiment repeat_region
/experiment rRNA
/experiment S_region
/experiment sig_peptide
/experiment stem_loop
/experiment STS
/experiment TATA_signal
/experiment terminator
/experiment transit_peptide
/experiment tRNA
/experiment unsure
/experiment V_region
/experiment V_segment
/experiment variation
/focus source
/frequency modified_base
/frequency source
/frequency variation
/function 3'UTR
/function 5'UTR
/function CDS
/function exon
/function gene
/function iDNA
/function intron
/function LTR
/function mat_peptide
/function misc_binding
/function misc_feature
/function misc_RNA
/function misc_signal
/function misc_structure
/function mRNA
/function operon
/function precursor_RNA
/function prim_transcript
/function promoter
/function protein_bind
/function repeat_region
/function rRNA
/function sig_peptide
/function stem_loop
/function transit_peptide
/function tRNA
/gene -10_signal
/gene -35_signal
/gene 3'UTR
/gene 5'UTR
/gene attenuator
/gene C_region
/gene CAAT_signal
/gene CDS
/gene conflict
/gene D_segment
/gene D-loop
/gene enhancer
/gene exon
/gene GC_signal
/gene gene
/gene iDNA
/gene intron
/gene J_segment
/gene LTR
/gene mat_peptide
/gene misc_binding
/gene misc_difference
/gene misc_feature
/gene misc_recomb
/gene misc_RNA
/gene misc_signal
/gene misc_structure
/gene modified_base
/gene mRNA
/gene N_region
/gene old_sequence
/gene oriT
/gene polyA_signal
/gene polyA_site
/gene precursor_RNA
/gene prim_transcript
/gene primer_bind
/gene promoter
/gene protein_bind
/gene RBS
/gene rep_origin
/gene repeat_region
/gene rRNA
/gene S_region
/gene sig_peptide
/gene stem_loop
/gene STS
/gene TATA_signal
/gene terminator
/gene transit_peptide
/gene tRNA
/gene unsure
/gene V_region
/gene V_segment
/gene variation
/gene_synonym -10_signal
/gene_synonym -35_signal
/gene_synonym 3'UTR
/gene_synonym 5'UTR
/gene_synonym attenuator
/gene_synonym C_region
/gene_synonym CAAT_signal
/gene_synonym CDS
/gene_synonym conflict
/gene_synonym D_segment
/gene_synonym D-loop
/gene_synonym enhancer
/gene_synonym exon
/gene_synonym GC_signal
/gene_synonym gene
/gene_synonym iDNA
/gene_synonym intron
/gene_synonym J_segment
/gene_synonym LTR
/gene_synonym mat_peptide
/gene_synonym misc_binding
/gene_synonym misc_difference
/gene_synonym misc_feature
/gene_synonym misc_recomb
/gene_synonym misc_RNA
/gene_synonym misc_signal
/gene_synonym misc_structure
/gene_synonym modified_base
/gene_synonym mRNA
/gene_synonym N_region
/gene_synonym old_sequence
/gene_synonym oriT
/gene_synonym polyA_signal
/gene_synonym polyA_site
/gene_synonym precursor_RNA
/gene_synonym prim_transcript
/gene_synonym primer_bind
/gene_synonym promoter
/gene_synonym protein_bind
/gene_synonym RBS
/gene_synonym rep_origin
/gene_synonym repeat_region
/gene_synonym rRNA
/gene_synonym S_region
/gene_synonym sig_peptide
/gene_synonym stem_loop
/gene_synonym STS
/gene_synonym TATA_signal
/gene_synonym terminator
/gene_synonym transit_peptide
/gene_synonym tRNA
/gene_synonym unsure
/gene_synonym V_region
/gene_synonym V_segment
/gene_synonym variation
/germline source
/haplotype source
/host source
/identified_by source
/inference -10_signal
/inference -35_signal
/inference 3'UTR
/inference 5'UTR
/inference attenuator
/inference C_region
/inference CAAT_signal
/inference CDS
/inference conflict
/inference D_segment
/inference D-loop
/inference enhancer
/inference exon
/inference GC_signal
/inference gene
/inference iDNA
/inference intron
/inference J_segment
/inference LTR
/inference mat_peptide
/inference misc_binding
/inference misc_difference
/inference misc_feature
/inference misc_recomb
/inference misc_RNA
/inference misc_signal
/inference misc_structure
/inference modified_base
/inference mRNA
/inference N_region
/inference old_sequence
/inference operon
/inference oriT
/inference polyA_signal
/inference polyA_site
/inference precursor_RNA
/inference prim_transcript
/inference primer_bind
/inference promoter
/inference protein_bind
/inference RBS
/inference rep_origin
/inference repeat_region
/inference rRNA
/inference S_region
/inference sig_peptide
/inference stem_loop
/inference STS
/inference TATA_signal
/inference terminator
/inference transit_peptide
/inference tRNA
/inference unsure
/inference V_region
/inference V_segment
/inference variation
/isolate source
/isolation_source source
/lab_host source
/label -10_signal
/label -35_signal
/label 3'UTR
/label 5'UTR
/label attenuator
/label C_region
/label CAAT_signal
/label CDS
/label conflict
/label D_segment
/label D-loop
/label enhancer
/label exon
/label GC_signal
/label gene
/label iDNA
/label intron
/label J_segment
/label LTR
/label mat_peptide
/label misc_binding
/label misc_difference
/label misc_feature
/label misc_recomb
/label misc_RNA
/label misc_signal
/label misc_structure
/label modified_base
/label mRNA
/label N_region
/label old_sequence
/label operon
/label oriT
/label polyA_signal
/label polyA_site
/label precursor_RNA
/label prim_transcript
/label primer_bind
/label promoter
/label protein_bind
/label RBS
/label rep_origin
/label repeat_region
/label rRNA
/label S_region
/label sig_peptide
/label source
/label stem_loop
/label STS
/label TATA_signal
/label terminator
/label transit_peptide
/label tRNA
/label unsure
/label V_region
/label V_segment
/label variation
/lat_lon source
/locus_tag -10_signal
/locus_tag -35_signal
/locus_tag 3'UTR
/locus_tag 5'UTR
/locus_tag attenuator
/locus_tag C_region
/locus_tag CAAT_signal
/locus_tag CDS
/locus_tag conflict
/locus_tag D_segment
/locus_tag D-loop
/locus_tag enhancer
/locus_tag exon
/locus_tag GC_signal
/locus_tag gene
/locus_tag iDNA
/locus_tag intron
/locus_tag J_segment
/locus_tag LTR
/locus_tag mat_peptide
/locus_tag misc_binding
/locus_tag misc_difference
/locus_tag misc_feature
/locus_tag misc_recomb
/locus_tag misc_RNA
/locus_tag misc_signal
/locus_tag misc_structure
/locus_tag modified_base
/locus_tag mRNA
/locus_tag N_region
/locus_tag old_sequence
/locus_tag oriT
/locus_tag polyA_signal
/locus_tag polyA_site
/locus_tag precursor_RNA
/locus_tag prim_transcript
/locus_tag primer_bind
/locus_tag promoter
/locus_tag protein_bind
/locus_tag RBS
/locus_tag rep_origin
/locus_tag repeat_region
/locus_tag rRNA
/locus_tag S_region
/locus_tag sig_peptide
/locus_tag stem_loop
/locus_tag STS
/locus_tag TATA_signal
/locus_tag terminator
/locus_tag transit_peptide
/locus_tag tRNA
/locus_tag unsure
/locus_tag V_region
/locus_tag V_segment
/locus_tag variation
/macronuclear source
/map -10_signal
/map -35_signal
/map 3'UTR
/map 5'UTR
/map attenuator
/map C_region
/map CAAT_signal
/map CDS
/map conflict
/map D_segment
/map D-loop
/map enhancer
/map exon
/map GC_signal
/map gap
/map gene
/map iDNA
/map intron
/map J_segment
/map LTR
/map mat_peptide
/map misc_binding
/map misc_difference
/map misc_feature
/map misc_recomb
/map misc_RNA
/map misc_signal
/map misc_structure
/map modified_base
/map mRNA
/map N_region
/map old_sequence
/map operon
/map oriT
/map polyA_signal
/map polyA_site
/map precursor_RNA
/map prim_transcript
/map primer_bind
/map promoter
/map protein_bind
/map RBS
/map rep_origin
/map repeat_region
/map rRNA
/map S_region
/map sig_peptide
/map source
/map stem_loop
/map STS
/map TATA_signal
/map terminator
/map transit_peptide
/map tRNA
/map unsure
/map V_region
/map V_segment
/map variation
/mating_type source
/mobile_element repeat_region
/mod_base modified_base
/mol_type source
/ncRNA_class ncRNA
/note -10_signal
/note -35_signal
/note 3'UTR
/note 5'UTR
/note attenuator
/note C_region
/note CAAT_signal
/note CDS
/note conflict
/note D_segment
/note D-loop
/note enhancer
/note exon
/note GC_signal
/note gap
/note gene
/note iDNA
/note intron
/note J_segment
/note LTR
/note mat_peptide
/note misc_binding
/note misc_difference
/note misc_feature
/note misc_recomb
/note misc_RNA
/note misc_signal
/note misc_structure
/note modified_base
/note mRNA
/note N_region
/note old_sequence
/note operon
/note oriT
/note polyA_signal
/note polyA_site
/note precursor_RNA
/note prim_transcript
/note primer_bind
/note promoter
/note protein_bind
/note RBS
/note rep_origin
/note repeat_region
/note rRNA
/note S_region
/note sig_peptide
/note source
/note stem_loop
/note STS
/note TATA_signal
/note terminator
/note transit_peptide
/note tRNA
/note unsure
/note V_region
/note V_segment
/note variation
/number CDS
/number exon
/number iDNA
/number intron
/number misc_feature
/old_locus_tag -10_signal
/old_locus_tag -35_signal
/old_locus_tag 3'UTR
/old_locus_tag 5'UTR
/old_locus_tag attenuator
/old_locus_tag C_region
/old_locus_tag CAAT_signal
/old_locus_tag CDS
/old_locus_tag conflict
/old_locus_tag D_segment
/old_locus_tag D-loop
/old_locus_tag enhancer
/old_locus_tag exon
/old_locus_tag GC_signal
/old_locus_tag gene
/old_locus_tag iDNA
/old_locus_tag intron
/old_locus_tag J_segment
/old_locus_tag LTR
/old_locus_tag mat_peptide
/old_locus_tag misc_binding
/old_locus_tag misc_difference
/old_locus_tag misc_feature
/old_locus_tag misc_recomb
/old_locus_tag misc_RNA
/old_locus_tag misc_signal
/old_locus_tag misc_structure
/old_locus_tag modified_base
/old_locus_tag mRNA
/old_locus_tag N_region
/old_locus_tag old_sequence
/old_locus_tag oriT
/old_locus_tag polyA_signal
/old_locus_tag polyA_site
/old_locus_tag precursor_RNA
/old_locus_tag prim_transcript
/old_locus_tag primer_bind
/old_locus_tag promoter
/old_locus_tag protein_bind
/old_locus_tag RBS
/old_locus_tag rep_origin
/old_locus_tag repeat_region
/old_locus_tag rRNA
/old_locus_tag S_region
/old_locus_tag sig_peptide
/old_locus_tag stem_loop
/old_locus_tag STS
/old_locus_tag TATA_signal
/old_locus_tag terminator
/old_locus_tag transit_peptide
/old_locus_tag tRNA
/old_locus_tag unsure
/old_locus_tag V_region
/old_locus_tag V_segment
/old_locus_tag variation
/operon -10_signal
/operon -35_signal
/operon attenuator
/operon CDS
/operon gene
/operon misc_RNA
/operon misc_signal
/operon mRNA
/operon operon
/operon precursor_RNA
/operon prim_transcript
/operon promoter
/operon protein_bind
/operon rRNA
/operon stem_loop
/operon terminator
/organelle source
/organism source
/PCR_conditions primer_bind
/PCR_primers source
/phenotype attenuator
/phenotype gene
/phenotype misc_difference
/phenotype misc_feature
/phenotype misc_signal
/phenotype operon
/phenotype promoter
/phenotype variation
/plasmid source
/pop_variant source
/product C_region
/product CDS
/product D_segment
/product exon
/product gene
/product J_segment
/product mat_peptide
/product misc_feature
/product misc_RNA
/product mRNA
/product N_region
/product precursor_RNA
/product rRNA
/product S_region
/product sig_peptide
/product transit_peptide
/product tRNA
/product V_region
/product V_segment
/product variation
/protein_id CDS
/proviral source
/pseudo C_region
/pseudo CDS
/pseudo D_segment
/pseudo exon
/pseudo gene
/pseudo intron
/pseudo J_segment
/pseudo mat_peptide
/pseudo misc_feature
/pseudo misc_RNA
/pseudo mRNA
/pseudo N_region
/pseudo operon
/pseudo promoter
/pseudo rRNA
/pseudo S_region
/pseudo sig_peptide
/pseudo transit_peptide
/pseudo tRNA
/pseudo V_region
/pseudo V_segment
/rearranged source
/replace conflict
/replace misc_difference
/replace old_sequence
/replace unsure
/replace variation
/ribosomal_slippage CDS
/rpt_family oriT
/rpt_family repeat_region
/rpt_type oriT
/rpt_type repeat_region
/rpt_unit_range oriT
/rpt_unit_range repeat_region
/rpt_unit_seq oriT
/rpt_unit_seq repeat_region
/satellite repeat_region
/segment source
/serotype source
/serovar source
/sex source
/specimen_voucher source
/standard_name -10_signal
/standard_name -35_signal
/standard_name 3'UTR
/standard_name 5'UTR
/standard_name C_region
/standard_name CDS
/standard_name D_segment
/standard_name enhancer
/standard_name exon
/standard_name gene
/standard_name iDNA
/standard_name intron
/standard_name J_segment
/standard_name LTR
/standard_name mat_peptide
/standard_name misc_difference
/standard_name misc_feature
/standard_name misc_recomb
/standard_name misc_RNA
/standard_name misc_signal
/standard_name misc_structure
/standard_name mRNA
/standard_name N_region
/standard_name operon
/standard_name oriT
/standard_name precursor_RNA
/standard_name prim_transcript
/standard_name primer_bind
/standard_name promoter
/standard_name protein_bind
/standard_name RBS
/standard_name rep_origin
/standard_name repeat_region
/standard_name rRNA
/standard_name S_region
/standard_name sig_peptide
/standard_name stem_loop
/standard_name STS
/standard_name terminator
/standard_name transit_peptide
/standard_name tRNA
/standard_name V_region
/standard_name V_segment
/standard_name variation
/strain source
/sub_clone source
/sub_species source
/sub_strain source
/tag_peptide tmRNA
/tissue_lib source
/tissue_type source
/transgenic source
/transl_except CDS
/transl_table CDS
/translation CDS
/trans_splicing CDS
/trans_splicing gene
/trans_splicing misc_RNA
/trans_splicing mRNA
/trans_splicing precursor_RNA
/trans_splicing tRNA
/trans_splicing 3'UTR
/trans_splicing 5'UTR
/variety source
7.5 Appendix V: Controlled vocabularies
This appendix contains information on the restricted vocabulary fields used in
the Feature Table. The information contained in this appendix is subject to
change, please contact the database staff for the most recent information
concerning controlled vocabularies. This appendix is organized as follows:
Authority The organization with authority to define the vocabulary
Reference Publications of (or about) the vocabulary
Contact Name of database staff responsible for maintaining
the database copy of the vocabulary
Scope Feature Table qualifiers which take members of this vocabulary
as values
Listing A listing of the current vocabulary with definitions or
explanations
This appendix includes reference lists for the following controlled vocabulary
fields:
- Nucleotide base codes (IUPAC)
- Modified base abbreviations
- Amino acid abbreviations
- Modified and unusual Amino Acids
- Genetic Code Tables
- Country Names
7.5.1 Nucleotide base codes (IUPAC)
Authority Nomenclature Committee of the International Union of
Biochemistry
Reference Cornish-Bowden, A. Nucl Acid Res 13, 3021-3030 (1985)
Contact EMBL-EBI
Scope Location descriptors
Listing
Symbol Meaning
------ -------
a a; adenine
c c; cytosine
g g; guanine
t t; thymine in DNA; uracil in RNA
m a or c
r a or g
w a or t
s c or g
y c or t
k g or t
v a or c or g; not t
h a or c or t; not g
d a or g or t; not c
b c or g or t; not a
n a or c or g or t
7.5.2 Modified base abbreviations
Authority Sprinzl, M. and Gauss, D.H.
Reference Sprinzl, M. and Gauss, D.H. Nucl Acid Res 10, r1 (1982).
(note that in Cornish_Bowden, A. Nucl Acid Res 13, 3021-3030
(1985) the IUPAC-IUB declined to recommend a set of
abbreviations for modified nucleotides)
Contact NCBI
Scope /mod_base
Abbreviation Modified base description
------------ -------------------------
ac4c 4-acetylcytidine
chm5u 5-(carboxyhydroxylmethyl)uridine
cm 2'-O-methylcytidine
cmnm5s2u 5-carboxymethylaminomethyl-2-thiouridine
cmnm5u 5-carboxymethylaminomethyluridine
d dihydrouridine
fm 2'-O-methylpseudouridine
gal q beta,D-galactosylqueosine
gm 2'-O-methylguanosine
i inosine
i6a N6-isopentenyladenosine
m1a 1-methyladenosine
m1f 1-methylpseudouridine
m1g 1-methylguanosine
m1i 1-methylinosine
m22g 2,2-dimethylguanosine
m2a 2-methyladenosine
m2g 2-methylguanosine
m3c 3-methylcytidine
m5c 5-methylcytidine
m6a N6-methyladenosine
m7g 7-methylguanosine
mam5u 5-methylaminomethyluridine
mam5s2u 5-methoxyaminomethyl-2-thiouridine
man q beta,D-mannosylqueosine
mcm5s2u 5-methoxycarbonylmethyl-2-thiouridine
mcm5u 5-methoxycarbonylmethyluridine
mo5u 5-methoxyuridine
ms2i6a 2-methylthio-N6-isopentenyladenosine
ms2t6a N-((9-beta-D-ribofuranosyl-2-methyltiopurine-6-yl)car
bamoyl)threonine
mt6a N-((9-beta-D-ribofuranosylpurine-6-yl)N-methyl-carbam
oyl)threonine
mv uridine-5-oxyacetic acid-methylester
o5u uridine-5-oxyacetic acid (v)
osyw wybutoxosine
p pseudouridine
q queosine
s2c 2-thiocytidine
s2t 5-methyl-2-thiouridine
s2u 2-thiouridine
s4u 4-thiouridine
t 5-methyluridine
t6a N-((9-beta-D-ribofuranosylpurine-6-yl)carbamoyl)threo
nine
tm 2'-O-methyl-5-methyluridine
um 2'-O-methyluridine
yw wybutosine
x 3-(3-amino-3-carboxypropyl)uridine, (acp3)u
OTHER (requires /note= qualifier)
7.5.3 Amino acid abbreviations
Authority IUPAC-IUB Joint Commission on Biochemical Nomenclature.
Reference IUPAC-IUB Joint Commission on Biochemical Nomenclature.
Nomenclature and Symbolism for Amino Acids and
Peptides.
Eur. J. Biochem. 138:9-37(1984).
IUPAC-IUBMB JCBN Newsletter, 1999
http://www.chem.qmul.ac.uk/iubmb/newsletter/1999/item3.html
Scope /anticodon, /codon, /transl_except
Contact EMBL-EBI
Listing (note that the abbreviations are legal values for amino acids, not the full names)
Abbreviation Amino acid name
------------ ---------------
Ala A Alanine
Arg R Arginine
Asn N Asparagine
Asp D Aspartic acid (Aspartate)
Cys C Cysteine
Gln Q Glutamine
Glu E Glutamic acid (Glutamate)
Gly G Glycine
His H Histidine
Ile I Isoleucine
Leu L Leucine
Lys K Lysine
Met M Methionine
Phe F Phenylalanine
Pro P Proline
Pyl O Pyrrolysine
Ser S Serine
Sec U Selenocysteine
Thr T Threonine
Trp W Tryptophan
Tyr Y Tyrosine
Val V Valine
Asx B Aspartic acid or Asparagine
Glx Z Glutamine or Glutamic acid.
Xaa X Any amino acid.
Xle J Leucine or Isoleucine
TERM termination codon
7.5.4 Modified and unusual Amino Acids
Abbreviation Amino acid
------------ ---------
Aad 2-Aminoadipic acid
bAad 3-Aminoadipic acid
bAla beta-Alanine, beta-Aminoproprionic acid
Abu 2-Aminobutyric acid
4Abu 4-Aminobutyric acid, piperidinic acid
Acp 6-Aminocaproic acid
Ahe 2-Aminoheptanoic acid
Aib 2-Aminoisobutyric acid
bAib 3-Aminoisobutyric acid
Apm 2-Aminopimelic acid
Dbu 2,4-Diaminobutyric acid
Des Desmosine
Dpm 2,2'-Diaminopimelic acid
Dpr 2,3-Diaminoproprionic acid
EtGly N-Ethylglycine
EtAsn N-Ethylasparagine
Hyl Hydroxylysine
aHyl allo-Hydroxylysine
3Hyp 3-Hydroxyproline
4Hyp 4-Hydroxyproline
Ide Isodesmosine
aIle allo-Isoleucine
MeGly N-Methylglycine, sarcosine
MeIle N-Methylisoleucine
MeLys 6-N-Methyllysine
MeVal N-Methylvaline
Nva Norvaline
Nle Norleucine
Orn Ornithine
OTHER (requires /note=)
7.5.5 Genetic Code Tables
Authority International Nucleotide Sequence Database Collaboration
Contact NCBI
Scope /transl_table qualifier
URL http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?mode=c
Genetic Code [1]
Standard Code (transl_table=1)
AAs = FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
Starts = ---M---------------M---------------M----------------------------
Base1 = TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
Base2 = TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
Base3 = TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
Genetic Code [2]
Vertebrate Mitochondrial Code (transl_table=2)
AAs = FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSS**VVVVAAAADDEEGGGG
Starts = --------------------------------MMMM---------------M------------
Base1 = TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
Base2 = TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
Base3 = TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
Genetic Code [3]
Yeast Mitochondrial Code (transl_table=3)
AAs = FFLLSSSSYY**CCWWTTTTPPPPHHQQRRRRIIMMTTTTNNKKSSRRVVVVAAAADDEEGGGG
Starts = ----------------------------------MM----------------------------
Base1 = TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
Base2 = TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
Base3 = TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
Genetic Code [4]
Mold, Protozoan, Coelenterate Mitochondrial Code & Mycoplasma/Spiroplasma
Code (transl_table=4)
AAs = FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
Starts = --MM---------------M------------MMMM---------------M------------
Base1 = TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
Base2 = TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
Base3 = TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
Genetic Code [5]
Invertebrate Mitochondrial Code (transl_table=5)
AAs = FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSSSSVVVVAAAADDEEGGGG
Starts = ---M----------------------------MMMM---------------M------------
Base1 = TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
Base2 = TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
Base3 = TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
Genetic Code [6]
Ciliate, Dasycladacean and Hexamita Nuclear Code (transl_table=6)
AAs = FFLLSSSSYYQQCC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
Starts = -----------------------------------M----------------------------
Base1 = TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
Base2 = TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
Base3 = TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
Genetic Code [9]
Echinoderm and Flatworm Mitochondrial Code (transl_table=9)
AAs = FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIIMTTTTNNNKSSSSVVVVAAAADDEEGGGG
Starts = -----------------------------------M---------------M------------
Base1 = TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
Base2 = TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
Base3 = TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
Genetic Code [10]
Euplotid Nuclear Code (transl_table=10)
AAs = FFLLSSSSYY**CCCWLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
Starts = -----------------------------------M----------------------------
Base1 = TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
Base2 = TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
Base3 = TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
Genetic Code [11]
Bacterial and Plant Plastid Code (transl_table=11)
AAs = FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
Starts = ---M---------------M------------MMMM---------------M------------
Base1 = TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
Base2 = TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
Base3 = TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
Genetic Code [12]
Alternative Yeast Nuclear Code (transl_table=12)
AAs = FFLLSSSSYY**CC*WLLLSPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
Starts = -------------------M---------------M----------------------------
Base1 = TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
Base2 = TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
Base3 = TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
Genetic Code [13]
Ascidian Mitochondrial Code (transl_table=13)
AAs = FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSSGGVVVVAAAADDEEGGGG
Starts = ---M------------------------------MM---------------M------------
Base1 = TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
Base2 = TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
Base3 = TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
Genetic Code [14]
Alternative Flatworm Mitochondrial Code (transl_table=14)
AAs = FFLLSSSSYYY*CCWWLLLLPPPPHHQQRRRRIIIMTTTTNNNKSSSSVVVVAAAADDEEGGGG
Starts = -----------------------------------M----------------------------
Base1 = TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
Base2 = TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
Base3 = TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
Genetic Code [15]
Blepharisma Nuclear Code (transl_table=15)
AAs = FFLLSSSSYY*QCC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
Starts = -----------------------------------M----------------------------
Base1 = TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
Base2 = TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
Base3 = TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
Genetic Code [16]
Chlorophycean Mitochondrial Code (transl_table=16)
AAs = FFLLSSSSYY*LCC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
Starts = -----------------------------------M----------------------------
Base1 = TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
Base2 = TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
Base3 = TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
Genetic Code [21]
Trematode Mitochondrial Code (transl_table=21)
AAs = FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNNKSSSSVVVVAAAADDEEGGGG
Starts = -----------------------------------M---------------M------------
Base1 = TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
Base2 = TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
Base3 = TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
Genetic Code [22]
Scenedesmus obliquus mitochondrial
AAs = FFLLSS*SYY*LCC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
Starts = -----------------------------------M----------------------------
Base1 = TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
Base2 = TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
Base3 = TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
Genetic Code [23]
Thraustochytrium Mitochondrial Code (transl_table=23)
AAs = FF*LSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
Starts = --------------------------------M--M---------------M------------
Base1 = TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
Base2 = TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
Base3 = TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
7.5.6 Country Names
Authority International Nucleotide Sequence Database Collaboration
Contact INSDC member databases
Scope /country qualifier
URL http://www.insdc.org/page.php?page=country
Feature Table Definition
Version 8 Oct, 2008