SITE MAP
Guide to NCBI resources
GenBank
Sequence submission support and software
BankIt
Sequin
Third Party Annotation
TPA database
International Nucleotide Sequence Database Collaboration
DDBJ/EMBL/GenBank
|
|
DDBJ/EMBL/GenBank Accession Prefix Format |
The format for GenBank Accession numbers are:
Nucleotide: | 1 letter + 5 numerals OR 2 letters + 6 numerals |
Protein: | 3 letters + 5 numerals |
WGS: | 4 letters + 2 numerals for WGS assembly version + 6-8 numerals |
MGA: | 5 letters + 7 numerals |
The International Nucleotide Sequence Database Collaboration DDBJ/EMBL/GenBank all receive sequence submissions, assign accessions, and exchange data so that all three groups represent the total collection. The accession assignment process is managed by prior agreement within the collaboration on which group will 'own' which accession prefix. This list of accession number prefixes should be used as a guide. There are cases where these assignments are not adhered to. For instance, there are ESTs and GSSs from GenBank that have the prefix for Direct submissions.
Allocation of Accession Prefixes
Nucleotide Accession Prefixes
Prefix |
Database |
Type |
|
BA,DF,DG |
DDBJ |
CON division |
|
AN |
EMBL |
CON division |
|
CH,CM,DS,EM, EN,EP,EQ,FA, GG,GL |
NCBI |
CON division |
|
C,AT,AU,AV,BB, BJ,BP,BW,BY,CI, CJ,DA,DB,DC, DK,FS |
DDBJ |
EST |
|
F |
EMBL |
EST |
|
H,N,T,R,W,AA,AI, AW,BE,BF,BG, BI,BM,BQ,BU, CA,CB,CD,CF, CK,CN,CO,CV, CX,DN,DR,DT, DV,DY,EB,EC, EE,EG,EH,EL, ES,EV,EW,EX, EY,FC,FD,FE, FF,FG,FK,FL, GD,GE,GH,GO |
GenBank |
EST |
|
D,AB |
DDBJ |
Direct submissions |
|
V,X,Y,Z,AJ,AM, FM |
EMBL |
Direct submissions |
|
U,AF,AY,DQ,EF, EU,FJ |
GenBank |
Direct submissions |
|
AP |
DDBJ |
Genome project data |
|
BS |
DDBJ |
Chimpanzee genome data |
|
AL,BX,CR,CT, CU |
EMBL |
Genome project data |
|
AE,CP,CY |
GenBank |
Genome project data |
|
AG,DE,DH,FT |
DDBJ |
GSS |
|
B,AQ,AZ,BH,BZ, CC,CE,CG,CL, CW,CZ,DU,DX, ED,EI,EJ,EK, ER,ET,FH,FI |
GenBank |
GSS |
|
AK |
DDBJ |
cDNA projects |
|
AC,DP |
GenBank |
HTGS |
|
E,BD,DD,DI,DJ, DL,DM |
DDBJ |
Patents |
|
A,AX,CQ,CS,FB, GM,GN |
EMBL |
Patents (nucleotide only) |
|
I,AR,DZ,EA,GC, GP |
GenBank |
Patents (nucleotide) |
|
G,BV,GF |
GenBank |
STS |
|
BR |
DDBJ |
TPA |
|
BN |
EMBL |
TPA |
|
EZ |
GenBank |
TSA |
|
S |
GenBank |
From journal scanning |
|
AD |
GenBank |
From GSDB |
|
AH |
GenBank |
Segmented set header |
|
AS |
GenBank |
Other - not currently being used |
|
BC |
GenBank |
MGC project |
| |
BK |
GenBank |
TPA |
|
BL,GJ,GK |
GenBank |
TPA CON division |
|
BT |
GenBank |
FLI-cDNA projects |
|
J,K,L,M |
GenBank |
from GSDB direct submissions |
|
N |
GenBank and DDBJ |
N0-N2 were used intially by both groups but have been removed from circulation, N2-N9 are ESTs |
|
AAAA-AZZZ |
GenBank |
WGS |
|
BAAA-BZZZ |
DDBJ |
WGS |
|
CAAA-CZZZ |
EMBL |
WGS |
|
DAAA-DZZZ |
GenBank |
WGS TPA |
|
AAAAA-AZZZZ |
DDBJ |
MGA |
|
Protein Accession Prefixes
Prefix |
Database |
Type |
|
BAA-BZZ |
DDBJ |
Protein ID |
|
CAA-CZZ |
EMBL |
Protein ID |
|
AAA-AZZ |
GenBank |
Protein ID |
|
AAE |
GenBank |
Protein ID for Patents (note that there are also some patent proteins with AAA and AAC |
|
FAA_FZZ |
DDBJ |
TPA Protein ID |
|
DAA-DZZ |
GenBank |
TPA Protein ID |
|
GAA-GZZ |
DDBJ |
WGS Protein ID |
|
EAA-EZZ |
GenBank |
WGS Protein ID |
|
HAA-HZZ |
GenBank |
TPA WGS Protein ID |
|
O |
Swiss-Prot |
Protein |
|
P |
Swiss-Prot (Geneva) |
Protein |
|
Q |
Swiss-Prot (Hinxton) |
Protein |
|
RefSeq Accession Format |
The RefSeq projects are NCBI sequence annotation projects and are not part of DDBJ/EMBL/GenBank. RefSeq accession numbers can be distinguished from GenBank accessions by their distinct format of an underbar in the third position.
Questions or
Comments?
Write to the NCBI Service Desk
Revised March 2, 2009.
|