NCBI Logo

Accession Number prefixes: Where are the sequences from?

Entrez PubMed BLAST Books OMIM Taxonomy Structure

spacer SITE MAP
Guide to NCBI resources

GenBank
Sequence submission support and software

BankIt

Sequin

Third Party Annotation
TPA database


International Nucleotide Sequence Database Collaboration
DDBJ/EMBL/GenBank

spacer

DDBJ/EMBL/GenBank Accession Prefix Format

The format for GenBank Accession numbers are:

Nucleotide:1 letter + 5 numerals OR 2 letters + 6 numerals
Protein:3 letters + 5 numerals
WGS:4 letters + 2 numerals for WGS assembly version + 6-8 numerals
MGA:5 letters + 7 numerals

The International Nucleotide Sequence Database Collaboration DDBJ/EMBL/GenBank all receive sequence submissions, assign accessions, and exchange data so that all three groups represent the total collection. The accession assignment process is managed by prior agreement within the collaboration on which group will 'own' which accession prefix. This list of accession number prefixes should be used as a guide. There are cases where these assignments are not adhered to. For instance, there are ESTs and GSSs from GenBank that have the prefix for Direct submissions.

Allocation of Accession Prefixes

Nucleotide Accession Prefixes
Prefix Database Type
BA,DF,DG DDBJ CON division
AN EMBL CON division
CH,CM,DS,EM, EN,EP,EQ,FA, GG,GL NCBI CON division
C,AT,AU,AV,BB, BJ,BP,BW,BY,CI, CJ,DA,DB,DC, DK,FS DDBJ EST
F EMBL EST
H,N,T,R,W,AA,AI, AW,BE,BF,BG, BI,BM,BQ,BU, CA,CB,CD,CF, CK,CN,CO,CV, CX,DN,DR,DT, DV,DY,EB,EC, EE,EG,EH,EL, ES,EV,EW,EX, EY,FC,FD,FE, FF,FG,FK,FL, GD,GE,GH,GO GenBank EST
D,AB DDBJ Direct submissions
V,X,Y,Z,AJ,AM, FM EMBL Direct submissions
U,AF,AY,DQ,EF, EU,FJ GenBank Direct submissions
AP DDBJ Genome project data
BS DDBJ Chimpanzee genome data
AL,BX,CR,CT, CU EMBL Genome project data
AE,CP,CY GenBank Genome project data
AG,DE,DH,FT DDBJ GSS
B,AQ,AZ,BH,BZ, CC,CE,CG,CL, CW,CZ,DU,DX, ED,EI,EJ,EK, ER,ET,FH,FI GenBank GSS
AK DDBJ cDNA projects
AC,DP GenBank HTGS
E,BD,DD,DI,DJ, DL,DM DDBJ Patents
A,AX,CQ,CS,FB, GM,GN EMBL Patents (nucleotide only)
I,AR,DZ,EA,GC, GP GenBank Patents (nucleotide)
G,BV,GF GenBank STS
BR DDBJ TPA
BN EMBL TPA
EZ GenBank TSA
S GenBank From journal scanning
AD GenBank From GSDB
AH GenBank Segmented set header
AS GenBank Other - not currently being used
BC GenBank MGC project
BK GenBank TPA
BL,GJ,GK GenBank TPA CON division
BT GenBank FLI-cDNA projects
J,K,L,M GenBank from GSDB direct submissions
N GenBank and DDBJ N0-N2 were used intially by both groups but have been removed from circulation, N2-N9 are ESTs
AAAA-AZZZ GenBank WGS
BAAA-BZZZ DDBJ WGS
CAAA-CZZZ EMBL WGS
DAAA-DZZZ GenBank WGS TPA
AAAAA-AZZZZ DDBJ MGA

Protein Accession Prefixes
Prefix Database Type
BAA-BZZ DDBJ Protein ID
CAA-CZZ EMBL Protein ID
AAA-AZZ GenBank Protein ID
AAE GenBank Protein ID for Patents (note that there are also some patent proteins with AAA and AAC
FAA_FZZ DDBJ TPA Protein ID
DAA-DZZ GenBank TPA Protein ID
GAA-GZZ DDBJ WGS Protein ID
EAA-EZZ GenBank WGS Protein ID
HAA-HZZ GenBank TPA WGS Protein ID
O Swiss-Prot Protein
P Swiss-Prot (Geneva) Protein
Q Swiss-Prot (Hinxton) Protein

RefSeq Accession Format

The RefSeq projects are NCBI sequence annotation projects and are not part of DDBJ/EMBL/GenBank. RefSeq accession numbers can be distinguished from GenBank accessions by their distinct format of an underbar in the third position.

 

Questions or Comments?
Write to the NCBI Service Desk

Revised March 2, 2009.