Annotation Submission
From AAAWiki
Submitted annotation data sets, per species/assembly. See Annotation Coordination for data conventions. DpseRec and DyakRec are the alternate reconciled assemblies.
Protein coding gene annotations are mirrored here as of 2006-06-02. Some things (names etc) have been changed to produce a uniform structure.
Contents |
DGIL
- SNAP gene predictions, see these notes: README
- With Ian Korf's kind help, I've added a prediction set with SNAP using Dmel protein homologies to train and guide gene calls. This produces a closer gene mapping where there is homology, yet retains unique gene calls in non-homologous regions. This SNO set generally has higher exon sensitivity and specificity than the SNP set. --Dongilbert 10:28, 30 May 2006 (PDT)
- The DGIL_SNO and DGIL_SNP prediction GFF files now have phase values added at ftp://eugenes.org/eugenes/genomes/caf1a/ See README for details --Dongilbert 12:15, 14 August 2006 (PDT)
- Species annotation directory ftp://eugenes.org/eugenes/genomes/caf1a/ including protein, transcript prediction Fasta
- GFF SNO: dana dere dgri dmel dmoj dper dpse dsec dsim dvir dwil dyak 30May06
- GFF SNP: dana dere dgri dmel dmoj dper dpse dsec dsim dvir dwil dyak 14May06
- Corrected the GFF on 14 may 06 for dana,dere,dgri,dmoj,dvir to use current CAF1 scaffolds, see README
PACH
Gene Predictions
- GeneMapper protein coding and 5'/3' UTR annotations (README)
- Download location: directory
dana dere dgri dmoj dper dpse dsec dsim dvir dwil dyak [dyakrec] and [dpserec] -- not done
Transposable Element Predictions
- TE annotations based on multiple alignment insertion signatures by Caspi and Pachter
- Download location: directory and description
dana dere dgri dmel dmoj dper dpse dsec dsim dvir dwil dyak
OXFD
Transcript and gene predictions
- Download location: directory
dana dere dgri dmoj dper dpse dsec dsim dvir dwil dyak [dyakrec] and [dpserec] -- not done
Orthologs and multiple alignments
Please find further annotations here.
Further annotations include
- ortholog sets : built by clustering pairwise orthology assignments from PhyOp
- multiple alignments : multiple alignments of ortholog sets using dialign and muscle, including bootstraps, dn and ds values and trees.
- codonbias : various codon bias indices (CAI, ENC, ...) and sequence properties (GC content, GC3 content) of predicted transcripts.
- predictions : complete set of predictions including pseudogenes (which were not part of the gene set submitted for consensus annotation)
The readme has more information.
Please let us know if you find this data useful. We would very much welcome suggestions and bug reports.
Codon usage
Supplementary information to our manuscript in Genetics (Heger & Ponting, Variable strength of translational selection among 12 Drosophila species. Genetics. 2007 Nov;177(3):1337-48.) can be found at our web server
ROBI
- Species annotation files [1]
- GST annotations
dana dere dgri dmel dmoj dper dpse dpserec dsec dsim dvir dwil dyak dyakrec
- CYP450 annotations
dana dere dgri dmel dmoj dper dpse dsec dsim dvir dwil dyak
EISE
- Updated 8th June 2006
- readme
Gene Models GFF3
- GeneMapper directory dsim dsec dere dyak dyakrecdana dpse dpserecdper dwil dvir dmoj dgri
- Exonerate directory dsim dsec dere dyak dyakrec dana dpse dpserec dper dwil dvir dmoj dgri
- GeneWise directory dsim dsec dere dyak dyakrec dana dpse dpserec dper dwil dvir dmoj dgri
Translations FASTA
- GeneMapper directory dsim dsec dere dyak dyakrecdana dpse dpserecdper dwil dvir dmoj dgri
- Exonerate directory dsim dsec dere dyak dyakrec dana dpse dpserec dper dwil dvir dmoj dgri
- GeneWise directory dsim dsec dere dyak dyakrec dana dpse dpserec dper dwil dvir dmoj dgri
CDS FASTA
- GeneMapper directory dsim dsec dere dyak dyakrecdana dpse dpserecdper dwil dvir dmoj dgri
- Exonerate directory dsim dsec dere dyak dyakrec dana dpse dpserec dper dwil dvir dmoj dgri
- GeneWise directory dsim dsec dere dyak dyakrec dana dpse dpserec dper dwil dvir dmoj dgri
INPARANOID Orthology-Paralogy
- GeneMapper directory dsim dsec dere dyak dyakrecdana dpse dpserecdper dwil dvir dmoj dgri
- Exonerate directory dsim dsec dere dyak dyakrec dana dpse dpserec dper dwil dvir dmoj dgri
- GeneWise directory dsim dsec dere dyak dyakrec dana dpse dpserec dper dwil dvir dmoj dgri
Fuzzy Reciprocal BLAST Orthology-Paralogy
- GeneMapper directory dsim dsec dere dyak dyakrecdana dpse dpserecdper dwil dvir dmoj dgri
- Exonerate directory dsim dsec dere dyak dyakrec dana dpse dpserec dper dwil dvir dmoj dgri
- GeneWise directory dsim dsec dere dyak dyakrec dana dpse dpserec dper dwil dvir dmoj dgri
Also coming:
- Pairwise T_COFFEE protein alignments
- dmel-dxxx, dxxx-dxxx, dxxx-dmel, and dmel-dmel BLASTP results
- Synteny blocks
- Retrieval of regions orthologous to a set of dmel coordinates using synteny and blast.
GOLD
This is an annotation of the 11 Drosophila species of the kayak gene region using D. melanogaster as the reference sequence.
dana dper dere dsec dyak dpse dwil dmoj dvir dgri dsim
NCBI
- See these notes: ftp://ftp.ncbi.nih.gov/genomes/Drosophila_melanogaster/special_requests/CAF1/README
- Species annotation files:
dana dere dgri dmel dmoj dper dpse dpserec dsec dsim dvir dwil dyak dyakrec
TRNA
- tRNA gene predictions (README)
- Download location: CAF1_TRNA_ATT.tgz
dana dere dgri dmoj dper dpse dsec dsim dvir dwil dyak dpserec dyakrec
OLIV
- For description and details see the Gene Validation page.
- Gene/Probe level validation by expression analysis (v.3.0, Nov 29, 2006)):
- Probes with detectable signal on species-specific array (v. 2.0, May, 2006):
dsim dyak dyakrec dana dpse dpserec dmoj dvir
BATZ
- Gene predictions on all 12 species using CONTRAST with no alignment information:
- Updated 5/31: All the coordinates in the previous files were mangled. The problem should be fixed now.
dana dere dgri dmel dmoj dper dpse dsec dsim dvir dwil dyak
- Gene predictions on melanogaster using CONTRAST with a multiple alignment of 7 species (more accurate, especially on coding region boundaries):
BREN
- Gene predictions on 11 species using N-SCAN with melanogaster alignments from dmel_caf1
dana dere dgri dmoj dper dpse dsec dsim dvir dwil dyak
RFAM
- Rfam/INFERNAL predictions of non-coding RNAs in 12 CAF1 assemblies (README):
dana dere dgri dmel dmoj dper dpse dsec dsim dvir dwil dyak
- Predicted homologs of verified melanogaster miRNAs (README):
dana dere dgri dmel dmoj dper dpse dsec dsim dvir dwil dyak
- Mapping of annotated melanogaster snoRNAs (README):
dana dere dgri dmel dmoj dper dpse dsec dsim dvir dwil dyak
RGUI
- geneid and SGP2 predictions (GID,GUT,SGP)
- Annotation directories GID, GUT, SGP2 (coming soon)
- GID GFF : dana dere dgri dmel dmoj dper dpse dsec dsim dvir dwil dyak
MAKA
- Annotation of spliceosomal snRNA genes using BLASTN, MFOLD, and manual inspection in 14 CAF1 assemblies
dana dere dgri dmel dmoj dper dpse dsec dsim dvir dwil dyak dpserec Site Map