Annotation Submission

From AAAWiki

Submitted annotation data sets, per species/assembly. See Annotation Coordination for data conventions. DpseRec and DyakRec are the alternate reconciled assemblies.

Protein coding gene annotations are mirrored here as of 2006-06-02. Some things (names etc) have been changed to produce a uniform structure.


Contents

DGIL

  • SNAP gene predictions, see these notes: README
    • With Ian Korf's kind help, I've added a prediction set with SNAP using Dmel protein homologies to train and guide gene calls. This produces a closer gene mapping where there is homology, yet retains unique gene calls in non-homologous regions. This SNO set generally has higher exon sensitivity and specificity than the SNP set. --Dongilbert 10:28, 30 May 2006 (PDT)
    • The DGIL_SNO and DGIL_SNP prediction GFF files now have phase values added at ftp://eugenes.org/eugenes/genomes/caf1a/ See README for details --Dongilbert 12:15, 14 August 2006 (PDT)

PACH

Gene Predictions

dana dere dgri dmoj dper dpse dsec dsim dvir dwil dyak [dyakrec] and [dpserec] -- not done

Transposable Element Predictions

  • TE annotations based on multiple alignment insertion signatures by Caspi and Pachter
  • Download location: directory and description

dana dere dgri dmel dmoj dper dpse dsec dsim dvir dwil dyak

OXFD

Transcript and gene predictions

dana dere dgri dmoj dper dpse dsec dsim dvir dwil dyak [dyakrec] and [dpserec] -- not done

Orthologs and multiple alignments

Please find further annotations here.

Further annotations include

  • ortholog sets : built by clustering pairwise orthology assignments from PhyOp
  • multiple alignments : multiple alignments of ortholog sets using dialign and muscle, including bootstraps, dn and ds values and trees.
  • codonbias : various codon bias indices (CAI, ENC, ...) and sequence properties (GC content, GC3 content) of predicted transcripts.
  • predictions : complete set of predictions including pseudogenes (which were not part of the gene set submitted for consensus annotation)

The readme has more information.

Please let us know if you find this data useful. We would very much welcome suggestions and bug reports.

Codon usage

Supplementary information to our manuscript in Genetics (Heger & Ponting, Variable strength of translational selection among 12 Drosophila species. Genetics. 2007 Nov;177(3):1337-48.) can be found at our web server

ROBI

  • Species annotation files [1]
  • GST annotations

dana dere dgri dmel dmoj dper dpse dpserec dsec dsim dvir dwil dyak dyakrec

  • CYP450 annotations

dana dere dgri dmel dmoj dper dpse dsec dsim dvir dwil dyak

EISE

  • Updated 8th June 2006
  • readme

Gene Models GFF3

Translations FASTA

CDS FASTA

INPARANOID Orthology-Paralogy

Fuzzy Reciprocal BLAST Orthology-Paralogy

Orthology-Paralogy Statistics

Also coming:

  • Pairwise T_COFFEE protein alignments
  • dmel-dxxx, dxxx-dxxx, dxxx-dmel, and dmel-dmel BLASTP results
  • Synteny blocks
  • Retrieval of regions orthologous to a set of dmel coordinates using synteny and blast.

GOLD

This is an annotation of the 11 Drosophila species of the kayak gene region using D. melanogaster as the reference sequence.

dana dper dere dsec dyak dpse dwil dmoj dvir dgri dsim

NCBI

dana dere dgri dmel dmoj dper dpse dpserec dsec dsim dvir dwil dyak dyakrec

TRNA

dana dere dgri dmoj dper dpse dsec dsim dvir dwil dyak dpserec dyakrec

OLIV

  • For description and details see the Gene Validation page.
  • Gene/Probe level validation by expression analysis (v.3.0, Nov 29, 2006)):

dsim dyak dana dpse dmoj dvir

  • Probes with detectable signal on species-specific array (v. 2.0, May, 2006):

dsim dyak dyakrec dana dpse dpserec dmoj dvir

BATZ

  • Gene predictions on all 12 species using CONTRAST with no alignment information:
    • Updated 5/31: All the coordinates in the previous files were mangled. The problem should be fixed now.

dana dere dgri dmel dmoj dper dpse dsec dsim dvir dwil dyak


  • Gene predictions on melanogaster using CONTRAST with a multiple alignment of 7 species (more accurate, especially on coding region boundaries):

dmel

BREN

  • Gene predictions on 11 species using N-SCAN with melanogaster alignments from dmel_caf1

dana dere dgri dmoj dper dpse dsec dsim dvir dwil dyak

RFAM

  • Rfam/INFERNAL predictions of non-coding RNAs in 12 CAF1 assemblies (README):

dana dere dgri dmel dmoj dper dpse dsec dsim dvir dwil dyak

  • Predicted homologs of verified melanogaster miRNAs (README):

dana dere dgri dmel dmoj dper dpse dsec dsim dvir dwil dyak

  • Mapping of annotated melanogaster snoRNAs (README):

dana dere dgri dmel dmoj dper dpse dsec dsim dvir dwil dyak

RGUI

dyak dpserec dyakrec

dyak dpserec dyakrec

MAKA

  • Annotation of spliceosomal snRNA genes using BLASTN, MFOLD, and manual inspection in 14 CAF1 assemblies

dana dere dgri dmel dmoj dper dpse dsec dsim dvir dwil dyak dpserec Site Map