From: ncbi-seminar-admin@ncbi.nlm.nih.gov on behalf of Anastasia Nikolskaya [nikolska@ncbi.nlm.nih.gov] Sent: Thursday, April 11, 2002 9:21 AM To: ncbi-seminar@ncbi.nlm.nih.gov Subject: Seminar reminder - David Sherman, 3 pm today Bldg. 38A, 5th floor conference room Thursday April 11, 3 PM The Genolevures genomics project David J. Sherman Laboratoire Bordelais de Recherche en Informatique Universite de Bordeaux The Genolevures genomics project is a large-scale comparison between S. cerevisiae and thirteen other yeast species of the various branches of the Hemiascomycetous class, conducted by seven French laboratories (Genoscope, Pasteur Institute, INRA INA-PG, Bordeaux, Lyon, Orsay, and Strasbourg) and now financed by the CNRS. We generated a large set of novel data (50 mbp) by sequencing random genomic libraries from these species, and performed a complete manual annotation by means of exhaustive Blastx comparisons with S.cerevisiae and a collection of complet e proteomes. Results (21 articles) were published in a special issue of FEBS Letters. My talk would first address some of the biological conclusions of this study. Of the more interesting are: a model of molecular evolution in the Hemiascomycetes based on the reiteration of chromosome segment duplication, that creates transient merodiploids that are subsequently resolved by gene deletion events; identification of 1892 ascomycetes-specific genes (half of which are already functionally characterized by S.cerevisiae) plus a total of some 20000 new genes among the thirteen species; and a degree of gene redundancy conserved across species that argues for a dynamic equilibrium of numerous duplication and deletion events rather than a massive duplicati= on occuring in some branches but not others. I would also present the practical issues of designing and organizing a genomics database for large-scale genome comparison and the user-oriented services that that entails. This work was performed under my responsibility in the Center for Bioinformatics at Bordeaux, and continues to grow with new annotations and new data. We are currently in the process of preparing for the integration of four new complete genomes that we will sequence in the coming year. Several ideas about presentation of cross-species data that moves away from a simple `annotated sequence' approach would round out this second part. --