From wolfsber@aspen Mon Mar 17 10:36:01 1997 To: ncbi-seminar@aspen Subject: Tuesday seminar Mime-Version: 1.0 X-IMAPbase: 1000759414 1 Status: O X-Status: X-Keywords: X-UID: 1 NCBI/CBB Seminar, Tues. March 18, 11 am Building 38A, 8th floor conference room I will be giving a two part seminar. A Comparison of Expressed Sequence Tags (ESTs) to Human Genomic Sequences Tyra G. Wolfsberg and David Landsman The Expressed Sequence Tag (EST) division of GenBank, dbEST, is a large repository of the data being generated by human genome sequencing centers. ESTs are short, single pass cDNA sequences generated from randomly selected library clones. The ~415,000 human ESTs represent a valuable, low priced, and easily accessible biological reagent. As many ESTs are derived from yet uncharacterized genes, dbEST is a prime starting point for the identification of novel mRNAs. Conversely, other genes are represented by hundreds of ESTs, a redundancy which may provide data about rare mRNA isoforms. Here we present an analysis of >1000 ESTs generated by the WashU-Merck EST project. These ESTs were collected by querying dbEST with the genomic sequences of 15 human genes. When we aligned the matching ESTs to the genomic sequences, we found that in one gene, 73% of the ESTs which derive from spliced or partially spliced transcripts either contain intron sequences or are spliced at previously unreported sites; other genes have lower percentages of such ESTs, and some have none. This finding suggests that ESTs could provide researchers with novel information about alternative splicing in certain genes. In a related analysis of pairs of ESTs which are reported to derive from a single gene, we found that as many as 26% of the pairs do not BOTH align with the sequence of the same gene. We suspect that some of these unusual ESTs result from artifacts in EST generation, and caution researchers that they may find such clones while analyzing sequences in dbEST. Analysis and classification of upstream regulatory sequences in Saccharomyces cerevisiae Tyra Wolfsberg, Chip Lawrence, and David Landsman The availability of the complete sequence of Saccharomyces cerevisiae allows for the first time a complete analysis of the genetic regulatory elements present in a eukaryote. We plan to carry out a large scale classification of yeast genes based on the sequence properties of their upstream regulatory elements. We will approach our analysis from a number of different directions, including: 1) classification of genes into groups based on, for example, coordinate gene expression or participation in common biochemical pathways, and determination of the sequence elements shared within and between the groups; 2) analysis of the upstream regulatory sequences from members of gene families to assess whether changes in these sequences may have led to a specialization of gene function; 3) characterization of the promoters of uncharacterized ORFs in order to predict their functions. As a first step in this project, we have collected a set of ~6000 potential yeast promoter sequences. Further analysis will be carried out using the Gibbs sampler for DNA sequences.