6th International Workshop
on the Identification of Transcribed Sequences

October 3-5, 1996 Edinburgh, Scotland

SESSION 3


Return to IWITS 1996 Homepage

  1. Gene identification on human chromosome 7q

    S.W. Scherer(1), J. M. Rommens(1,2), S. Soder(1), G. Traverso(1), J., McArthur-Morrison(1), I. Wing-Yuk Szeto(1), L. Osborne(1), E. Belloni(1), H.H.Q. Heng(1), D.W. Martindale(3), B.F. Koop(3) and L.-C. Tsui(1,2)
    (1)Department of Genetics, Research Institute, The Hospital for Sick Children (2)Department of Molecular and Medical Genetics, University of Toronto, Toronto, Canada; (3)University of Victoria, British Columbia, Canada

    Using a multi-level mapping approach we have constructed an integrated genetic and physical map of the long arm of human chromosome 7 (7q). The map was generated by grouping over 1600 DNA markers to 30 different intervals with the use of somatic cell hybrids and rearrangement breakpoints and then ordering these amongst 2,443 YAC clones, 2400 cosmids, and 310 PAC clones. We estimate that over 95% of 7q is now covered in contigs. This mapping strategy has allowed the incorporation of physical and genetic landmarks into a fully integrated map providing a molecular framework for large-scale gene identification and mapping, and positional cloning of disease genes. So far 170 known genes and over 200 ESTs that were previously mapped to chromosome 7 have been positioned on our map. In order to isolate and map the remainder of the estimated 2500 genes on chromosome 7q three strategies are being used: (1) we are continuing to place all the genes and ESTs in the public domain on our chromosome 7 map; (2) genomic DNA sequencing: our mapping study has identified at least 17 regions of 7q that are not represented in the YAC map. Subsequent analysis has indicated that the majority of the "gap" regions can be filled with cosmids and/or PACs and that these regions are usually gene rich. Therefore these clone contigs are also being used as high resolution sequence-ready maps for gene discovery. In addition, positional cloning experiments of disease gene loci including Williams Syndrome (7q11.23), NIDDM (7q21.3), SHFM1/EEC (7q21.3-q22.1), SLOS (7q32.1), XRCC2 (7q36.1), HPE3 (7q36) and cancer associated regions mapped to 7q22, 7q31.2 and 7q34 are underway. While the techniques of direct cDNA selection, exon-amplification, and searching for evolutionarily conserved sequences have yielded several candidate genes for these loci, we have found that genomic DNA sequencing is a more robust method of gene identification. For example, >500 kb of DNA commonly deleted in Williams-Beuren syndrome patients at q11.23 was sequenced (see abstract by Osborne et al.) and 6 transcribed sequences were found that were not identified using the other techniques. (3) In a global approach to isolate transcribed sequences from chromosome 7 we have performed direct cDNA selection experiments on tiling paths of contiguous cosmid clones spanning regions of 7q and a chromosome 7-specific cosmid library. To maximize the number of different chromosome 7 transcribed sequences retrieved we have limited the amount of genomic DNA in each selection experiment to < 3 Mb and then subjected this to two rounds of hybridization with primary cDNAs pools from 12 fetal and adult tissues. In each experiment >90% of the clones isolated mapped back to human chromosome 7. Already over 500 cDNA fragments have been sequenced and mapped back to defined positions on chromosome 7. Our data is released at regular intervals at: http://www.genet.sickkids.on.ca/chromosome7/. Supported by the Canadian Genome Analysis & Technology Program.

  2. Use of a novel consensus sequence for regulon mapping

    Gayle E. Woloschak, Tatjana Paunesku, Aleksandar Milosavljevic
    Argonne National Laboratory, Center for Mechanistic Biology and Biotechnology, Argonne, IL, USA CuraGen Corporation, 322 East Main Street, Branford, CT, USA

    In the process of identifying genes differentially expressed in ultraviolet-exposed cells, we identified a transcript having a 25-base-pair region that is highly conserved among a variety of species, including Bacillus circulans, pumpkin, yeast, Drosophila, mouse, and man. When in the 5' region (flanking region or UTR) of a gene, the sequence is predominantly in +/+ orientation with respect to the coding DNA strand; while in coding region and the 3' region (UTR), the sequence is most frequently in the */+ orientation with respect to the coding DNA strand. In two genes, the element is split into two parts; however, in most cases, it is found only once but with a minimum of 11 consecutive nucleotides precisely depicting the original sequence. It is found in a large number of different genes with diverse functions (from human ras p21 to B. circulans chitosanase). Gel shift assays demonstrated the presence of a protein in HeLa cell extracts that binds to the sense and antisense single-stranded consensus oligomers, as well as to double-stranded oligonucleotide. Because of its location in transcripts, this consensus element can be used for mapping cDNAs that comprise the regulon governed by the element. Experiments are underway using high-density membranes to identify cDNAs with the sequence. This work was supported by the U.S. Department of Energy, Office of Health and Environmental Research, under Contract No. W-31-109-ENG-38 and NIH grant #ES-07141-02.

  3. The Cot effect and other quantitative PCR issues

    Francoise Mathieu-Daude, John Welsh, Thomas Vogt, Rhonda Honeycutt, Karen Evans, Frank Kullmann, and Michael McClelland
    Sidney Kimmel Cancer Center, San Diego, CA, USA

    RNA fingerprinting by Arbitrarily Primed Polymerase Chain Reaction (RAP-PCR), and its variant, Differential Display (DD), are effective tools for studying differential gene expression in a variety of experimental systems. Once a differentially regulated gene is identified, it is usually necessary to verify its expression pattern independently. Therefore, we have explored several novel quantitative PCR concepts.

    Relative quantitation by low stringency PCR: When a change in the abundance of a specific message is of interest, relative quantitation can be achieved by low stringency PCR using primers directed toward the proper sequence, but encouraged to engage in limited mismatch priming. The protocol is similar to RNA fingerprinting, except that specific (rather than arbitrary) primers are used, and reactions are stopped after various cycle numbers, as in other quantitation protocols. Products from mismatch priming serve as internal controls for a variety of reaction parameters. Because there can be a great many of these mismatched primers, effects due to sequence peculiarities of the control molecules are averaged out.

    The Cot effect: Curiously, as a quantitative PCR reaction proceeds, more abundant products accumulate disproportionately slowly, even before the plateau effect due to limiting reaction components is reached. The "Cot effect" appears to be due to progressively faster reannealing of more abundant products as they accumulate. Conversely, less abundant products accumulate disproportionately rapidly. In RNA fingerprinting, the consequence of the Cot effect is that quantitative differences in an arbitrarily primed product between samples can be erased if the Cot effect is encouraged (e.g. by too many cycles).

    Target vs. standard titration: Many examples in the literature, an internal standard is titrated against a fixed amount of RNA, for the purpose of quantitating a specific message by PCR. This approach works, but requires repeated sampling of the reaction during the log-linear stage of the reaction. Another strategy is simpler. First, a calibration curve for several different standard concentrations against a titration of RNA concentrations is generated. Then, the unknown is titrated against a fixed amount of standard. When this is done, the concentration of the target molecule can be read directly from a standard curve, because some point on this curve represents the exact ratio and absolute amounts of the control and target RNAs. Conversely, when the calibration curve is generated by titrating the standard against a fixed amount of the target RNA, the curve does not contain, except by wild chance, the exact ratio and absolute amounts of control and target RNAs.

  4. Improvements to cosmid-based exon trapping

    Nicole Datson, Esther van de Vosse, Paola van der Bent, Hans Dauwerse, Emile de Meijer, Joris Heus, Gert Jan van Ommen and Johan den Dunnen
    MGC-Dept. of Human Genetics, Leiden University, The Netherlands

    Exon trapping has become a recognized method for the efficient isolation of coding sequences from large genomic regions. One of the merits of exon trapping is its independency of gene expression by detecting coding sequences at the DNA level. Consequently, genes with a complex or low level of spatial or temporal expression are identified in addition to ubiquitously expressed genes. With the complete sequence of the human genome coming within reach, computer prediction will become an important tool to identify new genes. Since some 70-90% of genes can be expected not to be represented by an EST, exon trapping will become an efficient system for in vivo confirmation of predicted genes. We have recently described a series of large-insert cosmid-based exon trap vectors, the sCOGH-vectors, which allow the scanning of entire cosmid inserts in one go for the presence of exonic gene fragments (1). Using this cosmid-based system we have successfully isolated multi-exon spliced products containing up to 7 contiguous exons of the dystrophin gene and 12 of the PKD1-gene. Several new variants have been constructed, including a YAC fragmentation vector. The latter facilitates the systematic and directive scanning of a YAC fragmented at Alu or LINE repeats for exons. The trapping of multiple exons of a gene in a single spliced product reduces the labour intensive analysis caused by repeated isolation of segments of the same gene. However, amplification of the longer RT-PCR products offers a technical challenge intrinsic to the system. We have introduced several changes in the protocol to improve RT-PCR yields, including long-range RT-PCR conditions, the use of polyA-selected RNA, new primers lowering the disturbing effects of background DNA, and a one tube RT-PCR system. Currently, we standardly use a 3'RACE to enable isolation of polyadenylated transcripts containing a 3' terminal exon, in addition to the RT-PCR directed at the trapping of internal exons. To facilitate simple determination of transfection efficiency we have inserted a Green Fluorescent Protein gene in one of the vector exons. This allows easy non-invasive monitoring by counting the number of fluorescent cells.

    1. Datson et al., NAR 24: 1105-1111 (1996).

  5. Functional correlation of architectural elements on mRNA

    Wai-Choi Leung(1,2), Takashi Kishimoto(1,2), Calvin H. Leung(1,2), Shaoxiong Chen(3), Linda Hyman(3), and Maria FKL Leung(1,2)
    (1)Division of Molecular Pathology, Department of Pathology and Laboratory Medicine, (2)Tulane Cancer Center and (3)Department of Biochemistry Tulane University School of Medicine, New Orleans, LA, USA.

    We have further refined our approach in defining the architecture of mRNA. Briefly, the entire nucleotide sequence of a mRNA is folded into a predicted optimal structure using an energy minimization algorithm. Segment Analysis is then performed on the folded RNA by tracing the RNA polynucleotide backbone in a-tracings and b-tracings which revealed both short range and long range interactions. A double stranded region which appears in either a- or b-tracings is a Closed Region formed by short range interaction. The RNA sequence in a Closed Region can be independently refolded into a structural element identical to the corresponding structure on the folded entire RNA. On the other hand, an Open Region is contributed by long range interaction. Its sequence is folded into an alternate structure different from the corresponding region on the folded entire RNA.

    An Energy Map can be constructed to describe the location, size, energy content and energy density of Closed Regions on a mRNA. The presence of a Closed Region can be experimentally verified by its ability to reduce specific yields in a modified RT-PCR assay based on our observation that the log specific yield of an amplified RNA segment is inversely proportional to the sum of free energy in a Closed Region.

    This approach readily generates molecular models for mRNAs on which more refine structural analyses can be performed. On the other hand, these molecular models provide sufficient structural information for functional assignment of architectural elements:

    1. RNA Translocation: In Drosophila, the dorsal-ventral axis is determined by translocation of RNA to specific locations in the egg. The 3'UTR of the translocated RNAs are thought to recognize an adapter protein linking the RNA to microtubular apparatus which effects migration of RNA. Our analysis of the bicoid mRNA indicated that the 3'UTR region formed an extended stem structure protruding from the body of the folded mRNA, thereby promoting binding with the adapter protein. Similar structures have also been observed in other translocated RNA involved in development of Drosophila, e.g. nanos, etc., and in Xenopus, e.g., Vg1. Interestingly, in gurken mRNA, the 5'UTR directs translocation and exhibited a similar structure.

    2. Nuclear Export: In HIV, the export of genomic RNA to the cytoplasm is mediated by binding of Rev protein to the Rev Responsive Element (RRE). Our analysis demonstrated that the RRE element exhibits an extended architectural element protruding from the viral genomic RNA. We hypothesized that this structure would promote the binding of Rev protein and facilitate export of viral RNA. Similarly structure is also observed for the cis-repressive element (CRS) which has previously been defined by genetic analysis as an additional binding site for Rev protein to facilitate nuclear export of viral genome.

    3. Transcription Termination: Mutagenesis of the adh2 gene of Saccharomyces cerevisiae in a reporter construct indicated that the recognition site for transcription termination resided in a 19 nucleotide stem structure bearing two single nucleotide bulges. Structural analyses of the adh2 RNA model indicated that this stem bears an energy density of about -0.3 kcal/mol/bp. This RNA stem folded into an alternate structure in mutants which lost the termination function and was regenerated in a compensatory mutant which regained two thirds of termination function.

    4. Translational Control: We observed that in a number of mRNAs known to exhibit translational control, the extreme 5' end sequence formed a stem structure with total energy value exceeding -50 kcal/mol. These elements also bear an energy density of -0.8 kcal/mol/bp or more, indicating high stability. Comparison of the stem structures for mRNA under similar translational control, e.g., p53 and CDK 4 mRNAs, revealed common sequence as potential recognition site for RNA-binding protein.

  6. On the mechanism of DNA unwinding in yeast control regions

    Gad Yagil
    Department of Cell Biology, The Weizmann Institute, Rehovot, Israel; EBI, Cambridge, United Kingdom

    The formation of an unwound DNA region is an essential step in gene transcription. The size and state of the unwound regions are nevertheless far from clear. A number of experimental tools for the characterization of unwound DNA are currently available, including: a. Single strand specific nucleases (S1, P1, mung bean nucleases), which serve to identify unwound regions. b. Two dimensional topoisomer analysis, which serves to determine the extent of unwinding. c. Conformation specific DNA reagents which can distinguish between the various paranemic structures unwound DNA can assume. The paranemic structures include cruciforms, H- form DNA, B-Z junctions, paranemic duplexes and strand separated DNA (cf. Yagil, Crit. Revs. Bioch. and Mol. Biol., 26: 475-559, 1991).

    The three techniques mentioned were applied to two strong yeast promoter regions containing long pyrimidine tracts, inserted into negatively supercoiled plasmids. Pyrimidine tracts are abundant in yeast promoter regions (Yagil, Yeast 10: 603) and are known to be attacked by single strand specific nucleases in eukaryotic nuclei. The principal P1 cleavage points of the DED1 promoter region maps within a sequence containing a pyrimidine tract of 40 bases; 2d topoisomer analysis indicates the unwinding of 4 primary turns. In the promoter of CYC1, a region of 33/36 pyrimidines is the principal P1 sensitive region. The limited symmetry of the P1 sensitive regions favors the formation of a paranemic duplex in the unwound region. Previous work showed that the chicken beta globin promoter and SV40 control regions assume an unwound state by 2d topoisomer analysis. It is proposed that polypyrimidine tracts serve as unwinding centers for DNA in eukaryotic genes, their length determining the extent of gene transcription.

    To further examine this possibility, a program designed to list and report the frequency of binary homotracts in sequenced DNA databases was written. Application of the program, TRACTS, to a large selection of eukaryotic sequences, led to the conclusion that all purine or all pyrimidine tracts (R.Y tracts) are highly overrepresented in almost all eukaryotic genomes. Organelle genomes (mitochondria and chloroplasts) show a similar overrepresentation. Tracts which are all G,T or all A,C (K.M tracts) are overrepresented to a nearly similar degree, while A,T or G,C rich tracts are only marginally overrepresented. In the promoter regions of sequenced yeast chromosomes R.Y tracts longer than 15 nt are present at an nearly 50 fold excess over random DNA. This further strengthens the possibility that the R.Y tracts have a role in gene regulation.

  7. In silico cloning of new transcripts using public databases

    P. Sanseau, R.W. Gill, D.S. Montgomery, M.D. Oxer, S. Taylor, I.J. Purvis and C.W. Dykes
    Glaxo-Wellcome Medicines Research Centre, Genomics and Advanced Technology & Informatics Research Units, Gunnels Wood Road, Stevenage, Herts, United Kingdom

    With the continuous growth of the public databases, in silico cloning is becoming a method of choice to obtain new genes of interest. The use of the information available in databases such as dbEST and its minimally redundant version (Unigene) has already significantly decreased the time and resources required to clone target genes. In theory by using the information one should be able to perform electronic northerns, zooblots, cloning genes or extend known or unknown gene families from one's desk-top computer. Moreover the chromosomal mapping of thousands of human ESTs using the radiation hybrid panels will help positional cloning projects. Rapid in silico cloning is only possible if bioinformatics tools are available. Therefore new web interfaces have been developed in house to rapidly obtain new genes and analyse the corresponding sequences for characteristics such as potential open reading frames. We will describe the results of our attempts to in silico clone a number of current target genes and extend gene families out of dbEST or other public databases. Finally we will indicate how in silico cloning information points the way to further "wet" experimentation.

  8. Construction and analysis of a gridded full length cDNA library generated from human fetal brain

    Stefan Wiemann, Bernhard Korn, and Annemarie Poustka
    Molecular Genome Analysis, German Cancer Research Center, Heidelberg, Germany

    We have constructed a cDNA library from human fetal brain that contains 70-80 % full length representations of primary transcripts. Single stranded cDNA was generated from fetal brain poly A+ RNA using an oligo dT primer, the =84cap switch oligonucleotide=93, and a reverse transcriptase deficient of RNase activity. Limited amplification of primary cDNA was performed under long range conditions using a combination of Amplitaq and Pfu DNA polymerases to minimize errors during PCR and to allow for amplification and subsequent cloning also of long cDNAs. The cDNA was cloned directionally into plasmid pAMP1 using the uracil/glycosylase cloning system. The library consists of 120,000 plasmid clones. The average insert size is 1.8 kb due to the selection for short inserts inherent to cloning in a plasmid vector. To date, 3,800 independent clones have been picked in 384 well microtiter plates and spotted on nylon membranes for hybridization analysis. No contaminating clones harboring copy rRNA were detectable in the array of 3,800 clones. As expected, 2 % of all clones were positive with a beta actin probe. The insert size is 1.9 kb in >80 % of these clones, reaching from the poly A tail to the described transcription initiation and cap addition site of the gene. Random sequencing of 100 clones revealed 49 known human genes, 12 homologous genes from human or other species, 26 genes that had hits only in EST databases and 13 novel genes with no hits in any database. 75% of the known genes were full length, most of them extending the published sequences at their 5=92-ends. Analysis of sequences upstream the first ATG revealed a high GC content (64 %) to be present in most cDNAs with high occurrence of CpG dinucleotides. In most novel cDNA clones the first ATG meets the consensus criteria of translation initiation suggested by M. Kozak. Currently, construction of cDNA libraries from other tissues, and testing of other cloning vectors for the selective cloning of longer (> 4 kb) cDNAs is underway.

  9. Comparative analysis with the Puffer fish, Fugu rubripes

    Greg Elgar
    Molecular Genetics, Department of Medicine and UK HGMP Resource Centre, Cambridge, United Kingdom

    The Japanese Pufferfish, Fugu rubripes, has a haploid genome of 400Mbases. It contains a similar gene set to mammals with the consequence that gene density is much higher. Many characteristics of genes are shared between mammals and teleost fish including high sequence similarity and conserved gene structure. Generally however, as well as being more densely spaced, Fugu genes are much smaller than their mammalian counterparts. This makes working with the Fugu genome much simpler than working with mammalian genomes. The Fugu genome has been used as a comparative tool in the analysis of gene structure, identification of regulatory elements, identification and confirmation of coding sequences and in whole genome analyses.There is some evidence that Fugu shares a degree of conserved synteny with mammals, a fact that has raised a great deal of interest with a view to positional cloning projects. However, little is known of the extent of these syntenies. The MRC is currently funding the Fugu Landmark Mapping Project. The aim is to sequence scan 1000 cosmids from a well characterized and publicly available genomic library which will provide a resource in a number of different areas including gene identification.

    Each cosmid is analysed for a number of different features and the data presented on the world wide web interface. To date nearly 200 cosmids have been sequenced and analysed. A summary of data from this project will be presented as well as some specific examples of the ways in which the project can aid in gene identification and characterization. Because of the likelihood of finding more than one Fugu gene on a single cosmid clone, some physical linkage data is also available through this project.