Theory and Application of YAC Technology for Genome Research

Published in Probe Volume 1(1-2): Spring-Summer 1991


Brian M. Hauge and Howard M. Goodman
Department of Genetics, Harvard Medical School
Department of Molecular Biology, Massachusetts General Hospital, Boston

A major challenge in plant molecular biology is isolating genes where the biochemical function of the gene product is unknown. In a variety of plant species, genes controlling a wide range of fundamental developmental and metabolic processes have been identified by mutational analysis and placed on classical genetic linkage maps. Examples include genes conferring resistance to plant pathogens, the synthesis and response to plant hormones, drought tolerance, and genes required for a variety of important developmental pathways. In most cases, while the mutant phenotype and genetic map locations are known, virtually nothing is known about the product of the gene.

Gene-Cloning Methods

There are several ways to clone genes for which the genetic locus and not the product of the gene is known. If a gene can be tagged with a transposable element, the gene can be cloned directly by isolating the sequences flanking the site of insertion. The cloning of genes by transposon tagging has been used extensively in maize. The most widely utilized and best characterized plant transposable elements, the maize Ac and Ds elements (reviewed by Fedoroff in Berg and Howe. 1989) are further capable of transposing in various heterologous plants (Van Sluys et al, 1987) thereby extending the utility of this system to plants having no well-characterized transposable element systems. In addition to endogenous transposons, the T-DNA of Agrobacterium tumerfaciens has also been successfully used for gene tagging (Feldmann et al, 1989).

A second alternative is to clone genes corresponding to deletion mutations using the technique of genomic subtraction (Straus and Ausubel, 1990). This method is based on the progressive enrichment for DNA fragments present in the wild type genome but absent in the mutant genome that harbors the deletion. Following multiple rounds of enrichment, the resultant fragments are amplified by PCR and cloned. The one constraint of the protocol is that the deletion must encompass a single restriction fragment which is composed entirely of non-repetitive DNA. Genomic subtraction has recently been used to clone the GA-1 gene from Arabidopsis (Sun and Ausubel, unpublished results).

Like the gene-tagging strategies, the advantage of genomic subtraction is that one has immediate access to the gene(s) of interest. A disadvantage of both approaches is that insertions or deletions in essential genes will be lethal, so phenotypes associated with "leaky" point mutations will not be detected.

Second, transposons as well as different mutagenic agents exhibit some degree of sequence specificity. Therefore, many important loci will be refractory to isolation by transposon tagging or genomic subtraction.

Chromosome-Walking Strategy

A more general approach is to clone genes by chromosome walking. This strategy is general in the sense that the cloning of the gene is based solely on the mutant phenotype and genetic map position. Therefore, chromosome walking can be used to clone any gene which can be genetically identified. The first step toward cloning the gene is to identify DNA probes residing within one to several cM of the locus of interest. Typically this is achieved by analyzing the meiotic segregation of restriction fragment length polymorphisms (RFLPs). Once a linked RFLP(s) has been identified, it can be used as the starting point to initiate a chromosome walk.

Briefly, chromosome walking entails the progressive isolation and characterization of overlapping sets of genomic clones. The overlapping clones are selected by hybridization using end-specific probes (probes generated from the extremities of the clone/contig). The walk is continued in this manner until the region spanning the intervening gap has been bridged by an overlapping set of clones. While chromosome walking is technically straight forward, in practice the procedure is extremely labor intensive and ill-suited for large projects where more than a few steps are required.

Constructing Physical Maps

Recently, interest has focused on strategies for constructing physical maps of entire genomes. By definition, a physical map consists of a linearly ordered set of DNA fragments encompassing the genome or region of interest. Physical maps are of two types, macro-restriction maps and ordered clone maps. The former consists of an ordered set of large DNA fragments generated by using restriction enzymes whose recognition sequences are infrequently represented in the genome (Smith et al, 1986). The macro-restriction map provides information about the organization of DNA fragments at the level of the intact chromosome thereby providing long-range continuity.

As the name implies, an ordered clone map consists of an overlapping collection of cloned DNA fragments. The DNA may be cloned into any one of the available vector systems--YACs, cosmids, phage, or even plasmids. Major advantages of ordered clone maps are that they are of high resolution and directly provide the clones for further study.

The immediate benefits of having a physical map are twofold. First, the physical map provides ready access to any region of the genome which can be genetically identified. Given a mutation of known genetic map location, the physical map can be used to easily isolate an overlapping collection of clones encompassing the locus of interest. By eliminating the need for laborintensive steps such as chromosome walking, researchers are free to focus their efforts on the isolation and characterization of the gene of interest. Second, the physical map provides a starting point for studying global genomic organization. As an increasing number of genes are cloned and molecular biological information is accumulated, one can begin to investigate the physical linkage of cloned genes, study the organization and distribution of repetitive elements, and address questions such as how physical distance and genetic distance are correlated. In this context, the map provides the framework for cataloging and integrating molecular biological information. Ultimately, genome organization will be investigated at the nucleotide level. Clearly, physical maps are the logical substrates for genome-sequencing projects.

Laborious Process

Physical mapping of complex genomes, however, is both laborious and computationally intensive. To illustrate the physical mapping problem, briefly described below are the researchers' efforts to assemble a complete physical map of the Arabidopsis thaliana genome, which will ultimately consist of a fully overlapping collection of cloned DNA fragments encompassing the five chromosomes.

The first stage of the mapping project involved the characterization of random cosmid clones by fingerprint analysis (Coulson et al, 1986; Hauge and Goodman, 1991). For the Arabidopsis project, approximately 20,000 random cosmid clones (~10 fold sampling redundancy) from primary libraries were fingerprinted. Using computer matching programs, the clones have been aligned into some 750 overlapping groups or contigs. The contigs encompass approximately 90-95 percent of the Arabidopsis genome (Hauge et al., 1991).

In general, some 8-10 genomic equivalents must be fingerprinted to achieve 70-95 percent coverage of the respective genome. The task of ordering the clones and aligning them with respect to the genetic map is formidable to say the least. To illustrate the magnitude of this problem, consider the maize genome, which is estimated to be 3,900,000 kb. Using random cosmid clones containing an average insert size of 40 kb, approximately a million clones would be needed for 10 genomic equivalents.

YAC Cloning Vectors

The mapping problem has been greatly simplified by the development of yeast artificial chromosome (YAC) cloning vectors (Burke et al, 1987). The YAC vectors allow for the routine cloning of 0.5 megabase-sized DNA fragments, representing an improvement of at least an order of magnitude over the previously existing techniques. The construction of YAC libraries involves the ligation of large DNA fragments (100-1000 kb) into a vector containing selectable markers and the functional components of a eucaryotic chromosome, ARS elements for autonomous replication, the centromere for proper disjunction during meiosis and mitosis, and telomeres required for the replication of linear molecules (Murray and Szostak, 1983). The clones are transferred into bakers yeast (S. cerevisiae) where they are replicated along with the endogenous host chromosomes.

There are two clear advantages of the yeast-cloning system: The large size of the inserts means that fewer clones need be examined. Equally important is that YACs offer the potential to give a more random representation of clones than are obtained using conventional cloning systems.

Utility of YAC Clones

The following examples of how YAC clones are being employed for Arabidopsis genome mapping illustrate the utility of YAC clones for genome research. Two general approaches are being used to assemble an overlapping YAC library covering the Arabidopsis genome. The first approach is to simply identify YAC clones corresponding to genetically mapped DNA probes (RFLPs and cloned genes). Presently some 380 Arabidopsis RFLP probes (Chang et al., 1988; Nam et al, 1989; S. Hanley and H.M. Goodman, unpublished; E. Meyerowitz, unpublished) are available for correlation of the physical map with the classical genetic linkage map. Using the available YAC libraries (Ward and Jen, 1990; Grill and Somerville, 1991), YAC clones corresponding to 125 RFLP markers have been identified (Hwang et al. 1991). Based on a mean YAC insert size of 160 kb and an average YAC contig size of 220-240, YACs of known genetic map location encompass approximately 30,000 kb or about 30 percent of the Arabidopsis genome (Hwang et al, 1991). Extension of this analysis to the remaining 160 some RFLP probes should result in a collection of YAC clones encompassing some 70 percent of the genome. Closure of the gaps can then be achieved by either chromosome walking or by utilizing the cosmid contig map as described below.

As an alternative strategy, the overlapping cosmid map (Hauge et al, 1991) is a powerful tool for assembling an overlapping YAC map. The linking strategy is to use the YAC clones to probe ordered arrays of cosmid clones that are representative of the contigs (Coulson et al, 1988). Cosmids within a contig are chosen so that there is minimal overlap between flanking clones, yet the clones are representative of the contigs. The cosmids are plated onto nylon membranes as ordered arrays and subsequently probed with labeled YAC clones. Using this strategy, the gaps in the contig map are closed and the YACs are aligned within the framework of the cosmid map, thereby generating an overlapping YAC map.

An advantage of this approach over traditional strategies that use endprobes to select overlapping clones is that the hybridization patterns are easily tested for a logical fit to the structure of the contig map. Linkage can, therefore, be rapidly established based largely on the results of colony hybridization. In contrast, techniques such as genome walking require laborious confirmation of each join and subsequent restriction mapping of the linking clones to determine both the direction and the extent of the walk.

Using a combination of the techniques described above, it is probable that an overlapping YAC library of the Arabidopsis genome will be completed in the near future. The overlapping YAC library will serve to facilitate the cloning of genes and will provide a minimal set of clones covering the Arabidopsis genome. Given the small genome of Arabidopsis, a representative collection of clones can be gridded at high density onto a single filter the size of a microtiter dish. These "polytene" blots (J. Sulston and A. Coulson, personal communication) can then be used to rapidly determine the chromosomal location of any new clone by simple blot hybridization.

YAC clones are likely to play an increasingly important role in future physical mapping projects. The strategies for physical mapping with YACs are essentially the same as those used for other genomic libraries (bacteriophage and cosmids). Using the existing technology, YAC clones may be fingerprinted directly and ordered into contigs (Kuspa et al, 1989). Moreover, the ability to easily generate endprobes from YACs using techniques such as inverse PCR (Ochman et al, 1988) allows for the construction of physical maps based on simple hybridization strategies.

The application of mapping strategies that use YACs should make it possible to undertake projects orders of magnitude larger than those currently underway. It remains to be determined, however, whether YACs will entirely supersede cosmid and clone maps since the smaller clones are generally required for routine procedures such as gene isolation and DNA sequencing.

References

Berg, D. E. and Howe, M. M. (1989). Mobile DNA. (American Society for Microbiology, Washington, D.C.).

Burke, D. T., Carle, G. F., and Olson, M. V. (1987). Cloning of large segments of exogenous DNA into yeast by means of artificial chromosome vectors. Science, 236, 806-812.

Chang, C., Bowman, J. L., DeJohn, A. W., Lander, E. S. and Meyerowitz, E. M. (1988). Restriction fragment length polymorphism linkage map for Arabidopsis thaliana. Proc. Natl. Acad. Sci. USA, 85, 6856-6860.

Coulson, A., Sulston, J., Brenner, S., and Karn, J. (1986). Toward a physical map of the nematode Caenorhabditis elegans. Proc. Natl. Acad. Sci. USA, 83, 7821-7825.

Coulson, A., Waterston, R., Kiff, J., Sulston, J., and Kohara, Y. (1988). Genome linking with yeast artificial chromosomes, Nature, 335, 184-186.

Feldman, K. A., Marks, M. D., Christianson, M. L., and Quatrano, R. S. (1989). A dwarf mutant of Arabidopsis generated by T-DNA insertion mutagenisis. Science, 243, 1351-1354.

Grill, E. and Somerville, C. (1991). Construction and characterization of a yeast artificial chromosome library of Arabidopsis which is suitable for chromosome walking. Molec. Gen. Genet., in press.

Hauge, B. M. and Goodman, H. M. (1991). Physical Mapping by Random Clone Fingerprint Analysis. In Plant Genomes: Methods for Genetic and Physical Mapping, T. Osborn and J.S. Beckmann eds. (Kluwer). In press.

Hauge, B. M., Hanley, S., Giraudat, J., and Goodman, H. M. (1991). Mapping the Arabidopsis Genome. In Molecular Biology of Plant Development, G. Jenkins and W. Schurch eds. In press.

Hwang, I., Kohchi, T., Hauge, B. M., Goodman, H. M., Schmidt, R., Cnops, G., Dean, C., Gibson, S., Iba, K., Lemieux, B. L., Danhoff, L., and Somerville, C. (1991). Identification and map position of YAC Clones comprising one third of the Arabidopsis genome. (submitted).

Kohara, Y., Akiyama, K., and Isono, K. (1987). The physical map of the whole E. coli chromosome: Application of a new strategy for rapid analysis and sorting of a large genomic library. Cell, 50, 495-508.

Kuspa, A., Vollrath, D., Cheng, Y., and Kaiser, D. (1989). Physical mapping of the Myxococcus xanthus genome by random cloning in yeast artificial chromosomes. Proc. Natl. Acad. Sci. USA, 86, 8917-8921.

Murray, A. W., and Szostak, J. W. (1983). Construction of artificial chromosomes in yeast. Nature, 305, 189-193.

Nam, H. G., Giraudat, J., den Boer, B., Moonan, F., Loos, W. D. B., Hauge, B. M., and Goodman, H. M. (1989). Restriction fragment length polymorphism linkage map of Arabidopsis thaliana. Plant Cell, 1, 699-705.

Ochman, H., Gerber, A. S., and Hartl, D. L. (1988). Genetic Applications of an inverse polymerase chain reaction. Genetics, 120, 621-623.

Olson, M. V., Dutchik, J. E., Graham, M. Y., Brodeur, G. M., Helms, C., Frank, M., MacCollin, M., Scheinman, R., and Frank, T. (1986). Random-clone strategy for genomic restriction mapping in yeast. Proc. Natl. Acad. Sci.USA, 83, 7826- 7830.

Pruitt, R. E. and Meyerowitz, E. M. (1986). Characterization of the genome of Arabidopsis thaliana. J.Mol.Biol., 187, 169-183.

Smith, C. L., Econome, J. G., Schutt, A., Klco, S., and Cantor, C. R. (1987). A physical map of the Escherichia coli K12 genome. Science, 236, 1448-1453.

Straus, D. and Ausubel, F. M. (1990). Genomic subtraction for cloning of DNA corresponding to deletion mutations. Proc. Natn. Acad. Sci. USA, 87, 1889- 1893.

Van Sluys, M. A., Tempe, J., and Fedoroff, N. (1987). Studies on the introduction and mobility of the maize Activator element in Arabidopsis and Daucus carota. EMBO J., 6, 3881-3889.

Ward, E. R. and Jen, G. C. (1990). Isolation of single-copy sequence clones from a yeast artificial chromosome library of randomly-sheraed Arabidopsis thaliana DNA. Plant Mol. Biol., 14, 561-568.