SURVEY OF TRINUCLEOTIDE REPEATS IN THE HUMAN GENOME

SURVEY OF TRINUCLEOTIDE REPEATS IN THE HUMAN GENOME: ASSESSMENT OF THEIR UTILITY AS GENETIC MARKERS

Julie M. Gastier (1), Jacqueline C. Pulido (1, 2), Sara Sunden (3), Thomas Brody (1), Kenneth H. Buetow (4), Jeffrey C. Murray (5), James L. Weber (6), Thomas J. Hudson (7), Val C. Sheffield (3), Geoffrey M. Duyk (1, 2)*

(1) Department of Genetics, (2) Howard Hughes Medical Institute, Harvard Medical School, Boston, MA 02115; (3) Department of Pediatrics, University of Iowa, Iowa City, IA 52242; (4) Fox Chase Cancer Research Center, Philadelphia PA, 19111; (5) Departments of Pediatrics and Biology, University of Iowa, Iowa City, IA 52245; (6) Marshfield Medical Research Foundation, Marshfield, WI, 54449; (7) Center for Genome Research, Whitehead Institute/Massachusetts Institute of Technology, Cambridge, MA 02139

* To whom correspondence should be addressed at present address:

Millennium Pharmaceuticals Incorporated, 640 Memorial Drive, Cambridge, MA 02139

Phone (617) 374-9480 x218, FAX (617) 374-9379

ABSTRACT

Genetic markers based upon PCR amplification of short tandem repeat-containing sequence tagged sites (STSs) have become the standard for genetic mapping. We have completed a survey based on the direct isolation of representative members of each of the ten trinucleotide repeat classes to determine their relative abundance, repeat size distribution, and general utility as genetic markers. Trinucleotide repeats, depending on the repeat class, are 1 to 2 orders of magnitude less frequent than (AC)n repeats. The average size of trinucleotide repeats sequenced was less than 15 repeat units in length, and only three of the STSs developed for this study demonstrated more than 25 repeats units. The (AAT)n class of repeats are the most abundant and also the most frequently polymorphic. Other classes of trinucleotide repeat classes observed to be frequently polymorphic include (AAC)n, (ACT)n, (ATC)n, and (AAG)n; however, the relative abundance of these classes is less than that observed for the (AAT)n class of repeats. Based upon this initial survey, we have initiated saturation cloning of the (AAT)n class of repeats. At the time of submission of this manuscript, we have developed, as part of the Cooperative Human Linkage Center (CHLC), more than 415 new high heterozygosity (AAT)n genetic markers (>2 alleles in 4 individuals) and 200 new low heterozygosity (AAT)n STSs from this larger screening effort combined with the initial survey.

INTRODUCTION

High resolution genetic maps are important tools for the mapping of disease genes and serve as frameworks for the construction of physical maps. In recent years, collaborative efforts have provided genetic maps with an average resolution of 3 cM (1,2,3), and ongoing efforts will result in the production of human genetic maps with an average resolution of <1 cM (4). The majority of the available genetic markers are based on (AC)n simple sequence repeats. The (AC)n repeats are extremely abundant and can be found on average once every 30-60 Kb (5-7). (AC)n repeats are generally polymorphic if the repeat length is greater than ten (2,8).

In contrast to (AC)n repeats, the ten classes of trinucleotide repeats have not been comprehensively surveyed to determine their utility as genetic markers, although a limited set of genetic markers based on trinucleotide repeats have been published (1,9). Information regarding additional sources for PCR-based genetic markers is extremely important, since completion of the current generation of genetic maps will require the filling of gaps resulting from local clustering of (AC)n repeats. High resolution genetic maps will be critical for methods such as allele association analysis of the genetic components of complex traits. In addition, large numbers of markers and STSs will facilitate completion of physical maps.

There are several advantages to incorporating markers based on higher order short tandem repeats into genetic maps. Genetic markers based upon trinucleotide and tetranucleotide repeats produce cleaner PCR amplification products than dinucleotide repeats and are more readily co-amplified. In addition, the amplification products derived from the trinucleotide and tetranucleotide classes of simple sequence repeats are more suitable for detection by silver staining than (AC)n repeat markers (unpublished data). The increasing use of trinucleotide and tetranucleotide repeats should therefore result in more rapid and accurate genotyping. The availability of marker sets based upon combinations of di-, tri-, and tetranucleotide repeat markers may increase the utility of hybridization based approaches for high throughput genotyping. Multiplex sets could be assembled based upon size ranges of expected alleles as well as by class of simple sequence repeats. Amplification products would then be detected following serial rounds of probing with simple sequence specific probes. Beyond their role in genetic mapping, STSs based on trinucleotide repeats may help define candidate disease loci based upon expansion of repeat sequences within defined genes. To date, the expansion of a trinucleotide repeat is associated with at least 9 diseases and 4 fragile sites (10-23).

Previously, we have described a method for generating small insert genomic libraries which have been highly enriched for clones containing short tandem repeats (24,25). We have used this method to survey the ten classes of trinucleotide repeats in order to determine their usefulness as genetic markers. Starting with at least fifty small insert human genomic clones for each repeat class, we have estimated the size distribution of each type of repeat in the genome, the association with repetitive DNA elements, and the polymorphic character of the repeat class. By screening a cosmid library with probes for each of the trinucleotide repeats, we have been able to estimate the relative abundance of the trinucleotide repeats in the human genome.

The information provided by this survey will help direct genome wide efforts towards generating high resolution genetic linkage maps. For example, we have screened the most polymorphic and abundant of the trinucleotide repeat classes, (AAT)n on a much larger scale than this initial survey to generate large numbers of new markers for the Cooperative Human Linkage Center (4). In this paper, we present evidence that the first 200 (AAT)n markers from that screen have similar polymorphic quality to the first markers generated in the initial survey by analyzing the correlation between repeat length and heterozygosity when typed on more individuals. In addition, data from the trinucleotide repeat survey will be useful to other researchers searching for new markers near a disease locus in the production of local, very high resolution genetic maps that have become essential for positional cloning strategies. Such markers, whether or not they are highly polymorphic, contribute to the establishment of integrated, high quality physical and genetic maps, improve the localization of recombinant breakpoints, and facilitate location of genes by the detection of "ancestral recombinants" by linkage disequilibrium methods.

RESULTS

Relative Abundance of individual classes of trinucleotide repeats

The most useful classes of simple sequence repeats for developing large numbers of genetic markers would be highly abundant in the human genome and would lead to the rapid and easy development of highly robust polymorphic markers. Previous studies suggested that the trinucleotide containing simple sequence repeats were less frequent than the dinucleotide repeats (26). In order to estimate the frequency of each of the trinucleotide repeat classes, we screened an amplified human cosmid library with probes for each type of repeat. Table 1 shows the frequency estimates of each of the ten classes of trinucleotide repeats. Estimates were made by screening approximately 30,000 cosmids for each repeat class with a non-radioactive screening system from Lifecodes, Inc. Frequency estimations are based on an average insert size for the cosmids of 35 Kb, and the human genome was assumed to be 3000 Mb. Hybridization conditions were such that positives include repeats of 5 units or greater, based on hybridization control clones. However, the most useful estimates for trinucleotide repeats would include only those with 8 repeat units or greater. Therefore, frequencies were adjusted based on the percentage of clones that when sequenced had repeats of 8 units or longer. Since the (ACG)n and (CCG)n clones were difficult to sequence (see below), the frequency estimates for these classes could not be adjusted. It is critical to note that these frequencies are rough estimates, under the conditions specified. It is difficult to make comparisons of such screens between various laboratories because of inherent differences in experimental and hybridization procedures.

Distribution of repeat lengths

We sequenced at least 50 hybridization positive small insert clones for each repeat class, and the profiles of the repeat lengths for the clones sequenced from each class is shown in Figure 1. The data presented is based on unique clones where a repeat length of at least five was observed. Two of the classes, (ACG)n and (CCG)n, proved to be very difficult to analyze. Clones harboring plasmids containing these sequence repeats appeared to grow poorly, and only small quantities of template could be obtained from our standard minipreps. Furthermore, most of the DNA sequences derived from these plasmids proved to be of low quality, possibly related to the high G/C content of these repeat sequences as well as flanking sequences. Since our aim was to determine which classes of repeats would be useful for rapidly generating large numbers of informative markers, these two classes were excluded from further analyses.

Development of Sequence Tagged Sites (STSs)

Table 2 presents the success and failure rates for the development of STSs from each class of trinucleotide repeats. The minimum repeat length used for the design of primer sets was n=7. Selection of this repeat length as a minimum cut off was based on the knowledge that most (AC)n repeat of 10 units or greater (20 bp total) tend to be polymorphic (2,5). This initial survey of trinucleotide simple sequence repeat lengths focused on the shortest observed allele size associated with highly polymorphic trinucleotide repeat based STSs. From these initial clones, we developed hybridization controls which served to calibrate our screening conditions. i.e. Very few polymorphic STSs were developed from clones containing repeat lengths less than 8 units. Large scale marker development has subsequently been restricted to only those only those clones which have a hybridization signal equal to or greater than the (NNN)8 control.

In order to assess the ease of marker development as well as optimize the process, we sought to characterize the common cause of failures that would lead to rejection of a clone/sequence for the development of a marker (Table 2). For example, if the majority of the losses were due to the length of observed repeats, the hybridization conditions were made more stringent for subsequent large scale screenings. In addition, duplication rates were useful for determining the quality of a small insert library as well determining whether a screen had reached "saturation." The initial high rate of duplication observed of clones observed in the restriction enzyme based libraries lead to the decision to develop higher complexity randomly sheared libraries. In addition, the observation of the relatively large number of clones requiring bi-directional sequencing in order to define the simple sequence repeats or loss of clones because the repeat was not observed following sequencing, allowed us to optimize insert size for library construction. Also, classes of clones that were frequently associated with Alu repetitive elements were prescreened for the presence of Alu sequences in subsequent marker development (C. Yandava, et al., in preparation). Finally, one class of repeats, (AGG)n were frequently very complex, yielding long stretches of C/T rich sequence with few perfect repeats, suggesting that it would be difficult to obtain large numbers of polymorphic markers from this class of repeats.

Polymorphism analysis

Based upon the strategy initially employed by Weissenbach, et al. (2) for the estimation of the likelihood that a newly isolated STS would be polymorphic in the reference CEPH pedigrees, we estimated the polymorphic potential of all STSs developed in the initial survey by analyzing the alleles in 4 reference CEPH individuals. If a given STS had at least three alleles in the 4 individuals, it was categorized as polymorphic and has been subsequently integrated into the high resolution genetic map of the CHLC. The yield of highly polymorphic markers developed for each of the trinucleotide repeat classes is shown in Figure 2.

Figure 2. Yield of highly polymorphic markers in the initial trinucleotide repeat survey % polymorphic = # STSs with > 2 alleles in the four CEPH individuals/total STSs which amplified. n = number of STSs which amplified.

Development of an extended set of (AAT)n class of markers

The (AAT)n class of simple sequence repeats was determined to be relatively abundant, highly polymorphic and readily developed into robust markers. Therefore, we screened more than 60,000 additional (AAT)n-enriched clones (8,000 from each of the six random sheared libraries and 4,000 from each of the three restriction enzyme digested libraries) in order to expand this pool of potential markers. All combined (AAT)n screens yielded 2877 positive clones which were sequenced, and 13.9 % of those sequenced yielded high heterozygosity markers (27). Failures were due to duplicates, Alu repeats, etc., as listed in Table 2. This percentage refers only to the success rate of the clones under the given screening conditions. It does not reflect the percentage of (AAT)n repeats which are polymorphic in the entire genome. i.e. The (AAT)n repeats located near Alu repeats may be polymorphic, but cannot be tested since unique primers cannot be designed. In the initial survey, we ascertained that the (AAT)n class of repeats are frequently associated with a specific subset of Alu repetitive elements. In order to reduce the number of sequences which could not be used for primer design due to an Alu element, we prescreened clones with custom Alu repeat probes and were able to reduce the loss of sequences due to an Alu repeat from 32% to <10% (C. Yandava, et al., in preparation). Using this larger set of (AAT)n markers, we studied the relationship between repeat length and informativeness for a given marker. To determine if the length of the repeat relates to the degree of polymorphism for a given marker, we analyzed 110 new (AAT)n markers which had been genotyped on the DNA from a pool of 40 individuals and the first 32 (AAT)n markers which were integrated into the CHLC maps by genotyping on 8 CEPH families (4), therefore yielding heterozygosity frequencies. The correlations are shown in Figure 3.

DISCUSSION

The development of highly informative markers remains a continuing goal of the Human Genome project because a lack of markers is often the limiting factor in positional cloning projects. We have performed a survey of the ten classes of trinucleotide repeats to determine their potential usefulness as new genetic markers. Plasmid clones containing the (ACG)n and (CCG)n classes were found to be difficult to grow, screen, and obtain reliable sequence information from, so they were discarded after the initial analysis. Eight classes of trinucleotide repeats were analyzed in detail.

We found that all classes of trinucleotide repeats are less frequent in the human genome than the (AC)n repeats. The most abundant trinucleotide repeat class, (AAT)n, occurs approximately every 500 Kb, while the least frequent class, (ACT)n, occurs approximately every 25,000 Kb. Since the repeats are so infrequent, we have taken advantage of marker-enriched libraries which have an estimated minimum enrichment factor of at least 100 (unpublished data). This method has allowed us to isolate large numbers of unique, repeat-containing clones for any short tandem repeat class. A previous report of data base searches to estimate the distribution of trinucleotide repeats in the human genome found (AAT)n, (AAC)n, and (AGC)n to be the most frequent (28). Consistent with this data, we found (AAT)n, and (AAC)n to be the most frequent trinucleotide repeats in cosmid screenings. However, (AGC)n were roughly 10-fold less abundant. This inconsistency may results from an overabundance of data base entries with (AGC)n repeats due to efforts to screen cDNAs for this type of repeat (29,30) or an unrealized systematic experimental error for which we cannot account.

The distribution of the number of contiguous repeat units varies between the trinucleotide repeat classes. In general, the range of the distribution of repeat lengths for a given class was indicative of the utility of that particular class in marker development. This correlation suggests that some classes are less likely to undergo expansion and contraction and will be less useful as genetic markers. The biological basis for these differences in stability and lengths of simple sequence repeats remains unclear, although G/C content appears to be at least one factor. It would be of interest to assess the relative stability of these different classes in cells carrying mutations in the mismatch repair system (e.g. MSH2 or MLH1 genes) (31,32) and also to determine the differences between the new mutation rates of the various classes. Initial mutation rate studies of the CHLC tetranucleotide markers [mainly (AGAT)n and (AAGG)n] suggest that the tetranucleotide-based markers have a mutation rate 3-4 times that of dinucleotides. However, trinucleotide-based markers [mainly (AAT)n] have a mutation rate roughly equivalent to that of dinucleotide repeat markers (unpublished data). Mutation rates at specific loci could aid researchers in determining which genetic markers would be best in cases where the difference between a mutated allele and a different allele is critical (e.g. paternity testing).

In addition, it would appear that mechanisms that result in the instability of repeats leading to alterations in repeat length by a few units is different than the mechanisms that result in the expansion of trinucleotide repeat length observed in some disease genes. We have developed over 300 STSs based on the (AGC)n repeat class and have found that the majority of the repeats are short and not highly polymorphic (unpublished data). One has to be cautious in interpreting these results as there may exist some selection against propagation of plasmids maintaining G/C rich inserts and a systematic selection against clones containing long G/C rich sequences.

In this survey of the trinucleotide repeats, we have categorized the reasons that a given clone containing a repeat failed to be developed into a suitable genetic marker. This analysis is most useful for selecting those classes which will be useful for larger screenings in comprehensive efforts of marker generation. The most useful class, in terms of rate of polymorphism and quality of markers has been the (AAT)n > (ACT)n > (AAG)n > (ATC)n > (AAC)n classes of simple sequence repeats. While we will be able to isolate a number of markers using marker selection on a genome wide basis from all of these classes in support of the CHLC effort, the low density of most of the trinucleotide repeat classes reduce their utility in support of more focused local projects. (ACT)n markers tend to be polymorphic, but they are extremely rare in the genomes. (ATC)n markers are relatively more abundant, but often less informative. (AAG)n and (AAC)n repeats are frequently associated with Alu elements, making an Alu prescreening step necessary for large scale development. It is interesting that the five most useful classes in this survey are all A/T rich. In addition, the most useful tetranucleotide classes [(AGAT)n, (AATG)n, (AATC)n, (AAGG)n] are also A/T rich (C. Yandava, et al., in preparation). The difference in polymorphic quality among the classes may suggest that mutation is due to mismatch repair error. However, mutation could also be due to "slippage" of the DNA polymerase, since slippage would be predicted to be greater for A/T rich sequences.

Since the (AAT)n class of trinucleotide repeats was most abundant and most frequently polymorphic, large scale screening of this class was initiated. We have used the additional markers to correlate the repeat length with the polymorphic quality of each STS. As has been observed for (AC)n repeats (8), there was a trend toward more informative markers as the sequenced repeat length increased. To date, we have developed more than 415 new (AAT)n markers as part of the Cooperative Human Linkage Center effort. Additional markers will be developed from the (AAC)n, (AAG)n, (ACT)n and (ATC)n, classes of trinucleotide repeats. This survey has been useful for determining the efficacy of using these types of short tandem repeats as new markers, and will provide suggestions for classes to use when screening for additional markers near a disease region.

MATERIALS AND METHODS

Marker-enriched small insert libraries

Plasmid based, small insert marker selected genomic DNA libraries were constructed from AluI, HaeIII, or EcoRV/SspI digested DNA or randomly sheared human genomic DNA as previously described (25). Briefly, a short tandem repeat oligo was extended off single-stranded uracil-containing phage DNA prepared from small-insert human genomic libraries. The extension products were transformed into wild type bacteria to enriched for the double stranded products which contained short tandem repeats. The estimated complexity of the restriction enzyme generated libraries was one genome equivalent, and the complexity of the randomly sheared genomic library was estimated to be six genome equivalents (27). As previously described, libraries were picked into 96 well microtiter plates and arrayed by replica platings in order to simplify and accelerate the screening process (25).

Identification of Trinucleotide repeat containing clones

Human genomic cosmid clones (Stratagene) and small insert clones containing simple sequence repeats based on the desired trinucleotide motif were detected by screening marker enriched libraries using the Quick-Light hybridization system (FMC Corporation). This system utilizes oligonucleotides corresponding to each repeat class, directly conjugated to alkaline phosphatase. Hybridization and wash temperatures were as follows: (AAC)n, 57 deg. C; (AAG)n 56 deg. C; (AAT)n 32 deg. C; (ACC)n 55 deg. C; (ACG)n 58 deg. C; (ACT)n 45 deg. C; (AGC)n 58 deg. C; (AGG)n 58 deg. C; (ATC)n 45 deg. C. Clones for the (CCG)n class of repeats were screened with an alternative protocol to reduce background due to the high G/C content: a custom (CCG)15 probe was constructed by Lifecodes Corporation, hybridization was performed at 48 degrees Celsius in Quick-Light hybridization buffer, washes consisted of 2X SSC at room temperature; 3M TEMAC, 0.1% SDS at 60 degrees Celsius; 2X SSC at room temperature.

Development of Sequence Tagged Sites (STSs)

Hybridization positive, small insert clones were subjected to single pass cycle sequencing using the M13 (-21), M13 reverse, and/or SP6 dye primer kits including Taq polymerase (Applied Biosystems, Inc.) with the ABI373 automated sequencer (Applied Biosystems, Inc.). Template DNA was prepared using the Magic Minipreps kit (Promega Corporation). Duplicate clones were detected using Sequencher (GeneCodes Corporation). Primers flanking the repeat were chosen using the Primer program (Whitehead Institute/MIT) as implemented by the CHLC primer pipeline. For information on the CHLC pipeline server, send a blank email message to: primer-server@chlc.org.

Polymorphism analysis

The STSs generated in the initial survey of 50 STS from each trinucleotide simple sequence repeat class were genotyped on four reference CEPH individuals (1331 01, 1331 02, 1408 01, and 1408 02) using standard PCR conditions. Subsequent STSs were analyzed for their degree of polymorphism and ranges of allele sizes by PCR amplification of pooled DNA samples derived from 40 CEPH individuals. (27). Where heterozygosity frequencies are listed, the data reflects typing of 8 CEPH families (4).

Chromosome assignment

Tentative localization of each STS to a specific human chromosome was accomplished by PCR based screening of the National Institute of General Medical Sciences (NIGMS) somatic cell hybrid mapping panels #1 and/or #2.

Electronic access to data

CHLC maps and marker information are available though several electronic information sources: anonymous ftp (ftp. chlc. org), Gopher (gopher. chlc. org), and World Wide Web (http://www.chlc.org).

Nomenclature

In this manuscript, the classes of trinucleotide repeats are referred to by their "alphabetically minimal" names (listed in ref. 33) in order to simplify the literature on short tandem repeats. For example, (AAT)n repeats are equivalent to (ATA)n, (TAA)n, (TAT)n, (TTA)n, and (ATT)n repeats. Unfortunately, this nomenclature was not adopted at the start of the CHLC database. The names of the trinucleotide repeat-based STSs in the CHLC database are identical to those listed in this paper except the following: (AAT)n = (ATA)n, (ACC)n = (CAC)n, (AGG)n = (CCT)n, (AAG)n = (CTT)n, (AGC)n = (GCT)n, and (CGC)n = (CCG)n.

ACKNOWLEDGMENTS

We are grateful to David Kwiatkowski, Richard Baldarelli, and George M. Church for critical reading of the manuscript. We thank the following for their invaluable technical assistance: Jelveh Ghazizadeh (Harvard Medical School), Gretel Mattes, John Beck, Brain Thompson, Tom Businga, Kerry Wiles, Dee Even (University of Iowa), Matt Stephenson, Donna David (Marshfield Clinic), Robert K. Stodola, Frank J. Manion, Raymond Reichard, Michel van der List, and John Quillen (Fox Chase Cancer Research Center).

REFERENCES

NIH/CEPH Collaborative Mapping Group. (1992) A comprehensive genetic linkage map of the human genome. Science, 258, 67-86.
Weissenbach, J., Gyapay, G., Dib, C., Vignal, A., Morissette, J., Millasseau, P., Vaysseix, G., Lathrop, M. (1992) A second-generation linkage map of the human genome. Nature, 359, 794-801.
Gyapay, G., Morissette, J., Vignal, A., Dib, C., Fizames, C., Millasseau, P., Marc, S., Bernardi, G., Lathrop, M., Weissenbach, J. (1994) The 1993-94 Genethon human genetic linkage map. Nature Genet., 7, 246-339.
Murray, J.C., Buetow, K.H., Weber, J.L., Ludwigsen, S., Scherpbier-Heddema, T., Manion, F., Quillen, J., Sheffield, V.C., Sunden, S., Duyk, G., Weissenbach, J., Gyapay, G., Dib, C., Morissette, J., Lathrop, G.M., Vignal, A., White, R., Matsunami, N., Gerken, S., Melis, R., Albertsen, H., Plaetke, R., Odelberg, S., Ward, D., Dausset, J., Cohen, D., Cann, H. (1994) A comprehensive human linkage map with centimorgan density. Science, 265, 2049-2054.
Weber, J. L., May, P. E. (1989) Abundant class of human DNA polymorphisms which can be typed using the polymerase chain reaction. Am. J. Hum. Genet., 44, 388-396.
Litt, M., Luty, J. A. (1989) A hypervariable microsatellite revealed by in vitro amplification of a dinucleotide repeat within the cardiac muscle actin gene. Am. J. Hum. Genet., 44, 397-401.
Stallings, R.L., Ford, A. F., Nelson, D., Torney, D. C., Hildebrand, C.E., Moyzis, R. K. (1991) Evolution and distribution of (GT)n repetitive sequences in mammalian genomes. Genomics, 10, 807-815.
Weber, J. (1990) Informativeness of human (dC-dA)n (dG-dT) polymorphisms. Genomics, 7, 524-530.
Edwards, A., Civitello, A., Hammond, H.A., Caskey, C.T. (1991) DNA typing and genetic mapping with trimeric and tetrameric tandem repeats. Am. J. Hum. Genet., 49, 746-756.
Verkerk, A.J., Pieretti, M., Sutcliffe, J.S., Fu, Y.H., Kuhl, D.P., Pizzuti, A., Reiner, O., Richards, S., Victoria, M.F., Zhang, F., Eussen, B.E., van Ommen, G.-J.B., Blonen, L.A.J., Riggins, G.J., Chastain, J.L., Kunst, C.B., Galjaard, H., Caskey, C.T., Nelson, D.L., Oostra, B.A., Warren, S.T. (1991) Identification of a gene (FMR-1) containing a CGG repeat coincident with a breakpoint cluster region exhibiting length variation in Fragile X syndrome. Cell, 65, 905-914.
La Spada, A. R., Wilson, E. M., Lubahn, D. B., Harding, A. E., Fischbeck, K. H. (1991) Androgen receptor gene mutations in X-linked spinal and bulbar muscular atrophy. Nature, 352, 77-79.
Brook, J. D., McCurrach, M.E., Harley, H.G., Buckler, A.J., Church, D., Aburatani, H., Hunter, K., Stanton, V.P., Thirion, J.-P., Hudson, T., Sohn, R., Zemelman, B., Snell, R.G., Rundle, S.A., Crow, S., Davies, J., Shelbourne, P., Buxton, J., Jones, C., Juvonen, V., Johnson, K., Harper, P.S., Shaw, D.J., Housman, D.E. (1991) Molecular basis of myotonic dystropy: Expansion of a trinucleotide (CTG) repeat at the 3' end of a transcript encoding a protein kinase family member. Cell, 68, 799-808.
Mahadevan, M., Tsilfidis, C., Sabourin, L., Shutler, G., Amemiya, C., Jansen, G., Neville, C., Narang, M., Barcelo, J., O'Hoy, K., Leblond, S., Earle-MacDonald, J., De Jong, P.J., Wieringa,B, Korneluk, R.G. (1992) Myotonic Dystrophy mutation: An unstable CTG repeat in the 3' untranslated region of the gene. Science, 255, 1253-1255.
Fu, Y.-H., Pizzuti, A., Fenwick, Jr., R.G., King, J., Rajnarayan, S., Dunne, P.W., Dubel, J., Nasser, G.A., Ashizawa, T., De Jong, P., Wieringa,B., Korneluk, R., Perryman, M.B., Epstein, H.F., Caskey, C.T. (1992) An unstable triplet repeat in a gene related to myotonic muscular dystrophy. Science, 255, 1256-1258.
The Huntington's Disease Collaborative Research Group. (1993) A novel gene containing a trinucleotide repeat that is expanded and unstable on Huntington's disease chromosomes. Cell, 72, 971-983.
Orr, H. T., Chung, M., Banfi, S., Kwiatkowski, Jr., T.J., Servadio, A., Beaudet, A.L., McCall, A.E., Duvick, L.A., Ranum, L.P.W., Zoghbi, H.Y. (1993) Expansion of an unstable trinucleotide CAG repeat in spinocerebellar ataxia type 1. Nature Genet., 4, 221-226.
Knight, S. J. L., Flannery, A.V., Hirst, M.C., Campbell, L., Christodoulou, Z., Phelps, S.R., Pointon, J., Middleton-Price, H.R., Barnicoat, A., Pembrey, M.E., Holland, J., Oostra, B.A., Bobrow, M., Davies, K.E. (1993) Trinucleotide repeat amplification and hypermethylation of a CpG island in FRAXE mental retardation. Cell, 74, 127-134.
Koide, R., Ikeuchi, T., Onodera, O., Tanaka, H., Igarashi, S., Endo, K., Takahashi, H., Kondo, R., Ishikawa, A., Hayashi, T., Saito, M., Tomoda, A., Miike, T., Naito, H., Ikuta, F., Tsuiji, S. (1994) Unstable expansion of CAG repeat in hereditary dentatorubral-pallidoluysian atrophy (DRPLA). Nature Genet., 6, 9-13.
Nagafuchi, S., Yanagisawa,H., Sato, K., Shirayama, T., Ohsaki, E., Bundo M., Takeda, T., Tadokoro, K., Kondo, I., Murayama N., Tanaka, Y., Kikushima, H., Umino, K., Kurosawa, H., Nihei, K., Inoue, T., Sano, A., Komure, O., Takahashi, M., Yoshizawa, T., Kanazawa, I., Yamada, M. (1994) Dentatorubral and pallidoluysian atrophy expansion of an unstable CAG trinucleotideon chromosome 12p. Nature Genet., 6, 14-18.
Nancarrow, J.K., Kremer, E., Holman, K., Eyre, H., Doggett, N.A., Le Paslier, D., Callen, D.F., Sutherland, G.R., Richards, R.I. (1994) Implications of FRA16A structure for the mechanism of chromosomal fragile site genesis. Science, 264, 1938-1941.
Burke, J.R., Wingfield, M.S., Lewis, K.E., Roses, A.D., Lee, J.E., Hulette, C., Pericak-Vance, M.A., Vance, J.M. (1994) The Haw River Syndrome: Dentatorubropallidoluysian atrophy (DRPLA) in an African-American family. Nature Genet., 7, 521-524.
Kawaguchi, Y., Okamoto, T., Taniwaki, M., Aizawa, M., Inoue, M., Katayama, S., Kawakami, H., Nakamura, S., Nishimura, M., Akiguchi, I., Kimura, J., Narumiya, S., Kakizuka, A. (1994) CAG expansions in a novel gene for Machado-Joseph disease at chromosome 14q32.1. Nature Genet., 8, 221-227.
Parrish, J.E., Oostra, B.A., Verkerk, A.J.M.H., Richards, C.S., Reynolds, J., Spikes, A.S., Shaffer, L.G., Nelson, D.L. (1994) Isolation of a GCC repeat showing expansion in FRAXF, a fragile site distal to FRAXA and FRAXE. Nature Genet., 8, 229-235.
Ostrander, E.A., Jong, P.M., Rine, J., Duyk, G. (1991) Construction of small-insert genomic DNA libraries highly enriched for microsatellite repeat sequences. Proc. Natl. Acad. Sci. USA, 89, 3419-3423.
Pulido. J.C., Duyk, G.M. (1994) In Dracopoli, N. C., Haines, J. L., Korf, B. R., Moir, D. T., Morton, C. C., Seidman, C. E., Seidman, J. G., Smith, D.R. (eds.), Current Protocols in Human Genetics. Wiley, New York, pp. 2.2.1-2.2.33.
Beckmann, J.S., Weber, J.L. (1992) Survey of human and rat microsatellites. Genomics, 12, 627-631.
Sheffield, V.C., Weber, J.L., Buetow, K.H., Murray, J.C., Even, D.A., Wiles, K., Gastier, J.M., Pulido, J.C., Yandava, C., Sunden, S.L., Mattes, G., Businga, T., McClain, A., Beck, J., Duyk, G.M. (1995) A collection of tri-and tetranucleotide repeat markers used to generate high quality, high resolution human genome-wide linkage maps. Hum. Molec. Genet., this issue.
Stallings, R. L. (1994) Distribution of trinucleotide microsatellites in different categories of mammalian genomic sequence: Implication for human genetic diseases. Genomics, 21, 116-121.
Riggins, G. J., Lokey, L. K., Chastain, J. L., Leiner, H. A., Sherman, S. L., Wilkinson, K. D., Warren, S. T. (1992) Human genes containing polymorphic trinucleotide repeats. Nature Genet., 2, 186-191.
Li, S.-H., McInnis, M. G., Margolis, R. L., Antonarakis, S. E., Ross, C. A. (1993) Novel triplet repeat containing genes in human brain: Cloning, expression, and length polymorphisms. Genomics, 16, 572-579.
Fishel, R., Lescoe, M. K., Rao, M. R. S., Copeland, N. G., Jenkins, N. A., Garber, J., Kane, M., Kolodner, R. (1993) The human mutator gene homolog MSH2 and its association with hereditary nonpolyposis colon cancer. Cell, 75, 1027-1038.
Liu, B., Nicolaides, N. C., Markowitz, S., Willson, J. K. V., Parsons, R. E., Jen, J., Papadopolous, N., Peltomaki, P., de la Chapelle, A., Hamilton, S. R., Kinzler, K. W., Vogelstein, B. (1995) Mismatch repair gene defects in sporadic colorectal cancers with microsatellite instability. Nature Genet., 9, 48-55.
Jin, L., Zhong, Y., Chakraborty, R. (1994) The exact numbers of possible microsatellite motifs. Am. J. Hum. Genet., 55, 582-583.

Back to CHLC Home page

Last modified: Thur Nov 30 16:10:16 EDT 1995

The CHLC Informatics Group (help@chlc.org)